1
The low energy tautomers and conformers of the dipeptides HisGly and GlyHis and of their sodium ion complexes in the gas phase

2
The low-lying conformers of the dipeptides HisGly and GlyHis, and of their sodium cation complexes, have been studied with a combination of Monte Carlo search with the Amber force field and local geometry optimization at the ab initio HF/6-31G(d) level, completed with MP2(full)/6-311+G(2d,2p) energetics at the HF/6-31G(d) geometries.

3
For each dipeptide, both the Nδ–H and Nε–H tautomers of the imidazole side chain of His were considered.

4
For each of the four isomeric dipeptides, 20–30 conformers were fully characterized at the ab initio level.

5
All low energy structures are found to involve H-bonding at the Nδ position of imidazole, either as a N–H donor or a N acceptor, depending upon the tautomer.

6
In three out of the four species, the most stable conformer involves a C-terminus carboxylic acid in its less favorable trans conformation, in order to maximize intramolecular H bonding.

7
It turns out that the lowest energy tautomer of HisGly is Nε–H, while that of GlyHis is Nδ–H.

8
This result argues in favor of the diversity of His tautomeric states in peptides and proteins.

9
The sodium cation complexes of both GlyHis and HisGly have been studied as well, again considering both tautomers in each case.

10
In three out of the four species, the most stable structure involves chelation of sodium by the two carbonyl oxygens and the imidazole ring.

11
On the contrary, the sodium complex of the Nδ–H tautomer of HisGly favors chelation to the peptidic carbonyl oxygen, the imidazole ring and the amino terminus.

12
In the Nε–H tautomers of both peptides, the most favorable binding site of imidazole is the Nδ nitrogen, while in the Nδ–H tautomers, it is the π cloud which provides side chain interaction.

13
As a result, both GlyHisNa+ and HisGlyNa+ favor the Nε–H tautomer of His, in contrast to what was found for the free peptides.

Introduction

14
Among all naturally occurring α-amino acids in living systems, histidine (His) occupies a special position.

15
The side chain of His, methylene-imidazole, bears two nitrogen atoms with variable protonation states.

16
Their pK’s are such that imidazole may easily exchange protons with its surroundings in the biologically significant range of pH, effectively supporting many chemical reactions.

17
Because of this versatility, histidine has been found to be involved in ca. 50% of the active sites of all enzymes, in a recent analysis of the enzyme structural database.1

18
The Lewis basicity of His also makes it a common ligand in the first metal coordination sphere of metallo-proteins.

19
It is even considered that coordinated His may exist in its fully deprotonated form, at least as a transient species.2

20
Therefore, the protonation state of His in peptides and proteins is a matter of great interest.

21
In the pH range 6.3–9.0 in aqueous solution at room temperature, isolated His exists as a zwitterion with a non-protonated side chain imidazole.

22
In this and higher pH range, the imidazole is in either of two tautomeric forms, Nδ–H and Nε–H.

23
Scheme 1 shows the two Nδ–H and Nε–H tautomers in the non-zwitterionic form.

24
Protonation of either form at the imino nitrogen leads to an imidazolium ion from which proton release may occur from either of the two nitrogens, leading to each of the two tautomers.

25
In His-containing peptides and proteins, it is therefore expected that the occurrence of the two tautomers is strongly influenced by the presence of H bond donors and acceptors in the surroundings, and by the chemical reactions which occur.

26
Determining the tautomer populations in proteins usually relies on NMR measurements.

27
One-dimensional measurements on various nuclei suffer from various drawbacks, although 15N spectra have been shown to yield unambiguous information on His tautomers in 15N-enriched proteins, and in isolated His at various pH and temperatures, in several solvents.3

28
More recently, several techniques have been devised to identify the protonation state and tautomeric forms of His, including N–C J couplings1,4 and 2D 1H/13Cδ correlations.5

29
It remains the case, however, that the protonation and tautomeric states of most His residues in most crystallographic structures in protein databases are not unequivocally assigned.

30
In such cases, hydrogen locations are assigned, either in the structure determination and refinement procedure, or in post-structural analysis, using tools whose reliability is not known in general.

31
A recent computational and statistical study6 has indeed issued a warning signal about the significance of proton positions in His residues in protein databases, whenever specific NMR data is not available.

32
This study, based on local energy minimization of His side chain torsions using the Amber force field, showed that proton assignments appear to be no better than random.

33
Thus it is currently difficult to extract from structural databases whether there is a strong preference for one of the tautomers of His in proteins, even with the restriction to the interior of proteins, where the solvent is not expected to have a strong influence.

34
One result of the computational study is that there appears to be no general preference for one tautomer over the other.

35
In order to understand in details the factors determining the relative stabilities of His tautomers, it is therefore of interest to study model systems.

36
The situation is simple in isolated His, for which 15N NMR, as well as other techniques, have established that the Nε–H tautomer is largely dominant.3

37
This dominance has been discussed in terms of the intramolecular hydrogen bond which may establish between the Nδ and one or two of the N–H bond(s) of the ammonium group (note that in the following, we use the notation Nδ and Nε, with δ and ε as subscripts, for the imidazole nitrogen atom at the δ and ε position, respectively, so that Nδ–H and Nε–H denote the nitrogen–hydrogen bonds at these positions.

38
On the other hand, Nδ–H and Nε–H, with δ and ε as superscripts, are used to describe the tautomers with the N–H bond at the δ and ε position, respectively).

39
The same type of argument was found to explain the structure of the dipeptide HisGly, for which several crystal structures have been determined.7

40
In the crystals of both the chloride7a and dichloride7b salts of HisGly (with one and two chloride ions, and singly and doubly protonated HisGly, respectively), there appears to be a strong hydrogen bond between the ammonium group and the imine Nδ of imidazole.

41
The crystal structure of the hemihydrate of HisGly has also been obtained,7c in which the His side chain is not protonated; the structural parameters are consistent with the existence of a zwitterion, involving a N-terminus ammonium and a C-terminus carboxylate.

42
Rather surprisingly, it was found that the crystal structure contains two peptide molecules per asymmetric crystal unit, in the two different tautomeric forms.

43
One is the expected Nε–H tautomer while the second is the Nδ–H tautomer, involving H-bonding between Nδ–H and the C terminal carboxylate.

44
Thus it appears that HisGly is a simple model in which the coexistence and possible competition between the two tautomers of His already exist.

45
For this reason, we have explored the conformational landscape of HisGly.

46
In order to delineate in detail the interactions which influence the relative stabilities of the tautomers, we have also studied its dipeptide isomer GlyHis, for which there is no structural information available, to the best of our knowledge.

47
Clearly such simple models lack the side chain–side chain interactions in which His imidazoles are often involved in proteins.

48
Yet local interactions within small dipeptides are found to have a significant influence on the relative stabilities of the tautomers.

49
We have also studied the sodium cation complexes of HisGly and GlyHis, since in a recent study of the Na+ complexes of a series of small peptides, we found that there exists a significant sequence effect on the binding affinities of HisGly and GlyHis to Na+.8

50
Sodium is known to bind peptides mostly through their oxygen atoms, however, in GlyHis and HisGly, it is expected that the imidazole side chain can provide an additional binding site, in a way which will depend significantly upon the tautomer.

51
In the present context, it is interesting to see if cation attachment may be a way to differentiate tautomers.

52
The detailed understanding of the conformation landscape of small biomolecules is a challenging task, both because there is generally a large to very large number of low-lying energy minima, and because energy barriers connecting these minima are often small, leading to several conformers being populated at room temperature.

53
Various experimental methods are capable of yielding some structural data on such molecules, yet it remains very difficult to identify the lowest energy conformers with certainty, and to obtain an overview of the potential energy surface.

54
Quantum chemistry is an efficient alternative to experimental methods to explore the conformations of flexible molecules of relatively small size, as it provides relative energies of conformers with good accuracy, and enables a thorough exploration of the potential energy surface.

55
There has been ample work on the conformations of Gly in the 1990’s,9 and other amino acids have also been studied, including alanine,10 serine, cysteine,11 valine,12 proline,13 glutamic acid14 and arginine.15

56
Several oligopeptides have also been studied.16

57
Herein we use ab initio computations to establish the structures of the low energy conformations of the dipeptides GlyHis and HisGly in each of their possible tautomeric forms.

58
Analogous work has also been carried out for the sodium cation complexes of the four isomers.

Computational methods

59
The number of weakly hindered rotors (seven covalent single bonds between non-hydrogen atoms) is large enough in GlyHis and HisGly that constructing structures based on chemical intuition solely is not appropriate.

60
Thus we resorted to a non local exploration of the potential energy surface, as a preliminary step prior to local geometry optimization with the ab initio methods mentioned below.

61
Monte Carlo sampling was carried out with the Metropolis criterion at 300 K, in which random values were generated for the torsion angles around all single bonds between non-hydrogen atoms except the peptide bond.

62
In order to keep this procedure computationally tractable, the Amber 94 force field was used for energy calculations, with RESP atomic charges.17

63
At least two independent searches were performed for each case.

64
A limit of 1000 random tries or 500 geometry optimizations was set for each search.

65
The first search was started from a β-strand like extended structure, while the second was started from the lowest energy structure found in the first, ensuring that the two would be largely different.

66
Random sampling was systematically followed by local geometry optimization at the same level.

67
However the Amber calculations lead to an energy ordering of conformers which deviates significantly from that obtained with accurate ab initio calculations.

68
In order to locate all of the low lying structures, a large sample was selected from the Amber results and subjected to geometry re-optimization at the ab initio level, typically between 20 and 30 in each case (each tautomeric form of GlyHis and HisGly).

69
In one case, the Nε–H tautomer of HisGly, we first searched the potential energy surface without a Monte Carlo search, but rather with a combination of scanning the relevant torsion angles and selecting structures on the basis of maximizing hydrogen bond interaction.

70
Then we carried out an MC search starting from one of the low energy structures previously found.

71
This led structures of ranks 5, 10 and 1 at the optimized Amber level to insert into those previously obtained, and become structures of ranks 2, 6 and 18, respectively, after ab initio re-optimization and final energy calculations.

72
Yet this procedure is too lengthy to be used in general, therefore we resorted to initial MC searches for all three other cases.

73
In all cases, the lowest 15 structures at the ab initio level arose from structures in the lower half of the Amber set, which contained from 30 to 40 unique structures, depending upon the case.

74
The experience gained from the Nε–H tautomer of HisGly led us to inspect all of the low energy structures finally obtained at the ab initio level, and change some torsions when it appeared that a related structure might be of lower energy based on the criterion of enhancement of the hydrogen bond network.

75
We believe it is this combination of random and hydrogen bonding searches that may lead to an efficient determination of most, if not all, of the low energy conformers.

76
Yet the present work does not aim at providing an exhaustive description of these very complex potential energy surfaces.

77
Ab initio calculations were carried out at levels which represent a compromise between accuracy and tractability for consideration of a significant number of structures.

78
Geometries were optimized at the HF/6-31G(d) level, vibrational analyses were carried out at the same level to determine zero-point vibrational energies, thermal corrections to total energies, and entropies.

79
Final energetics were determined at the MP2(full)/6-311+G(2d,2p) level using the HF/6-31G(d) geometries, a level of computation which has been previously shown to yield accurate energetics.18

80
All results mentioned below are computed at this MP2(full)/6-311+G(2d,2p)//HF/6-31G(d) level except otherwise noted.

81
Relative energies computed at the HF and MP2 levels were usually found to be in satisfactory agreement with each other (differing by less than 10 kJ mol−1, and often by less than 5 kJ mol−1).

82
It turned out, however, that in some cases the differences were as large as 10–25 kJ mol−1.

83
A careful inspection showed that such large differences occur when the conformers being compared have the C-terminal carboxylic acid in different conformations (cisvs. trans).

84
Test calculations on acetic acid itself indicate that the cis-trans relative energy is 7 kJ mol−1 lower at the MP2(full)/6-311+G(2d,2p)//HF/6-31G(d) level, compared to the HF/6-31G(d) result.

85
Moreover, low energy structures always bear a trans carboxylic acid that is H-bond donating, and this H bond is not very accurately described at the HF level.

86
Since the energy differences between the best structures of the two tautomers turned out to be very small for both dipeptides, additional calculations were carried out.

87
The geometries of the most stable conformers were optimized at the MP2/6-31G(d) level, and final energetics were recomputed at the MP2/6-311+G(2d,2p)//MP2/6-31G(d) and MP2/aug-cc-pVTZ(−f)//MP2/6-31G(d) levels.

88
In the latter “(−f)” indicates that the most diffuse f functions on C, N and O, and the most diffuse d functions on H have been dropped from the regular aug-cc-pVTZ set As described in the text, these more accurate levels lead to relative energies in reasonable agreement with the MP2(full)/6-311+G(2d,2p)//HF/6-31G(d) values.

89
In particular, there was no change on the energetic ordering.

90
Monte Carlo calculations were carried out with HyperChem 6.0,19 while the Gaussian03 package was used for ab initio calculations20.

Results and discussion

Conformer descriptors

91
Several different ways are conceivable for the description of conformers, for each of the tautomers of HisGly and GlyHis.

92
This is due to the presence of seven weakly hindered rotors in each species: the Cα–C(O) and Cα–N of the main chain of each residue, the Cα–Cβ and Cβ–Cring of the His side chain, and the C–OH bond of the C-terminus acid.

93
Describing each of the conformers by the value of the torsion angle around each of these bonds would be rather tedious, although it would carry all of the information.

94
We have chosen to define families using a hierarchy of structural criteria (descriptors), of which several are non local.

95
The first descriptor is the existence of a hydrogen bond between an atom of the main chain and the Nδ or Nδ–H of the imidazole ring.

96
All of the low energy conformers determined in this work bear such a hydrogen bond; for instance for the Nε–H tautomer of GlyHis, the lowest lying conformer found without a hydrogen bond at Nδ is higher in energy than the most stable by 18 kJ mol−1.

97
We denote the cyclic motif formed by such H bonds as “Cn”, meaning that the H bond generates a n-membered cycle.

98
The various types of Cn structures, C6–C10, are depicted schematically in Scheme 2.

99
The arrows in Scheme 2 are oriented from the hydrogen bond donor to the acceptor.

100
For instance in the Nδ–H tautomers of HisGly and GlyHis, the Nδ–H bond can be a H-donor towards the carbonyl oxygen of His, generating a C7 motif.

101
C7 motifs also exist for the Nε–H tautomers, in which the Nδ atom is a H-bond acceptor from an N–H bond of Gly in HisGly or the C-terminus OH bond in GlyHis.

102
On the other hand, in none of the conformers is the Nε or Nε–H oriented in such a way as to engage in a H bond of any Cn type, because the dipeptide chain is too short.

103
The second conformation descriptor is the conformation of the Cn ring.

104
The types of ring conformations encountered in low energy structures are presented in more detail in Fig. 1.

105
C6 rings connect the Nδ position to the main chain nitrogen of His, which is the peptidic nitrogen in GlyHis and the N terminus in HisGly.

106
In both Nε–H tautomers, there are two possible conformations, chair and half-chair (see Fig. 1).

107
Note that the half-chair may be inverted, leading to a different energy since steric repulsions with the rest of the molecule are different.

108
In the Nδ–H tautomer of HisGly, a chair conformation is formed by H bonding from the Nδ–H to the amino terminus.

109
For GlyHis, the H bond would point to the peptidic nitrogen, but the rigidity of the peptide linkage precludes formation of the C6 ring in this case.

110
Other Cn rings may be formed in several isomers, but the C7 is the only one which occurs in all.

111
The various C7 possibilities have been introduced above.

112
We find two conformations for the C7, half-chair-like and boat-like.

113
Hydrogen bonding is favored when the N–H⋯OC segment is nearly planar in both types of conformation, since it allows the N–H bond to point approximately towards an oxygen lone pair, and the two bond dipoles to be oriented favourably for electrostatic stabilization.

114
This restricts significantly the flexibility of the C7 motif.

115
As shown in Fig. 1, formation of a C8 ring is possible only for the Nδ–H tautomer of GlyHis, connecting the Nδ–H bond to the carbonyl oxygen of Gly.

116
We found two conformations for this C8 which may be loosely defined as half-chair-like and boat-like.

117
As for C8, a C9 ring is compatible with one isomer only; in this case it is the Nε–H tautomer of GlyHis, in which the H bond connects Nδ to one of the N–H bonds of the amino terminus.

118
The C9 ring is associated in some cases with a C6 ring, when the main chain is oriented in a way which also permits interaction of Nδ with the peptidic N–H bond.

119
Finally, C10 rings exist in both tautomers of HisGly.

120
In the Nδ–H tautomer it is due to a H bond between the Nδ–H bond and the oxygens at the C terminus, while in the Nε–H tautomer, the H bond is between the C terminus O–H bond of the trans conformation of carboxylic acid and Nδ.

121
In both cases, we find two conformers, which are deduced from each other by a 180° rotation around the Cα–N bond of Gly (see Fig. 1).

122
This leads to opposite orientations of the OC−N–H peptidic group with respect to the mean plane of the C10 cycle.

123
The third descriptor is the relative orientations of the peptidic OC–N–H plane with respect to that of the C terminal carboxylic acid.

124
There are two rough relative orientations: either coplanar, defining a fragment of a β sheet, or perpendicular.

125
Finally, the fourth descriptor is the relative position of imidazole and of the peptidic plane.

126
“Open” conformers correspond to extended structures, in which the Gly residue and imidazole are distant, i.e. they cannot interact via H bonding.

127
“Closed” conformers are such that such H bonds can be established, between the Nδ position of His and an atom of either the peptidic bond or of the C terminus.

128
Altogether, each existing combination of these four descriptors defines a “family”.

129
We have noted these families with capital roman numbers, in the stability order.

130
For instance for the Nδ–H tautomer of GlyHis, the lowest energy conformer belongs to family “I”, which corresponds to the existence of a C7 ring, in a boat-like conformation, in a “closed” relative orientation of Gly and imidazole, and a perpendicular relative orientation of the peptidic and carboxylic planes.

131
Within each family, there remain conformational differences.

132
These are specified by the orientation of the terminal NH2 and COOH groups.

133
The various possibilities encountered in low energy structures are gathered in Fig. 2, together with the numbering used hereafter.

134
For instance in conformations “1” the NH2 terminus is a H bond acceptor towards the peptidic N–H, while in “2” it is a H bond donor towards the peptidic oxygen, etc.

135
In conformations “a” the carboxylic C terminus is in its trans conformation and it is a H bond donor towards the peptidic oxygen, while in “b” it is cis, and a H bond acceptor from the peptidic N–H.

136
There is some redundancy between these notations and the specification of the Cn rings, since, e.g., conformations 5 are C9 rings (see Fig. 2).

137
For all isomers, the most stable conformer of each family is depicted in Fig. 3–6.

138
In order to illustrate the conformational flexibility offered by the NH2 and COOH termini, the six most stable conformations of the most stable family of the Nδ–H tautomer of GlyHis are shown in Fig. 7.

Low energy conformers of the Nδ–H tautomer of GlyHis

139
The relative energies of all conformers, grouped in five families according to the descriptors defined above, are gathered in the left part of Table 1.

140
The structures of the most stable conformers for each of the first five families are shown in Fig. 3, while a series of conformers of family I are shown in Fig. 7.

141
The best structure overall is I-1-ad, which bears a C7 ring with a H bond from the Nδ–H to the His carbonyl oxygen.

142
In order to maximize H bonding, the carboxyl group is trans, enabling H bond donation from its O–H bond to the peptidic oxygen (which happens to be a C7 ring of another type, not used herein for structure specification).

143
The so-defined orientation of the peptide linkage is such that its N–H bond may interact with the π cloud of imidazole.

144
At the same time, the NH2 terminus is oriented in such a way as to be a H bond acceptor from the peptidic N–H.

145
This structure is particularly stable since all heteroatoms are engaged in H bonding, except for the Nε (for which it is structurally impossible).

146
The second most stable conformer is I-2-ad (see Fig. 7), 10 kJ mol−1 less stable than I-1-ad.

147
The only difference between I-1-ad and I-2-ad is the orientation of the CH2NH2 at the N-terminus.

148
In I-1-ad, the amino terminus is a H bond acceptor from the peptidic N–H, while in I-2-ad it is a double H bond donor to the peptidic oxygen.

149
In all cases studied herein, we find the same energy ordering between these two orientations of the N terminus.

150
The next conformer of the same family (the fourth most stable overall) is I-3-ad, which differs from I-2-ad by the orientation of the amino terminus, which interacts with the peptidic oxygen via a single N–H bond rather than both.

151
The fact that this single N–H bond is better oriented towards the peptidic oxygen than either of the N–H bonds in I-2-ad does not compensate completely for the loss of one interaction, leading to a destabilization of 3 kJ mol−1 (i.e. 13 kJ mol−1 less stable than I-1-ad).

152
Other conformers of the same family have the carboxyl group in the cis conformation.

153
Although it is intrinsically more stable than the trans (by 22 kJ mol−1 in acetic acid), it does not allow simultaneous H bonding to the oxygen carbonyl on the one hand, and from the O–H bond on the other.

154
The most stable of such conformers is I-1-d, which differs from I-1-ad only by the orientation of the OH group.

155
It lies 17 kJ mol−1 higher in energy.

156
Other conformers add to this another less favourable interaction relative to I-1-ad, such as the orientation of the amino terminus, or the hydroxyl oxygen instead of the more basic carbonyl oxygen as a H bond acceptor (see Fig. 7), and are more than 20 kJ mol−1 less stable than I-1-ad.

157
We now turn to the lowest energy conformers of other families.

158
The most stable structure in family II, II-1-b, shown in Fig. 3, bears a C8 ring.

159
Here it is the peptide carbonyl oxygen, rather than that of the carboxyl terminus, which interacts with the Nδ–H bond of imidazole.

160
The peptide linkage is oriented in such a way as to allow H bond donation from the peptidic N–H to both the amino terminus and the carbonyl oxygen of the C terminus.

161
It is 12 kJ mol−1 less stable than I-1-ad.

162
The most stable structure in family III, III-1-b, has a C8 ring analogous to that in family II, however, in a boat-like conformation.

163
The peptide linkage has the same orientation as II-1-b, and thus forms the same H bonds involving the amidic hydrogen.

164
The most stable structure in family IV, IV-1-bd, has a C7 ring involving the C-terminus carbonyl oxygen as in family I, however, the ring has a half-chair-like conformation, while it is boat-like in family I. The two families also differ by the relative orientations of the peptidic and C-terminus planes (third descriptor above), perpendicular for family I and parallel for family IV.

165
This parallel orientation is common with family II, and it allows again the peptidic N–H bond to interact with both termini.

166
This leaves the peptidic carbonyl oxygen without a H bond, so that this structure is less stable than I-1-ad by 14 kJ mol−1.

167
Family V has a C7 ring of the same type as that of family I, but they differ by their conformation: half-chair-like in I, and boat-like in V. With a cis carboxyl group leaving the OH bond without H bonding, the most structure of family V, V-1-d, lies 16 kJ mol−1 above I-1-a.

168
Some additional, generally less stable, conformers of this species may be found in Table 1.

Low energy conformers of the Nε–H tautomer of GlyHis

169
The relative energies of all conformers are gathered in the right part of Table 1.

170
The most stable conformers for each of the first seven families are shown in Fig. 4.

171
As for the most stable conformer of the Nδ–H tautomer, the most stable structure (I-1-jb) involves a boat-like C7 ring, which is closed in this case by a O–H⋯N bond from the C terminus to the imidazole Nδ.

172
For this to occur, the carboxyl group is in its trans conformation, another feature that is common to both tautomers.

173
The main chain has parallel peptide and carboxyl groups, and has glycine and imidazole in an open arrangement.

174
This allows the peptidic N–H to interact with both the C-terminus carbonyl oxygen and with the amino terminus.

175
On the other hand, the open structure precludes any interaction of the peptide carbonyl, which may explain why this conformer is less stable than the best Nδ–H conformer (see Table 1; at the MP2/6-311+G(2d,2p)//MP2/6-31G* level, this difference is 6.5 kJ mol−1).

176
The second most stable conformer (I-2-jb, not shown) resembles the first, the only difference being that the amino terminus is a H bond donor towards the peptidic carbonyl, rather than an acceptor from the peptidic N–H.

177
Thus the lower portion of the conformational spaces of the Nδ–H and Nε–H tautomers are fairly similar.

178
The third most stable structure is also the first of family II (II-1,5-b, see Fig. 4).

179
It has a C9 ring connecting the amino terminus to the imidazole Nδ, and in addition, the relative orientations of the peptide linkage and of the carboxyl group, and of the main chain and the imidazole, enable the formation of a C6 ring between Nδ and the peptidic N–H.

180
This structure is 7 kJ mol−1 higher in energy than I-1,5-jb.

181
A structure analogous to that of II-1,5-b, with a C9 ring and planar main chain, is found with IV-1,5-b (the seventh most stable conformer in Table 2).

182
However the C9 conformation is different, precluding the formation of a C6 ring.

183
This conformer is only 3 kJ mol−1 less stable than II-1,5-b.

184
It is likely that the loss of a H bond is partly compensated by a smaller strain, permitting better relative orientations of the amino terminus and the peptidic N–H for H bonding.

185
A slightly more stable structure is the best conformer of family III, III-1-f.

186
Its favourable features are a C6 ring and a H bond from the peptidic N–H to the amino terminus.

187
However, none of its carboxyl oxygens can engage into H bonds, which is why this structure is 8 kJ mol−1 higher in energy than I-1-jb.

188
Another structure shown in Fig. 4 is V-1-b, which has a stable, planar main chain skeleton, but which lacks a H bond to the imidazole Nδ.

189
This leads to a high energy, 18 kJ mol−1 higher above I-1-jb.

190
Other structures in Fig. 4, the most stable of families VI and VII, are of even higher energies.

191
There are also conformers of intermediate energies, in the 12–20 kJ mol−1 range above I-1-jb, which are not described in detail here, but for which the specification of descriptors in Table 2 should be reasonably explicit.

Low energy conformers of the Nδ–H tautomer of HisGly

192
The relative energies of all conformers found are gathered in the left part of Table 2.

193
The structures of the most stable conformers for each of the eight families identified are shown in Fig. 5.

194
The best conformer, I-1-h, has a boat-like C7 ring which involves the peptidic carbonyl, as compared to the carboxyl carbonyl in GlyHis.

195
In addition, the perpendicular orientations of the peptidic linkage and the carboxyl terminus enable the latter to interact with the Nδ–H of imidazole, forming a C10 ring.

196
The favourable orientation of the amino terminus towards the peptidic N–H, already described previously, also permits some interaction of one of its N–H bonds with the π cloud of imidazole.

197
As seen in Table 2, this is the most stable conformer, however, it is slightly less stable than the best conformer of the Nε–H tautomer.

198
The most stable conformer of family II, II-1-b (see Fig. 5), differs from I-1-h by the orientation of the Gly backbone, with the terminal carbonyl interacting with the peptidic N–H rather than with Nδ–H.

199
This is a weaker interaction, essentially a dipole–dipole interaction of the CO and N–H bonds with antiparallel dipoles, rather than a H bond.

200
As a consequence, it is less stable than I-1-h by 5 kJ mol−1.

201
The next two conformers (not shown) are higher congeners of family I, differing from I-1-h only by the orientation of the COOH group.

202
They are less stable than I-1-h by 6–8 kJ mol−1.

203
The next most stable structure is the first of family III (III-4-h, see Fig. 5).

204
It has a C10 ring as does I-1-h, but with a different conformation (of type a for III-4-h), in which the C7 ring no longer exists.

205
Although the amino terminus –N–H peptidic bond interaction is maintained, together with that of one amino N–H with the π cloud of imidazole, the lack of the C7 ring leads to a destabilization of 12 kJ mol−1.

206
Structure IV-1-h, the next higher conformer, presents a b-type C10 ring, no C7 ring, and has no amino N–H-imidazole stabilization.

207
It is about 1 kJ mol−1 less stable than III-4-h.

208
Several conformers belonging to the III, IV and I families follow with increasing energies, and the first members of families V-VIII (shown in Fig. 5) are all at least 20 kJ mol−1 less stable than I-1-h.

Low energy conformers of the Nε–H tautomer of HisGly

209
The relative energies of all conformers found are gathered in the right part of Table 2.

210
The structures of the most stable conformers for each of the seven families identified are shown in Fig. 6.

211
The best structure, I-1-a, involves a C6 ring in a half-chair conformation (as do the next two most stable, I-1-f and I-1-b).

212
The amino terminus is both a H bond donor, to Nδ, and an acceptor, from the peptidic N–H.

213
As discussed above, the latter interaction is common in low energy structures.

214
The carboxyl terminus adopts a trans conformation to act as a H bond donor to the peptidic carbonyl.

215
Its other typical, cis conformation, is adopted in I-1-f, which is 7 kJ mol−1 less stable.

216
Thus here again, the most stable structure involves a trans carboxylic acid.

217
The first member of family II, II-1-i, is the fourth most stable overall.

218
Its ring is a C10, and it is now the carboxyl terminus in its trans conformation which behaves as a H bond donor towards the Nδ.

219
Again the amino terminus can accept H bonding from the peptidic N–H, however, both carbonyl oxygen are left without significant interaction, leading to an energy 10 kJ mol−1 higher than that of I-1-a.

220
While the C7 ring was clearly the most favorable for the three previous isomers, in this case the only possibility to form a C7 is to bind the peptidic N–H bond to the Nδ, which is incompatible with H bond donation of the peptidic N–H to the amino terminus.

221
Not only is the C7 not the best ring in this case, but the most stable conformer bearing a C7, III-2-b, is found to lie 15 kJ mol−1 higher than I-1-a.

222
The first member of family IV, IV-3-b, has a conformation very close to that of III-2-b.

223
They differ only by the C7 conformation (half-chair-like in IV-3-b).

224
Another C7 occurs in family V, V-2-a, with the same C7 conformation as in IV-3.

225
The difference consists in the perpendicular orientation of the acid function with respect to the peptidic plane (third descriptor).

226
This orientation allows the carboxyl group in trans conformation to make a H bond to the peptidic carbonyl, instead of being cis and bind electrostatically with the peptidic N–H (as in IV-3-b).

227
As a balance of these changes, the energies of these three conformers III-2-b, IV-3-b and V-2-a are very close and differ only by 3 kJ mol−1.

Zwitterions

228
Amino acids are known to have zwitterionic structures in aqueous solution in the biologically significant range of pH (ca. 6–9).

229
In the gas phase, the positive and negative charges cannot be as efficiently stabilized as they are in solution, and the zwitterions are expected to be much less stable.

230
In fact, all available evidences point to structures without formal charges in the gas phase.

231
Thus we expect that for dipeptides such as HisGly and GlyHis, zwitterions are not the most stable isomers either.

232
Yet it was deemed necessary to check this issue computationally.

233
The number of stable conformers for the zwitterions of the tautomers of HisGly and GlyHis is expected to be significantly smaller than for the “neutral” structures described above.

234
Preliminary calculations indeed showed that many structures lead to collapse to non-zwitterions.

235
As expected, the most favorable structures involve direct interaction between one of the carboxylate oxygens and one of the ammonium N–H bonds.

236
For such motifs to be stable against proton transfer from the ammonium to the carboxylate, at least one of the charges must be stabilized by a second hydrogen bond, for instance O by the imidazole Nδ–H, or N+–H by the imidazole Nδ.

237
In all cases, the zwitterions were found to be much less stable than many conformers without formal charges.

238
The smallest difference found between the best zwitterion and the best “neutral” conformer is for the Nε–H tautomer of HisGly, where it is 89 kJ mol−1.

239
This zwitterion happens to have a structure that is fairly similar to that reported for the crystal structure of the hemihydrate of HisGly.7c

240
Since all zwitterions are so high in energy, they are not described further here for the sake of brevity.

General trends

241
All of the low energy structures determined in this work (in a range of ca. 20 kJ mol−1) bear a Cn ring.

242
Moreover, this ring is a C7 in the most stable conformers of three out of the four isomers.

243
In these three cases, the ring conformation is boat-like.

244
Clearly whenever the skeleton is compatible with such a motif, it is highly favorable.

245
A feature that is common to all four lowest energy conformers, is the N-terminus conformation, in which the amino group is a H bond acceptor from the peptidic N–H bond.

246
Finally, in three out of the four cases, maximizing the strength of hydrogen bonding leads to a trans carboxyl group being present in the most stable conformer, even though it is intrinsically much less stable than the cis conformer.

247
As expected, though, the specificities of the various isomers are such that not all of the descriptors can take similar values for all most stable conformers.

248
The relative orientations of the peptide linkage and the carboxyl group, the relative orientations of the Gly skeleton and the imidazole ring, and the cis or trans conformation of the carboxyl group, lead to structures that are significantly different, in order to optimize the network of hydrogen bonds.

249
Comparison of the energies of the most stable conformers, for the Nδ–H and Nε–H tautomers, indicate that the most favorable tautomer of GlyHis is the Nδ–H, while that for HisGly is the Nε–H.

250
The energy differences are 4 and 1 kJ mol−1 for GlyHis and HisGly, respectively, at the MP2/6-311+G(2d,2p)//HF/6-31G* level, including ZPE correction.

251
Such values are too small to draw a safe conclusion regarding the lowest energy tautomers, therefore they were recomputed with more accurate geometries and final energetics.

252
At the MP2/6-311+G(2d,2p)//MP2/6-31G* level, the energy differences between the tautomers are found to be slightly increased, to 5 kJ mol−1 for both GlyHis and HisGly.

253
At the MP2/aug-cc-pVTZ(-f)//MP2/6-31G* level, they increase again, and amount to 7 kJ mol−1 for GlyHis and 8 kJ mol−1 for HisGly.

254
These results leave little doubt that the most stable tautomer is Nδ–H for GlyHis and Nε–H for HisGly.

Sodium complexes

255
The low energy conformation spaces for the sodium complexes of the four isomers are expected to be significantly restricted as compared to those of the free peptides, since previous studies21 have shown that multidentate binding of Na+ is a strong stability factor in its complexes with amino acids and oligopeptides.

256
For GlyGly, it was shown that the most stable structures involve ion interaction with both carbonyl oxygens, differing by the interaction, or lack thereof, with the amino terminus.22

257
In GlyHis and HisGly, our results above suggest that for the Nε–H tautomers, the imidazole Nδ nitrogen may be an additional chelation site.

258
It may also be anticipated that interaction of sodium with part or all of the imidazole π electrons will introduce an alternative binding capability, open to both types of tautomers.

259
Because the number of low-energy structures is much less for sodium complexes than for free peptides, we do not introduce descriptors, and mostly describe the structures in terms of the sodium chelation sites and intramolecular H binding they involve.

Low energy structures of the Nδ–H tautomer of GlyHisNa+

260
The principles summarized above are illustrated by the relative energies of the most stable structures (see Table 3 and Fig. 8): the most favorable chelation of sodium is to both carbonyl oxygens and the imidazole ring.

261
Since strong binding to Nε is sterically impossible, the ion interacts with the π cloud of imidazole.

262
In both I and II, Na+ is bound to both carbonyl oxygens, and sits above the plane of imidazole, with closer interaction with one of the CN bonds.

263
The two structures differ by the orientation of the NH2 terminus: it interacts with the peptidic N–H in I, while it is a H-bond acceptor from the imidazole Nδ–H in II.

264
The distances from sodium to its chelation partners are almost the same in both cases.

265
I is more stable than II by 12 kJ mol−1.

266
A third structure of the same type, III, differs from I by the terminal acid which is cis in I and trans in III, with the O–H bond pointing towards the peptidic nitrogen.

267
Since the latter is not a particularly strong H bond acceptor, III is less stable than I by 20 kJ mol−1.

268
A fourth structure, V, is also bound to sodium via the same sites as I, II and III.

269
It now has the NH2 terminus interacting with the peptidic CO.

270
As seen above for the conformations of the free peptides, this is less favorable for NH2 than interacting with the peptidic N–H.

271
V is less stable than I by 28 kJ mol−1, and is essentially degenerate with structure IV, in which the peptidic CO is H-bound to the Nδ–H of imidazole rather than to sodium as in the previous cases.

272
The remaining, less stable structures of this isomer (V, VI and VII) all have sodium bound to the two carbonyl oxygens only.

273
In all cases, the peptidic N–H bond interacts with both the NH2 terminus and imidazole in varying orientations.

274
These three structures have similar energies, 30–40 kJ mol−1 higher than that of I.

Low energy structures of the Nε–H tautomer of GlyHisNa+

275
Here as for the Nδ–H tautomer above, the most stable structures involve chelation to both carbonyl oxygen (see Table 3 and Fig. 9).

276
However the most favorable binding mode of Na+ to imidazole is no longer to the π cloud, but rather to the Nδ nitrogen.

277
This tridentate binding is found in I, II, IV and VI.

278
In the lowest three of these structures, the NH2 terminus is in its most favorable conformation, interacting with the peptidic N–H.

279
I and II differ by the side chain conformation of His.

280
In I it is such that imidazole is closer to the C terminus, while it is closer to the main chain and peptidic carbonyl in II.

281
I and II are isoenergetic.

282
II and IV share a similar main chain conformation, however, in IV the C terminus acid is in its less favorable trans conformation.

283
This enables interaction of the acid O–H with the peptidic nitrogen, albeit not strongly enough to compensate for the acid destabilization.

284
A better compromise is obtained in III, where Na+ interacts with the carbonyl oxygens and the N terminus nitrogen.

285
Here the acid is trans as in IV, with the O–H now pointing towards the Nδ of imidazole.

286
This leads to favorable H binding with a O–H⋯N bond angle of 164°.

287
Yet this structure is less stable than I by 14 kJ mol−1.

288
Another type of intramolecular H bond is found in V and VII, in which it is the peptidic N–H which binds to the Nδ.

289
These two structures have very similar backbones, with both carbonyls bound to Na+.

290
They mostly differ by the conformation of the N terminus, and it is the interaction of NH2 with Na+ (in V) rather than with the peptidic N–H (in VII) which is the most favourable.

291
Finally, VI differs from II by the NH2 conformation, and as seen several times before, H donation to the peptidic CO is less favourable than H acceptance from the peptidic N–H, by 25 kJ mol−1 in this case.

Low energy structures of the Nδ–H tautomer of HisGlyNa+

292
The results are summarized in Table 4 and Fig. 10.

293
As for the Nδ–H tautomer of GlyHisNa+, interaction of the ion with the Nε of imidazole is not possible, therefore Na+ interacts with the π cloud.

294
However in this case, a significant difference may be seen: the most stable structure does not involve Na+ chelation to both carbonyls, but rather to the peptidic carbonyl and the NH2 terminus, and sits on top of the ring.

295
This arrangement allows for efficient H bond donation from the Nδ–H to the C terminal carbonyl oxygen, with a relatively short H⋯O distance of 2.03 Å.

296
More importantly, the N–H⋯O and H⋯OC angles are 160 and 143°, respectively.

297
The reason why the Nδ–H tautomer of HisGlyNa+ is unique is that this H bond involves a C10 cycle (see Scheme 2) with enough flexibility to allow for nearly optimum N–H⋯OC orientation.

298
In contrast, in the GlyHisNa+ Nδ–H tautomer, there is only a C7 or C8 possible between the Nδ–H and the terminal or peptidic CO, respectively (see, e.g., isomers IV of GlyHisNa+ in Fig. 8 with N–H⋯O and H⋯OC angles of 145 and 129°).

299
In the Nδ–H tautomers of GlyHisNa+ and HisGlyNa+, the Nε–H bond cannot form any significant interaction; the Nδ–H bond can indeed engage into H bonds (see, e.g., isomers III of GlyHisNa+ in Fig. 9), but this can only occur at the expense of imidazole no longer being a sodium ligand.

300
Therefore it is only in Nδ–H HisGlyNa+ that such imidazole N–H bonding can compensate for the loss of a carbonyl ligand to Na+.

301
All other structures have Na+ bound to the two carbonyl oxygens and to the π cloud of imidazole.

302
They differ by two main characteristics: (i) the conformation of the side chain of His is such that imidazole is closer either to the C terminus (as in II and IV) or to the peptidic chain (as in III and V), and (ii) the orientation of the amino terminus.

303
The latter is a H bond acceptor from the imidazole Nδ–H in II and III, from the Cε–H in IV and provides a fourth chelation site to Na+ in V. These four structures span an energy range of less than 20 kJ mol−1, while II is 6 kJ mol−1 less stable than I.

Low energy structures of the Nε–H tautomer of HisGlyNa+

304
The results are summarized in Table 4 and Fig. 11.

305
As with the Nε–H tautomer of GlyHisNa+, strong interaction of the ion with the Nδ of imidazole is possible, so that the most favorable chelation of sodium is to both carbonyl oxygens and the Nδ in I. Yet the next two structures in stability order, II and III, have Na+ bound to Nδ, the peptidic oxygen and the amino terminus, differing only by the conformation of the C terminus.

306
Both are very close in energy, 22 and 24 kJ mol−1 higher than I. Structures IV and VI have the same tridentate chelation as I, however, now the conformation of His side chain is such that imidazole is closer to the peptidic chain, rather than to the C terminus as in I. As a consequence, the amino terminus is a H bond donor to the peptidic oxygen in IV and VI, rather than being an acceptor from the peptidic N–H as in I. These differences lead to IV and VI being less stable than I by 26 and 32 kJ mol−1, respectively.

307
Finally, Na+ is only bound to two sites in V and VII.

308
V is more stable than VII by 8 kJ mol−1, because its two ligands are the two carbonyl oxygens.

309
As seen for other cases previously, it has a trans acid which allows for hydrogen bonding, to the Nδ in this case.

310
Yet with only two chelation sites to Na+, it is less stable than structure I by 31 kJ mol−1.

Conclusion

311
A comprehensive study of the low energy structures of both tautomers of GlyHis and HisGly, and their sodium complexes has been described.

312
Compared to what has been described previously in the literature for GlyGly, the side chain of His provides additional binding capability.

313
While the Nε position is too remote for interaction, the Nδ can engage in H bonding either as an acceptor or as a donor, depending upon the tautomer.

314
This capability has a strong influence on all low energy structures of the four peptides.

315
Maximizing the H-bonding also leads to the C-terminal carboxylic acid being in its trans conformation in three out of the four cases.

316
Overall, the most stable tautomer of gaseous GlyHis is found to be the Nδ–H, while that of HisGly is the Nε–H, however, the energy differences are small in both cases, less than 10 kJ mol−1 at all computational levels used.

317
Because of these small differences, the picture might be significantly changed in the condensed phase, especially in a protic solvent where intra- and inter-molecular hydrogen bonds might be in competition.

318
In the sodium complexes, the Nδ position of the imidazole side chain of His again has a strong influence on stability.

319
In the Nε–H tautomers, Nδ is an efficient sodium chelator or H bond acceptor, while in the Nδ–H, it may act as an acceptor, with Na+ interacting with the π cloud of imidazole.

320
The stronger ion–molecule interactions in the Nε–H tautomers lead them to be the most stable in GlyHisNa+ and HisGlyNa+.

321
These results illustrate the role of environment (here, sequence effects and ion chelation) on the relative energies of His tautomers.

322
It is highly probable that environment effects in proteins are even stronger.

323
Tautomeric forms of His in X-ray diffraction structures should therefore be assigned with caution.