Among all naturally occurring α-amino acids in living systems, histidine (His) occupies a special position. The side chain of His, methylene-imidazole, bears two nitrogen atoms with variable protonation states. Their pK’s are such that imidazole may easily exchange protons with its surroundings in the biologically significant range of pH, effectively supporting many chemical reactions. Because of this versatility, histidine has been found to be involved in ca. 50% of the active sites of all enzymes, in a recent analysis of the enzyme structural database.[1] The Lewis basicity of His also makes it a common ligand in the first metal coordination sphere of metallo-proteins. It is even considered that coordinated His may exist in its fully deprotonated form, at least as a transient species.[2] Therefore, the protonation state of His in peptides and proteins is a matter of great interest.
In the pH range 6.3–9.0 in aqueous solution at room temperature, isolated His exists as a zwitterion with a non-protonated side chain imidazole. In this and higher pH range, the imidazole is in either of two tautomeric forms, Nδ–H and Nε–H. Scheme 1 shows the two Nδ–H and Nε–H tautomers in the non-zwitterionic form. Protonation of either form at the imino nitrogen leads to an imidazolium ion from which proton release may occur from either of the two nitrogens, leading to each of the two tautomers. In His-containing peptides and proteins, it is therefore expected that the occurrence of the two tautomers is strongly influenced by the presence of H bond donors and acceptors in the surroundings, and by the chemical reactions which occur. Determining the tautomer populations in proteins usually relies on NMR measurements. One-dimensional measurements on various nuclei suffer from various drawbacks, although 15N spectra have been shown to yield unambiguous information on His tautomers in 15N-enriched proteins, and in isolated His at various pH and temperatures, in several solvents.[3] More recently, several techniques have been devised to identify the protonation state and tautomeric forms of His, including N–C J couplings[1,4] and 2D 1H/13Cδ correlations.[5] It remains the case, however, that the protonation and tautomeric states of most His residues in most crystallographic structures in protein databases are not unequivocally assigned. In such cases, hydrogen locations are assigned, either in the structure determination and refinement procedure, or in post-structural analysis, using tools whose reliability is not known in general. A recent computational and statistical study[6] has indeed issued a warning signal about the significance of proton positions in His residues in protein databases, whenever specific NMR data is not available. This study, based on local energy minimization of His side chain torsions using the Amber force field, showed that proton assignments appear to be no better than random. Thus it is currently difficult to extract from structural databases whether there is a strong preference for one of the tautomers of His in proteins, even with the restriction to the interior of proteins, where the solvent is not expected to have a strong influence. One result of the computational study is that there appears to be no general preference for one tautomer over the other. In order to understand in details the factors determining the relative stabilities of His tautomers, it is therefore of interest to study model systems.
The situation is simple in isolated His, for which 15N NMR, as well as other techniques, have established that the Nε–H tautomer is largely dominant.[3] This dominance has been discussed in terms of the intramolecular hydrogen bond which may establish between the Nδ and one or two of the N–H bond(s) of the ammonium group (note that in the following, we use the notation Nδ and Nε, with δ and ε as subscripts, for the imidazole nitrogen atom at the δ and ε position, respectively, so that Nδ–H and Nε–H denote the nitrogen–hydrogen bonds at these positions. On the other hand, Nδ–H and Nε–H, with δ and ε as superscripts, are used to describe the tautomers with the N–H bond at the δ and ε position, respectively). The same type of argument was found to explain the structure of the dipeptide HisGly, for which several crystal structures have been determined.[7] In the crystals of both the chloride[7a] and dichloride[7b] salts of HisGly (with one and two chloride ions, and singly and doubly protonated HisGly, respectively), there appears to be a strong hydrogen bond between the ammonium group and the imine Nδ of imidazole. The crystal structure of the hemihydrate of HisGly has also been obtained,[7c] in which the His side chain is not protonated; the structural parameters are consistent with the existence of a zwitterion, involving a N-terminus ammonium and a C-terminus carboxylate. Rather surprisingly, it was found that the crystal structure contains two peptide molecules per asymmetric crystal unit, in the two different tautomeric forms. One is the expected Nε–H tautomer while the second is the Nδ–H tautomer, involving H-bonding between Nδ–H and the C terminal carboxylate. Thus it appears that HisGly is a simple model in which the coexistence and possible competition between the two tautomers of His already exist.
For this reason, we have explored the conformational landscape of HisGly. In order to delineate in detail the interactions which influence the relative stabilities of the tautomers, we have also studied its dipeptide isomer GlyHis, for which there is no structural information available, to the best of our knowledge. Clearly such simple models lack the side chain–side chain interactions in which His imidazoles are often involved in proteins. Yet local interactions within small dipeptides are found to have a significant influence on the relative stabilities of the tautomers. We have also studied the sodium cation complexes of HisGly and GlyHis, since in a recent study of the Na+ complexes of a series of small peptides, we found that there exists a significant sequence effect on the binding affinities of HisGly and GlyHis to Na+.[8] Sodium is known to bind peptides mostly through their oxygen atoms, however, in GlyHis and HisGly, it is expected that the imidazole side chain can provide an additional binding site, in a way which will depend significantly upon the tautomer. In the present context, it is interesting to see if cation attachment may be a way to differentiate tautomers.
The detailed understanding of the conformation landscape of small biomolecules is a challenging task, both because there is generally a large to very large number of low-lying energy minima, and because energy barriers connecting these minima are often small, leading to several conformers being populated at room temperature. Various experimental methods are capable of yielding some structural data on such molecules, yet it remains very difficult to identify the lowest energy conformers with certainty, and to obtain an overview of the potential energy surface. Quantum chemistry is an efficient alternative to experimental methods to explore the conformations of flexible molecules of relatively small size, as it provides relative energies of conformers with good accuracy, and enables a thorough exploration of the potential energy surface. There has been ample work on the conformations of Gly in the 1990’s,[9] and other amino acids have also been studied, including alanine,[10] serine, cysteine,[11] valine,[12] proline,[13] glutamic acid[14] and arginine.[15] Several oligopeptides have also been studied.[16] Herein we use ab initio computations to establish the structures of the low energy conformations of the dipeptides GlyHis and HisGly in each of their possible tautomeric forms. Analogous work has also been carried out for the sodium cation complexes of the four isomers.
The number of weakly hindered rotors (seven covalent single bonds between non-hydrogen atoms) is large enough in GlyHis and HisGly that constructing structures based on chemical intuition solely is not appropriate. Thus we resorted to a non local exploration of the potential energy surface, as a preliminary step prior to local geometry optimization with the ab initio methods mentioned below. Monte Carlo sampling was carried out with the Metropolis criterion at 300 K, in which random values were generated for the torsion angles around all single bonds between non-hydrogen atoms except the peptide bond. In order to keep this procedure computationally tractable, the Amber 94 force field was used for energy calculations, with RESP atomic charges.[17] At least two independent searches were performed for each case. A limit of 1000 random tries or 500 geometry optimizations was set for each search. The first search was started from a β-strand like extended structure, while the second was started from the lowest energy structure found in the first, ensuring that the two would be largely different. Random sampling was systematically followed by local geometry optimization at the same level. However the Amber calculations lead to an energy ordering of conformers which deviates significantly from that obtained with accurate ab initio calculations. In order to locate all of the low lying structures, a large sample was selected from the Amber results and subjected to geometry re-optimization at the ab initio level, typically between 20 and 30 in each case (each tautomeric form of GlyHis and HisGly).
In one case, the Nε–H tautomer of HisGly, we first searched the potential energy surface without a Monte Carlo search, but rather with a combination of scanning the relevant torsion angles and selecting structures on the basis of maximizing hydrogen bond interaction. Then we carried out an MC search starting from one of the low energy structures previously found. This led structures of ranks 5, 10 and 1 at the optimized Amber level to insert into those previously obtained, and become structures of ranks 2, 6 and 18, respectively, after ab initio re-optimization and final energy calculations. Yet this procedure is too lengthy to be used in general, therefore we resorted to initial MC searches for all three other cases. In all cases, the lowest 15 structures at the ab initio level arose from structures in the lower half of the Amber set, which contained from 30 to 40 unique structures, depending upon the case. The experience gained from the Nε–H tautomer of HisGly led us to inspect all of the low energy structures finally obtained at the ab initio level, and change some torsions when it appeared that a related structure might be of lower energy based on the criterion of enhancement of the hydrogen bond network. We believe it is this combination of random and hydrogen bonding searches that may lead to an efficient determination of most, if not all, of the low energy conformers. Yet the present work does not aim at providing an exhaustive description of these very complex potential energy surfaces.
Ab initio calculations were carried out at levels which represent a compromise between accuracy and tractability for consideration of a significant number of structures. Geometries were optimized at the HF/6-31G(d) level, vibrational analyses were carried out at the same level to determine zero-point vibrational energies, thermal corrections to total energies, and entropies. Final energetics were determined at the MP2(full)/6-311+G(2d,2p) level using the HF/6-31G(d) geometries, a level of computation which has been previously shown to yield accurate energetics.[18] All results mentioned below are computed at this MP2(full)/6-311+G(2d,2p)//HF/6-31G(d) level except otherwise noted. Relative energies computed at the HF and MP2 levels were usually found to be in satisfactory agreement with each other (differing by less than 10 kJ mol−1, and often by less than 5 kJ mol−1). It turned out, however, that in some cases the differences were as large as 10–25 kJ mol−1. A careful inspection showed that such large differences occur when the conformers being compared have the C-terminal carboxylic acid in different conformations (cisvs. trans). Test calculations on acetic acid itself indicate that the cis-trans relative energy is 7 kJ mol−1 lower at the MP2(full)/6-311+G(2d,2p)//HF/6-31G(d) level, compared to the HF/6-31G(d) result. Moreover, low energy structures always bear a trans carboxylic acid that is H-bond donating, and this H bond is not very accurately described at the HF level.
Since the energy differences between the best structures of the two tautomers turned out to be very small for both dipeptides, additional calculations were carried out. The geometries of the most stable conformers were optimized at the MP2/6-31G(d) level, and final energetics were recomputed at the MP2/6-311+G(2d,2p)//MP2/6-31G(d) and MP2/aug-cc-pVTZ(−f)//MP2/6-31G(d) levels. In the latter “(−f)” indicates that the most diffuse f functions on C, N and O, and the most diffuse d functions on H have been dropped from the regular aug-cc-pVTZ set As described in the text, these more accurate levels lead to relative energies in reasonable agreement with the MP2(full)/6-311+G(2d,2p)//HF/6-31G(d) values. In particular, there was no change on the energetic ordering.
Monte Carlo calculations were carried out with HyperChem 6.0,[19] while the Gaussian03 package was used for ab initio calculations[20].
Several different ways are conceivable for the description of conformers, for each of the tautomers of HisGly and GlyHis. This is due to the presence of seven weakly hindered rotors in each species: the Cα–C(O) and Cα–N of the main chain of each residue, the Cα–Cβ and Cβ–Cring of the His side chain, and the C–OH bond of the C-terminus acid. Describing each of the conformers by the value of the torsion angle around each of these bonds would be rather tedious, although it would carry all of the information. We have chosen to define families using a hierarchy of structural criteria (descriptors), of which several are non local.
The first descriptor is the existence of a hydrogen bond between an atom of the main chain and the Nδ or Nδ–H of the imidazole ring. All of the low energy conformers determined in this work bear such a hydrogen bond; for instance for the Nε–H tautomer of GlyHis, the lowest lying conformer found without a hydrogen bond at Nδ is higher in energy than the most stable by 18 kJ mol−1. We denote the cyclic motif formed by such H bonds as “Cn”, meaning that the H bond generates a n-membered cycle. The various types of Cn structures, C6–C10, are depicted schematically in Scheme 2. The arrows in Scheme 2 are oriented from the hydrogen bond donor to the acceptor. For instance in the Nδ–H tautomers of HisGly and GlyHis, the Nδ–H bond can be a H-donor towards the carbonyl oxygen of His, generating a C7 motif. C7 motifs also exist for the Nε–H tautomers, in which the Nδ atom is a H-bond acceptor from an N–H bond of Gly in HisGly or the C-terminus OH bond in GlyHis. On the other hand, in none of the conformers is the Nε or Nε–H oriented in such a way as to engage in a H bond of any Cn type, because the dipeptide chain is too short.
The second conformation descriptor is the conformation of the Cn ring. The types of ring conformations encountered in low energy structures are presented in more detail in Fig. 1. C6 rings connect the Nδ position to the main chain nitrogen of His, which is the peptidic nitrogen in GlyHis and the N terminus in HisGly. In both Nε–H tautomers, there are two possible conformations, chair and half-chair (see Fig. 1). Note that the half-chair may be inverted, leading to a different energy since steric repulsions with the rest of the molecule are different. In the Nδ–H tautomer of HisGly, a chair conformation is formed by H bonding from the Nδ–H to the amino terminus. For GlyHis, the H bond would point to the peptidic nitrogen, but the rigidity of the peptide linkage precludes formation of the C6 ring in this case. Other Cn rings may be formed in several isomers, but the C7 is the only one which occurs in all. The various C7 possibilities have been introduced above. We find two conformations for the C7, half-chair-like and boat-like. Hydrogen bonding is favored when the N–H⋯OC segment is nearly planar in both types of conformation, since it allows the N–H bond to point approximately towards an oxygen lone pair, and the two bond dipoles to be oriented favourably for electrostatic stabilization. This restricts significantly the flexibility of the C7 motif. As shown in Fig. 1, formation of a C8 ring is possible only for the Nδ–H tautomer of GlyHis, connecting the Nδ–H bond to the carbonyl oxygen of Gly. We found two conformations for this C8 which may be loosely defined as half-chair-like and boat-like. As for C8, a C9 ring is compatible with one isomer only; in this case it is the Nε–H tautomer of GlyHis, in which the H bond connects Nδ to one of the N–H bonds of the amino terminus. The C9 ring is associated in some cases with a C6 ring, when the main chain is oriented in a way which also permits interaction of Nδ with the peptidic N–H bond. Finally, C10 rings exist in both tautomers of HisGly. In the Nδ–H tautomer it is due to a H bond between the Nδ–H bond and the oxygens at the C terminus, while in the Nε–H tautomer, the H bond is between the C terminus O–H bond of the trans conformation of carboxylic acid and Nδ. In both cases, we find two conformers, which are deduced from each other by a 180° rotation around the Cα–N bond of Gly (see Fig. 1). This leads to opposite orientations of the OC−N–H peptidic group with respect to the mean plane of the C10 cycle.
The third descriptor is the relative orientations of the peptidic OC–N–H plane with respect to that of the C terminal carboxylic acid. There are two rough relative orientations: either coplanar, defining a fragment of a β sheet, or perpendicular. Finally, the fourth descriptor is the relative position of imidazole and of the peptidic plane. “Open” conformers correspond to extended structures, in which the Gly residue and imidazole are distant, i.e. they cannot interact via H bonding. “Closed” conformers are such that such H bonds can be established, between the Nδ position of His and an atom of either the peptidic bond or of the C terminus. Altogether, each existing combination of these four descriptors defines a “family”. We have noted these families with capital roman numbers, in the stability order. For instance for the Nδ–H tautomer of GlyHis, the lowest energy conformer belongs to family “I”, which corresponds to the existence of a C7 ring, in a boat-like conformation, in a “closed” relative orientation of Gly and imidazole, and a perpendicular relative orientation of the peptidic and carboxylic planes. Within each family, there remain conformational differences. These are specified by the orientation of the terminal NH2 and COOH groups. The various possibilities encountered in low energy structures are gathered in Fig. 2, together with the numbering used hereafter. For instance in conformations “1” the NH2 terminus is a H bond acceptor towards the peptidic N–H, while in “2” it is a H bond donor towards the peptidic oxygen, etc. In conformations “a” the carboxylic C terminus is in its trans conformation and it is a H bond donor towards the peptidic oxygen, while in “b” it is cis, and a H bond acceptor from the peptidic N–H. There is some redundancy between these notations and the specification of the Cn rings, since, e.g., conformations 5 are C9 rings (see Fig. 2).
For all isomers, the most stable conformer of each family is depicted in Fig. 3–6. In order to illustrate the conformational flexibility offered by the NH2 and COOH termini, the six most stable conformations of the most stable family of the Nδ–H tautomer of GlyHis are shown in Fig. 7.
Low energy conformers of the Nδ–H tautomer of GlyHisThe relative energies of all conformers, grouped in five families according to the descriptors defined above, are gathered in the left part of Table 1. The structures of the most stable conformers for each of the first five families are shown in Fig. 3, while a series of conformers of family I are shown in Fig. 7. The best structure overall is I-1-ad, which bears a C7 ring with a H bond from the Nδ–H to the His carbonyl oxygen. In order to maximize H bonding, the carboxyl group is trans, enabling H bond donation from its O–H bond to the peptidic oxygen (which happens to be a C7 ring of another type, not used herein for structure specification). The so-defined orientation of the peptide linkage is such that its N–H bond may interact with the π cloud of imidazole. At the same time, the NH2 terminus is oriented in such a way as to be a H bond acceptor from the peptidic N–H. This structure is particularly stable since all heteroatoms are engaged in H bonding, except for the Nε (for which it is structurally impossible). The second most stable conformer is I-2-ad (see Fig. 7), 10 kJ mol−1 less stable than I-1-ad. The only difference between I-1-ad and I-2-ad is the orientation of the CH2NH2 at the N-terminus. In I-1-ad, the amino terminus is a H bond acceptor from the peptidic N–H, while in I-2-ad it is a double H bond donor to the peptidic oxygen. In all cases studied herein, we find the same energy ordering between these two orientations of the N terminus. The next conformer of the same family (the fourth most stable overall) is I-3-ad, which differs from I-2-ad by the orientation of the amino terminus, which interacts with the peptidic oxygen via a single N–H bond rather than both. The fact that this single N–H bond is better oriented towards the peptidic oxygen than either of the N–H bonds in I-2-ad does not compensate completely for the loss of one interaction, leading to a destabilization of 3 kJ mol−1 (i.e. 13 kJ mol−1 less stable than I-1-ad). Other conformers of the same family have the carboxyl group in the cis conformation. Although it is intrinsically more stable than the trans (by 22 kJ mol−1 in acetic acid), it does not allow simultaneous H bonding to the oxygen carbonyl on the one hand, and from the O–H bond on the other. The most stable of such conformers is I-1-d, which differs from I-1-ad only by the orientation of the OH group. It lies 17 kJ mol−1 higher in energy. Other conformers add to this another less favourable interaction relative to I-1-ad, such as the orientation of the amino terminus, or the hydroxyl oxygen instead of the more basic carbonyl oxygen as a H bond acceptor (see Fig. 7), and are more than 20 kJ mol−1 less stable than I-1-ad.
We now turn to the lowest energy conformers of other families. The most stable structure in family II, II-1-b, shown in Fig. 3, bears a C8 ring. Here it is the peptide carbonyl oxygen, rather than that of the carboxyl terminus, which interacts with the Nδ–H bond of imidazole. The peptide linkage is oriented in such a way as to allow H bond donation from the peptidic N–H to both the amino terminus and the carbonyl oxygen of the C terminus. It is 12 kJ mol−1 less stable than I-1-ad. The most stable structure in family III, III-1-b, has a C8 ring analogous to that in family II, however, in a boat-like conformation. The peptide linkage has the same orientation as II-1-b, and thus forms the same H bonds involving the amidic hydrogen. The most stable structure in family IV, IV-1-bd, has a C7 ring involving the C-terminus carbonyl oxygen as in family I, however, the ring has a half-chair-like conformation, while it is boat-like in family I. The two families also differ by the relative orientations of the peptidic and C-terminus planes (third descriptor above), perpendicular for family I and parallel for family IV. This parallel orientation is common with family II, and it allows again the peptidic N–H bond to interact with both termini. This leaves the peptidic carbonyl oxygen without a H bond, so that this structure is less stable than I-1-ad by 14 kJ mol−1. Family V has a C7 ring of the same type as that of family I, but they differ by their conformation: half-chair-like in I, and boat-like in V. With a cis carboxyl group leaving the OH bond without H bonding, the most structure of family V, V-1-d, lies 16 kJ mol−1 above I-1-a. Some additional, generally less stable, conformers of this species may be found in Table 1.
Low energy conformers of the Nε–H tautomer of GlyHisThe relative energies of all conformers are gathered in the right part of Table 1. The most stable conformers for each of the first seven families are shown in Fig. 4. As for the most stable conformer of the Nδ–H tautomer, the most stable structure (I-1-jb) involves a boat-like C7 ring, which is closed in this case by a O–H⋯N bond from the C terminus to the imidazole Nδ. For this to occur, the carboxyl group is in its trans conformation, another feature that is common to both tautomers. The main chain has parallel peptide and carboxyl groups, and has glycine and imidazole in an open arrangement. This allows the peptidic N–H to interact with both the C-terminus carbonyl oxygen and with the amino terminus. On the other hand, the open structure precludes any interaction of the peptide carbonyl, which may explain why this conformer is less stable than the best Nδ–H conformer (see Table 1; at the MP2/6-311+G(2d,2p)//MP2/6-31G* level, this difference is 6.5 kJ mol−1). The second most stable conformer (I-2-jb, not shown) resembles the first, the only difference being that the amino terminus is a H bond donor towards the peptidic carbonyl, rather than an acceptor from the peptidic N–H. Thus the lower portion of the conformational spaces of the Nδ–H and Nε–H tautomers are fairly similar. The third most stable structure is also the first of family II (II-1,5-b, see Fig. 4). It has a C9 ring connecting the amino terminus to the imidazole Nδ, and in addition, the relative orientations of the peptide linkage and of the carboxyl group, and of the main chain and the imidazole, enable the formation of a C6 ring between Nδ and the peptidic N–H. This structure is 7 kJ mol−1 higher in energy than I-1,5-jb. A structure analogous to that of II-1,5-b, with a C9 ring and planar main chain, is found with IV-1,5-b (the seventh most stable conformer in Table 2). However the C9 conformation is different, precluding the formation of a C6 ring. This conformer is only 3 kJ mol−1 less stable than II-1,5-b. It is likely that the loss of a H bond is partly compensated by a smaller strain, permitting better relative orientations of the amino terminus and the peptidic N–H for H bonding. A slightly more stable structure is the best conformer of family III, III-1-f. Its favourable features are a C6 ring and a H bond from the peptidic N–H to the amino terminus. However, none of its carboxyl oxygens can engage into H bonds, which is why this structure is 8 kJ mol−1 higher in energy than I-1-jb. Another structure shown in Fig. 4 is V-1-b, which has a stable, planar main chain skeleton, but which lacks a H bond to the imidazole Nδ. This leads to a high energy, 18 kJ mol−1 higher above I-1-jb. Other structures in Fig. 4, the most stable of families VI and VII, are of even higher energies. There are also conformers of intermediate energies, in the 12–20 kJ mol−1 range above I-1-jb, which are not described in detail here, but for which the specification of descriptors in Table 2 should be reasonably explicit.
Low energy conformers of the Nδ–H tautomer of HisGlyThe relative energies of all conformers found are gathered in the left part of Table 2. The structures of the most stable conformers for each of the eight families identified are shown in Fig. 5. The best conformer, I-1-h, has a boat-like C7 ring which involves the peptidic carbonyl, as compared to the carboxyl carbonyl in GlyHis. In addition, the perpendicular orientations of the peptidic linkage and the carboxyl terminus enable the latter to interact with the Nδ–H of imidazole, forming a C10 ring. The favourable orientation of the amino terminus towards the peptidic N–H, already described previously, also permits some interaction of one of its N–H bonds with the π cloud of imidazole. As seen in Table 2, this is the most stable conformer, however, it is slightly less stable than the best conformer of the Nε–H tautomer. The most stable conformer of family II, II-1-b (see Fig. 5), differs from I-1-h by the orientation of the Gly backbone, with the terminal carbonyl interacting with the peptidic N–H rather than with Nδ–H. This is a weaker interaction, essentially a dipole–dipole interaction of the CO and N–H bonds with antiparallel dipoles, rather than a H bond. As a consequence, it is less stable than I-1-h by 5 kJ mol−1. The next two conformers (not shown) are higher congeners of family I, differing from I-1-h only by the orientation of the COOH group. They are less stable than I-1-h by 6–8 kJ mol−1. The next most stable structure is the first of family III (III-4-h, see Fig. 5). It has a C10 ring as does I-1-h, but with a different conformation (of type a for III-4-h), in which the C7 ring no longer exists. Although the amino terminus –N–H peptidic bond interaction is maintained, together with that of one amino N–H with the π cloud of imidazole, the lack of the C7 ring leads to a destabilization of 12 kJ mol−1. Structure IV-1-h, the next higher conformer, presents a b-type C10 ring, no C7 ring, and has no amino N–H-imidazole stabilization. It is about 1 kJ mol−1 less stable than III-4-h. Several conformers belonging to the III, IV and I families follow with increasing energies, and the first members of families V-VIII (shown in Fig. 5) are all at least 20 kJ mol−1 less stable than I-1-h.
Low energy conformers of the Nε–H tautomer of HisGlyThe relative energies of all conformers found are gathered in the right part of Table 2. The structures of the most stable conformers for each of the seven families identified are shown in Fig. 6. The best structure, I-1-a, involves a C6 ring in a half-chair conformation (as do the next two most stable, I-1-f and I-1-b). The amino terminus is both a H bond donor, to Nδ, and an acceptor, from the peptidic N–H. As discussed above, the latter interaction is common in low energy structures. The carboxyl terminus adopts a trans conformation to act as a H bond donor to the peptidic carbonyl. Its other typical, cis conformation, is adopted in I-1-f, which is 7 kJ mol−1 less stable. Thus here again, the most stable structure involves a trans carboxylic acid. The first member of family II, II-1-i, is the fourth most stable overall. Its ring is a C10, and it is now the carboxyl terminus in its trans conformation which behaves as a H bond donor towards the Nδ. Again the amino terminus can accept H bonding from the peptidic N–H, however, both carbonyl oxygen are left without significant interaction, leading to an energy 10 kJ mol−1 higher than that of I-1-a. While the C7 ring was clearly the most favorable for the three previous isomers, in this case the only possibility to form a C7 is to bind the peptidic N–H bond to the Nδ, which is incompatible with H bond donation of the peptidic N–H to the amino terminus. Not only is the C7 not the best ring in this case, but the most stable conformer bearing a C7, III-2-b, is found to lie 15 kJ mol−1 higher than I-1-a. The first member of family IV, IV-3-b, has a conformation very close to that of III-2-b. They differ only by the C7 conformation (half-chair-like in IV-3-b). Another C7 occurs in family V, V-2-a, with the same C7 conformation as in IV-3. The difference consists in the perpendicular orientation of the acid function with respect to the peptidic plane (third descriptor). This orientation allows the carboxyl group in trans conformation to make a H bond to the peptidic carbonyl, instead of being cis and bind electrostatically with the peptidic N–H (as in IV-3-b). As a balance of these changes, the energies of these three conformers III-2-b, IV-3-b and V-2-a are very close and differ only by 3 kJ mol−1.
Amino acids are known to have zwitterionic structures in aqueous solution in the biologically significant range of pH (ca. 6–9). In the gas phase, the positive and negative charges cannot be as efficiently stabilized as they are in solution, and the zwitterions are expected to be much less stable. In fact, all available evidences point to structures without formal charges in the gas phase. Thus we expect that for dipeptides such as HisGly and GlyHis, zwitterions are not the most stable isomers either. Yet it was deemed necessary to check this issue computationally. The number of stable conformers for the zwitterions of the tautomers of HisGly and GlyHis is expected to be significantly smaller than for the “neutral” structures described above. Preliminary calculations indeed showed that many structures lead to collapse to non-zwitterions. As expected, the most favorable structures involve direct interaction between one of the carboxylate oxygens and one of the ammonium N–H bonds. For such motifs to be stable against proton transfer from the ammonium to the carboxylate, at least one of the charges must be stabilized by a second hydrogen bond, for instance O− by the imidazole Nδ–H, or N+–H by the imidazole Nδ. In all cases, the zwitterions were found to be much less stable than many conformers without formal charges. The smallest difference found between the best zwitterion and the best “neutral” conformer is for the Nε–H tautomer of HisGly, where it is 89 kJ mol−1. This zwitterion happens to have a structure that is fairly similar to that reported for the crystal structure of the hemihydrate of HisGly.[7c] Since all zwitterions are so high in energy, they are not described further here for the sake of brevity.
All of the low energy structures determined in this work (in a range of ca. 20 kJ mol−1) bear a Cn ring. Moreover, this ring is a C7 in the most stable conformers of three out of the four isomers. In these three cases, the ring conformation is boat-like. Clearly whenever the skeleton is compatible with such a motif, it is highly favorable. A feature that is common to all four lowest energy conformers, is the N-terminus conformation, in which the amino group is a H bond acceptor from the peptidic N–H bond. Finally, in three out of the four cases, maximizing the strength of hydrogen bonding leads to a trans carboxyl group being present in the most stable conformer, even though it is intrinsically much less stable than the cis conformer. As expected, though, the specificities of the various isomers are such that not all of the descriptors can take similar values for all most stable conformers. The relative orientations of the peptide linkage and the carboxyl group, the relative orientations of the Gly skeleton and the imidazole ring, and the cis or trans conformation of the carboxyl group, lead to structures that are significantly different, in order to optimize the network of hydrogen bonds.
Comparison of the energies of the most stable conformers, for the Nδ–H and Nε–H tautomers, indicate that the most favorable tautomer of GlyHis is the Nδ–H, while that for HisGly is the Nε–H. The energy differences are 4 and 1 kJ mol−1 for GlyHis and HisGly, respectively, at the MP2/6-311+G(2d,2p)//HF/6-31G* level, including ZPE correction. Such values are too small to draw a safe conclusion regarding the lowest energy tautomers, therefore they were recomputed with more accurate geometries and final energetics. At the MP2/6-311+G(2d,2p)//MP2/6-31G* level, the energy differences between the tautomers are found to be slightly increased, to 5 kJ mol−1 for both GlyHis and HisGly. At the MP2/aug-cc-pVTZ(-f)//MP2/6-31G* level, they increase again, and amount to 7 kJ mol−1 for GlyHis and 8 kJ mol−1 for HisGly. These results leave little doubt that the most stable tautomer is Nδ–H for GlyHis and Nε–H for HisGly.
The low energy conformation spaces for the sodium complexes of the four isomers are expected to be significantly restricted as compared to those of the free peptides, since previous studies[21] have shown that multidentate binding of Na+ is a strong stability factor in its complexes with amino acids and oligopeptides. For GlyGly, it was shown that the most stable structures involve ion interaction with both carbonyl oxygens, differing by the interaction, or lack thereof, with the amino terminus.[22] In GlyHis and HisGly, our results above suggest that for the Nε–H tautomers, the imidazole Nδ nitrogen may be an additional chelation site. It may also be anticipated that interaction of sodium with part or all of the imidazole π electrons will introduce an alternative binding capability, open to both types of tautomers. Because the number of low-energy structures is much less for sodium complexes than for free peptides, we do not introduce descriptors, and mostly describe the structures in terms of the sodium chelation sites and intramolecular H binding they involve.
Low energy structures of the Nδ–H tautomer of GlyHisNa+The principles summarized above are illustrated by the relative energies of the most stable structures (see Table 3 and Fig. 8): the most favorable chelation of sodium is to both carbonyl oxygens and the imidazole ring. Since strong binding to Nε is sterically impossible, the ion interacts with the π cloud of imidazole. In both I and II, Na+ is bound to both carbonyl oxygens, and sits above the plane of imidazole, with closer interaction with one of the CN bonds. The two structures differ by the orientation of the NH2 terminus: it interacts with the peptidic N–H in I, while it is a H-bond acceptor from the imidazole Nδ–H in II. The distances from sodium to its chelation partners are almost the same in both cases. I is more stable than II by 12 kJ mol−1. A third structure of the same type, III, differs from I by the terminal acid which is cis in I and trans in III, with the O–H bond pointing towards the peptidic nitrogen. Since the latter is not a particularly strong H bond acceptor, III is less stable than I by 20 kJ mol−1. A fourth structure, V, is also bound to sodium via the same sites as I, II and III. It now has the NH2 terminus interacting with the peptidic CO. As seen above for the conformations of the free peptides, this is less favorable for NH2 than interacting with the peptidic N–H. V is less stable than I by 28 kJ mol−1, and is essentially degenerate with structure IV, in which the peptidic CO is H-bound to the Nδ–H of imidazole rather than to sodium as in the previous cases. The remaining, less stable structures of this isomer (V, VI and VII) all have sodium bound to the two carbonyl oxygens only. In all cases, the peptidic N–H bond interacts with both the NH2 terminus and imidazole in varying orientations. These three structures have similar energies, 30–40 kJ mol−1 higher than that of I.
Low energy structures of the Nε–H tautomer of GlyHisNa+Here as for the Nδ–H tautomer above, the most stable structures involve chelation to both carbonyl oxygen (see Table 3 and Fig. 9). However the most favorable binding mode of Na+ to imidazole is no longer to the π cloud, but rather to the Nδ nitrogen. This tridentate binding is found in I, II, IV and VI. In the lowest three of these structures, the NH2 terminus is in its most favorable conformation, interacting with the peptidic N–H. I and II differ by the side chain conformation of His. In I it is such that imidazole is closer to the C terminus, while it is closer to the main chain and peptidic carbonyl in II. I and II are isoenergetic. II and IV share a similar main chain conformation, however, in IV the C terminus acid is in its less favorable trans conformation. This enables interaction of the acid O–H with the peptidic nitrogen, albeit not strongly enough to compensate for the acid destabilization. A better compromise is obtained in III, where Na+ interacts with the carbonyl oxygens and the N terminus nitrogen. Here the acid is trans as in IV, with the O–H now pointing towards the Nδ of imidazole. This leads to favorable H binding with a O–H⋯N bond angle of 164°. Yet this structure is less stable than I by 14 kJ mol−1. Another type of intramolecular H bond is found in V and VII, in which it is the peptidic N–H which binds to the Nδ. These two structures have very similar backbones, with both carbonyls bound to Na+. They mostly differ by the conformation of the N terminus, and it is the interaction of NH2 with Na+ (in V) rather than with the peptidic N–H (in VII) which is the most favourable. Finally, VI differs from II by the NH2 conformation, and as seen several times before, H donation to the peptidic CO is less favourable than H acceptance from the peptidic N–H, by 25 kJ mol−1 in this case.
Low energy structures of the Nδ–H tautomer of HisGlyNa+The results are summarized in Table 4 and Fig. 10. As for the Nδ–H tautomer of GlyHisNa+, interaction of the ion with the Nε of imidazole is not possible, therefore Na+ interacts with the π cloud. However in this case, a significant difference may be seen: the most stable structure does not involve Na+ chelation to both carbonyls, but rather to the peptidic carbonyl and the NH2 terminus, and sits on top of the ring. This arrangement allows for efficient H bond donation from the Nδ–H to the C terminal carbonyl oxygen, with a relatively short H⋯O distance of 2.03 Å. More importantly, the N–H⋯O and H⋯OC angles are 160 and 143°, respectively. The reason why the Nδ–H tautomer of HisGlyNa+ is unique is that this H bond involves a C10 cycle (see Scheme 2) with enough flexibility to allow for nearly optimum N–H⋯OC orientation. In contrast, in the GlyHisNa+ Nδ–H tautomer, there is only a C7 or C8 possible between the Nδ–H and the terminal or peptidic CO, respectively (see, e.g., isomers IV of GlyHisNa+ in Fig. 8 with N–H⋯O and H⋯OC angles of 145 and 129°). In the Nδ–H tautomers of GlyHisNa+ and HisGlyNa+, the Nε–H bond cannot form any significant interaction; the Nδ–H bond can indeed engage into H bonds (see, e.g., isomers III of GlyHisNa+ in Fig. 9), but this can only occur at the expense of imidazole no longer being a sodium ligand. Therefore it is only in Nδ–H HisGlyNa+ that such imidazole N–H bonding can compensate for the loss of a carbonyl ligand to Na+. All other structures have Na+ bound to the two carbonyl oxygens and to the π cloud of imidazole. They differ by two main characteristics: (i) the conformation of the side chain of His is such that imidazole is closer either to the C terminus (as in II and IV) or to the peptidic chain (as in III and V), and (ii) the orientation of the amino terminus. The latter is a H bond acceptor from the imidazole Nδ–H in II and III, from the Cε–H in IV and provides a fourth chelation site to Na+ in V. These four structures span an energy range of less than 20 kJ mol−1, while II is 6 kJ mol−1 less stable than I.
Low energy structures of the Nε–H tautomer of HisGlyNa+The results are summarized in Table 4 and Fig. 11. As with the Nε–H tautomer of GlyHisNa+, strong interaction of the ion with the Nδ of imidazole is possible, so that the most favorable chelation of sodium is to both carbonyl oxygens and the Nδ in I. Yet the next two structures in stability order, II and III, have Na+ bound to Nδ, the peptidic oxygen and the amino terminus, differing only by the conformation of the C terminus. Both are very close in energy, 22 and 24 kJ mol−1 higher than I. Structures IV and VI have the same tridentate chelation as I, however, now the conformation of His side chain is such that imidazole is closer to the peptidic chain, rather than to the C terminus as in I. As a consequence, the amino terminus is a H bond donor to the peptidic oxygen in IV and VI, rather than being an acceptor from the peptidic N–H as in I. These differences lead to IV and VI being less stable than I by 26 and 32 kJ mol−1, respectively. Finally, Na+ is only bound to two sites in V and VII. V is more stable than VII by 8 kJ mol−1, because its two ligands are the two carbonyl oxygens. As seen for other cases previously, it has a trans acid which allows for hydrogen bonding, to the Nδ in this case. Yet with only two chelation sites to Na+, it is less stable than structure I by 31 kJ mol−1.