1
Using real-valued multi-objective genetic algorithms to model molecular absorption spectra and Raman excitation profiles in solution

2
The empirical modeling of the absorption spectra and resonance Raman excitation profiles of a large molecule in solution requires adjustment of a minimum of dozens of parameters to fit several hundred data points.

3
This is a difficult optimization problem because all of the observables depend on all of the parameters in a highly coupled and nonlinear manner.

4
Standard nonlinear least-squares fitting methods are highly susceptible to becoming trapped in local minima in the error function unless very good initial guesses for the molecular parameters are made.

5
Here, we demonstrate a method that employs a real-valued genetic algorithm to force a broad search through parameter space to determine the best-fit parameters.

6
The multiobjective genetic algorithm is successful at inverting absorption spectra and Raman excitation profiles to determine molecular parameters.

7
When vibronic structure is evident in the absorption profile, the algorithm returns nearly quantitative results.

8
For broad, featureless profiles, the algorithm returns the correct slope of the excited state surface but cannot independently determine the excited-state frequency and the equilibrium geometry change.

9
Compared with manual adjustment of parameters to obtain a best fit, the genetic algorithm is computationally less efficient but requires less human time.

Introduction

10
Optical absorption spectra and resonance Raman excitation profiles contain information about a variety of important molecular parameters including the excitation energy, transition moment, excited state vibrational frequencies, and geometry changes along each coupled normal mode.

11
We1–4 and others5–8 have utilized Raman excitation profiles to quantify excited state parameters and deepen our understanding of electronic processes.

12
Fig. 1 illustrates a related pair of optical spectra and molecular parameters.

13
Chemical physicists have developed powerful theoretical and computational techniques that allow spectra to be calculated given knowledge of the potential energy surfaces, dynamics, and radiation-matter couplings of a particular system.

14
Spectroscopists address the related problem of measuring the spectra and inverting the spectral observables to extract the physical parameters.

15
Within a given theoretical framework, it can be quite easy to calculate the optical spectra of even very large polyatomic molecules in the condensed phase.

16
However, it is difficult to objectively find the set of parameters that best fits the data, and it is particularly hard to justify those instances when it is necessary to go beyond the simplest assumptions and consider Duschinsky rotation or vibrational coordinate dependence of the electronic transition moment, for example.

17
Traditionally the “intelligently guided random search” has been employed by our group2,9–12 and others.13–19

18
This method involves guessing a set of parameters, calculating the spectra, comparing with experiment, modifying the parameters, recalculating the spectra, and continuing iteratively until a best fit between simulation and experiment is found.

19
Since the spectra depend on the molecular parameters in a highly nonlinear fashion and the Raman data usually carry significant uncertainties, converging on a best fit set of parameters is slow, and it is rarely possible to be certain that the best fit has been found or that the best-fit parameters are unique.

20
In this paper, we use real-valued genetic algorithms to automate this iterative process of parameter optimization and simultaneously determine the uniqueness of the best-fit parameters.

21
Genetic algorithms have been applied to a variety of problems in chemistry and physics that involve optimization in a large parameter space.

22
These include determination of mechanisms and rate coefficients for complex reactions,20 determination of structures from NMR spectra,21,22 molecular and cluster geometry optimization,23–26 parameter optimization in semiempirical electronic structure methods,27 numerical solution of the Schrödinger equation,28 optimization of vibrational force fields,29 and modeling of vibronic30,31 and rovibronic32 molecular spectra.

23
Previously in our group, the absorption spectra of molecules with one vibrational mode were used to quantitatively determine molecular parameters in a three-step process.31

24
First, a neural network which had been trained to associate spectral patterns with their underlying molecular parameters was used to obtain an estimate of the relevant parameters.

25
This initial guess was used to select the range of parameter space to be searched by a micro-genetic algorithm.

26
The range was discretized into 210 bits in each dimension, and a random population was allowed to evolve toward the solution.

27
Since convergence of the population slowed as it neared the extremum, a traditional Levenberg–Marquardt nonlinear least squares search was grafted on to locate the true minimum.

28
The method worked well, but training the neural network is a slow process which does not scale well for larger systems and the Levenberg–Marquardt steps add an extra layer of computational complexity and time.

29
Our goal here is to develop a stand-alone genetic algorithm which can efficiently solve the spectral inversion problem for sets of absorption spectra and resonance Raman excitation profiles.

30
In Section 2 we describe a theoretical framework for calculating absorption spectra and Raman excitation profiles from fundamental molecular parameters, provide a physical interpretation of the parameters and define the objective functions which we seek to minimize in determining the best fit.

31
We develop the simplest implementation of a real-valued genetic algorithm in Section 3 which is designed to meet a single objective (e.g., fitting the optical absorption spectrum).

32
The algorithm is successful at quantitatively fitting absorption spectra for one-mode systems.

33
As the number of modes increases, the “uniqueness” of the fit is sacrificed but a range of parameters which provide quantitative fits can still be determined.

34
In Section 4, we extend our treatment to multiobjective problems in which there are several properties of interest (e.g., the absorption spectrum and Raman excitation profiles for several fundamental, overtone and combination bands) which depend on the same set of parameters in different ways.

35
The inclusion of Raman data enables reliable parameter optimization for multimode systems.

36
The largest calculations attempted describe a simplified p-nitroaniline molecule with five modes coupled to the electronic transition.

37
Finally, in Section 5 we comment on the strengths and weaknesses of genetic algorithms for inverting spectroscopic data.

Electronic spectroscopy

38
The traditional approach to calculating Raman excitation profiles involves approximations to the energy-frame Kramers-Heisenberg-Dirac sum-over-states formula.

39
These calculations quickly become intractable with even a few degrees of freedom when there are significant Franck–Condon displacements.

40
Instead, working directly in the time domain exploits the short-time nature of Raman scattering and leads to efficient calculations: The propagation time required to determine the Raman cross sections stays constant, or even decreases, as the number of degrees of freedom and displacements grows.

41
The computational method is based on the time-dependent formulation of Raman scattering33 which expresses the exact Raman amplitude in terms of a half-Fourier transform of the overlap between the initial ground state wavepacket propagating on the excited state surface |ϕi(t)〉 and the final stationary state |ϕf〉.

42
The scattering amplitude is given by where the frequency Ω = ωi + ωL − ω0–0 is the vibrational frequency of the initial state plus the incident laser frequency minus the purely electronic transition frequency.

43
The time correlation function is modulated by a solvent-induced spectral broadening g(t).

44
In the most naïve approximation, this damping can be modeled as a simple Lorentzian function governed by an electronic dephasing rate Γ, g(t) = Γt.

45
We employ an overdamped Brownian oscillator in the high temperature limit34,35 to more carefully describe solvent interaction.

46
Following refs. 36, 37 and 38, we assume that the initial population is in the vibrational ground state and for clarity consider a single coupled vibrational mode, where q is the ground-state dimensionless normal coordinate with conjugate momentum p. When the molecule is excited, the wavefunction propagates along the upper surface according to the excited state Hamiltonian, |ϕi(t)〉 = exp(−iHet/ħ)|ϕi〉 with where q′ is the excited-state normal coordinate.

47
The excited state surface is displaced Δ′ along the normal coordinate q′.

48
The general expression for the final state is where Hn is the nth Hermite polynomial with n = 0 for absorption, n = 1 for fundamental Raman scattering and n = 2 for overtone Raman scattering.

49
The explicit form of the overlap of the propagated initial wavepacket and the final stationary state is whereω± = ωe ± ωgHere Δ is the displacement in ground-state dimensionless normal coordinates (Δ = (ωg/ωe)1/2Δ′).

50
For molecules with more than one coupled vibrational mode, the total overlap function is a simple product of the single-mode overlaps as long as the ground- and excited-state normal coordinates are parallel (no Duschinsky rotation).

51
For comparison to experiment, we calculate the Raman cross section according to where ωS is the frequency of the scattered photon and M is the electronic transition length integral, assumed independent of vibrational coordinate.

52
A similar expression can be derived for the absorption cross section: where n is the solution refractive index.

53
Absorption is governed by the real part of the Fourier transform of the time autocorrelation function damped by solvent interaction.

54
Many generalizations of these expressions have been developed37–39 which allow for greater complexity in the system including thermal population of initial vibrational levels, Duschinsky rotation, and coordinate dependent transition moments.

55
In this work, the condensed-phase optical absorption spectrum and resonance Raman excitation profiles of a molecule having a single, isolated electronic transition between two harmonic potential energy surfaces are considered.

56
The calculations are performed within the Condon approximation (no vibrational coordinate dependence of the electronic transition moment), and it is assumed that the ground and excited state normal modes are the same (no Duschinsky rotation).

57
Different frequencies in the ground and excited electronic states are allowed.

58
The ground state vibrational frequencies are assumed to be known from IR or Raman spectroscopy.

59
The solvent-induced vibronic line broadening function is modeled by a single Brownian oscillator in the overdamped, high temperature limit35 with the lineshape parameter fixed at κ = 0.05.

60
Within these approximations, the absorption spectrum and Raman excitation profiles can be exactly determined given the following parameters.

61
(i) Δ, the displacement for each normal mode between ground and excited state potential minima in ground state dimensionless normal coordinates which is needed to calculate the correlation function.

62
(ii) ωe/ωg, the ratio of the excited and ground state vibrational frequencies for each normal mode which is needed to calculate the correlation function.

63
(iii) Γ, the vibronic linewidth which is incorporated in the solvent-induced broadening governed by the Brownian oscillator damping function g(t).

64
(iv) ω0–0, the purely electronic transition frequency which governs the position along the frequency axis.

65
(v) M, the electronic transition length which determines the integrated intensity (oscillator strength).

66
The goal of this work is to invert experimental spectra to determine these parameters for systems with several coupled vibrational modes.

67
While the spectroscopic model employed to generate the calculated spectra involves common approximations suited to our molecular systems, the time-dependent approach to determine the absorption and Raman excitation profiles does not generally require them.

68
The algorithm presented here could easily be extended beyond these approximations to allow, for example, vibrational coordinate dependence of the transition moment, inhomogeneous broadening, or more than one contributing resonant excited state.

69
The concomitant increase in the number of molecular parameters would, however, place even more stringent requirements on the quality and quantity of the Raman data.

70
To monitor the progress of our fit, we compare the target spectra (analogous to experimental data) to calculated spectra which correspond to a particular set of parameters {ai}.

71
The error associated with a calculated spectrum is governed by the mean absolute difference between calculated and target spectra where σcA,R(ωj) and σtA,R(ωj) are the calculated and target absorption or Raman cross sections at frequency ωj, and N is the number of points in the target spectrum.

72
Our choice of the absolute error in eqn. 12 instead of the relative error (where each term in the sum is divided by σtA,R(ωj) as used in ref. 31) does not overemphasize the lowest intensity points where experimental error may be more significant in the Raman excitation profiles.

73
The goal of the optimization is to find the {ai} which minimize the error functions.

Genetic algorithms

74
Genetic algorithms40–43 are global optimization methods that mimic the mechanisms of evolution: reproduction, natural selection, and diversification.

75
In biological systems, genetic information is encoded in the quaternary alphabet of DNA base pairs.

76
Sequences of base pairs comprise individual genes which correspond to traits in the developed organism; the genes are grouped into chromosomes.

77
Through meiosis, the parental chromosomes crossover and recombine to form genetically unique children with qualities inherited from both parents.

78
Random mutations further diversify the next generation though these events are infrequent and rarely beneficial.

79
Children develop and compete for resources and the opportunity to mate.

80
Only the most fit phenotypes are expected to survive and propagate; they represent better solutions to adaptive problems.

81
For the current problem of fitting molecular spectra, we consider the molecular parameters described in Section 2 to be analogous to the genes which determine each individual's characteristics.

82
The phenotypes are the calculated spectra, analogous to the physical and behavioral characteristics of biological organisms.

83
The fitness of each individual is a measure of its similarity to the target spectrum.

84
Only the best solutions are allowed to reproduce to drive the population toward the optimum solution.

85
In this section, we detail our application of genetic algorithms to invert a target absorption spectrum, that is, to derive the molecular parameters that produce it.

86
The extension to include resonance Raman data is described in Section 4.

Encoding

87
In recasting the optimization problem in terms of an evolutionary system, we kept the problem conceptually close to the real parameter space by constructing each chromosome from N floating-point numbers which directly correspond to the N parameters.

88
Traditional genetic algorithms encode each parameter in an n-bit binary gene which has the unphysical property of assigning equal importance to (for example) the most and least significant bits.

89
Real-valued genetic algorithms have been shown to be robust, accurate, and efficient.44–47

90
A further advantage for this application is the relative compactness of N floating point numbers as compared to N × n zeros and ones.

91
For complex systems with several parameters, the arrangement of genes within the chromosomes is important.

92
Genetic algorithms are believed to rely on the conservation of successful patterns to guide the evolutionary process.40

93
Successive evolutionary steps juxtapose these independent “building blocks” to construct the optimal solution.41

94
The variables of our system are highly correlated and do not act independently in determining the phenotype.

95
When epistasis, or interaction among design parameters, occurs, convergence of the genetic algorithm is much more difficult.48

96
Grouping entangled genes in the chromosome promotes the formation of robust, quasi-independent building blocks which have a high probability of being propagated to the next generation.

97
For the conjugated organic molecular systems of interest to our group, we chose the parameter ranges shown in Table 1 for the normal-mode displacement, the relative excited-state frequency and the vibronic linewidth.

98
Systems with resolved vibronic progressions and relatively narrow linewidths are best described in terms of the displacements and frequencies of each electronically coupled normal mode individually {Δ1,ω1,Δ2,ω2…}.

99
When the inner-sphere reorganization energy or vibronic linewidth increases, the vibronic progression becomes difficult to resolve.

100
Physically, the spectrum becomes more appropriately described in terms of the overall reorganization energy (width) and shape (asymmetry) of the curve.

101
To speed convergence, we recast the variables Δ and ωe/ωg into the mode-specific inner-sphere reorganization energy λ = ω2eΔ2/2ωg and a shape factor ρ, the ratio of Δ2 and ωe/ωg.

102
The overall width is a collective function of the reorganization energies and the linewidth.

103
Convergence is accelerated by forming parameter blocks of the collective shape and width {ρ1,ρ2,…,λ1,λ2,…}.

104
The comparison of the target and calculated spectra is conducted in an energy window which extends from 2500 cm−1 below to 5000 cm−1 above the target spectrum maximum at 50 cm−1 intervals.

105
To guarantee that the absorption features of the calculated spectrum occur within the observation window we adjust the pure electronic excitation frequency, ω0–0.

106
For systems with well-resolved vibronic structure and a dominant peak, the maxima of the target and calculated spectra are aligned.

107
All other spectra are adjusted such that the first moments of the calculated and target spectra coincide.

108
In both cases, we equate the integrated intensity of the absorption curves within the window to determine the transition moment.

Evolution

109
Upon randomly generating an initial population, we assume that the genetic pool contains the solution, or a better solution, to the adaptive problem of fitting the spectrum.

110
However, the solution is not “active” because the optimal genetic combination is split among several members.

111
Just as in nature where individuals more successful in competing for resources are more likely to survive and propagate their genetic material, the algorithm's fitness function transforms the performance of each member into an allocation of reproductive opportunities.

112
The random crossover of these “most fit” partial solutions enables the algorithm to approach, and eventually find, the optimum.

113
Mutations allow the emergence of new configurations which widen the pool and improve the chances of finding the optimal solution.

114
This process of evaluation, selection, recombination and mutation forms one generation in the execution of a genetic algorithm.

115
The fitness of each member of the population is inversely proportional to the objective function defined in eqn. 12.

116
We take The simplest algorithm for selection maps the population onto a roulette wheel where each individual is represented by a space proportional to its fitness as in Fig. 2.

117
Individuals are entered into the mating pool by repeatedly spinning the roulette wheel.

118
This method leads to genetic drift, or a loss of population diversity, which is overcome using stochastic universal sampling.41

119
To prevent the domination of a single individual, after each spin of the wheel several (in our case, four) parents are selected and copied into the mating pool.

120
Parents are randomly chosen from this intermediate population to reproduce and are then discarded.

121
In nature, crossover occurs when two parents exchange parts of their corresponding chromosomes.

122
The most common approach for recombination of two parents represented by a vector of real numbers is flat crossover, a gene-by-gene weighted average, to give two children.

123
Each gene in the children ci is expressed as a linear combination of their parents, where γ is a random number on the interval [0,1].

124
Flat crossover (see Fig. 2) explores more of the parameter space than the more traditional single-point crossover.

125
To maintain some consistency between successive generations, 20% of the population is generated by asexual reproduction, or with γ = 0.

126
Since the selection and crossover processes are stochastic, there is no guarantee that the algorithm will monotonically approach the optimum.

127
To prevent backward steps, the best 10% of the solutions (the elite members49) are allowed to survive into the next generation.

128
The difference between asexual reproduction and survival of elite members is that elite chromosomes do not undergo mutation.

129
Successful convergence depends on a delicate balance between exploration of the parameter space and exploitation of good solutions.

130
This balance is governed by the mutation operator.

131
Fitness proportionate selection exploits fit members to generate fit children.

132
To avoid premature convergence to a non-optimal solution, and maintain population diversity, we use a dynamically updated mutation operator.50

133
Each gene has an adaptive mutation rate δ normally distributed about zero according to ±δ = 1 − exp{−(0.5 − x)2/2.373}which takes a random x on the interval [0,1] and maps it to a percent deviation between 0 and 10% of the allowed parameter range.

134
We choose δ to be negative if x < 0.5.

135
Every ten generations, adaptation occurs by narrowing the allowed parameter range to the parameter average plus or minus two standard deviations.

136
Generally, an at-first-sight needlessly large search region is sampled by the initial population to avoid missing the global minimum.

137
During the optimization, we narrow the search toward the most promising region of the parameter space to increase the role of exploitation as the optimum is approached.

Results

138
A simple test of the efficacy of our algorithm is to construct target spectra with known parameters within the ranges given in Table 1 and then to invert the spectra.

139
Comparing the actual parameters to those returned by the algorithm affords a measure of our success.

140
As expected, systems with few parameters and substantial vibronic structure are easiest to invert.

141
For systems with a single mode coupled to the electronic transition, we always achieve a quantitative fit to the absorption spectrum for the parameter ranges in Table 1.

142
Six randomly generated target spectra are shown in Fig. 3; the corresponding parameters are the bold entries in Table 2.

143
To invert these spectra, the population size was fixed at 30 chromosomes which were allowed to evolve for 30 generations.

144
The calculated spectra are superimposed on the target spectra in Fig. 3 and no differences are visible.

145
Comparing the parameters (plain-type entries in Table 2) demonstrates that inversion of the spectra yields ∼1% error in the parameter values.

146
Comparable results were obtained in our previous work31 by combining an integer valued genetic algorithm with Levenberg–Marquardt refinement.

147
As we move to larger systems, the inversion becomes more difficult.

148
For example, the width of the absorption spectrum is governed by both the inner-sphere reorganization energy, a function of Δ and ωe/ωg, and the outer-sphere reorganization energy which depends on Γ.

149
The non-separability of the parameters in the model plagues optimization techniques which typically rely on parameter additivity.

150
In the one-mode case this “epistasis” does not hinder inversion of the spectrum.

151
For multi-mode systems, the increase in both the number of parameters and the complexity of the relationship between them place much more stringent demands on the optimization technique.

152
It is also in this regime where an automated, objective fitting algorithm is most necessary.

153
A “difficult” target absorption spectrum of a two-mode system was synthesized (closed circles in Fig. 4) from the parameters shown in the bottom, bold line of Table 3.

154
This absorption spectrum is relatively broad and the vibronic patterns of the individual normal modes cannot be distinguished.

155
Four attempts to invert this spectrum were made.

156
In each, the 60 member population was allowed to evolve for 60 generations.

157
The results are shown as the solid lines in Fig. 4 with the corresponding parameters entered in Table 3.

158
While it is only in the inset that the small errors in the calculated spectra are visible, the calculated parameters do not correspond to the target parameters.

159
Generally, the algorithm fails to correctly invert spectra which are broad or lack clear vibronic progressions because the fits are not unique.

160
As the number of parameters increases and the resolution of spectral features decreases, several parameter sets produce nearly identical, good fits to the absorption spectrum.

161
More data are required to find a single best solution.

Multiobjective genetic algorithm

162
More insight into the molecular parameters may be achieved with resonance Raman intensities.1–8

163
For each mode coupled to the electronic transition, an experimental profile of the Raman cross section vs. laser frequency may be obtained.

164
Each resonance Raman profile isolates the contribution of a single vibrational mode to the often unresolved vibronic structure of the absorption.

165
Incorporating these data into the calculation of a fitness function is not straightforward, as typically there are many fewer data points with much higher uncertainties than in the linear absorption spectrum.

166
A direct statistical combination of the absorption and Raman data would fail to assign enough importance to the Raman experiments.

167
These data are incommensurate (cannot be measured on the same scale) and should not be combined into a single objective function.

168
In this section, we describe an algorithm which simultaneously optimizes several objectives.

169
We begin by discussing multiobjective problems and defining Pareto-optimality.

170
Next, we present a modified version of the strength Pareto evolution algorithm.

171
Finally, we show results for a series of randomly generated target spectra which typify the spectroscopy of conjugated organic molecular materials.

Background

172
Real-world design problems require the simultaneous optimization of multiple objectives which are often incommensurate, and sometimes competing.

173
These multiobjective problems typically do not have a single utopian solution.

174
Instead, they are solved by determining the tradeoff, or Pareto-optimal surface.

175
Solutions along this surface are optimal in the sense that no other solutions in the search space are superior to them with respect to every objective.

176
To make the conditions of Pareto-optimality mathematically rigorous, we state that chromosome if the fitness of x is greater than or equal to the fitness of y with respect to every objective, and that the fitness of x is greater than the fitness of y for at least one objective.

177
Fig. 5 shows the approximate Pareto-optimal surface of a two-dimensional optimization problem as a solid line at the lower left of each panel.

178
Nondominated solutions are indicated with solid circles, while dominated chromosomes are shown as open circles.

179
As the system evolves, the nondominated front approaches the Pareto-optimal surface.

180
For an arbitrary dominated chromosome, in the upper-right-hand panel, the dominated solutions are shown in light gray while the dominant solutions are shown in a darker gray.

181
In general, the goal of multiobjective problems is to find as many nondominated solutions as possible along the Pareto-optimal surface and then to allow a higher-level decision maker, who may possess additional information, to choose among them.

182
Since the absorption spectrum and Raman excitation profiles are experimental data from the same molecule, they must be produced by a common set of molecular parameters–there should be a single best solution.

183
However, there need not be a utopian solution if approximations such as separable harmonic vibrations are included in the model, or when spectral noise is considered.

184
The range of good solutions will provide information about the error tolerance for each molecular parameter and the appropriateness of the model.

185
The absence of an adequate fit indicates either an insufficient model or experimental noise.

186
A problem which illustrates the Pareto-optimal surface can be artificially constructed by adding 25% rms noise to the Raman excitation profiles of a hypothetical two-mode system.

187
Since there is no longer an optimal solution which would generate the exact absorption and Raman spectra, we may examine the algorithm's ability to approach and sample the Pareto-optimal front.

188
(The algorithm is presented in detail in Section 4.2).

189
For this example, we fixed the population size at sixty chromosomes, each composed of seven floating point numbers {Δ1,ω1e/ω1g,Δ2,ω2e/ω2g,Γ,ω0–0,M}, and allowed the population to evolve for sixty generations.

190
The two, simultaneous objectives are (1) to fit the absorption spectrum and (2) to fit the “noisy” Raman excitation profiles of the two coupled modes.

191
The evolution of the population toward the Pareto-optimal surface is shown in Fig. 5 where each chromosome is represented by a point in fitness space (OA,OR).

192
Noise added to the Raman excitation profiles obviates a perfect solution which would be located at the origin.

193
Initially, the nondominated chromosomes (filled circles) approach and spread out along the Pareto-optimal surface.

194
As evolution continues, the dominated chromosomes (open circles) approach the nondominated front.

195
Even with fitness sharing51–55 (see Section 4.2) to minimize genetic drift toward a specific region of the tradeoff surface, the converged solution has reduced diversity.

196
The two extreme nondominated solutions are designated by a star and asterisk with their corresponding spectra shown in Fig. 6 and parameter values in Table 4.

197
The best solution with regard to absorption is shown in the bottom panel while the best solution with regard to the Raman excitation profile is shown in the top panel.

198
There is a clear trade-off in optimizing the two objectives.

199
Instead of allowing the inversion algorithm to choose between these extreme solutions, we allow the decision maker to examine all of the nondominated solutions within the parameter space.

200
We find that neither solution shown in Fig. 6, and none of the other solutions along the Pareto-optimal surface, fit both objectives well.

201
Generally, we can use this negative result to assert that either the data are noisy or the model is inadequate!

Algorithm

202
Genetic algorithms are particularly well suited to multiobjective optimization.56–58

203
Multiple individuals can search for multiple solutions in parallel.

204
By incorporating the concept of Pareto-domination into the selection procedure, and applying a niching pressure to spread the population out along the tradeoff surface, the genetic algorithm described in Section 3 can be modified to handle multiple objective functions.

205
Below, we present a modified version of the strength-Pareto evolution algorithm.53,59

206
An initial population of P chromosomes is randomly generated within the allowed parameter space.

207
With a single objective we are able to unambiguously order the chromosomes in terms of their fitness and allocate reproductive opportunities accordingly.

208
With several objectives, the population is not totally ordered since chromosomes can be indifferent to one another.

209
To order these systems, we define the concept of strength to represent the number of chromosomes a particular individual dominates.

210
As in Fig. 6, nondominated chromosomes do not generally have the same strengths.

211
To give the nondominated individuals the same reproductive opportunities, we define the raw fitness of a chromosome using the sum of the strengths of all the chromosomes which dominate it.

212
The negative sign guarantees that the raw fitness of dominated chromosomes is less than that of nondominated chromosomes which all have R = 0.

213
Fitness sharing is incorporated to spread the population out along the tradeoff surface by adding large “niching penalties” to chromosomes in crowded regions of parameter space.

214
The local density is defined in terms of σ*i, the distance from chromosome i to its √Pth nearest neighbor with P denoting the total population size.

215
We measure the distance between chromosomes i and j in genotype (parameter) space according to σij = ∑n|ai,n − aj,n|/rn so that the distance along each coordinate n is normalized by its allowed range, rn.

216
The density around each individual is D(i) = 1/(σ*i + 2) where the 2 in the denominator ensures D(i) < 1.

217
Subtracting D(i) from the raw fitness yields the overall fitness F(i) = R(i) − D(i).

218
The principle of natural selection dictates that individuals should be given reproductive opportunities according to their fitness.

219
Since a ranking scheme is used to provide the strength-Pareto fitness, we cannot base our selection criteria on the magnitude of the each individual's fitness.

220
Instead, we force the chromosomes to compete: Binary tournament selection with replacement is used to select parents.

221
Each member of the mating pool is selected by randomly choosing two members of the population, comparing their fitness, copying the winner into the pool, and replacing both members into the population.

222
Once the mating pool is full, pairs are randomly chosen, allowed to reproduce, and discarded.

223
To prevent backward steps in evolution, we again allow elite members of the population to survive from one generation to the next.

224
The top ten percent of the solutions are chosen based on their raw fitness value R(i).

225
In the case of a tie, the most redundant solution, the solution closest in genotype space to the other candidates, is iteratively removed from the list of elites.

Results

226
To test our multiobjective genetic algorithm we construct target spectra with known parameters, invert them, and compare the calculated parameters to those used to construct the target spectrum.

227
In our study of the single-objective algorithm, we found that as the number of parameters increased the absorption spectrum failed to provide enough information to uniquely determine the molecular parameters which generated it.

228
In this section, we will show that including data from Raman excitation profiles enables a more complete characterization of the underlying molecular parameters.

229
First, we revisit the “difficult” target absorption spectrum of the two-mode system from Section 4.3.

230
Using the same parameters, we synthesized Raman excitation profiles by calculating the Raman cross section of each mode at seven frequencies resonant with the broad electronic transition.

231
To invert the target spectra (the points in Fig. 7), we add a second objective to minimize the error at these points.

232
Four attempts were made.

233
In each, the 60 member population was allowed to evolve for 60 generations.

234
The results are shown as the solid lines in Fig. 7 with representative parameters entered in Table 5.

235
Since there are multiple objectives there can be several nondominated chromosomes in the final population and only one is entered for each replicate.

236
The best calculated parameters correspond closely to the target parameters, though some large variations (≈20%) exist among the nondominated chromosomes.

237
Again, only in the inset are the small errors in the calculated absorption spectra visible.

238
The Raman excitation profile of mode 2 is about five times more intense than that of mode 1 and the algorithm does an excellent job at fitting the experimental points.

239
For the less Raman-active mode, there is some variability in the excitation profiles, but each passes very close to the target at the prescribed wavelengths.

240
When two coupled modes are considered, we must choose how many objective functions to optimize: we can either lump the two fundamental Raman excitation profiles together into a single objective function, or treat them separately.

241
As the number of objective functions increases the number of non-dominated chromosomes increases.

242
Horn and Nafpliotis52 illustrate the effects of too little and too much dominance pressure.

243
When too much of the population is comprised of non-dominated solutions (tnon ≫ 20%), the algorithm prematurely converges to a small portion of the front.

244
When there are too few non-dominated solutions (tnon ∼ 1%), there is not enough dominance pressure to exploit fit solutions and convergence is slow.

245
Their empirically-derived, best case is tnon ≈ 10% which yields a tight and complete distribution.

246
Keeping the ratio of nondominated to dominated solutions around 0.1, two mode calculations were generally best solved with two objective functions.

247
A more impressive demonstration of the multiobjective genetic algorithm is achieved with a five-mode calculation.

248
The bold parameters in Table 6 were used to synthesize an absorption spectrum and Raman cross section at seven frequencies for the fundamental, overtone, and combination bands of the system.

249
Using three objective functions (the error in the absorption spectrum, the fundamental Raman excitation profiles, and the Raman excitation profiles of the most intense overtones and combinations), we inverted the data to determine the parameters in Table 6 and the corresponding spectra in Figs. 8 and 9.

250
The algorithm is quite successful at inverting these spectra: fewer than 300 data points (more than half from the absorption spectrum) are used to find 13 free parameters!

251
When a spectrum with less structure is chosen, inversion becomes challenging because solvent-induced broadening smears out the signatures of individual modes.

252
Using objective functions which consider the collective deviation of all modes from the target Raman data, as above, many disparate chromosomes (parameter sets) have similar error values and the population does not converge to the optimal solution.

253
The possibility of increasing the number of objective functions to distinguish the behavior of particular modes may be considered, however the cost can be prohibitive since the number of nondominated chromosomes increases very quickly and the population size must grow concomitantly.

254
To focus the population more effectively, the objective functions must be chosen to capture the nuance of the data.

255
The best parceling of experimental data into objective functions strongly depends on the specifics of the system.

256
Generally, schemes for breaking the problem down into smaller pieces and identifying useful objective functions are more physically relevant and intuitive than the iterative search for molecular parameters.

257
Our specific recommendations for this problem are to group vibrational modes with similar experimental cross sections.

258
When more modes are considered, our strategy will be to group similar vibrational modes (i.e., symmetric or asymmetric stretches).

259
While the genetic algorithm has removed some degree of subjectivity in the inversion of small systems, with larger problems the spectroscopist still has to be clever.

260
A more typical five-mode system presents a much more difficult problem.

261
The broad and featureless absorption spectrum and fundamental Raman excitation profiles are shown in Fig. 10.

262
Splitting the Raman active modes into two groups with independent error functions (the intense profiles from modes 2, 3, and 5 and the less intense profiles from modes 1 and 4), we attempt to determine the molecular parameters from three objective functions.

263
With a population size of 100, within five generations there are 20–30% nondominated chromosomes.

264
While the fit to the absorption spectrum is generally good, the Raman excitation profiles are poorly represented by these nondominated chromosomes.

265
The range of parameters that provide nearly quantitative fits to the structureless absorption spectrum is large and spans regions of parameter space which do not produce acceptable Raman excitation profiles.

266
To focus the search on promising regions of parameter space, all of the absorption profiles with an error function below some tolerance value are set equal after 10 generations.

267
Using only the error functions associated with the Raman excitation profiles to determine domination quickly hones the population toward an optimal solution.

268
Three representative nondominated chromosomes are shown in Table 7 with a typical fit in Fig. 10.

269
While the parameters corresponding to all nondominated chromosomes fit the data well, some of the parameters deviate considerably from the target ones.

270
The errors associated with mode 1, for example, exceed 20%.

271
This seemingly paradoxical result occurs because, when the spectra are completely unstructured, the Raman profiles are not sensitive to the excited state frequencies and displacements independently.

272
The slope of the excited state surface along each normal coordinate is the principal factor in determining the intensity of the Raman excitation profiles.1,3,33

273
In Table 8, the slopes of the calculated and target excited state surfaces at the ground state minimum are compared.

274
Although the algorithm cannot reliably extract the displacement Δ and the frequency ratio ωe/ωg independently, the slope of the excited state surface can be determined even when the spectra have no structure.

275
The most useful test of our genetic algorithm is to measure its ability to invert the absorption spectrum and Raman excitation profiles of a randomly generated system against that of a human performing the “intelligently guided random search” methodology.

276
An independent third party was recruited to design a five-mode system with parameters falling within the ranges given in Table 1.

277
The absorption spectrum and Raman intensities shown in Fig. 11, but not the generating parameters, were presented to the authors.

278
One attempted to extract the parameters manually while the other employed the genetic algorithm.

279
In approximately 25 iterations requiring a total of 1.5 h of work, the manual search produced a good fit to the data and reasonable agreement with the target parameters; however, the excited-state frequencies could not be determined with any reliability, and were left equal to the ground-state frequencies for all but the most strongly coupled mode.

280
The genetic algorithm from Section 4.3 found a marginally better fit after 25 generations, with a reduced population of 50 chromosomes.

281
Using a 600 MHz G3 processor, each generation of 50 calculations took about 10 min.

282
Plainly stated, it took the algorithm 50 times as many calculations, and about twice as long, to find an optimal fit but it required almost no “human time”.

283
The genetic algorithm was able to return reasonably good values for the excited-state frequencies as well as the displacements and can therefore be judged more successful than the manual method, although we suspect that the introduction of experimental noise would reduce or eliminate this advantage.

Conclusions

284
Overall, the multiobjective genetic algorithm is an effective means of inverting spectroscopic data.

285
When spectra have well-resolved vibronic progressions, the inversion is nearly quantitative though the algorithm is computationally less efficient than its human counterpart.

286
As the spectra are smeared out by inhomogeneous broadening or large molecular reorganization energies, information is lost and multiple solutions provide good fits to the data.

287
When a unique solution cannot be determined, a range of suitable parameters can be extracted from the set of nondominated chromosomes.

288
This intrinsic determination of error bounds is unique to the genetic algorithm.

289
In those cases where no acceptable fit can be achieved, either experimental noise or physical effects not included in our model are at play.

290
The greatest success of the genetic algorithm over the intelligently guided random search is the ability to objectively decide when additional parameters are required to describe the data.