Difference between revisions of "Temperature Scheduling in Simulated Annealing"

Latest revision as of 21:01, 16 January 2008

The purpose of this page is to describe the notion of temperature scheduling in simulated annealing. The type of temperature schedule used in one's algorithm and the size and difficulty of the terrain of the search space have connections to the convergence properties of the algorithm. These topics are considered briefly below, with an emphasis on the diamond problem. Additionally, prospective temperature scheduling strategies are also proposed, which will be studied in actual simulated annealing runs in the near future.

Introduction to Temperature Scheduling

...bringing a fluid into a low-energy state such as growing a crystal, has been considered...to be similar to the process of finding an optimum solution of a combinatorial optimization problem. Annealing is a well-known process for growing crystals. It consists of melting the fluid and then lowering the temperature slowing until the crystal is formed. The rate of the decrease of temperature has to be very low around the freezing temperature. The Metropolis Monte Carlo method...can be used to simulate the annealing process. It has been proposed as an effective method for finding global minima of combinatorial optimization problems.[3]

Temperature scheduling in simulated annealing refers to the process of controlling the temperature in a particular run, either through a simple geometric sequence or by an adaptive sequence. The effect that temperature has on an annealing run is that it determines the probability that a solution with a higher cost function value will be accepted over one with a lower value, which is known as hill climbing[1], [3]. There are two main stages of temperature scheduling (warming up and cooling down), which are punctuated by two stopping conditions (equilibrium and frozen criterion respectively). The warming up stage represents a period during simulated annealing in which the temperature is raised to a high enough point that the probability of the acceptance of a neighboring solution is nearly one. Such a high probability of acceptance ensures that the final solution does not depend on the initial starting point. The warming period usually occurs for a predefined number of steps (known as a chain length). When the predefined number of steps have occurred, the annealing run has reached the equilibrium point, which simply is the term used to define the point at which warming has ended and cooling will begin. Continuing the analogy to classical annealing, the cooling stage of the annealing run, corresponds to the period during which the temperature is cooled down to reduce the acceptance ratio of lesser quality solutions. The cooling phase helps the algorithm to find an ultimate solution. Depending on the parameters selected, the cooling phase is ended by either a set number of steps or a determined acceptance ratio less than some value. This point in the process is known as the freezing point (or frozen criterion) [4].

ParSA Scheduling Capabilities

The following table summarizes ParSA's temperature scheduling classes, for more information see [4].

ParSA Scheduling Class	Warming Up	Equilibrium	Cooling Down	Frozen
SA_EasyScheduler	user defined	user defined	$T_{n}=\alpha T_{n-1}{\frac {}{}}$	acceptance ratio less than a predefined value after a given number of temperature steps
SA_AartsScheduler	$T={\bar {\Delta C^{(+)}}}\left(\ln {\frac {m_{2}}{m_{2}\chi _{0}-(1-\chi _{0})m_{1}}}\right)^{-1}$	"length of a subchain with constant temperature is set to the number local neighborhood"[4]	$T_{n}=T_{n-1}\left(1+{\frac {\ln(1+\delta )T_{n-1}}{3\sigma (T_{n-1)}}}\right)^{-1}$	terminates when the mean value of the derivative of the cost function is less than a user set value
SA_MIRScheduler	similar to SA_AartsScheduler	$T_{start}=-{\frac {\Delta C_{max}}{\ln \chi _{0}}}$	$T_{n}=\alpha T_{n-1}{\frac {}{}}$	$T_{end}=-{\frac {\Delta C_{min}}{\ln \chi _{0}}}$

Symbol	What It Represents
χ_m	acceptance ratio
α	constant cooling factor between 0 and 1
σ(T_m)	standard deviation of temperature
δ	distance parameter (determines speed of temperature decrease)
ΔC⁽⁺⁾	"the mean value of the differences between the cost function of all worse solutions" [4]
m₁	better neighbor during SA_AartsScheduler pre-warming phase
m₂	worse neighbor during SA_AartsScheduler pre-warming phase

Figure 1: A rough portrayal of the dependence of the next temperature step as a function of the delta parameter and the previous temperature's value for the SA_AartsSchedule cooling phase

Difficulty of the Problem

The size and terrain of a particular search space are two concepts that one should include when speaking of the difficulty of a search space. The computing time drastically increases with the size (or dimensionality) of search space [1]. However, it is still possible to have higher dimensional problems that are unimodal or have very few local minima. Thus, one must also take into account the shape of search space when contemplating the difficulty.

In the diamond problem, a solution consists of values given to two amplitudes and three phases. The amplitudes represent the intensity of the light from the reference mirror and the diamond (the front and the back only differ by a constant factor). The phases represent each of the three surfaces: the reference mirror and the front and back of the diamond. Each of these quantities is represented by a certain order Legendre polynomial, which is represented in the form of a matrix with the number of rows and columns to match the order with an offset of one. Since the order Legendre Polynomial for each of the amplitudes and phases is set by the user, the dimensionality of the problem varies depending on the desired order. Thus, the lowest dimensionality that the diamond problem could take on would be 11 and the highest would be (cutting the Legendre polynomial order off at five) 125. However, it is most likely somewhere in between most likely slightly greater than 50, due to wanting to fully exploit the range of the Legendre polynomials to give the greatest accuracy in mapping the diamond surface.

Convergence and Prospective strategies

Figure 2: An example of convergence for a Simulated Annealing Run

Convergence when spoken in the context of simulated annealing refers to how (if at all) a particular algorithm will approach the correct solution (or for very difficult problems a close to correct solution). There are many proofs of convergence that are given for certain types of simulated annealing algorithms ([3] and [7]), each with there own twist on cooling and other aspects of the algorithm's implementation. To understand fully (in a mathematical sense) the subject of convergence one must look into the properties of Markov chains and their connections to Monte Carlo-like algorithms. This topic is reserved for another wiki page. However, citing the article by [2], the ParSA library suggests that convergence speed is governed by the following equation:

$P(X_{n}\not \in Cost_{min})=\left({\frac {K}{n}}\right)^{\alpha }$

(1)

Where P is the convergence speed, n is the subchain length and "K and α are constants specific to the problem." [4].

The primary objective for potential cooling strategies lies in the determination of the α and K factors given in the ParSA section on improving solution quality in a lesser amount of time. By tracking both the chain length n and speed of convergence P(X_n ∉ Cost_min), one can find a linear plot relating all four of the quantities through the following relationship:

$\ln {P(X_{n}\not \in Cost_{min})}=\alpha \ln {K}-\alpha \ln {n}$

(2)

Once a sufficient number of runs have been completed, the α and K factors will be known and can thereby be exploited to find the most effective chain length to run multiple independent Markov chains. Given the potential size of the search space, one can muse that the α factor will most likely be closer to one rather than to zero, because with the multiple run strategy, faster cooling will result in a particular chain settling very quickly to a minimum (which may be a local minimum). After settling, the cluster can then move on to a new chain to settle to another minima. If this is repeated, the chances of finding the global minima among one of the solutions is much greater than if only one chain were used [4].

The figures to the right represent convergence graphs of different optimization algorithms. In figure 2 [6], the lines with equilateral triangles dispersed throughout represent simulated annealing algorithms similar to ParSA. The graph with the base of the triangle up represents adaptive simulated annealing, while the graph with the base of the triangles down represents non-adaptive simulated annealing. In figures 3 and 4 [5], the dotted line denoted by SA represents simulated annealing.

Figure 3: Another example of convergence for a Simulated Annealing Run

Figure 4: Another example of convergence for a Simulated Annealing Run

References

1. Esin Onbasoglu and Linet Ozdamar. Parallel Simulated Annealing Algorithms in Global Optimization. Journal of Global Optimization. 2001.

2. S.Y. Lee and K.G. Lee. Synchronous and Asynchronous Parallel Simulated Annealing with Multiple Markov Chains. IEEE Transactions on Parallel and Distributed Systems. 1991

3. Debasis Mitra, Fabio Romeo, and Alberto Sangiovanni-Vincentelli. Convergence and Finite-Time Behavior of Simulated Annealing. Advances in Applied Probability. 1986.

4. Georg Kliewer and Karsten Klohs. Parallel Simulated Annealing Library (parSA) User Manual. Version 2.2.

5. Hongbo Liu, Ajith Abraham and Weishi Zhang. A fuzzy adaptive turbulent particle swarm optimisation. Int. J. Innovative Computing and Applications. 2007.

6. Hyeong Soo Chang, Michael C. Fu, Jiaqiao Hu and Steven I. Marcus. An Asymptotically Efficient Simulation-Based Algorithm for Finite Horizon Stochastic Dynamic Programming. IEEE Transactions on Automatic Control. 2007.

7. Claude J. P. Belisle. Convergence Theorems for a Class of Simulated Annealing Algorithms on ℜ^d. Journal of Applied Probability. 1992.

@@ Line 1: / Line 1: @@
-<font; color="red"> THIS PAGE IS A WORK IN PROGRESS </font>
+The purpose of this page is to describe the notion of temperature scheduling in simulated annealing.  The type of temperature schedule used in one's algorithm and the size and difficulty of the ''terrain'' of the search space have connections to the convergence properties of the algorithm.  These topics are considered briefly below, with an emphasis on ''the diamond problem''.  Additionally, prospective temperature scheduling strategies are also proposed, which will be studied in actual simulated annealing runs in the near future.
-The purpose of this page is to describe the notion of temperature scheduling in simulated annealing.  The type of temperature schedule used in one's algorithm and the size and difficulty of ''terrain'' of the search space have connections to the convergence properties of ones algorithm.  These topics are also considered briefly below, with an emphasis on ''the diamond problem''.  Additionally, prospective temperature scheduling strategies are also proposed, which will serve as a backbone to potential simulated annealing runs in the near future.
+== Introduction to Temperature Scheduling ==
+''...bringing a fluid into a low-energy state such as growing a crystal, has been considered...to be similar to the process of finding an optimum solution of a combinatorial optimization problem.  Annealing is a well-known process for growing crystals.  It consists of melting the fluid and then lowering the temperature slowing until the crystal is formed.  The rate of the decrease of temperature has to be very low around the freezing temperature.  The Metropolis Monte Carlo method...can be used to simulate the annealing process.  It has been proposed as an effective method for finding global minima of combinatorial optimization problems.''[[#References|[3]]]
-== Introduction ==
+Temperature scheduling in simulated annealing refers to the process of controlling the ''temperature'' in a particular run, either through a simple geometric sequence or by an adaptive sequence.  The effect that temperature has on an annealing run is that it determines the probability that a solution with a higher cost function value will be accepted over one with a lower value, which is known as ''hill climbing''[[#References|[1]]], [[#References|[3]]].  There are two main stages of temperature scheduling (warming up and cooling down), which are punctuated by two stopping conditions (equilibrium and frozen criterion respectively).  The warming up stage represents a period during simulated annealing in which the temperature is raised to a high enough point that the probability of the acceptance of a neighboring solution is nearly one.  Such a high probability of acceptance ensures that the final solution does not depend on the initial starting point.  The warming period usually occurs for a predefined number of ''steps'' (known as a chain length).  When the predefined number of steps have occurred, the annealing run has reached the equilibrium point, which simply is the term used to define the point at which warming has ended and cooling will begin.  Continuing the analogy to classical annealing, the cooling stage of the annealing run, corresponds to the period during which the temperature is cooled down to reduce the acceptance ratio of lesser quality solutions.  The cooling phase helps the algorithm to find an ultimate solution.  Depending on the parameters selected, the cooling phase is ended by either a set number of steps or a determined acceptance ratio less than some value.  This point in the process is known as the freezing point (or frozen criterion) [[#References|[4]]].
-There are three major stages in the temperature scheduling process of a simulated annealing algorithm: warming up, equilibrium, cooling -down and frozen.  The warming up period allows the search to get far away from the initial solution, thus ensuring that the solution will not depend on where it started.  During this phase the temperature is kept very high and "walking uphill" is equally (if not more) likely at walking downhill.  The equilibrium point of the schedule signifies the ending of the warming up phase and the beginning of the cooling (or decrement) phase.  The equilibrium phase is given by a certain number of "steps" in the solution space.  The cooling phase of scheduling is the period during which the temperature is reduced, thereby allowing the solution to settle in on a particular solution.  The frozen stage is the exit switch of the program.  When the annealing run hits a certain temperature or the ratio of successive steps in solution space are less than a user set ratio, the program ends, signifying the end of the temperature scheduling process.  There are several different ways that each of the above stages can be guided, which depend on the method that is in use.  The two main types of strategies are adaptive and geometric.  Geometric styled schedules allow the user to set the initial temperature, the number of steps taken by the program before it switches to the cooling portion and the decrement factor.  Adaptive scheduling involves using formula's that involve computing the average neighborhood size of a solution and other factors.  While the user still has to set certain parameters in this method, the temperature is influenced more by the search space.
-==Cooling Schedule Information==
+==ParSA Scheduling Capabilities==
-The three cooling schedules featured in the ParSA Library are the SA_EasyScheduler, the SA_AartsScheduler and the SA_MIRScheduler.  Each of these rely on different principles who's aim is to find the best solution in the least amount of time.  The SA_EasyScheduler relies on many user set parameters that include the initial temperature, the initial chain length for temperature (equilibrium), the decrement factor, and the frozen factor.  Cite the turkish paper here with concurrence for large problems and a constant cooling factor.  Difficulty of determiing starting temperature difficult tho.The SA_Aarts Scheduler requires that the user set three parameters, epsilon, delta and omega that govern the temperature scheduling process.  epsilon is the the frozen parameter.  omega is the smoothing parameter.  delta is the distance factor, which shows up in the temperature decrement calculator.  its rough dependence is given in the graph at the left. [[Image:Temp_delta.jpg|thumb|A rough portrayal of the dependence of the next temperature step as a function of the delta parameter and the previous temperature's value]] The MIR scheduler, which stands for Multiple Independent runs is based off of the idea that runs of shorter chain lengths linked together will find a more accurate solution than one sequential run of the same lenght.  the reason that this provides a more accurate search is that the search can span a larger area in the search space.  Thus, with more space covered the the solution given is more likely to be closer to the global optimum.
+The following table summarizes ParSA's temperature scheduling classes, for more information see [[#References|[4]]].
-==Difficulty of the problem==
+<table BORDER="3" CELLSPACING="1" CELLPADDING="1">
-The size and ''terrain'' of a particular search space are two concepts that one should include when speaking of the difficulty of a search space.  The computing time drastically increases with the size (or dimensionality) of  search space [1].  However, it is still possible to have higher dimensional problems that are unimodal or have very few local minima.  Thus, one must also take into account the shape of search space when contemplating the difficulty.
+<tr>
+<td>'''ParSA Scheduling Class'''</td>
+<td>'''Warming Up'''</td>
+<td>'''Equilibrium'''</td>
+<td>'''Cooling Down'''</td>
+<td>'''Frozen'''</td>
+</tr>
+<tr>
+<td>'''SA_EasyScheduler'''</td>
+<td>''user defined''</td>
+<td>''user defined''</td>
+<td><math>T_n = \alpha T_{n-1} \frac{}{}</math></td>
+<td>''acceptance ratio less than a predefined value after a given number of temperature steps''</td>
+</tr>
+<tr>
+<td>'''SA_AartsScheduler'''</td>
+<td><math>T=\bar{\Delta C^{(+)}}\left(\ln{\frac{m_2}{m_2\chi_0-(1-\chi_0)m_1}}\right)^{-1}</math></td>
+<td>''"length of a subchain with constant temperature is set to the number local neighborhood"''[[#References|[4]]]</td>
+<td><math>T_n=T_{n-1}\left(1+\frac{\ln(1+\delta)T_{n-1}}{3\sigma(T_{n-1)}}\right)^{-1}</math></td>
+<td>''terminates when the mean value of the derivative of the cost function is less than a user set value''</td>
+</tr>
+<tr>
+<td>'''SA_MIRScheduler'''</td>
+<td>''similar to SA_AartsScheduler''</td>
+<td><math>T_{start}=-\frac{\Delta C_{max}}{\ln \chi_0}</math></td>
+<td><math>T_n = \alpha T_{n-1} \frac{}{}</math></td>
+<td><math>T_{end}=-\frac{\Delta C_{min}}{\ln \chi_0}</math></td>
+</tr>
+</table>
-In ''the diamond problem'', a solution consists of values given to two amplitudes and three phases.  The amplitudes represent the intensity of the light from the reference mirror and the diamond {the front and the back only differ by a constant).  The phases represent each of the three surfaces: the reference mirror and the front and back of the diamond.  Each of these quantities is represented by a certain order Legendre polynomial, which is represented in the form of a matrix with the number of rows and columns to match the order with an offset of one.  Since the order Legendre Polynomial for each of the amplitudes and phases is set by the user, the dimensionality of the problem varies depending on the desired order.  Thus, the lowest dimensionality that ''the diamond problem'' could take on would be 11 and the highest would be (cutting the Legendre polynomial order off at five) 125.  However, it is most likely somewhere in between most likely slightly greater than 50, due to wanting to fully exploit the range of the Legendre polynomials to give the greatest accuracy in mapping the diamond surface.
+{|align=center spacing="10"
+|<table BORDER="3" CELLSPACING="1" CELLPADDING="1">
+<tr>
+<td>'''Symbol'''</td>
+<td>'''What It Represents'''</td>
+</tr>
+<tr>
+<td>&chi;<sub>m</sub></td>
+<td>acceptance ratio </td>
+</tr>
+<tr>
+<td>&alpha;</td>
+<td>constant cooling factor between 0 and 1</td>
+</tr>
+<tr>
+<td>&sigma;(T<sub>m</sub>)</td>
+<td>standard deviation of temperature</td>
+</tr>
+<tr>
+<td>&delta;</td>
+<td>distance parameter (determines speed of temperature decrease)</td>
+</tr>
+<tr>
+<td>&Delta;C<sup>(+)</sup></td>
+<td>"the mean value of the differences between the cost function of all worse solutions" [[#References|[4]]]</td>
+</tr>
+<tr>
+<td>m<sub>1</sub></td>
+<td>better neighbor during SA_AartsScheduler pre-warming phase</td>
+</tr>
+<tr>
+<td>m<sub>2</sub></td>
+<td>worse neighbor during SA_AartsScheduler pre-warming phase</td>
+</tr>
+</table>
-[1] Esin Onbasoglu and Ozdamar Yeditepe University, Istanbul, Turkey Journal of Global Optimization 2001
+|[[Image:Temp_delta.jpg|thumb|Figure 1: A rough portrayal of the dependence of the next temperature step as a function of the delta parameter and the previous temperature's value for the SA_AartsSchedule cooling phase]]
+|}
-== Convergence and Simulated Annealing ==
+==Difficulty of the Problem==
-The ParSA library documentation gives a formula, which it states is proven in [2].  The documentation states that this formula is based in
+The size and ''terrain'' of a particular search space are two concepts that one should include when speaking of the difficulty of a search space.  The computing time drastically increases with the size (or dimensionality) of  search space [[#References|[1]]].  However, it is still possible to have higher dimensional problems that are unimodal or have very few local minima.  Thus, one must also take into account the shape of search space when contemplating the difficulty.
-==Prospective strategies==
+In ''the diamond problem'', a solution consists of values given to two amplitudes and three phases.  The amplitudes represent the intensity of the light from the reference mirror and the diamond (the front and the back only differ by a constant factor).  The phases represent each of the three surfaces: the reference mirror and the front and back of the diamond.  Each of these quantities is represented by a certain order Legendre polynomial, which is represented in the form of a matrix with the number of rows and columns to match the order with an offset of one.  Since the order Legendre Polynomial for each of the amplitudes and phases is set by the user, the dimensionality of the problem varies depending on the desired order.  Thus, the lowest dimensionality that ''the diamond problem'' could take on would be 11 and the highest would be (cutting the Legendre polynomial order off at five) 125.  However, it is most likely somewhere in between most likely slightly greater than 50, due to wanting to fully exploit the range of the Legendre polynomials to give the greatest accuracy in mapping the diamond surface.
-Each of the above cooling schedule classes have parameters that guide every portion of the cooling process.  the warming up, the equilibrium determination, the cooling and the frozen.
+==Convergence and Prospective strategies==
+[[Image:SA_perform.jpg|thumb|Figure 2: An example of convergence for a Simulated Annealing Run]]
+Convergence when spoken in the context of simulated annealing refers to how (if at all) a particular algorithm will approach the correct solution (or for very difficult problems a close to correct solution).  There are many proofs of convergence that are given for certain types of simulated annealing algorithms ([[#References|[3]]] and [[#References|[7]]]), each with there own twist on cooling and other aspects of the algorithm's implementation.  To understand fully (in a mathematical sense) the subject of convergence one must look into the properties of Markov chains and their connections to Monte Carlo-like algorithms.  This topic is reserved for another wiki page.  However, citing the article by [[#References|[2]]], the ParSA library suggests that convergence speed is governed by the following equation:
+{|width="50%"
+|align="right"|
+<math>P(X_n \not\in Cost_{min})=\left(\frac{K}{n}\right)^{\alpha}</math>
+|align="center" width="80"|(1)
+|}
+Where P is the convergence speed, n is the subchain length and "''K'' and &alpha; are constants specific to the problem." [[#References|[4]]].
+The primary objective for potential cooling strategies lies in the determination of the &alpha; and ''K'' factors given in the ParSA section on improving solution quality in a lesser amount of time.  By tracking both the chain length ''n'' and speed of convergence P(X<sub>n</sub> &notin; Cost<sub>min</sub>), one can find a linear plot relating all four of the quantities through the following relationship:
+{|width="50%"
+|align="right"|
+<math>\ln{P(X_n \not\in Cost_{min})} = \alpha \ln{K} - \alpha \ln{n}</math>
+|align="center" width="80"|(2)
+|}
+Once a sufficient number of runs have been completed, the &alpha; and ''K'' factors will be known and can thereby be exploited to find the most effective chain length to run multiple independent Markov chains.  Given the potential size of the search space, one can muse that the &alpha; factor will most likely be closer to one rather than to zero, because with the multiple run strategy, faster cooling will result in a particular chain settling very quickly to a minimum (which may be a local minimum).  After settling, the cluster can then move on to a new chain to settle to another minima.  If this is repeated, the chances of finding the global minima among one of the solutions is much greater than if only one chain were used [[#References|[4]]].
+The figures to the right represent convergence graphs of different optimization algorithms.  In figure 2 [[#References|[6]]], the lines with equilateral triangles dispersed throughout represent simulated annealing algorithms similar to ParSA.  The graph with the base of the triangle up represents adaptive simulated annealing, while the graph with the base of the triangles down represents non-adaptive simulated annealing.  In figures 3 and 4 [[#References|[5]]], the dotted line denoted by SA represents simulated annealing.
+{|align=center
+|[[Image:SA_perform_2.jpg|thumb|Figure 3: Another example of convergence for a Simulated Annealing Run]]
+|[[Image:SA_perform_3.jpg|thumb|Figure 4: Another example of convergence for a Simulated Annealing Run]]
+|}
+==References==
+<div class="references-small">
+. Esin Onbasoglu and Linet Ozdamar.  ''Parallel Simulated Annealing Algorithms in Global Optimization.''  Journal of Global Optimization. 2001.
+. S.Y. Lee and K.G. Lee. ''Synchronous and Asynchronous Parallel Simulated Annealing with Multiple Markov Chains.'' IEEE Transactions on Parallel and Distributed Systems. 1991
+. Debasis Mitra, Fabio Romeo, and Alberto Sangiovanni-Vincentelli. ''Convergence and Finite-Time Behavior of Simulated Annealing.'' Advances in Applied Probability. 1986.
+. Georg Kliewer and Karsten Klohs.  ''Parallel Simulated Annealing Library (parSA) User Manual.'' Version 2.2.
+. Hongbo Liu, Ajith Abraham and Weishi Zhang.  ''A fuzzy adaptive turbulent particle swarm optimisation.''  Int. J. Innovative Computing and Applications.  2007.
+. Hyeong Soo Chang, Michael C. Fu, Jiaqiao Hu and Steven I. Marcus.  ''An Asymptotically Efficient Simulation-Based Algorithm for Finite Horizon Stochastic Dynamic Programming.'' IEEE Transactions on Automatic Control. 2007.
+. Claude J. P. Belisle. ''Convergence Theorems for a Class of Simulated Annealing Algorithms on &real;<sup>d</sup>.''  Journal of Applied Probability. 1992.
+</div>