Optimization algorithms

1. Search space size in the real world

The number of possible solutions for a planning problem can be mind blowing. For example:

  • Four queens has 256 possible solutions (4^4) and two optimal solutions.

  • Five queens has 3125 possible solutions (5^5) and one optimal solution.

  • Eight queens has 16777216 possible solutions (8^8) and 92 optimal solutions.

  • 64 queens has more than 10^115 possible solutions (64^64).

  • Most real-life planning problems have an incredible number of possible solutions and only one or a few optimal solutions.

For comparison: the minimal number of atoms in the known universe (10^80). As a planning problem gets bigger, the search space tends to blow up really fast. Adding only one extra planning entity or planning value can heavily multiply the running time of some algorithms.

cloudBalanceSearchSpaceSize

Calculating the number of possible solutions depends on the design of the domain model:

searchSpaceSizeCalculation

This search space size calculation includes infeasible solutions (if they can be represented by the model), because:

  • The optimal solution might be infeasible.

  • There are many types of hard constraints that cannot be incorporated in the formula practically. For example, in Cloud Balancing, try incorporating the CPU capacity constraint in the formula.

Even in cases where adding some of the hard constraints in the formula is practical (for example, Course Scheduling), the resulting search space is still huge.

An algorithm that checks every possible solution (even with pruning, such as in Branch And Bound) can easily run for billions of years on a single real-life planning problem. The aim is to find the best solution in the available timeframe. Planning competitions (such as the International Timetabling Competition) show that Local Search variations (Tabu Search, Simulated Annealing, Late Acceptance, …​) usually perform best for real-world problems given real-world time limitations.

2. Does OptaPy find the optimal solution?

The business wants the optimal solution, but they also have other requirements:

  • Scale out: Large production data sets must not crash and have also good results.

  • Optimize the right problem: The constraints must match the actual business needs.

  • Available time: The solution must be found in time, before it becomes useless to execute.

  • Reliability: Every data set must have at least a decent result (better than a human planner).

Given these requirements, and despite the promises of some salesmen, it is usually impossible for anyone or anything to find the optimal solution. Therefore, OptaPy focuses on finding the best solution in available time.

The nature of NP-complete problems make scaling a prime concern.

The quality of a result from a small data set is no indication of the quality of a result from a large data set.

Scaling issues cannot be mitigated by hardware purchases later on. Start testing with a production sized data set as soon as possible. Do not assess quality on small data sets (unless production encounters only such data sets). Instead, solve a production sized data set and compare the results of longer executions, different algorithms and - if available - the human planner.

3. Architecture overview

OptaPy uses OptaPlanner, the first framework to combine optimization algorithms (metaheuristics, …​) with score calculation by a rule engine (such as Drools). This combination is very efficient, because:

  • A rule engine, such as Drools, is great for calculating the score of a solution of a planning problem. It makes it easy and scalable to add additional soft or hard constraints. It does incremental score calculation (deltas) without any extra code. However it tends to be not suitable to actually find new solutions.

  • An optimization algorithm is great at finding new improving solutions for a planning problem, without necessarily brute-forcing every possibility. However, it needs to know the score of a solution and offers no support in calculating that score efficiently.

architectureOverview

4. Optimization algorithms overview

OptaPy supports three families of optimization algorithms: Exhaustive Search, Construction Heuristics and Metaheuristics. In practice, Metaheuristics (in combination with Construction Heuristics to initialize) are the recommended choice:

scalabilityOfOptimizationAlgorithms

Each of these algorithm families have multiple optimization algorithms:

Table 1. Optimization Algorithms Overview
Algorithm Scalable? Optimal? Easy to use? Tweakable? Requires CH?

Exhaustive Search (ES)

Brute Force

0/5

5/5

5/5

0/5

No

Branch And Bound

0/5

5/5

4/5

2/5

No

Construction heuristics (CH)

First Fit

5/5

1/5

5/5

1/5

No

First Fit Decreasing

5/5

2/5

4/5

2/5

No

Weakest Fit

5/5

2/5

4/5

2/5

No

Weakest Fit Decreasing

5/5

2/5

4/5

2/5

No

Strongest Fit

5/5

2/5

4/5

2/5

No

Strongest Fit Decreasing

5/5

2/5

4/5

2/5

No

Cheapest Insertion

3/5

2/5

5/5

2/5

No

Regret Insertion

3/5

2/5

5/5

2/5

No

Metaheuristics (MH)

Local Search (LS)

Hill Climbing

5/5

2/5

4/5

3/5

Yes

Tabu Search

5/5

4/5

3/5

5/5

Yes

Simulated Annealing

5/5

4/5

2/5

5/5

Yes

Late Acceptance

5/5

4/5

3/5

5/5

Yes

Great Deluge

5/5

4/5

3/5

5/5

Yes

Step Counting Hill Climbing

5/5

4/5

3/5

5/5

Yes

Variable Neighborhood Descent

3/5

3/5

2/5

5/5

Yes

Evolutionary Algorithms (EA)

Evolutionary Strategies

3/5

3/5

2/5

5/5

Yes

Genetic Algorithms

3/5

3/5

2/5

5/5

Yes

To learn more about metaheuristics, see Essentials of Metaheuristics or Clever Algorithms.

5. Which optimization algorithms should I use?

The best optimization algorithms configuration to use depends heavily on your use case. However, this basic procedure provides a good starting configuration that will produce better than average results.

  1. Start with a quick configuration that involves little or no configuration and optimization code: See First Fit.

  1. Next, add Late Acceptance behind it:

    1. First Fit Decreasing.

    2. Late Acceptance.

At this point, the return on invested time lowers and the result is likely to be sufficient.

However, this can be improved at a lower return on invested time.

6. Power tweaking or default parameter values

Many optimization algorithms have parameters that affect results and scalability. OptaPy applies configuration by exception, so all optimization algorithms have default parameter values. This is very similar to the Garbage Collection parameters in a JVM: most users have no need to tweak them, but power users often do.

The default parameter values are sufficient for many cases (and especially for prototypes). The documentation for each optimization algorithm also declares the advanced configuration for power tweaking.

The default value of parameters will change between minor versions, to improve them for most users. The advanced configuration can be used to prevent unwanted changes, however, this is not recommended.

7. Solver phase

A Solver can use multiple optimization algorithms in sequence. Each optimization algorithm is represented by one solver Phase. There is never more than one Phase solving at the same time.

Some Phase implementations can combine techniques from multiple optimization algorithms, but it is still just one Phase. For example: a Local Search Phase can do Simulated Annealing with entity Tabu.

Here is a configuration that runs three phases in sequence:

<solver xmlns="https://www.optaplanner.org/xsd/solver" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="https://www.optaplanner.org/xsd/solver https://www.optaplanner.org/xsd/solver/solver.xsd">
  ...
  <constructionHeuristic>
    ... <!-- First phase: First Fit Decreasing -->
  </constructionHeuristic>
  <localSearch>
    ... <!-- Second phase: Late Acceptance -->
  </localSearch>
  <localSearch>
    ... <!-- Third phase: Tabu Search -->
  </localSearch>
</solver>

The solver phases are run in the order defined by solver configuration.

  • When the first Phase terminates, the second Phase starts, and so on.

  • When the last Phase terminates, the Solver terminates.

Usually, a Solver will first run a construction heuristic and then run one or multiple metaheuristics:

generalPhaseSequence

If no phases are configured, OptaPy will default to a Construction Heuristic phase followed by a Local Search phase.

Some phases (especially construction heuristics) will terminate automatically. Other phases (especially metaheuristics) will only terminate if the Phase is configured to terminate:

<solver xmlns="https://www.optaplanner.org/xsd/solver" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="https://www.optaplanner.org/xsd/solver https://www.optaplanner.org/xsd/solver/solver.xsd">
  ...
  <termination><!-- Solver termination -->
    <secondsSpentLimit>90</secondsSpentLimit>
  </termination>
  <localSearch>
    <termination><!-- Phase termination -->
      <secondsSpentLimit>60</secondsSpentLimit><!-- Give the next phase a chance to run too, before the Solver terminates -->
    </termination>
    ...
  </localSearch>
  <localSearch>
    ...
  </localSearch>
</solver>

If the Solver terminates (before the last Phase terminates itself), the current phase is terminated and all subsequent phases will not run.

8. Scope overview

A solver will iteratively run phases. Each phase will usually iteratively run steps. Each step, in turn, usually iteratively runs moves. These form four nested scopes:

  1. Solver

  2. Phase

  3. Step

  4. Move

scopeOverview

Configure logging to display the log messages of each scope.

9. Termination

Not all phases terminate automatically and may take a significant amount of time. A Solver can be terminated synchronously by up-front configuration, or asynchronously from another thread.

Metaheuristic phases in particular need to be instructed to stop solving. This can be because of a number of reasons, for example, if the time is up, or the perfect score has been reached just before its solution is used. Finding the optimal solution cannot be relied on (unless you know the optimal score), because a metaheuristic algorithm is generally unaware of the optimal solution.

This is not an issue for real-life problems, as finding the optimal solution may take more time than is available. Finding the best solution in the available time is the most important outcome.

If no termination is configured (and a metaheuristic algorithm is used), the Solver will run forever, until terminateEarly() is called from another thread. This is especially common during real-time planning.

For synchronous termination, configure a Termination on a Solver or a Phase when it needs to stop. The built-in implementations of these should be sufficient, but custom terminations are supported too. Every Termination can calculate a time gradient (needed for some optimization algorithms), which is a ratio between the time already spent solving and the estimated entire solving time of the Solver or Phase.

9.1. Time spent termination

Terminates when an amount of time has been used.

  <termination>
    <!-- 2 minutes and 30 seconds in ISO 8601 format P[n]Y[n]M[n]DT[n]H[n]M[n]S -->
    <spentLimit>PT2M30S</spentLimit>
  </termination>

Alternatively to a duration in ISO 8601 format, you can also use:

  • Milliseconds

      <termination>
        <millisecondsSpentLimit>500</millisecondsSpentLimit>
      </termination>
  • Seconds

      <termination>
        <secondsSpentLimit>10</secondsSpentLimit>
      </termination>
  • Minutes

      <termination>
        <minutesSpentLimit>5</minutesSpentLimit>
      </termination>
  • Hours

      <termination>
        <hoursSpentLimit>1</hoursSpentLimit>
      </termination>
  • Days

      <termination>
        <daysSpentLimit>2</daysSpentLimit>
      </termination>

Multiple time types can be used together, for example to configure 150 minutes, either configure it directly:

  <termination>
    <minutesSpentLimit>150</minutesSpentLimit>
  </termination>

Or use a combination that sums up to 150 minutes:

  <termination>
    <hoursSpentLimit>2</hoursSpentLimit>
    <minutesSpentLimit>30</minutesSpentLimit>
  </termination>

This Termination will most likely sacrifice perfect reproducibility (even with environmentMode REPRODUCIBLE) because the available CPU time differs frequently between runs:

  • The available CPU time influences the number of steps that can be taken, which might be a few more or less.

  • The Termination might produce slightly different time gradient values, which will send time gradient-based algorithms (such as Simulated Annealing) on a radically different path.

9.2. Unimproved time spent termination

Terminates when the best score has not improved in a specified amount of time. Each time a new best solution is found, the timer basically resets.

  <localSearch>
    <termination>
      <!-- 2 minutes and 30 seconds in ISO 8601 format P[n]Y[n]M[n]DT[n]H[n]M[n]S -->
      <unimprovedSpentLimit>PT2M30S</unimprovedSpentLimit>
    </termination>
  </localSearch>

Alternatively to a duration in ISO 8601 format, you can also use:

  • Milliseconds

      <localSearch>
        <termination>
          <unimprovedMillisecondsSpentLimit>500</unimprovedMillisecondsSpentLimit>
        </termination>
      </localSearch>
  • Seconds

      <localSearch>
        <termination>
          <unimprovedSecondsSpentLimit>10</unimprovedSecondsSpentLimit>
        </termination>
      </localSearch>
  • Minutes

      <localSearch>
        <termination>
          <unimprovedMinutesSpentLimit>5</unimprovedMinutesSpentLimit>
        </termination>
      </localSearch>
  • Hours

      <localSearch>
        <termination>
          <unimprovedHoursSpentLimit>1</unimprovedHoursSpentLimit>
        </termination>
      </localSearch>
  • Days

      <localSearch>
        <termination>
          <unimprovedDaysSpentLimit>1</unimprovedDaysSpentLimit>
        </termination>
      </localSearch>

Just like time spent termination, combinations are summed up.

It is preferred to configure this termination on a specific Phase (such as <localSearch>) instead of on the Solver itself.

This Termination will most likely sacrifice perfect reproducibility (even with environmentMode REPRODUCIBLE) as the available CPU time differs frequently between runs:

  • The available CPU time influences the number of steps that can be taken, which might be a few more or less.

  • The Termination might produce slightly different time gradient values, which will send time gradient based algorithms (such as Simulated Annealing) on a radically different path.

Optionally, configure a score difference threshold by which the best score must improve in the specified time. For example, if the score doesn’t improve by at least 100 soft points every 30 seconds or less, it terminates:

  <localSearch>
    <termination>
      <unimprovedSecondsSpentLimit>30</unimprovedSecondsSpentLimit>
      <unimprovedScoreDifferenceThreshold>0hard/100soft</unimprovedScoreDifferenceThreshold>
    </termination>
  </localSearch>

If the score improves by 1 hard point and drops 900 soft points, it’s still meets the threshold, because 1hard/-900soft is larger than the threshold 0hard/100soft.

On the other hand, a threshold of 1hard/0soft is not met by any new best solution that improves 1 hard point at the expense of 1 or more soft points, because 1hard/-100soft is smaller than the threshold 1hard/0soft.

To require a feasibility improvement every 30 seconds while avoiding the pitfall above, use a wildcard * for lower score levels that are allowed to deteriorate if a higher score level improves:

  <localSearch>
    <termination>
      <unimprovedSecondsSpentLimit>30</unimprovedSecondsSpentLimit>
      <unimprovedScoreDifferenceThreshold>1hard/*soft</unimprovedScoreDifferenceThreshold>
    </termination>
  </localSearch>

This effectively implies a threshold of 1hard/-2147483648soft, because it relies on Integer.MIN_VALUE.

9.3. BestScoreTermination

BestScoreTermination terminates when a certain score has been reached. Use this Termination where the perfect score is known, for example for four queens (which uses a SimpleScore):

  <termination>
    <bestScoreLimit>0</bestScoreLimit>
  </termination>

A planning problem with a HardSoftScore may look like this:

  <termination>
    <bestScoreLimit>0hard/-5000soft</bestScoreLimit>
  </termination>

A planning problem with a BendableScore with three hard levels and one soft level may look like this:

  <termination>
    <bestScoreLimit>[0/0/0]hard/[-5000]soft</bestScoreLimit>
  </termination>

In this instance, Termination once a feasible solution has been reached is not practical because it requires a bestScoreLimit such as 0hard/-2147483648soft. Use the next termination instead.

9.4. BestScoreFeasibleTermination

Terminates as soon as a feasible solution has been discovered.

  <termination>
    <bestScoreFeasible>true</bestScoreFeasible>
  </termination>

This Termination is usually combined with other terminations.

9.5. StepCountTermination

Terminates when a number of steps has been reached. This is useful for hardware performance independent runs.

  <localSearch>
    <termination>
      <stepCountLimit>100</stepCountLimit>
    </termination>
  </localSearch>

This Termination can only be used for a Phase (such as <localSearch>), not for the Solver itself.

9.6. UnimprovedStepCountTermination

Terminates when the best score has not improved in a number of steps. This is useful for hardware performance independent runs.

  <localSearch>
    <termination>
      <unimprovedStepCountLimit>100</unimprovedStepCountLimit>
    </termination>
  </localSearch>

If the score has not improved recently, it is unlikely to improve in a reasonable timeframe. It has been observed that once a new best solution is found (even after a long time without improvement on the best solution), the next few steps tend to improve the best solution.

This Termination can only be used for a Phase (such as <localSearch>), not for the Solver itself.

9.7. ScoreCalculationCountTermination

ScoreCalculationCountTermination terminates when a number of score calculations have been reached. This is often the sum of the number of moves and the number of steps. This is useful for benchmarking.

  <termination>
    <scoreCalculationCountLimit>100000</scoreCalculationCountLimit>
  </termination>

Switching EnvironmentMode can heavily impact when this termination ends.

9.8. Combining multiple terminations

Terminations can be combined, for example: terminate after 100 steps or if a score of 0 has been reached:

  <termination>
    <terminationCompositionStyle>OR</terminationCompositionStyle>
    <bestScoreLimit>0</bestScoreLimit>
    <stepCountLimit>100</stepCountLimit>
  </termination>

Alternatively you can use AND, for example: terminate after reaching a feasible score of at least -100 and no improvements in 5 steps:

  <termination>
    <terminationCompositionStyle>AND</terminationCompositionStyle>
    <bestScoreLimit>-100</bestScoreLimit>
    <unimprovedStepCountLimit>5</unimprovedStepCountLimit>
  </termination>

This example ensures it does not just terminate after finding a feasible solution, but also completes any obvious improvements on that solution before terminating.

9.9. Asynchronous termination from another thread

Asynchronous termination from another thread occurs when a Solver needs to be terminated early from another thread, for example, due to a user action or a server restart. This cannot be configured by a Termination as it is impossible to predict when and if it will occur. Therefore the Solver interface has the following thread-safe methods:

class Solver:
    ...
    def terminateEarly(self) -> bool:
        ...
    def isTerminateEarly(self) -> bool:
        ...

When calling the terminateEarly() method from another thread, the Solver will terminate at its earliest convenience and the solve(Solution) method will return (in the original Solver thread).

Interrupting the Solver thread (which is the thread that called Solver.solve(Solution)) has the same effect as calling terminateEarly() except that it leaves that thread in the interrupted state. This guarantees a graceful shutdown when an Executor (such as ThreadPoolExecutor) is shutdown because that only interrupts all active threads in the pool.

10. SolverEventListener

Each time a new best solution is found, a new BestSolutionChangedEvent is fired in the Solver thread.

To listen to such events, add a SolverEventListener to the Solver:

class Solver:
    ...
    def addEventListener(self, event_listener: Callable[[SolverEvent], None]) -> None:
        ...
    def removeEventListener(self, event_listener: Callable[[SolverEvent], None]) -> None:
        ...

The BestSolutionChangedEvent's getNewBestSolution may not be initialized or feasible. Use the isFeasible() method on BestSolutionChangedEvent's new best Score to detect such cases:

def new_best_solution_listener(event):
    # Ignore infeasible (including uninitialized) solutions
    if event.getNewBestSolution().getScore().isFeasible():
        ...


solver.addEventListener(new_best_solution_listener)

Use Score.isSolutionInitialized() instead of Score.isFeasible() to only ignore uninitialized solutions, but also accept infeasible solutions.

The event listener is called in the solver’s thread, as part of Solver.solve(). So it should return quickly to avoid slowing down the solving.

11. Custom solver phase

Custom solver phases are currently not supported in OptaPy.

12. No change solver phase

In rare cases, it’s useful not to run any solver phases. But by default, configuring no phase will trigger running the default phases. To avoid those, configure a NoChangePhase:

<solver xmlns="https://www.optaplanner.org/xsd/solver" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="https://www.optaplanner.org/xsd/solver https://www.optaplanner.org/xsd/solver/solver.xsd">
  ...
  <noChangePhase/>
</solver>

13. Multithreaded solving

Python has a Global Interpreter Lock (GIL), and as a result, only one thread can run at a time in Python. This prevents OptaPy from utilizing multiple cores on a machine, since only one of the cores can execute Python code at a time. Hence, utilizing multithreaded solving will probably degrade performance instead of improving it at this time.

There are several ways of doing multithreaded solving:

  • Multitenancy: solve different datasets in parallel

    • The SolverManager will make it even easier to set this up, in a future version.

  • Multi bet solving: solve 1 dataset with multiple, isolated solvers and take the best result.

    • Not recommended: This is a marginal gain for a high cost of hardware resources.

    • Use multithreaded incremental solving instead.

  • Partitioned Search: Split 1 dataset in multiple parts and solve them independently.

  • Multithreaded incremental solving: solve 1 dataset with multiple threads without sacrificing incremental score calculation.

    • Donate a portion of your CPU cores to OptaPy to scale up the score calculation speed and get the same results in fraction of the time.

    • Configure multithreaded incremental solving.

multiThreadingStrategies

A logging level of debug or trace might cause congestion multithreaded solving and slow down the score calculation speed.

13.1. @planning_id

For some functionality (such as multithreaded solving and real-time planning), OptaPy needs to map problem facts and planning entities to an ID. OptaPy uses that ID to rebase a move from one thread’s solution state to another’s.

To enable such functionality, specify the @planning_id annotation on the identification field or getter method, for example on the database ID:

from optapy import planning_entity, planning_id

@planning_entity
class CloudComputer:
    computer_id: int
    ...
    @planning_id
    def get_computer_id(self):
        return self.computer_id

Or alternatively, on another type of ID:

from optapy import problem_fact, planning_id

@problem_fact
class User:
    user_name: str
    ...
    @planning_id
    def get_user_name(self):
        return self.user_name

A @planning_id property must be:

  • Unique for that specific class

    • It does not need to be unique across different problem fact classes (unless in that rare case that those classes are mixed in the same value range or planning entity collection).

  • An instance of a type that implements __hash__(self) and __eq__(self, other).

    • It’s recommended to use the type int, str or uuid.

  • Never None by the time Solver.solve() is called.

13.2. Multithreaded incremental solving

Python has a Global Interpreter Lock (GIL), and as a result, only one thread can run at a time in Python. This prevents OptaPy from utilizing multiple cores on a machine, since only one of the cores can execute Python code at a time. Hence, utilizing multithreaded solving will probably degrade performance instead of improving it at this time.

Enable multithreaded incremental solving by adding a @planning_id annotation on every planning entity class and planning value class. Then configure a moveThreadCount:

<solver xmlns="https://www.optaplanner.org/xsd/solver" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="https://www.optaplanner.org/xsd/solver https://www.optaplanner.org/xsd/solver/solver.xsd">
  <moveThreadCount>AUTO</moveThreadCount>
  ...
</solver>

Advanced configuration:

<solver xmlns="https://www.optaplanner.org/xsd/solver" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="https://www.optaplanner.org/xsd/solver https://www.optaplanner.org/xsd/solver/solver.xsd">
  <moveThreadCount>4</moveThreadCount>
  <moveThreadBufferSize>10</moveThreadBufferSize>
  <threadFactoryClass>...MyAppServerThreadFactory</threadFactoryClass>
  ...
</solver>

The following moveThreadCounts are supported:

  • NONE (default): Don’t run any move threads. Use the single threaded code.

  • AUTO: Let OptaPy decide how many move threads to run in parallel. On machines or containers with little or no CPUs, this falls back to the single threaded code.

  • Static number: The number of move threads to run in parallel.

    <moveThreadCount>4</moveThreadCount>

    This can be 1 to enforce running the multithreaded code with only 1 move thread (which is less efficient than NONE).

It is counter-effective to set a moveThreadCount that is higher than the number of available CPU cores, as that will slow down the score calculation speed. The number of available cores in Python is effectively 1, due to the Global Intrepreter Lock (GIL). One good reason to do it anyway, is to reproduce a bug of a high-end production machine.

Multithreaded solving is still reproducible, as long as the resolved moveThreadCount is stable. A run of the same solver configuration on 2 machines with a different number of CPUs, is still reproducible, unless the moveThreadCount is set to AUTO or a function of availableProcessorCount.

The moveThreadBufferSize power tweaks the number of moves that are selected but won’t be foraged. Setting it too low reduces performance, but setting it too high too. Unless you’re deeply familiar with the inner workings of multithreaded solving, don’t configure this parameter.