Proceedings - ijcai-11

RCRA 2011 

The 18th RCRA International Workshop on 

“Experimental Evaluation of Algorithms for solving 

problems with combinatorial explosion” 

A workshop of the 

22nd International Joint Conference on Artificial Intelligence 

(IJCAI 2011) 

Barcelona, 17–18 July 2011

Preface 

Many problems in Artificial Intelligence show an exponential explosion of the search space. Although 

stemming from different research areas in AI, such problems are often addressed with algorithms that 

have a common goal: the effective exploration of huge state spaces. Many algorithms developed in 

one research area are applicable to other problems, or can be hybridised with techniques in other 

areas. Artificial Intelligence tools often exploit or hybridise techniques developed by other research 

communities, such as Operations Research. 

In recent years, research in AI has more and more focussed on experimental evaluation of algorithms, 

the development of suitable methodologies for experimentation and analysis, the study of languages 

and the implementation of systems for the definition and solution of problems. 

Scope of this workshop series is fostering the cross-fertilisation of ideas stemming from different areas, 

proposing benchmarks for new challenging problems, comparing models and algorithms from an experimental 

viewpoint, and, in general, comparing different approaches with respect to efficiency, problem 

modelling, and ease of development. 

RCRA workshops are organised by “Rappresentazione della Conoscenza e Ragionamento Automatico” 

(RCRA, rcra.aixia.it), which is a scientific community interested in Knowledge Representation 

and Automated Reasoning. The RCRA group, founded in 1993, is part of the Italian Association for 

Artificial Intelligence (AI*IA, www.aixia.it). 

i

Workshop committees 

Workshop chairs 

Marco Gavanelli, University of Ferrara, Italy 

Toni Mancini, Sapienza University, Roma, Italy 

Programme committee 

Marco Alberti, Universidade Nova de Lisboa, Portugal 

Tolga Bektas, University of Southampton, UK 

Francesco Calimeri, University of Calabria, Italy 

Agostino Dovier, University of Udine, Italy 

Esra Erdem, Sabanci University, Instanbul, Turkey 

Wolfgang Faber, University of Calabria, Italy 

Pierre Flener, Uppsala University, Sweden 

Scott E. Grasman, Missouri University of Science and Technology, USA 

Angel Juan, IN3-Universitat Oberta de Catalunya, Barcelona, Spain 

Henry Kautz, University of Rochester, USA 

Daniel Le Berre, Université d’Artois, Lens Cedex, France 

Inês Lynce, INESC-ID, Lisboa, Portugal 

Marco Maratea, University of Genova, Italy 

Joao Marques-Silva, University College Dublin, Ireland 

Michela Milano, University of Bologna, Italy 

Massimo Narizzano, University of Genova, Italy 

Angelo Oddi, ISTC-CNR, Roma, Italy 

Gilles Pesant, University of Montreal, Canada 

Pilar Pozos Parra, Universidad Juarez Autonoma de Tabasco, Tabasco, Mexico 

Steve Prestwich, University College Cork, Ireland 

Helena Ramalhinho Dias Lourenço, Universitat Pompeu Fabra, Barcelona, Spain 

Daniel Riera i Terrén, IN3-Universitat Oberta de Catalunya, Barcelona, Spain 

Fabrizio Riguzzi, University of Ferrara, Italy 

Rubén Ruiz Garcìa, Universidad Politécnica de Valencia, Spain 

Alessandro Saetti, University of Brescia, Italy 

Andrea Schaerf, University of Udine, Italy 

Bart Selman, Cornell University, USA 

ii

Helmut Simonis, University College Cork, Ireland 

Mirek Truszczyński, University of Kentucky, Lexington, KY, USA 

External referees 

Sara Ceschia, University of Udine, Italy 

Raffaele Cipriano, University of Udine, Italy 

Giuseppe Filippone, University of Calabria, Italy 

Jonathan Gaudreault, FOCAC, Université Laval, Canada 

Giovambattista Ianni, University of Calabria, Italy 

Marco Kuhlmann, Uppsala University, Sweden 

Marco Manna, University of Calabria, Italy 

Vasco Manquinho, INESC-ID, Lisboa, Portugal 

Malek Mouhoub, University of Regina, Canada 

Luca Pulina, University of Sassari, Italy 

Kristian Reale, University of Calabria, Italy 

Francesco Ricca, University of Calabria, Italy 

Greg Rix, University of Montreal, Canada 

Andrea Roli, University of Bologna, Italy 

Giorgio Terracina, University of Calabria, Italy 

Local organisation 

Angel Juan, IN3-Universitat Oberta de Catalunya, Barcelona, Spain 

Daniel Riera i Terrén, IN3-Universitat Oberta de Catalunya, Barcelona, Spain 

Josep Jorba, IN3-Universitat Oberta de Catalunya, Barcelona, Spain 

Helena R. Lourenço, Universitat Pompeu Fabra, Spain 

David Masip, IN3-Universitat Oberta de Catalunya, Barcelona, Spain 

Joan M. Marques, IN3-Universitat Oberta de Catalunya, Barcelona, Spain 

Joan A. Pastor, IN3-Universitat Oberta de Catalunya, Barcelona, Spain 

iii

Table of contents 

Full papers 

Parallel Search for Boolean Optimization 

Ruben Martins, Vasco Manquinho and Inês Lynce ................................................1 

Hydra-MIP: Automated Algorithm Configuration and Selection for Mixed Integer Programming 

Lin Xu, Frank Hutter, Holger Hoos and Kevin Leyton-Brown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16 

Predicting Natural Hazards Evolution: How to Overcome the Impact of Input-parameter Uncertainty 

Andrés Cencerrado, Ana Cortés and Tomás Margalef . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 

A Hybrid Algorithm Combining Path Scanning and Biased Random Sampling for the Arc Routing 

Problem 

Sergio González Martín, Angel A. Juan, Daniel Riera and José Cáceres Cruz . . . . . . . . . . . . . . . . . . . 46 

Algorithms for Interval Data Minmax Regret Paths 

Carolinne Torres, César Astudillo, Matthew Bardeen and Alfredo Candia . . . . . . . . . . . . . . . . . . . . . . . .55 

Community of Scientist Optimization: Foraging and Competing for Research Resources 

Alfredo Milani and Valentino Santucci . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 

An Empirical Study of Learning and Forgetting Constraints 

Neil Charles Armour Moore, Ian Gent and Ian Miguel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 

Job Shop Scheduling with Routing Flexibility and Sequence Dependent Setup-Times 

Angelo Oddi, Riccardo Rasconi, Amedeo Cesta and Stephen Smith . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .96 

Automatic Generation of Efficient Domain-Specific Planners from Generic Parametrized Planners 

Mauro Vallati, Chris Fawcett, Alfonso Gerevini, Holger Hoos and Alessandro Saetti . . . . . . . . . . . . 111 

Taking Advantage of Domain Knowledge in Optimal Hierarchical Deepening Search Planning 

Pascal Schmidt, Florent Teichteil-Königsbuch and Patrick Fabiani . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 

Short papers 

Solving Disjunctive Temporal Problems with Preferences using Boolean Optimization solvers 

Marco Maratea, Maurizio Pianfetti and Luca Pulina . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 

Visualizing Learning Dynamics in Large-Scale Networks 

Manal Rayess and Sherief Abdallah . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 

ACO Algorithms for Solving a New Fleet Assignment Problem 

Javier Diego, Miguel Ortega-Mier, Alvaro Garcia-Sanchez and Ignacio Rubio . . . . . . . . . . . . . . . . . . .155 

A New Guillotine Placement Heuristic for the Orthogonal Cutting Problem 

Slimane Aboumsabah and Ahmed Riadh Baba-Ali . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 

Solving Distributed FCSPs with Naming Games 

Stefano Bistarelli, Giorgio Gosti and Francesco Santini . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 

iv

Already published papers 

On Improving MUS Extraction Algorithms 

Joao Marques-Silva and Inês Lynce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 

Applying UCT to Boolean Satisfiability 

Alessandro Previti, Raghuram Ramanujan, Marco Schaerf and Bart Selman . . . . . . . . . . . . . . . . . . . . 180 

An Efficient Hierarchical Parallel Genetic Algorithm for Graph Coloring Problem 

Reza Abbasian and Malek Mouhoub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 

Checking Safety of Neural Networks with SMT Solvers: a Comparative Evaluation 

Luca Pulina and Armando Tacchella . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .182 

Plan Stability: Replanning versus Plan Repair 

Maria Fox, Alfonso Gerevini, Derek Long and Ivan Serina . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 

v

Parallel Search for Boolean Optimization 

Ruben Martins, Vasco Manquinho, and Inês Lynce 

IST/INESC-ID, Technical University of Lisbon, Portugal 

{ruben,vmm,ines}@sat.inesc-id.pt 

Abstract. The predominance of multicore processors has increased the interest 

in developing parallel Boolean Satisfiability (SAT) solvers. As a result, more 

parallel SAT solvers are emerging. Even though parallel approaches are known 

to boost performance, parallel approaches developed for Boolean optimization 

are scarce. This paper proposes parallel search algorithms for Boolean optimization 

and introduces a new parallel solver for Boolean optimization problem instances. 

Using two threads, an unsatisfiability-based algorithm is used to search 

on the lower bound value of the objective function, while at the same time a linear 

search is performed on the upper bound value of the objective function. Searching 

in both directions and exchanging learned clauses between these two orthogonal 

approaches makes the search more efficient. This idea is further extended for a 

larger number of threads by dividing the search space considering different local 

upper values of the objective function. The parallel search on different local upper 

values leads to constant updates on the lower and upper bound values, which 

result in reducing the search space. Moreover, different search strategies are performed 

on the upper bound value, increasing the diversification of the search. 

1 Introduction 

An increasing number of parallel Boolean Satisfiability (SAT) solvers have come to 

light in the recent past as a result of multicore processors having become the dominant 

platform. The use of SAT is widespread with many practical application and it is clear 

that the optimization version of SAT, i.e. Boolean optimization, can be applied to solve 

many practical optimization problems. The competitive performance and robustness of 

Boolean optimization solvers is certainly required to achieve this goal. 

When compared with SAT instances, Boolean optimization instances tend to be 

more intricate as it is not sufficient to find an assignment that satisfies all the constraints, 

but rather an optimization function has to be taken into account. Hence, it comes as a 

natural step to develop parallel algorithms to Boolean optimization, following the recent 

success in the SAT field. 

Although this reasoning comes as natural, there are only a few parallel implementation 

for solving Boolean Optimization. SAT4J PB RES//CP 1 implements a resolution 

based algorithm that competes with a cutting plane based algorithm to find a new upper 

Proceedings of the 18 th RCRA workshop on Experimental Evaluation of Algorithms for Solving 

Problems with Combinatorial Explosion (RCRA 2011). 

In conjunction with IJCAI 2011, Barcelona, Spain, July 17-18, 2011. 

1 http://www.satcompetition.org/PoS/presentations-pos/leberre.pdf 

1

2 R. Martins, V. Manquinho, I. Lynce 

bound or to prove optimality. When one of the algorithms finds a new upper bound, it 

terminates the search of the other algorithm and both restart their search within the new 

upper bound. If one of the algorithms proves optimality then the problem is solved and 

the search is stopped. Clause sharing is not performed between these two algorithms. 

In the context of Integer Linear Programming (ILP), the commercial solver CPLEX is 

known to have the option of performing parallel search 2 but no detailed description is 

available. 

Parallel algorithms have the advantage of allowing to implement orthogonal approaches 

that complement each other. That is the case in SAT4J PB RES//CP where 

cutting planes are run against resolution. Another alternative, which will be explored in 

this paper, is to run an algorithm that searches to increase the lower bound value against 

an algorithm that searches to decrease the upper bound value. Furthermore, one may 

have more than one algorithm searching on the upper bound value. 

The main contribution of this paper is two-fold. First, we introduce a parallel search 

algorithm for Boolean optimization that uses two threads: one thread searches to reduce 

the upper bound value and the other thread searches to increase the lower bound value. 

Second, a more complex parallel algorithm is introduced, which extends the previous 

algorithm with additional threads searching to reduce the upper bound value. 

The paper is organized as follows. The next section describes the preliminaries, 

namely Maximum Satisfiability (MaxSAT) and Pseudo-Boolean Optimization (PBO). 

Section 3 describes a parallel two-thread search algorithm for Boolean optimization, 

which is extended to a multithread algorithm in section 4. Afterwards, an experimental 

evaluation of the new algorithms is presented and the paper concludes. 

2 Preliminaries 

In this section we briefly describe the Boolean optimization formalisms to be used in 

the remainder of the paper, namely Maximum Satisfiability (MaxSAT) and Pseudo- 

Boolean Optimization (PBO). Moreover, we also review the encoding from MaxSAT to 

PBO. 

The MaxSAT problem can be defined as finding an assignment to problem variables 

such that it minimizes (maximizes) the number of unsatisfied (satisfied) clauses 

in a CNF formula ϕ. However, MaxSAT has several variants such as partial MaxSAT, 

weighted MaxSAT and weighted partial MaxSAT. In the partial MaxSAT problem some 

clauses in ϕ are declared as hard, while the reminder are declared as soft. The objective 

in partial MaxSAT is to find an assignment such that all hard clauses are satisfied while 

minimizing the number of unsatisfied soft clauses. Finally, in the weighted versions of 

MaxSAT, soft clauses can have weights greater than 1 and the objective is to satisfy all 

hard clauses while minimizing the total weight of unsatisfied soft clauses. 

A related Boolean optimization formalism is Pseudo-Boolean Optimization (PBO). 

PBO is defined as finding an assignment to problem variables such that all pseudo- 

Boolean constraints are satisfied and the value of a linear cost function is minimized. 

Unlike MaxSAT, constraints in PBO are more general and there are no soft constraints. 

2 http://www.ibm.com/software/integration/optimization/cplex-optimizer/ 

2


3 Parallel Search on the Lower and Upper Bound Values 

Unsatisfiability-based algorithms are very effective for several Boolean optimization 

problems [10, 18, 2]. These algorithms work by iteratively identifying unsatisfiable subformulas 

ϕU from the original formula ϕ. At each step, a SAT (or pseudo-Boolean) 

solver is used to check if the formula is unsatisfiable. If that is the case, for each soft 

constraint 3 in the identified unsatisfiable sub-formula ϕU , a new relaxation variable 

is added such that when assigned to 1, the soft constraint becomes satisfiable [18]. 

Moreover, additional constraints are also added to ϕ such that only one of the newly 

created relaxation variables can be assigned value 1. Next, the solver checks if the 

formula remains unsatisfiable. The procedure ends when the working formula becomes 

satisfiable and the solver returns a solution (i.e. the optimum value was found), or if ϕU 

only contains hard constraints (i.e. the original problem instance is unsatisfiable) [10]. 

The original procedure proposed by Fu and Malik [10] was improved, namely by 

using more effective encodings [21, 20] for the constraints on the relaxation variables, 

as well as different strategies to minimize the overall number of relaxation variables 

needed [20, 2]. Moreover, generalizations for the weighted MaxSAT variants have also 

been proposed [18, 3]. 

The most classical approach for Boolean optimization is the use of branch and 

bound algorithms where an upper bound on the value of the objective function is updated 

when a new solution is found. In these algorithms, lower bounds are estimated 

and whenever the lower bound is higher or equal to the upper bound, the search procedure 

can safely backtrack since extending the current set of variables assignments will 

surely not result in a better solution. Several MaxSAT and PBO algorithms follow this 

approach using different lower bounding procedures [15, 16, 4, 12, 17]. 

Another classical approach is to perform a linear search on the value of the objective 

function [8]. In this case, whenever a new solution is found, the upper bound value is 

updated and a new constraint is added such that all solutions with a higher value are 

excluded. Several PBO solvers use this approach [23, 9, 14, 1]. Moreover, by using an 

encoding to PBO, MaxSAT instances can also be solved using this approach [14]. 

Notice that the unsatisfiability-based procedures correspond to searching on the 

lower bound of the value of the optimal solution. At each iteration the working formula 

is unsatisfiable, and the algorithm terminates when the working formula becomes 

satisfiable. On the other hand, linear search on the values of the objective function 

corresponds to searching on the upper bound. In this case, the working formula is satisfiable 

at each iteration. The algorithm terminates when the problem instance becomes 

unsatisfiable and the optimum value is given by the last recorded solution. 

An algorithm that searches on both the lower and upper bounds of the objective 

function has already been proposed [19]. The search is initially done by a pseudo- 

Boolean solver that performs a search on the upper bound value of the objective function. 

However, the use of the pseudo-Boolean solver is limited to 10% of the time limit 

given to solve the formula. If the PBO solver proves optimality within this time limit, 

the optimal solution has been found without having to search on the lower bound side. 

3 In MaxSAT a soft constraint is a clause, but for more general formulations, it can be any linear 

pseudo-Boolean constraint. 

4

Parallel Algorithms for Boolean Optimization 5 

On the other hand, if the PBO solver was not able to prove optimality within the time 

limit, an unsatisfiability-based algorithm is used to search on the lower bound value 

of the objective function. wbo [18, 19] is a weighted Boolean optimization solver that 

uses this approach. Experimental results show that searching on the upper and lower 

bound values leads to solving more instances. Since these approaches are orthogonal, 

they complement each other on several classes of problem instances. In this paper, for 

simplicity of the algorithmic description, it is assumed that the Boolean optimization 

problem to be solved is weighted partial MaxSAT. However, algorithms described next 

can be easily generalized to other Boolean formulations. 

3.1 Parallel Search 

Nowadays, extra computing power is not coming anymore from higher processor frequencies 

but rather from a growing number of cores and processors. Exploiting this new 

architecture will allow Boolean Optimization solvers to become more effective and to 

be able to solve more problem instances. In this section we propose to perform a parallel 

search on the upper and lower bound values of the objective function. Even though 

searching on both the upper and the lower bound is not new [21, 19], searching on both 

of them in parallel is novel to the best of our knowledge. In this paper we propose the 

parallelization of the wbo solver, and the new solver is named pwbo. pwbo uses a linear 

search algorithm to search on the upper bound side and an unsatisfiability-based 

algorithm for searching on the lower bound side. 

A parallel search with these two orthogonal strategies results in a performance as 

good as the best strategy for each problem instance. However, if both threads cooperate 

through clause sharing, it is possible to perform better than the best strategy. Additionally, 

both strategies can also cooperate in finding the optimum value. If during the search 

the lower bound value provided by the unsatisfiability-based algorithm and the upper 

bound value provided by the other thread become the same, it means that the optimum 

solution has been found. Therefore, it is not necessary for any of the threads to continue 

the search to prove optimality since their combined information already proves it. 

3.2 Clause Sharing 

It is commonly known that conflict-driven clause learning is crucial for the efficiency of 

modern Boolean optimization solvers. The description of conflict-driven clause learning 

procedures is out of the scope of the paper and will be assumed. We refer to the literature 

for detailed explanations on these procedures [22, 25]. In the context of parallel solving, 

it is expected that sharing learned clauses can help to further prune the search space and 

boost the performance of the parallel solver. 

In parallel SAT solving, learned clauses that have less than a given number of literals 

are shared among the different threads. More advanced heuristics can be used for 

controlling the throughput and quality of the shared clauses [11]. Moreover, the literal 

block distance [5] can also be used for sharing clauses in a parallel context [13]. In 

our approach, we start by sharing clauses that have 5 or fewer literals. This cutoff is 

dynamically changed using the throughput and quality heuristic proposed by Hamadi 

et al. [11]. Additionally, all clauses that have literal block distance 2 are also shared. 

5


It should be noted that in the pwbo solver not all conflict-driven learned clauses can 

be shared between both threads. This is due to the fact that the working formulas are 

different. On the unsatisfiability-based algorithm, the input formula ϕMS is a weighted 

partial MaxSAT formula with soft and hard constraints. 

However, on the thread that makes the linear search on the upper bound value of the 

objective function, we encode the input formula ϕMS into a PBO formulation ϕPBO. 

As a result of that encoding (see example 1), the set of variables in ϕPBO might have 

been extended by additional relaxation variables necessary to encode the soft clauses in 

the original formula ϕMS. In order to define the conditions for safe clause sharing, we 

start by defining soft and hard learned clauses. 

Definition 1 (Soft and Hard Learned Clauses). If in the conflict analysis procedure 

used in the unsatisfiability-based algorithm, at least one soft clause is used in the clause 

learning process, then the generated learned clause is labeled as soft. On the other 

hand, if only hard clauses are used, then the generated learned clause is labeled as 

hard. 

Since ϕMS contains both soft and hard clauses, it will also have soft and hard 

learned clauses. On the other hand, ϕPBO only has hard clauses, and as a result, will 

only have hard learned clauses. Nevertheless, as mentioned previously, ϕPBO may contain 

additional variables not present in ϕMS. As a result, the safe sharing procedure 

between the two threads is as follows: 

– A hard learned clause from the unsatisfiability-based algorithm can be safely shared 

to the other thread. This is due to the fact that the resolution operations used in ϕMS 

can also be reproduced in ϕPBO, since all original hard clauses in ϕMS are also 

present in ϕPBO. 

– A soft learned clause from the unsatisfiability-based algorithm is not shared since 

it may not be valid for formula ϕPBO. 

– A hard learned clause generated when solving ϕPBO can be shared with the unsatisfiability-based 

algorithm if the learned clause does not contain relaxation variables. 

This is safe since one can reproduce the generation of the hard learned clause 

by resolution steps using just hard clauses also present in ϕMS. 

Finally, between iterations of the unsatisfiability-based algorithm, working formula 

ϕMS is also extended with additional relaxation variables. However, since these variables 

are added to soft clauses, if a conflict-based learned clause contains any relaxation 

variable, then it will necessarily be considered a soft clause. This is due to the fact that 

at least one soft clause would have been used in the learning procedure. 

4 Parallel Search on the Upper Bound Value 

The previous section presented a parallel search solver for Boolean optimization based 

on two orthogonal strategies. In the proposed approach, one thread is used for each 

strategy. For computer architectures with more than two cores, we can extend the previous 

idea by performing a parallel search on the upper bound value of the objective 

function. Therefore, if n cores are available, we can use one thread to search on the 

6


lower bound value of the objective function, while at the same time k threads search 

on different local upper bound values of the objective function and n − k − 1 threads 

search on the upper bound value of the objective function. Local bound threads have a 

local upper bound value that is enforced in their search. The iterative search on different 

local upper bound values leads to constant updates on the lower and upper bound 

values that will reduce the search space. Next, an example of this approach is described. 

Afterwards, a more detailed description of the algorithm is provided. 

Example 2. Consider a weighted partial MaxSAT formula ϕMS as input. For the input 

formula, one can easily find initial lower and upper bounds. Suppose the initial lower 

and upper bound values are 0 and 11, respectively. Moreover, consider also that the 

optimal solution is 3 and our goal is to find it using four threads, t0,t1,t2 and t3. Thread 

t0 applies an unsatisfiability-based algorithm (i.e., searches on the lower bound of the 

optimum value of the objective). This thread starts with a lower bound of 0 and will 

iteratively increase the lower bound until the optimum value is found. 

Thread t1 searches on the upper bound value of the objective function, while threads 

t2 and t3 search on different local upper bound values of the objective function. The 

initial input formula ϕMS is encoded into the pseudo-Boolean formalism (see section 2) 

and an additional constraint is added to limit the value of the objective function in each 

thread. For example, thread t1 starts its search with upper bound value of 11 and threads 

t2 and t3 can start their search with respective local upper bound values of 3 and 7. 

Suppose that thread t2 finishes its computation and finds that the formula is unsatisfiable 

for an upper bound of 3. This means that there is no solution with values 0, 1 and 

2 for the objective function. Therefore, the global lower bound value can be updated to 

3. Thread t2 is now free to search on a different local upper bound value, for example 

5. In the meantime, thread t3 found a solution with objective value 6. Hence, the global 

upper bound value can be updated to 6. Thread t1 updates its upper bound value to 6 

and thread t3 is now free to search on a different local upper bound value, for example 

4. Afterwards, consider that thread t1 found a solution with objective value 3. Again, 

the global upper bound value can be updated to 3. Since the global lower bound value is 

the same as the global upper bound value, the optimum has been found and the search 

terminates. 

4.1 Algorithmic Description 

In what follows it is shown how the parallel search on the values of the objective 

function can be implemented in pwbo. Algorithm 1 describes pwbo. It receives a 

weighted partial MaxSAT formula (ϕMS) and the number of available threads (n). The 

thread with index 0 is referred to as the lower bound thread and applies an unsatisfiabilitybased 

algorithm to ϕMS. The thread with index 1 is referred to as the upper bound 

thread and searches on the upper bound value of the objective function. The threads 

indexed 2 to n − 1 are referred as local upper bound threads and search on different 

local upper bound values of the objective function. For the sake of simplicity, it is 

considered that there is only one thread that searches on the upper bound value of the 

objective function. However, this algorithm can be easily generalized for k local upper 

7

Algorithm 2 Parallel Algorithms for Boolean Optimization 

PARALLELLOWERALG(ϕ) 

1 localLB ← 0 

2 while (search) 

3 do (st, ϕU ,model) ← PBSOLVER(ϕ) 

4 if st = UNSAT 

5 then localLB ← localLB+ COREWEIGHT(ϕU ) 

6 RELAXCORE(ϕ, ϕU ) 

7 UPDATELOWERBOUND(localLB, 0) 

8 if globalUB = globalLB 

9 then search ← false 

10 else if st = SAT 

11 then UPDATEUPPERBOUND(localLB, 0) 

12 globalModel ← model 

13 search ← false 

PARALLELUPPERALG(ϕ,id) 


1 while (search) 

2 do (st, ϕU ,model) ← PBSOLVER(ϕ ∪ { P cjlj ≤ threadUB[id] − 1}) 

3 if st = UNSAT 

4 then UPDATELOWERBOUND(threadUB[id], id) 

5 CLEARLOCALCONSTRAINTS(ϕ) 

6 else if st = FORCED ABORT 

7 then CLEARLOCALCONSTRAINTS(ϕ) 

8 else if st = SAT 

9 then UPDATEUPPERBOUND(VALUE(MODEL), id) 

10 globalModel ← model 

11 if globalUB = globalLB 

12 then search ← false 

unsatisfiable sub-formula provided by the PB solver if ϕ is unsatisfiable, and model 

contains an assignment to the variables of ϕ when the formula is satisfiable. In this 

thread, if the outcome of the PB solver is forced abort, it means an optimal solution has 

been found by another thread (search was set to false) and the procedure terminates. 

When the status of the PB solver is unsatisfiable (line 3), the unsatisfiable subformula 

ϕU is relaxed in the procedure RELAXCORE. We refer to the literature for 

the details of this procedure [2, 18]. Next, if localLB is greater than the current global 

lower bound, the global lower bound is updated in UPDATELOWERBOUND (line 7). 

Notice that this may result in forcing one or more upper bound threads to abort and 

updating their upper bound limits. Otherwise, it means that an upper thread has already 

proved a better lower bound, and the search proceeds. If the status of the PB solver 

is satisfiable (line 10) it means that the unsatisfiability-based algorithm has found an 

optimal solution. As a result, the upper bound is updated (line 11), the solution is stored 

(line 12) and the flag search is set to false so that the remaining threads terminate. 

9


The PARALLELUPPERALG procedure takes as input a PBO formula ϕ (section 2) 

and a thread identifier. At each iteration, a PB solver is used to solve ϕ (line 2), with an 

additional constraint that limits the value of the objective function. Let this constraint 

be named the thread bound constraint. 

Notice that the thread bound constraint cannot be shared among all threads, since 

it is only valid if the optimum value is lower than the thread upper bound. The same 

sharing rules must apply to conflict-driven learned clauses that depend on the thread 

bound constraint. Therefore, it is necessary to define what is a local constraint and in 

what conditions it can be shared with other threads. 

Definition 2 (Local Constraint). The thread bound constraint is labeled as a local 

constraint. Let ω be a conflict-driven learned clause and let ϕω be the set of constraints 

used in the implication graph to learn ω. The new clause ω is defined as a local constraint 

if at least one constraint in ϕω is a local constraint. 

After the call to the PB solver (line 3), if it returns unsatisfiable, it means that a new 

lower bound has been found. The lower bound is updated (line 4) and if the thread is 

searching on a local upper bound then it gets a new local upper bound value. Since the 

formula given to the PB solver was unsatisfiable, it is necessary to remove the thread 

bound constraint (line 5). Additionally, all local clauses are also removed since they 

may not be valid with the new local upper bound. 

If the status of the solver is forced abort, it means that some other thread already 

proved that the current search space is redundant. This can happen if the thread local 

upper bound is smaller than the global lower bound, or if the thread local upper bound 

is greater than the global upper bound. Local constraints are therefore removed (line 

7). In fact, the local constraints are only removed when the forced abort is caused by 

an update on the global lower bound value. Otherwise, local constraints remain valid. 

If the PB solver returns satisfiable, a new upper bound has been found. Therefore, the 

global upper bound is updated (line 9) and the model is stored (line 10). If the thread 

is searching on a local upper bound then it gets a new local upper bound value, since 

the upper bound thread will continue the search on the new upper bound that has been 

found. If the thread is searching on the global upper bound the search then proceeds 

as usual. Finally, after the necessary updates depending on the PB solver status, it is 

checked whether the global upper bound is equal to the global lower bound. If this 

occurs, optimality is proved and the search terminates (lines 11-12). 

We should note that some details are not fully described in this algorithmic description 

due to lack of space. In particular, updates to global data structures are inside 

critical regions and locks are used to avoid two or more threads to be updating these 

data structures at the same time. Moreover, updates to global lower and upper bounds 

only take place when the new values improve the current ones. Additionally, the update 

on the saved model is also inside a critical region and is only done when the global 

upper bound is updated. 

Finally, when the global bounds are updated at UPDATELOWERBOUND and UP- 

DATEUPPERBOUND, that may result in forcing the PB solver in other threads to stop 

(resulting in a forced abort status). As a result, new thread local upper bounds must be 

defined for the aborted threads. Hence, each aborted thread is assigned a new local upper 

bound that covers the broadest range of yet untested bounds. More formally, the new 

10


Table 1. Number of industrial partial MaxSAT instances solved by sequential and parallel solvers 

Benchmark set #I QMaxSAT pm2 wbo 

pwbo 

2T 4T 4T-CNF 

bcp-fir 59 50 58 42 44 44 56 

bcp-hipp-yRa1 55 46 45 22 22 24 40 

bcp-msp 64 26 14 16 15 15 20 

bcp-mtg 40 40 40 31 32 33 40 

bcp-syn 74 32 39 34 36 36 40 

CircuitTraceCompaction 4 4 4 4 4 4 4 

HaplotypeAssembly 6 0 5 5 5 5 5 

pbo-mqc 168 153 129 147 167 168 168 

pbo-routing 15 15 15 15 15 15 15 

PROTEIN INS 12 6 3 1 1 1 2 

Total 497 372 352 317 341 345 390 

those categories. The evaluation was performed on two AMD Opteron 6172 processors 

(2.1 GHz with 64 GB of RAM) running Fedora Core 13 with a timeout of 1,800 seconds 

(wall clock time). 

The results were obtained by running each parallel solver on each instance for three 

times. Similarly to what is done when analyzing randomized solvers, the median time 

was taken into account. This means that an instance must be solved by at least two of 

the three runs to be considered solved. We should note, however, that this measure is 

more conservative than the one used in the SAT Race 2008 5 which is commonly used 

by parallel SAT solvers [11]. 

Table 1 gives the number of partial MaxSAT instances from the industrial category 

that were solved by sequential and parallel solvers. The sequential solvers considered 

were QMaxSAT 6 (ranked 1 st in the MaxSAT Evaluation 2010), pm2 [2] (ranked 2 nd ) 

and wbo [18, 19] (ranked 3 rd ). Note that wbo is also our reference solver as the new 

parallel algorithms were implemented on the top of wbo. SAT4J MAXSAT [14] and SAT4J 

MAXSAT RES//CP were not evaluated since their performance is not comparable to the 

remaining state-of-the-art partial MaxSAT solvers. For the 497 instances tested, SAT4J 

MAXSAT 2.2.3 and SAT4J MAXSAT RES//CP can only solve 277 and 290 instances, 

respectively. 

The parallel solvers evaluated correspond to the different versions of pwbo. pwbo 

is a parallel solver implemented on the top of wbo. pwbo 2T uses two threads according 

to what is described in section 3, thus having one thread searching on the lower 

bound value and another thread searching on the upper bound value. pwbo 4T and pwbo 

4T-CNF use four threads according to what is described in section 4, thus having one 

thread searching on the lower bound value and three threads searching on the upper 

bound value. The difference between pwbo 4T and pwbo 4T-CNF is on the number 

of threads that search on local and global upper bound values. Increasing the number 

of threads that search on local upper bound values allows to reduce the search space 

by finding new lower and upper bounds. On the other hand, increasing the number of 

5 http://baldur.iti.uka.de/sat-race-2008/ 

6 http://www.maxsat.udl.cat/10/solvers/QMaxSat.pdf 

12

time (seconds) 

1800 

1600 

1400 

1200 

1000 

800 

600 

400 

200 

wbo 

pwbo T2 

pwbo T4 

PM2 

QMaxSAT 

pwbo T4-CNF 


Fig. 1. Cactus plot with running times of solvers 

0 

0 50 100 150 200 250 300 350 400 

instances 

threads that search on the global upper bound increases the diversification of the search, 

since those threads are searching using different strategies. pwbo 4T uses two threads to 

search on local upper bound values and one thread to search on the global upper bound 

value. On the other hand, pwbo 4T-CNF uses one thread to search on local upper bound 

values and two threads to search on the global upper bound value with the different 

strategies described in section 4.2. The objective function for partial MaxSAT instances 

corresponds to a cardinality constraint, since all coefficients are 1. Therefore, pwbo 

4T-CNF uses Sinz’s encoding [24] to translate the cardinality constraint into clauses. 

Clearly, all versions of pwbo perform better than the sequential solver wbo. When 

analyzing each benchmark family, one can conclude that the benefits obtained from 

parallel solvers are not the same for all benchmarks families, although in general the 

number of solved instances tends to increase for all families. There is a significant boost 

when using two threads (pwbo T2), showing that a parallel search on the lower and 

upper bounds makes the search mode efficient and solves more instances. When using 

four threads the number of solved instances still increases. pwbo T4 shows that reducing 

the search space by doing a local upper bound search allows solving more instances. 

Another significant boost is given by the diversification of the search. Indeed, pwbo 

T4-CNF with its combination of search diversification and search space reduction is 

able to solve more instances than the best sequential solver (QMaxSAT), thus improving 

the current state of the art. 

Figure 1 contains a cactus plot with the running times of all the solvers for which 

data was given in Table 1. With no doubt, the parallel versions of pwbo perform better 

than wbo. Moreover, the best performing solver is pwbo T4-CNF that clearly outperforms 

all other solvers, including the best sequential solver QMaxSAT. Finally, Table 2 

13


Table 2. Speedup on the 312 instances solved by wbo and all pwbo solvers 

Solver Time (s) Speedup 

wbo 36,208.33 1.00 

pwbo 2T 22,798.28 1.59 

pwbo 4T 18,203.79 1.99 

pwbo 4T-CNF 13,236.87 2.74 

contains the speedup resulting from using pwbo, the parallel version of wbo. wbo is 

compared against pwbo 2T, pwbo 4T and pwbo 4T-CNF. The results are conclusive. The 

speedup increases as the number of threads increases, being almost 2 in pwbo 4T when 

local upper bound search is used and close to 3 in pwbo 4T-CNF when diversification of 

the search is combined with reduction of the search space. 

6 Conclusions 

This paper introduces new parallel algorithms for Boolean optimization. This work was 

in part motivated by the recent success of parallel SAT algorithms, also taking into account 

that parallel algorithms for Boolean optimization are scarce. Two new algorithms 

were proposed. The first algorithm uses two threads, one searching on the lower bound 

value and the other one searching on the upper bound value of the objective function. 

The second algorithm uses an additional number of threads to search on local upper 

bound values. Moreover, this algorithm is further improved by increasing the diversification 

of the search through different search strategies on the global upper bound. 

Experimental results, obtained on a significant number of problem instances, clearly 

show the efficiency of the new proposed algorithms. 

Due to the success of our approach in partial MaxSAT, we plan to further extend 

our evaluation to weighted Boolean optimization, as future work. Moreover, we propose 

to further increase the diversification of the search by implementing a portfolio of 

complementary algorithms. The portfolio of algorithms can then be used to search on 

local and global upper bounds thus increasing the efficiency of the solver. Finally, an 

experimental study of the scalability of our approach should also be performed. 

Acknowledgement. This work was partially supported by FCT under research projects 

BSOLO (PTDC/EIA/76572/2006) and iExplain (PTDC/EIA-CCO/102077/2008), and 

INESC-ID multiannual funding through the PIDDAC program funds. 

References 

1. F. Aloul, A. Ramani, I. Markov, and K. A. Sakallah. Generic ILP versus specialized 0-1 ILP: 

An update. In International Conference on Computer-Aided Design, pages 450–457, 2002. 

2. C. Ansótegui, M. Bonet, and J. Levy. Solving (Weighted) Partial MaxSAT through Satisfiability 

Testing. In International Conference on Theory and Applications of Satisfiability 

Testing, pages 427–440, 2009. 

3. C. Ansótegui, M. Bonet, and J. Levy. A New Algorithm for Weighted Partial MaxSAT. In 

AAAI Conference on Artificial Intelligence, pages 3–8, 2010. 

14


4. J. Argelich, C. M. Li, and F. Manyà. An improved exact solver for partial max-sat. In 

International Conference on Nonconvex Programming: Local and Global Approaches, pages 

230–231, 2007. 

5. G. Audemard and L. Simon. Predicting Learnt Clauses Quality in Modern SAT Solvers. In 

International Joint Conference on Artificial Intelligence, pages 399–404, 2009. 

6. O. Bailleux, Y. Boufkhad, and O. Roussel. A Translation of Pseudo Boolean Constraints to 

SAT. Journal on Satisfiability, Boolean Modeling and Computation, 2:191–200, 2006. 

7. O. Bailleux, Y. Boufkhad, and O. Roussel. New Encodings of Pseudo-Boolean Constraints 

into CNF. In International Conference on Theory and Applications of Satisfiability Testing, 

pages 181–194, 2009. 

8. P. Barth. A Davis-Putnam Enumeration Algorithm for Linear Pseudo-Boolean Optimization. 

Technical Report MPI-I-95-2-003, Max Plank Institute for Computer Science, 1995. 

9. N. Eén and N. Sörensson. Translating pseudo-Boolean constraints into SAT. Journal on 

Satisfiability, Boolean Modeling and Computation, 2:1–26, 2006. 

10. Z. Fu and S. Malik. On solving the partial MAX-SAT problem. In International Conference 

on Theory and Applications of Satisfiability Testing, pages 252–265, 2006. 

11. Y. Hamadi, S. Jabbour, and L. Sais. Control-Based Clause Sharing in Parallel SAT Solving. 

In International Joint Conference on Artificial Intelligence, pages 499–504, 2009. 

12. F. Heras, J. Larrosa, and A. Oliveras. MiniMaxSAT: An efficient weighted Max-SAT solver. 

Journal of Artificial Intelligence Research, 31:1–32, 2008. 

13. S. Kottler. SArTagnan. SAT Race, Solver Description, 2010. 

14. D. Le Berre and A. Parrain. The sat4j library, release 2.2 system description. Journal on 

Satisfiability Boolean Modeling and Computation, 7:59–64, 2010. 

15. C. M. Li, F. Manyà, and J. Planes. New inference rules for Max-SAT. Journal of Artificial 

Intelligence Research, 30:321–359, 2007. 

16. H. Lin and K. Su. Exploiting inference rules to compute lower bounds for MAX-SAT solving. 

In International Joint Conference on Artificial Intelligence, pages 2334–2339, 2007. 

17. V. Manquinho and J. Marques-Silva. Search pruning techniques in SAT-based branch-andbound 

algorithms for the binate covering problem. IEEE Transactions on Computer-Aided 

Design, 21(5):505–516, 2002. 

18. V. Manquinho, J. Marques-Silva, and J. Planes. Algorithms for Weighted Boolean Optimization. 

In International Conference on Theory and Applications of Satisfiability Testing, pages 

495–508, 2009. 

19. V. Manquinho, R. Martins, and I. Lynce. Improving Unsatisfiability-Based Algorithms for 

Boolean Optimization. In International Conference on Theory and Applications of Satisfiability 

Testing, pages 181–193, 2010. 

20. J. Marques-Silva and V. Manquinho. Towards more effective unsatisfiability-based maximum 

satisfiability algorithms. In International Conference on Theory and Applications of 

Satisfiability Testing, pages 225–230, 2008. 

21. J. Marques-Silva and J. Planes. Algorithms for Maximum Satisfiability using Unsatisfiable 

Cores. In Design, Automation and Testing in Europe Conference, pages 408–413, 2008. 

22. J. Marques-Silva and K. Sakallah. GRASP: A new search algorithm for satisfiability. In 

International Conference on Computer-Aided Design, pages 220–227, 1996. 

23. H. Sheini and K. Sakallah. Pueblo: A Modern Pseudo-Boolean SAT Solver. In Design, 

Automation and Testing in Europe Conference, pages 684–685, March 2005. 

24. C. Sinz. Towards an Optimal CNF Encoding of Boolean Cardinality Constraints. In International 

Conference on Principles and Practice of Constraint Programming, pages 827–831, 

2005. 

25. L. Zhang, C. F. Madigan, M. W. Moskewicz, and S. Malik. Efficient conflict driven learning 

in boolean satisfiability solver. In International Conference on Computer-Aided Design, 

pages 279–285, 2001. 

15

Hydra-MIP: Automated Algorithm Configuration and 

Selection for Mixed Integer Programming 

Lin Xu, Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown 

Department of Computer Science 

University of British Columbia, Canada 

{xulin730, hutter, hoos, kevinlb}@cs.ubc.ca 

Abstract. State-of-the-art mixed integer programming (MIP) solvers are highly 

parameterized. For heterogeneous and a priori unknown instance distributions, no 

single parameter configuration generally achieves consistently strong performance, 

and hence it is useful to select from a portfolio of different configurations. HYDRA 

is a recent method for using automated algorithm configuration to derive multiple 

configurations of a single parameterized algorithm for use with portfolio-based 

selection. This paper shows that, leveraging two key innovations, HYDRA can 

achieve strong performance for MIP. First, we describe a new algorithm selection 

approach based on classification with a non-uniform loss function, which 

significantly improves the performance of algorithm selection for MIP (and SAT). 

Second, by modifying HYDRA’s method for selecting candidate configurations, 

we obtain better performance as a function of training time. 


Mixed integer programming (MIP) is a general approach for representing constrained 

optimization problems with integer-valued and continuous variables. Because MIP serves 

as a unifying framework for NP-complete problems and combines the expressive power 

of integrality constraints with the efficiency of continuous optimization, it is widely used 

both in academia and industry. MIP used to be studied mainly in operations research, 

but has recently become an important tool in AI, with applications ranging from auction 

theory [19] to computational sustainability [8]. Furthermore, several recent advances in 

MIP solving have been achieved with AI techniques [7, 13]. 

One key advantage of the MIP representation is that highly optimized solvers can 

be developed in a problem-independent way. IBM ILOG’s CPLEX solver 1 is particularly 

well known for achieving strong practical performance; it is used by over 1 300 

corporations (including one-third of the Global 500) and researchers at more than 1 000 




1 http://ibm.com/software/integration/optimization/cplex-optimization-studio/ 

16

universities [16]. Here, we propose improvements to CPLEX that have the potential to 

directly impact this massive user base. 

State-of-the-art MIP solvers typically expose many parameters to end users; for 

example, CPLEX 12.1 comes with a 221-page parameter reference manual describing 

135 parameters. The CPLEX manual warns that “integer programming problems are more 

sensitive to specific parameter settings, so you may need to experiment with them.” How 

should such solver parameters be set by a user aiming to solve a given set of instances? 

Obviously—despite the advice to “experiment”—effective manual exploration of such a 

huge space is infeasible; instead, an automated approach is needed. 

Conceptually, the most straightforward option is to search the space of algorithm 

parameters to find a (single) configuration that minimizes a given performance metric 

(e.g., average runtime). Indeed, CPLEX itself includes a self-tuning tool that takes this 

approach. A variety of problem-independent algorithm configuration procedures have 

also been proposed in the AI community, including I/F-Race [3], ParamILS [15, 14], 

and GGA [2]. Of these, only PARAMILS has been demonstrated to be able to effectively 

configure CPLEX on a variety of MIP benchmarks, with speedups up to several orders 

of magnitude, and overall performance substantially better than that of the CPLEX 

self-tuning tool [13]. 

While automated algorithm configuration is often very effective, particularly when 

optimizing performance on homogeneous sets of benchmark instances, it is no panacea. 

In fact, it is characteristic of NP-hard problems that no single solver performs well on 

all inputs (see, e.g., [30]); a procedure that performs well on one part of an instance 

distribution often performs poorly on another. An alternative approach is to choose a 

portfolio of different algorithms (or parameter configurations), and to select between 

them on a per-instance basis. This algorithm selection problem [24] can be solved by 

gathering cheaply computable features from the problem instance and then evaluating 

a learned model to select the best algorithm [20, 9, 6]. The well-known SATZILLA [30] 

method uses a regression model to predict the runtime of each algorithm and selects 

the algorithm predicted to perform best. Its performance in recent SAT competitions 

illustrates the potential of portfolio-based selection: it is the best known method for 

solving many types of SAT instances, and almost always outperforms all of its constituent 

algorithms. 

Portfolio-based algorithm selection also has a crucial drawback: it requires a strong 

and sufficiently uncorrelated portfolio of solvers. While the literature has produced many 

different approaches for solving SAT, there are few strong MIP solvers, and the ones that 

do exist have similar architectures. However, algorithm configuration and portfolio-based 

algorithm selection can be combined to yield automatic portfolio construction methods 

applicable to domains in which only a single, highly-parameterized algorithm exists. 

Two such approaches have been proposed in the literature. HYDRA [28] is an iterative 

procedure. It begins by identifying a single configuration with the best overall performance, 

and then iteratively adds algorithms to the portfolio by applying an algorithm 

configurator with a customized, dynamic performance metric. At runtime, algorithms 

are selected from the portfolio as in SATZILLA. ISAC [17] first divides instance sets into 

2 

17

clusters based on instance features using the G-means clustering algorithm, then applies 

an algorithm configurator to find a good configuration for each cluster. At runtime, 

ISAC computes the distance in feature space to each cluster centroid and selects the 

configuration for the closest cluster. We note two theoretical reasons to prefer HYDRA to 

ISAC. First, ISAC’s clustering is solely based on distance in feature space, completely 

ignoring the importance of each feature to runtime. Thus, ISAC’s performance can 

change dramatically if additional features are added (even if they are uninformative). 

Second, no amount of training time allows ISAC to recover from a misleading initial 

clustering or an algorithm configuration run that yields poor results. In contrast, HYDRA 

can recover from poor algorithm configuration runs in later iterations. 

In this work, we show that HYDRA can be used to build strong portfolios of CPLEX 

configurations, dramatically improving CPLEX’s performance for a variety of MIP 

benchmarks, as compared to ISAC, algorithm configuration alone, and CPLEX’s default 

configuration. This achievement leverages two modifications to the original HYDRA approach, 

presented in Section 2. Section 3 describes the features and CPLEX parameters 

we identified for use with HYDRA, along with the benchmark sets upon which we evaluated 

it. Section 4 evaluates HYDRA-MIP and presents evidence that our improvements to 

HYDRA are also useful beyond MIP. Section 5 concludes and describes future work. 

2 Improvements to Hydra 

It is difficult to directly apply the original HYDRA method to the MIP domain, for two 

reasons. First, the data sets we face in MIP tend to be highly heterogeneous; preliminary 

prediction experiments (not reported here for brevity) showed that HYDRA’s linear 

regression models were not robust for such heterogeneous inputs, sometimes yielding 

extreme mispredictions of more than ten orders of magnitude. Second, individual HYDRA 

iterations can take days to run—even on a large computer cluster—making it difficult 

for the method to converge within a reasonable amount of time. (We say that HYDRA 

has converged when substantial increases in running time stop leading to significant 

performance gains.) 

In this section, we describe improvements to HYDRA that address both of these issues. 

First, we modify the model-building method used by the algorithm selector, using a 

classification procedure based on decision forests with a non-uniform loss function. 

Second, we modify HYDRA to add multiple solvers in each iteration and to reduce the 

cost of evaluating these candidate solvers, speeding up convergence. We denote the 

original method as HydraLR,1 (“LR” stands for linear regression and “1” indicates the 

number of configurations added to the portfolio per iteration), the new method including 

only our first improvement as HydraDF,1 (“DF” stands for decision forests), and the 

full new method as HydraDF,k. 

3 

18

2.1 Decision forests for algorithm selection 

There are many existing techniques for algorithm selection, based on either regression 

[30, 26] or classification[10, 9, 25, 23]. SATZILLA [30] uses linear basis function 

regression to predict the runtime of each of a set of K algorithms, and picks the one 

with the best predicted performance. Although this approach has led to state-of-the-art 

performance for SAT, it does not directly minimize the cost of running the portfolio 

on a set of instances, but rather minimizes the prediction error separately in each of K 

predictive models. This has the advantage of penalizing costly errors (picking a slow 

algorithm over a fast one) more than less costly ones (picking a fast algorithm over a 

slightly faster one), but cannot be expected to perform well when training data is sparse. 

Stern et al [26] applied the recent Bayesian recommender system Matchbox to algorithm 

selection; similar to SATZILLA, this approach is cost-sensitive and uses a regression 

model that predicts the performance of each algorithm. CPHYDRA[23] uses case-based 

reasoning to determine a schedule of constraint satisfaction solvers (instead of picking a 

single solver). Its k-nearest neighbor approach is simple and effective, but determines 

similarity solely based on instance features (ignoring instance hardness). Finally, ISAC 

uses a cost-agnostic clustering approach for algorithm selection. Our new selection 

procedure uses an explicit cost-sensitive loss function—punishing misclassifications in 

direct proportion to their impact on portfolio performance—without predicting runtime. 

Such an approach has never before been applied to algorithm selection: all existing classification 

approaches use a simple 0–1 loss function that penalizes all misclassification 

equally (e.g., [25, 9, 10]). Specifically, this paper describes a cost-sensitive classification 

approach based on decision forests (DFs). Particularly for heterogeneous benchmark 

sets, DFs offer the promise of effectively partitioning the feature space into qualitatively 

different parts. In contrast to clustering methods, DFs take runtime into account when 

determining that partitioning. 

We constructed cost-sensitive DFs as collections of T cost-sensitive decision trees [27]. 

Following [4], given n training data points with k features each, for each tree we construct 

a bootstrap sample of n training data points sampled uniformly at random with 

repetitions; during tree construction, we sample a random subset of log 2(k)+1features 

at each internal node to be considered for splitting the data at that node. Predictions are 

based on majority votes across all T trees. For a set of m algorithms {s1,...,sm}, an 

n × k matrix holding the values of k features for each of n training instances, and an 

n × m matrix P holding the performance of the m algorithms on the n instances, we 

construct our selector based on m · (m − 1)/2 pairwise cost-sensitive decision forests, 

determining the labels and costs as follows. For any pair of algorithms (i, j), we train a 

cost-sensitive decision forest DF(i, j) on the following weighted training data: we label 

an instance q as i if P (q, i) is better than P (q, j), and as j otherwise; the weight for that 

instance is |P (q, i) − P (q, j)|. For test instances, we apply each DF(i, j) to vote for 

either i or j and select the algorithm with the most votes as the best algorithm for that 

instance. Ties are broken by only counting the votes from those decision forests that 

involve algorithms which received equal votes; further ties are broken randomly. 

4 

19

We made one further change to the mechanism gleaned from SATZILLA. Originally, 

a subset of candidate solvers was chosen by determining the subset for which portfolio 

performance is maximized, taking into account model mispredictions. Likewise, a similar 

procedure was used to determine presolver policies. These internal optimizations were 

performed based on the same instance set used to train the models. However, this can 

be problematic if the model overfits the training data; therefore, in this work, we use 

10-fold cross validation instead. 

2.2 Speeding up convergence 

HYDRA uses an automated algorithm configurator as a subroutine, which is called in every 

iteration to find a configuration that augments the current portfolio as well as possible. 

Since algorithm configuration is a hard problem, configuration procedures are incomplete 

and typically randomized. Because a single run of a randomized configuration procedure 

might not yield a high-performing parameter configuration, it is common practice to 

perform multiple runs in parallel and to use the configuration that performs best on the 

training set [12, 14, 28, 13]. 

Here, we make two modifications to HYDRA to speed up its convergence. First, in 

each iteration, we add k promising configurations to the portfolio, rather than just the 

single best. If algorithm configuration runs were inexpensive, this modification to HYDRA 

would not help: additional configurations could always be found in later iterations, if 

they indeed complemented the portfolio at that point. However, when each iteration must 

repeatedly solve many difficult MIP instances, it may be impossible to perform more 

than a small number of HYDRA iterations within any reasonable amount of time, even 

when using a computer cluster. In such a case, when many good (and rather different) 

configurations are found in an iteration, it can be wasteful to retain only one of these. 

Our second change to HYDRA concerns the way that the ‘best’ configurations returned 

by different algorithm configuration runs are identified. HydraDF,1 determines the ‘best’ 

of the configurations found in a number of independent configurator runs by evaluating 

each configuration on the full training set and selecting the one with best performance. 

This evaluation phase can be very costly: e.g., if we use a cutoff time of 300 seconds per 

run during training and have 1 000 instances, then computing the training performance of 

each candidate configuration can take nearly four CPU days. Therefore, in HydraDF,k, 

we select the configuration for which the configuration procedure’s internal estimate 

of the average performance improvement over the existing portfolio is largest. This 

alternative is computationally cheap: it does not require any evaluations of configurations 

beyond those already performed by the configurator. However, it is also potentially risky: 

different configurator runs typically use the training instances in a different order and 

evaluate configurations using different numbers of instances. It is thus possible that 

the configurator’s internal estimate of improvement for a parameter configuration is 

high, but that it turns out to not help for instances the configurator has not yet used. 

5 

20

Fortunately, adding k parameter configurations to the portfolio in each iteration mitigates 

this problem: if each of the k selected configurations has independent probability p of 

yielding a poor configuration, the probability of all k configurations being poor is only 

p k . 

3 MIP: Features, Data Sets, and Parameters 

While the improvements to HYDRA presented above were motivated by MIP, they can 

nevertheless be applied to any domain. In this section, we describe all domain-specific 

elements of HYDRA-MIP: the MIP instance features upon which our models depend, 

the CPLEX parameters we configured, and the data sets upon which we evaluated our 

methods. 

3.1 Features of MIP Instances 

We constructed a large set of 139 MIP features, drawing on 97 existing features [21, 11, 

17] and also including 42 new probing features. Specifically, existing work used features 

based on problem size, graph representations, proportion of different variable types 

(e.g., discrete vs continuous), constraint types, coefficients of the objective function, 

the linear constraint matrix and the right hand side of the constraints. We extended 

those features by adding more descriptive statistics when applicable, such as medians, 

variation coefficients, and interquantile distances of vector-based features. For the first 

time, we also introduce a set of MIP probing features based on short runs of CPLEX 

using default settings. These contain 20 single probing features and 22 vector-based 

features. The single probing features are as follows. Presolving features (6 in total) are 

CPU times for presolving and relaxation, # of constraints, variables, nonzero entries in 

the constraint matrix, and clique table inequalities after presolving. Probing cut usage 

features (8 in total) are the number of each of 7 different cut types, and total cuts applied. 

Probing result features (6 in total) are MIP gap achieved, # of nodes visited), # of feasible 

solutions found, # of iterations completed, # of times CPLEX found a new incumbent by 

primal heuristics, and # of solutions or incumbents found. Our 22 vector-based features 

contain descriptive statistics (averages, medians, variation coefficients, and interquantile 

distances, i.e., q90-q10) for the following 6 quantities reported by CPLEX over time: (a) 

improvement of objective function; (b) number of integer-infeasible variables at current 

node; (c) improvement of best integer solution; (d) improvement of upper bound; (e) 

improvement of gap; (f) nodes left to be explored (average and variation coefficient 

only). 

6 

21

3.2 CPLEX Parameters 

Out of CPLEX 12.1’s 135 parameters, we selected a subset of 74 parameters to be 

optimized. These are the same parameters considerd in [13], minus two parameters 

governing the time spent for probing and solution polishing. (These led to problems when 

the captime used during parameter optimization was different from that used at test time.) 

We were careful to keep all parameters fixed that change the problem formulation (e.g., 

parameters such as the optimality gap below which a solution is considered optimal). The 

74 parameters we selected affect all aspects of CPLEX. They include 12 preprocessing 

parameters; 17 MIP strategy parameters; 11 parameters controlling how aggressively to 

use which types of cuts; 8 MIP “limits” parameters; 10 simplex parameters; 6 barrier 

optimization parameters ; and 10 further parameters. Most parameters have an “automatic” 

option as one of their values. We allowed this value, but also included other values (all 

other values for categorical parameters, and a range of values for numerical parameters). 

Exploiting the fact that 4 parameters were conditional on others taking certain values, 

they gave rise to 4.75 · 10 45 distinct parameter configurations. 

3.3 MIP Benchmark Sets 

Our goal was to obtain a MIP solver that works well on heterogenous data. Thus, 

we selected four heterogeneous sets of MIP benchmark instances, composed of many 

well studied MIP instances. They range from a relatively simple combination of two 

homogenous subsets (CL∪REG) to heterogenous sets using instances from many sources 

(e.g., MIX). While previous work in automated portfolio construction for MIP [17] has 

only considered very easy instances (ISAC(new) with a mean CPLEX default runtime 

below 4 seconds), our three new benchmarks sets are much more realistic, with CPLEX 

default runtimes ranging from seconds to hours. 

CL∪REG is a mixture of two homogeneous subset, CL and REG. CL instances come 

from computational sustainability; they are based on real data used for the construction of 

a wildlife corridor for endangered grizzly bears in the Northern Rockies [8] and encoded 

as mixed integer linear programming (MILP) problems. We randomly selected 1000 CL 

instances from the set used in [13], 500 for training and 500 for testing. REG instances are 

MILP-encoded instances of the winner determination problem in combinatorial auctions. 

We generated 500 training and 500 test instances using the regions generator from 

the Combinatorial Auction Test Suite [22], with the number of bids selected uniformly at 

random from between 750 and 1250, and a fixed bids/goods ratio of 3.91 (following [21]). 

CL∪REG∪RCW is the union of CL∪REG and another set of MILP-encoded instances 

from computational sustainability, RCW. These instances model the spread of the endangered 

red-cockaded woodpecker, conditional on decisions about certain parcels of 

land to be protected. We generated 990 RCW instances (10 random instances for each 

7 

22

combination of 9 maps and 11 budgets), using the generator from [1] with the same 

parameter setting, except a smaller sample size of 5. We split these instances 50:50 into 

training and test sets. 

ISAC(new) is a subset of the MIP data set from [17]; we could not use the entire set, 

since the authors had irretrievably lost their test set. We thus divided their 276 training 

instances into a new training set of 184 and a test set of 92 instances. Due to the small 

size of the data set, we did this in a stratified fashion, first ordering the instances based 

on CPLEX default runtime and then picking every third instance for the test set. 

MIX subsets of the sets studied in [13]. It includes all instances from MASS (100 

instances), MIK (120 instances), CLS (100 instances), and a subset of CL (120 instances) 

and REG200 (120 instances). (Please see [13] for the description of each underlying 

set.) We preserved the training-test split from [13], resulting in 280 training and 280 test 

instances. 

4 Experimental Results 

In this section, we examined HYDRA-MIP’s performance on our MIP datasets. We began 

by describing the experimental setup, and then evaluated each of our improvements to 

HydraLR,1. 

4.1 Experimental setup 

For algorithm configuration we used PARAMILS version 2.3.4 with its default instantiation 

of FOCUSEDILS with adaptive capping [14]. We always executed 25 parallel configuration 

runs with different random seeds with a 2-day cutoff. (Running times were always 

measured using CPU time.) During configuration, the captime for each CPLEX run was 

set to 300 seconds, and the performance metric was penalized average runtime (PAR-10, 

where PAR-k of a set of r runs is the mean over the r runtimes, counting timed-out 

runs as having taken k times the cutoff time). For testing, we used a cutoff time of 

3 600 seconds. In our feature computation, we used a 5-second cutoff for computing 

probing features. We omitted these probing features (only) for the very easy ISAC(new) 

benchmark set. We used the Matlab version R2010a implementation of cost-sensitive 

decision trees; our decision forests consisted of 99 such trees. All of our experiments 

were carried out on a cluster of 55 dual 3.2GHz Intel Xeon PCs with 2MB cache and 

2GB RAM, running OpenSuSE Linux 11.1. 

In our experiments, the total running time for the various HYDRA procedures was 

often dominated by the time required for running the configurator and therefore turned 

out to be roughly proportional to the number of HYDRA iterations performed. Each 

8 

23

DataSet Model Train (cross valid.) Test SF: LR/DF (Test) 

Time PAR (Solved) Time PAR (Solved) Time PAR 

CL LR 39.7 39.7 (100%) 39.4 39.4 (100%) 1.00× 1.00× 

∪REG DF 39.7 39.7 (100%) 39.3 39.3 (100%) 

CL∪ LR 105.1 105.1 (100%) 102.6 102.6 (100%) 1.04× 1.04× 

REG∪RCW DF 98.8 98.8 (100%) 98.8 98.8 (100%) 

LR 2.68 2.68 (100%) 2.36 2.36 (100%) 1.18× 1.18× 

ISAC(new) DF 2.19 2.19 (100%) 2.00 2.00 (100%) 

LR 52 52 (100%) 56 172 (99.6%) 1.17× 1.05× 

MIX DF 48 48 (100%) 48 164 (99.6%) 

Table 1. MIPzilla performance (average runtime and PAR in seconds, and percentage solved), 

varying predictive models. Column SF gives the speedup factor achieved by cost-sensitive decision 

forests (DF) over linear regression (LR) on the test set. 

iteration required 50 CPU days for algorithm configuration, as well as validation time to 

(1) select the best configuration in each iteration (only for HydraLR,1 and HydraDF,1); 

and (2) gather performance data for the selected configurations. Since HydraDF,4 selects 

4 solvers in each iteration, it has to gather performance data for 3 additional solvers per 

iteration (using the same captime as used at test time, 3 600 seconds), which roughly 

offsets its savings due to ignoring the validation step. Using the format (HydraDF,1, 

HydraDF,4), the overall runtime requirements in CPU days were as follows: (366,356) 

for CL∪REG; (485, 422) for CL∪REG∪RCW; (256,263) for ISAC(new); and (274,269) 

for MIX. Thus, the computational cost for each iteration of HydraLR,1 and HydraDF,1 

was similar. 

4.2 Algorithm selection with decision forests 

To assess the impact of our improved algorithm selection procedure, we evaluated it 

in the context of SATZILLA-style portfolios of different CPLEX configurations, dubbed 

MIPzilla. As component solvers, we always used the CPLEX default plus CPLEX 

configurations optimized for the various subsets of our four benchmarks. Specifically, 

for ISAC(new) we used the six configurations found by GGA in [17]. For CL∪REG, 

CL∪REG∪RCW, and MIX we used one configuration optimized for each of the benchmark 

instance sets that were combined to create the distribution (e.g., CL and REG for 

CL∪REG). We took all such optimized configurations from [13], and manually optimized 

the remaining configurations using PARAMILS. 

In Table 1, we presented performance results for MIPzilla on our four MIP 

benchmark sets, contrasting the original linear regression (LR) models with our new 

cost-sensitive decision forests (DF). Overall, MIPzilla was never worse with DF than 

with LR, and sometimes substantially better. For relatively simple data sets, such as 

CL∪REG and CL∪REG∪RCW, the difference between the models was quite small. For 

9 

24

DataSet Model Train (cross valid.) Test SF: LR/DF (Test) 

Time PAR (Solved) Time PAR (Solved) Time PAR 

LR 172 332 (99.5%) 177 458 (99.1%) 1.08× 1.13× 

RAND DF 147 308 (99.5%) 164 405 (99.3%) 

LR 518 2224 (94.7%) 549 2858 (92.9%) 1.16× 1.26× 

HAND DF 363 1327 (97.0%) 475 2268 (94.4%) 

LR 459 2195 (94.6%) 545 3085 (92.1%) 1.12× 1.34× 

INDU DF 382 1635 (96.1%) 487 2300 (94.4%) 

Table 2. SATZILLA performance (average runtime and PAR in seconds, and percentage solved), 

varying predictive models. Column SF gives the speedup factor achieved by cost-sensitive decision 

forests (DF) over linear regression (LR) on the test set. 

more heterogeneous data sets, MIPzilla performed much better with DF than with LR: 

e.g., 18% and 17% better in terms of final portfolio runtime in the case of ISAC(new) 

and MIX. Overall, our new cost-sensitive classification-based algorithm selection was 

clearly preferable to the previous mechanism based on linear regression. In further 

experiments, we also evaluated alternate approaches based on random regression forests 

(trained separately for each algorithm as in the linear regression approach), decision 

forests without costs, and support vector machines (SVMs) both with and without costs. 

We found that the cost-sensitive variants always outperformed the cost-free ones. In these 

more extensive experiments, we observed that cost-sensitive DF always performed very 

well and linear regression performed inconsistently, with especially poor performance 

on heterogenous data sets. 

Our improvements to the algorithm selection procedure, although motivated by 

the application to MIP, were in fact problem independent. We therefore conducted 

an additional experiment to evaluate the effectiveness of SATZILLA based on our new 

cost-sensitive decision forests, compared to the original version using linear regression 

models. We used the same data used for building SATzilla2009 [29]. The number 

of training/test instances were 1211/806 (RAND category with 17 candidate solvers), 

672/447 (HAND category with 13 candidate solvers) and 570/379 (INDU category with 

10 candidate solvers). Table 2 shows that by using our new cost-sensitive decision forest, 

we improved SATZILLA’s performance 29% (in average over three categories) in terms 

of PAR over the previous (competition-winning) version of SATZILLA; for the important 

industrial category, we observed PAR improvements of 34%. Because there exists 

no highly parameterized SAT solver with strong performance across problem classes 

(analogous to CPLEX for MIP), we did not investigate HYDRA for SAT. 2 However, we 

noted that this paper’s findings suggest that there is merit in constructing such highly 

parameterized solvers for SAT and other NP-hard problems. 

2 The closest to a SAT equivalent of what CPLEX is for MIP would be MiniSAT [5], but it 

does not expose many parameters and does not perform well for random instances. The highly 

parameterized SATenstein solver [18] cannot be expected to perform well across the board 

for SAT; in particular, local search is not the best method for highly structured instances. 

10 

25

DataSet Solver Train (cross valid.) Test 

Time PAR (Solved) Time PAR (Solved) 

Default 424 1687 (96.7%) 424 1493 (96.7%) 

CL ParamILS 145 339 (99.4%) 134 296 (99.5%) 

∪REG HydraDF,1 64 97 (99.9%) 63 63 (100%) 

HydraDF,4 42 42 (100%) 48 48 (100%) 

MIPzilla 40 40 (100%) 39 39 (100%) 

Oracle 33 33 (100%) 33 33 (100%) 

(MIPzilla) 

CL Default 405 1532 (96.5%) 406 1424 (96.9%) 

∪REG ParamILS 148 148 (100%) 151 151 (100%) 

∪RCW HydraDF,1 89 89 (100%) 95 95 (100%) 

HydraDF,4 106 106 (100%) 112 112 (100%) 

MIPzilla 99 99 (100%) 99 99 (100%) 

Oracle 89 89 (100%) 89 89 (100%) 

(MIPzilla) 

Default 3.98 3.98 (100%) 3.77 3.77 (100%) 

ISAC ParamILS 2.06 2.06 (100%) 2.13 2.13 (100%) 

(new) HydraLR,1 1.67 1.67 (100%) 1.52 1.52 (100%) 

HydraDF,1 1.2 1.2 (100%) 1.42 1.42 (100%) 

HydraDF,4 1.05 1.05 (100%) 1.17 1.17 (100%) 

MIPzilla 2.19 2.19 (100%) 2.00 2.00 (100%) 

Oracle 1.83 1.83 (100%) 1.81 1.81 (100%) 

(MIPzilla) 

Default 182 992 (97.5%) 156 387 (99.3%) 

ParamILS 139 717 (98.2%) 126 357 (99.3%) 

MIX HydraLR,1 74 74 (100%) 90 205 (99.6%) 

HydraDF,1 60 60 (100%) 65 181 (99.6%) 

HydraDF,4 53 53 (100%) 62 177 (99.6%) 

MIPzilla 48 48 (100%) 48 164 (99.6%) 

Oracle 34 34 (100%) 39 155 (99.6%) 

(MIPzilla) 

Table 3. Performance (average runtime and PAR in seconds, and percentage solved) of 

HydraDF,4, HydraDF,1 and HydraLR,1 after 5 iterations. 

4.3 Evaluating HYDRA-MIP 

Next, we evaluated our full HydraDF,4 approach for MIP; on all four MIP benchmarks, 

we compared it to HydraDF,1, to the best configuration found by PARAMILS, and to 

the CPLEX default. For ISAC(new) and MIX we also assessed HydraLR,1. We did 

not do so for CL∪REG and CL∪REG∪RCW because, based on the results in Table 1, we 

expected the DF and LR models to perform almost identically. Table 3 presents these 

results. First, comparing HydraDF,4 to PARAMILS alone and to the CPLEX default, we 

observed that HydraDF,4 achieved dramatically better performance, yielding between 

2.52-fold and 8.83-fold speedups over the CPLEX default and between 1.35-fold and 

2.79-fold speedups over the configuration optimized with PARAMILS in terms of average 

runtime. Note that (due probably to the heterogeneity of the data sets) the built-in CPLEX 

self-tuning tool was unable to find any configurations better than the default for any of 

11 

26

PAR Score 

PAR Score 

300 

250 

200 

150 

100 

50 

Hydra DF,4 

Hydra DF,1 

MIPzilla DF 

Oracle(MIPzilla) 

0 

1 2 3 4 5 

Number of Hydra Iterations 

3 

2.5 

2 

1.5 

(a) CL∪REG 

Hydra DF,4 

Hydra DF,1 

Hydra LR,1 

1 

1 2 3 4 5 


(c) ISAC(new) 

MIPzilla DF 


PAR Score 

PAR Score 

160 

140 

120 

100 

Hydra DF,4 

Hydra DF,1 

MIPzilla DF 


80 

1 2 3 4 5 


400 

350 

300 

250 

200 

(b) CL∪REG∪RCW 

Hydra DF,4 

Hydra DF,1 

Hydra LR,1 

MIPzilla DF 


150 

1 2 3 4 5 


(d) MIX 

Fig. 1. Performance per iteration for HydraDF,4, HydraDF,1 and HydraLR,1, evaluated on test 

data. 

our four data sets. Compared to HydraLR,1, HydraDF,4 yielded a 1.3-fold speedup 

for ISAC(new) and a 1.5-fold speedup for MIX. HydraDF,4 also typically performed 

better than our intermediate procedure HydraDF,1, with speedup factors up to 1.21 

(ISAC(new)). However, somewhat surprisingly, it actually performed worse for one 

distribution, CL∪REG∪RCW. We analyzed this case further and found that in HydraDF,4, 

after iteration three PARAMILS did not find any configurations that would further improve 

the portfolio, even with a perfect algorithm selector. This poor PARAMILS performance 

could be explained by the fact that HYDRA’s dynamic performance metric only rewarded 

configurations that made progress on solving some instances better; almost certainly 

starting in a poor region of configuration space, PARAMILS did not find configurations 

that made progress on any instances over the already strong portfolio, and thus lacked 

guidance towards better regions of configuration space. We believed that this problem 

could be addressed by means of better configuration procedures in the future. 

12 

27

Figure 1 shows the test performance the different HYDRA versions achieved as a 

function of their number of iterations, as well as the performance of the MIPzilla 

portfolios we built manually. When building these MIPzilla portfolios for CL∪REG, 

CL∪REG∪RCW, and MIX, we exploited ground truth knowledge about the constituent 

subsets of instances, using a configuration optimized specifically for each of these subsets. 

As a result, these portfolios yielded very strong performance. Although our various 

HYDRA versions did not have access to this ground truth knowledge, they still roughly 

matched MIPzilla’s performance (indeed, HydraDF,1 outperformed MIPzilla on 

CL∪REG). For ISAC(new), our baseline MIPzilla portfolio used CPLEX configurations 

obtained by ISAC [17]; all HYDRA versions clearly outperformed MIPzilla 

in this case, which suggests that its constituent configurations are suboptimal. For 

ISAC(new), we observed that for (only) the first three iterations, HydraLR,1 outperformed 

HydraDF,1. We believed that this occurred because in later iterations the portfolio 

had stronger solvers, making the predictive models more important. We also observed 

that HydraDF,4 consistently converged more quickly than HydraDF,1 and HydraLR,1. 

While HydraDF,4 stagnated after three iterations for data set CL∪REG∪RCW (see our 

discussion above), it achieved the best performance at every given point in time for the 

three other data sets. For ISAC(new), HydraDF,1 did not converge after 5 iterations, 

while HydraDF,4 converged after 4 iterations and achieved better performance. For 

the other three data sets, HydraDF,4 converged after two iterations. The performance 

of HydraDF,4 after the first iteration (i.e., with 4 candidate solvers available to the 

portfolio) was already very close to the performance of the best portfolios for MIX and 

CL∪REG. 

4.4 Comparing to ISAC 

We spent a tremendous amount of effort attempting to compare HydraDF,4 with 

ISAC [17], since ISAC is also a method for automatic portfolio construction and was 

previously applied to a distribution of MIP instances. ISAC’s authors supplied us with 

their their training instances and the CPLEX configurations their method identified, but 

are generally unable to make their code available to other researchers and, as mentioned 

previously, were unable to recover their test data. We therefore compared HydraDF,4’s 

and ISAC’s relative speedups over the CPLEX default (thereby controlling for different 

machine architectures) on their training data. We note that HydraDF,4 was given only 

2/3 as much training data as ISAC (due to the need to recover a test set from [17]’s 

original training set); the methods were evaluated using only the original ISAC training 

set; the data set is very small, and hence high-variance; and all instances were quite easy 

even for the CPLEX default. In the end, HydraDF,4 achieved a 3.6-fold speedup over 

the CPLEX default, as compared to the 2.1-fold speedup reported in [17]. 

As shown in Figure 1, all versions of HYDRA performed much better than a MIPzilla 

portfolio built from the configurations obtained from ISAC’s authors for the ISAC(new) 

13 

28

dataset. In fact, even a perfect oracle of these configurations only achieved an average 

runtime of 1.82 seconds, which is a factor of 1.67 slower than HydraDF,4. 

5 Conclusion 

In this paper, we showed how to extend HYDRA to achieve strong performance for heterogeneous 

MIP distributions, outperforming CPLEX’s default, PARAMILS alone, ISAC and 

the original HYDRA approach. This was done using a cost-sensitive classification model 

for algorithm selection (which also lead to performance improvements in SATZILLA), 

along with improvements to HYDRA’s convergence speed. In future work, we plan to 

investigate more robust selection criteria for adding multiple solvers in each iteration of 

HydraDF,k that consider both performance improvement and performance correlation. 

Thus, we may be able to avoid the stagnation we observed on CL∪REG∪RCW. We expect 

that HydraDF,k can be further strengthened by using improved algorithm configurators, 

such as model-based procedures. Overall, the availability of effective procedures for 

constructing portfolio-based algorithm selectors, such as our new HYDRA, should encourage 

the development of highly parametrized algorithms for other prominent NP-hard 

problems in AI, such as planning and CSP. 

References 

1. K. Ahmadizadeh, C. Dilkina, B.and Gomes, and A. Sabharwal. An empirical study of 

optimization for maximizing diffusion in networks. In CP, 2010. 

2. C. Ansotegui, M. Sellmann, and K. Tierney. A gender-based genetic algorithm for the 

automatic configuration of solvers. In CP, pages 142–157, 2009. 

3. M. Birattari, Z. Yuan, P. Balaprakash, and T. Stüzle. Empirical Methods for the Analysis of 

Optimization Algorithms, chapter F-race and iterated F-race: an overview. 2010. 

4. L. Breiman. Random forests. Machine Learning, 45(1):5–32, 2001. 

5. N. Eén and N. Sörensson. An extensible SAT-solver. In Proceedings of the 6th Intl. Conf. on 

Theory and Applications of Satisfiability Testing, LNCS, volume 2919, pages 502–518, 2004. 

6. C. Gebruers, B. Hnich, D. Bridge, and E. Freuder. Using CBR to select solution strategies in 

constraint programming. In ICCBR, pages 222–236, 2005. 

7. A. Gilpin and T. Sandholm. Information-theoretic approaches to branching in search. Discrete 

Optimization, 2010. doi:10.1016/j.disopt.2010.07.001. 

8. C. P. Gomes, W. van Hoeve, and A. Sabharwal. Connections in networks: A hybrid approach. 

In CPAIOR, 2008. 

9. A. Guerri and M. Milano. Learning techniques for automatic algorithm portfolio selection. In 

ECAI, pages 475–479, 2004. 

10. E. Horvitz, Y. Ruan, C. P. Gomes, H. Kautz, B. Selman, and D. M. Chickering. A Bayesian 

approach to tackling hard computational problems. In UAI, pages 235–244, 2001. 

11. F. Hutter. Automated Configuration of Algorithms for Solving Hard Computational Problems. 

PhD thesis, University Of British Columbia, Computer Science, 2009. 

14 

29

12. F. Hutter, D. Babić, H. H. Hoos, and A. J. Hu. Boosting Verification by Automatic Tuning of 

Decision Procedures. In FMCAD, pages 27–34, Washington, DC, USA, 2007. IEEE Computer 

Society. 

13. F. Hutter, H. H. Hoos, and K. Leyton-Brown. Automated configuration of mixed integer 

programming solvers. In CPAIOR, 2010. 

14. F. Hutter, H. H. Hoos, K. Leyton-Brown, and T. Stützle. ParamILS: an automatic algorithm 

configuration framework. Journal of Artificial Intelligence Research, 36:267–306, 2009. 

15. F. Hutter, H. H. Hoos, and T. Stützle. Automatic algorithm configuration based on local 

search. In AAAI, pages 1152–1157, 2007. 

16. IBM. IBM ILOG CPLEX Optimizer – Data Sheet. Available online: ftp://public.dhe. 

ibm.com/common/ssi/ecm/en/wsd14044usen/WSD14044USEN.PDF, 2011. 

17. S. Kadioglu, Y. Malitsky, M. Sellmann, and K. Tierney. ISAC - instance specific algorithm 

configuration. In ECAI, 2010. 

18. A. KhudaBukhsh, L. Xu, H. H. Hoos, and K. Leyton-Brown. SATenstein: Automatically 

building local search SAT solvers from components. pages 517–524, 2009. 

19. D. Lehmann, R. Müller, and T. Sandholm. The winner determination problem. In Combinatorial 

Auctions, chapter 12, pages 297–318. 2006. 

20. K. Leyton-Brown, E. Nudelman, G. Andrew, J. McFadden, and Y. Shoham. A portfolio 

approach to algorithm selection. In IJCAI, pages 1542–1543, 2003. 

21. K. Leyton-Brown, E. Nudelman, and Y. Shoham. Empirical hardness models: Methodology 

and a case study on combinatorial auctions. Journal of the ACM, 56(4):1–52, 2009. 

22. K. Leyton-Brown, M. Pearson, and Y. Shoham. Towards a universal test suite for combinatorial 

auction algorithms. In ACM-EC, pages 66–76, 2000. 

23. E. O’Mahony, E. Hebrard, A. Holland, C. Nugent, and B. O’Sullivan. Using case-based 

reasoning in an algorithm portfolio for constraint solving. In Irish Conference on Artificial 

Intelligence and Cognitive Science, 2008. 

24. J. R. Rice. The algorithm selection problem. Advances in Computers, 15:65–118, 1976. 

25. H. Samulowitz and R. Memisevic. Learning to solve QBF. In AAAI, pages 255–260, 2007. 

26. D. Stern, R. Herbrich, T. Graepel, H. Samulowitz, L. Pulina, and A. Tacchella. Collaborative 

expert portfolio management. In AAAI, pages 210–216, 2010. 

27. K. M. Ting. An instance-weighting method to induce cost-sensitive trees. IEEE Trans. Knowl. 

Data Eng., 14(3):659–665, 2002. 

28. L. Xu, H. H. Hoos, and K. Leyton-Brown. Hydra: Automatically configuring algorithms for 

portfolio-based selection. In AAAI, pages 210–216, 2010. 

29. L. Xu, F. Hutter, H. Hoos, and K. Leyton-Brown. SATzilla2009: an Automatic Algorithm 

Portfolio for SAT. Solver description, SAT competition 2009, 2009. 

30. L. Xu, F. Hutter, H. H. Hoos, and K. Leyton-Brown. SATzilla: portfolio-based algorithm 

selection for SAT. Journal of Artificial Intelligence Research, 32:565–606, June 2008. 

15 

30


As it is well stated, natural hazards may dramatically threaten people’s lives, 

because of the different kind of drawbacks they can generate, such as disturbing 

people’s daily activities, economic losses, and even breaking the peace of a whole 

country. For this reason, any effort oriented to minimize the impacts of natural 

catastrophes, such as tornados, floods, forest fires, and other hazards, is welcome. 

Due to the difficulty on predicting the occurrence of these phenomena, most of 

the research efforts are focused on predicting their evolution through the time, 

relying in some physical or mathematical models. 

Nevertheless, environmental hazards represent very difficult systems to simulate. 

Theoretical and model-related issues aside, many simulators lack precision 

on their results because of the inherent uncertainty of the data needed to define 

the state of the system. This uncertainty is due to the difficult to gather precise 

values at the right places where the catastrophe is taking place, or because the 

hazard itself distorts the measurements. So, in many cases the unique alternative 

consists of working with interpolated, outdated, or even absolutely unknown values. 

Obviously, this fact results in a lack of accuracy and quality on the provided 

predictions. 

To overcome the just mentioned input uncertainty problem, we have developed 

a two-stage prediction strategy, which, first of all, carries out a parameter 

adjustment process by comparing the results provided by the simulator and the 

real observed disaster evolution. Then, the underlying simulator is executed taking 

into account the adjusted parameters obtained in the previous phase, in 

order to predict the evolution of the particular hazard for a later time instant. A 

successful application of this method mainly depends on the effectiveness of the 

adjustment technique that has been carried out. In this sense, our research group 

has developed several solutions for input parameters optimization, all of them 

characterized by an intensive data management: use of statistical approach based 

on exhaustive exploration of previous fires databases [5], application of evolutionary 

computation [8], calibration based on domain-specific knowledge [4], and 

even solutions coming from the merge of some of the above mentioned [7]. Since 

all these approaches perform the calibration stage in a data driven fashion, they 

all match the Dynamic Data Driven Application Systems (DDDAS) paradigm 

[13, 14]. 

It has been demonstrated that the above mentioned adjustment techniques 

contribute to improve the quality of the predictions. However, in some cases the 

application of them becomes a problem of exploration of huge search spaces. This 

fact turns out a great disadvantage, taking into account that, in these kind of 

urgent situations, a successful prediction is not only determined by the accuracy 

of the results: it is also necessary to seriously consider the time restrictions. While 

a natural catastrophe is taking place, is necessary to make urgent decisions to 

effectively fight against it. Many times there exist several constraints that make 

arise the question of how to deal with the combinatorial explosion of the search 

space. 

2 

32

In order to be useful, any evolution prediction of an ongoing hazard must 

be delivered as fast as possible for not being outdated. Consequently, we come 

up with the binomial urgency-accuracy. For this purpose, we introduce a new 

methodology to characterize each element of the proposed DDDAS prediction 

process, with the aim of enhancing the efficiency of our proposed strategy. In 

particular, we have carried out this research work using forest fires as study case, 

and designing experimental testbeds based on the static analysis of the results 

obtained from thousands of simulations of different well-known simulators. 

This work is part of a more ambitious project, which consists of determining 

in advance how a certain combination of natural hazard simulator, computational 

resources, adjustment strategy, and frequency of data acquisition will perform, 

in terms of execution time and prediction quality. 

This paper is organized as follows. In the next section, an overview of how 

the two-stages DDDAS for forest fire spread prediction is given. In Section 3, 

we expose how this framework could be generalized to any natural hazard, and 

the methodology to evaluate the prediction time assessment is described. In 

Section 4, the experimental study is reported and, finally, the main conclusions 

are included in Section 5. 

2 DDDAS for Forest Fire Spread Prediction 

In the field of physical systems modelling, specifically forest fire behavior modeling, 

there exist several fire propagation simulators [9–11], based in some physical 

or mathematical models [1], whose main objective is to try to predict the fire 

evolution. These simulators need certain input data, which define the characteristics 

of the environment where the fire is taking place, in order to evaluate its 

future propagation. This data usually consists of the current fire front, terrain 

topography, vegetation type, and meteorological data such as humidity and wind 

direction and wind speed. Some of this data could be retrieved in advance and 

with noticeably accuracy, as, for example, the topography of the area and the 

predominant vegetation types. However, there is some data that turns out very 

difficult to obtain with reliability. For instance, getting an accurate fire perimeter 

is very complicated because of the difficulties involved in getting, at real time, 

images or data about this matter. Other kind of data sensitive to imprecisions 

is that of meteorological data, which is often distorted by the fire itself. However, 

this circumstance is not only related to forest fires, but it also happens 

in any system with a dynamic state evolution throughout time (e.g. floods [19], 

thunderstorms [20, 21], etc.). These restrictions concerning uncertainty in the 

input parameters, added to the fact that these inputs are set up only at the very 

beginning of the simulation process, become an important drawback because as 

the simulation time goes on, variables previously initialized could change dramatically, 

misleading simulation results. In order to overcome these restrictions, 

we need a system capable of dynamically obtaining real time input data in those 

case that is possible and, otherwise, properly estimating the values of the input 

parameters needed by the underlying simulator. 

3 

33

(a) Classic prediction (b) Two-stage prediction 

Fig. 1. Prediction Methods 

The classic way of predicting forest fire behaviour, summarised in Figure 

1(a), takes the initial state of the fire front (RF = real fire) as input as well as 

the input parameters given for some time tx. The simulator then returns the 

prediction (SF = simulated fire) for the state of fire front at a later time tx+1. 

Comparing the simulation result SF from time tx+1 with the advanced real 

fire RF at the same instant, the forecasted fire front tends to differ to a greater 

or lesser extent from the real fire line. One reason for this behaviour is that the 

classic calculation of the simulated fire is based upon one single set of input 

parameters afflicted with the before explained insufficiencies. To overcome this 

drawback, a simulator independent data-driven prediction scheme was proposed 

to optimize dynamic model input parameters [3]. Introducing a previous calibration 

step as shown in Figure 1(b), the set of input parameters is optimized 

before every prediction step. The solution proposed comes from reversing the 

problem: how to find a parameter configuration such that, given this configuration 

as input, the fire simulator would produce predictions that match the actual 

fire behavior. Having detected the simulator input that better describes current 

environmental conditions, the same set of parameters, could also be used to 

describe best the immediate future, assuming that meteorological conditions remain 

constant during the next prediction interval. Then, the prediction becomes 

the result of a series of automatically adjusted input configurations. 

This strategy works under the hypothesis that the environmental conditions 

are stable throughout the adjustment and calibration steps. However, this assumption 

is not always true, specially when dealing with very dynamic parameters, 

such as wind speed and wind direction. For this reason, new techniques 

had to be introduced to overcome this problem, so that the system was able 

to dynamically acquire data if there have been detected sudden changes in the 

initial conditions [7]. 

Previous works proposed several calibration techniques, which made the 

problem of fire spread prediction to fit the DDDAS paradigm, rather than the 

classic prediction scheme such as [6–8]. Since the two stages DDDAS for forest fire 

spread prediction described in Figure 1(b) constitutes a simulator-independent 

prediction method, the same technique could be extrapolated to any kind of 

natural disasters by only exchanging the underlying simulator. Figure 2 shows 

4 

34

a general scheme for a two-stage DDDAS for natural hazard management. In 

the following section, we shall describe a methodology to perform the prediction 

time assessment under this prediction framework. 

Fig. 2. General two-stages DDDAS for natural hazard prediction evolution 

3 Prediction Time Assessment 

As stated in Section 1, when dealing with emergency simulation, it is extremely 

necessary to maximize the result of the urgency-accuracy binomial. This goal is 

oriented to provide the personnel in charge of making decisions about how to 

face an ongoing emergency, with intelligent tools able to evaluate, in advance, 

how a certain combination of simulator, computational resources, adjustment 

strategy, and frequency of data acquisition will perform, in terms of execution 

time and prediction quality. In order to bound the problem, we work under 

certain assumptions: 

– We focus on those emergencies where the corresponding simulators present 

high input-data sensitivity. 

– We assume scenarios where the computational resources are dedicated. Currently, 

we are working on adapting tools that allow urgent execution of tasks 

in distributed-computing environments, e.g. SPRUCE [15]. 

– We rely on the two-stage DDDAS prediction strategy. 

Taking into account these premises and bearing in mind the scheme shown 

in Figure 2, we can define three levels of prediction time assessment: Simulator 

level assessment (SLA), Adjustment level assessment (ALA) and Prediction level 

assessment (PLA). 

3.1 Simulator level assessment (SLA) 

Prediction time assessment at this level must be done independently on the 

underlying simulator (natural hazard) and the particular setting of their input 

parameters. The main objective at this level is to define a simulator-independent 

5 

35

methodology to determine a clustering classification of the simulator execution 

time, where each cluster has associated an upper bound for the execution time 

depending on the values of the input parameters. This process is carried out in 

an offline way and will be widely explained later on in this paper. Since this 

characterization process depends on the executable platform, different simulator 

characterizations will be performed for each available computational resource. 

3.2 Adjustment level assessment (ALA) 

This level corresponds to estimate the prediction time increase due to the calibration 

strategy used in the Adjustment stage. As we have previously mentioned, 

there exist several calibration strategies that have been demonstrate to be useful 

for improving the prediction quality of a hazard evolution. Each one of this 

optimization schemes must be modeled independently of each other because the 

way of performing is quite different. As it could be observed in Figure 2, there 

is a tight relation between the results obtained at SLA with this level because 

SLA is inside ALA, therefore, ALA is directly proportional to SLA. 

3.3 Prediction level assessment (PLA) 

At this level one can rely either on dynamic data injection to the system or not. 

A pure DDDAS will take into account data injection at real time and this is the 

way that the DDDAS for forest fire spread prediction has been designed in its 

advanced form. However, in a preliminary version, the dynamic data injection 

was not considered and it was based in the working hypothesis that the environmental 

conditions keep constant from the calibration stage to the prediction 

stage. For this reason, the PLA methodology has been designed in a two step 

fashion, first of all we will determine a standard methodology for the prediction 

stage without real time data injection and, afterwards, the PLA’s characterization 

will be performed, taking into account data gathering frequency and data 

source. The aim consists of reaching the capability to determine the probability 

distribution that indicates which percentage of prediction improvement has historically 

been obtained in the cases where the data was acquired with a certain 

frequency, and from a certain data sources. This characterization level, as in 

SLA, relies on a massive statistical study. Thus, we can assess in advance the 

probability of improvement the dynamic data injection process may produce in 

the prediction, without the need to modify the underlying simulator. 

It is important to notice that in the characterization of the simulator, we 

focus on the execution time as a ”classification criteria”, whereas the quality of 

prediction is the factor taken into account when characterizing the adjustment 

stage (ALA). This is because the quality of the initial prediction given by the 

simulator has no influence over the final prediction. Nevertheless, the execution 

time of each calibration technique is directly proportional to the execution time 

of the simulator. Hence, in order to estimate both accuracy of prediction and 

time needed to perform it, the study of these aspects is carried out in this way. 

6 

36

In the next section, an empirical study concerning the method followed for 

the Simulator Level Assessment is detailed and the obtained results are analyzed. 

4 Experimental Study 

In this section, we present the experimental studies carried out to validate the 

first two steps of the previously proposed methodology. 

Subsequently, we expose the way we deal with SLA. As stated above, the fact 

of having well characterized each simulator we deal with, in terms of execution 

time, becomes crucial for the validation of the whole methodology. Afterwards, 

we present the application of the described strategy to the ALA, where, as 

adjustment technique, we chose the Genetic Algorithm. 

4.1 Prediction Time Classification 

The matter concerning the simulation time estimation may be tackled by means 

of carrying out large sets of executions of the underlying simulator, and then 

analyzing its behavior from the obtained results. However, this fact may not be 

trivial in certain cases. While it is easy to detect that the application presents 

a high sensitivity to certain input parameters, even in an intuitive way, some 

of them produce a behavior of the simulator that turns out hard to predict. 

Figures 3 and 4 show examples of each case, respectively. In the former, one can 

observe that the dimension of the map to be simulated has a direct influence 

on the execution time (as it was bound to happen), whereas, in the latter, it 

can be noticed that the relation between execution time and wind direction is 

not so clear (this anomaly is reported in [12]), and even it becomes odder when 

combining variations in wind direction with variations with vegetation type. 

Fig. 3. Execution time as a function of number of cells. 

Currently, this characterization is fulfilled by means of carrying out large sets 

of executions (on the order of tens of thousands) counting on different initial 

scenarios (different input data sets), and then, applying knowledge-extraction 

techniques from the info they provide. We record the execution times from the 

7 

37

Fig. 4. Variations in execution time according to variations in wind direction and 

vegetation type. 

experiment, and then we establish a classification of the input parameters according 

the elapsed times they produced. At this moment, we are capable to 

apply machine learning techniques to determine classification criteria and, therefore, 

given a new set of input parameters, to be able to estimate how much the 

execution will last. 

This learning process is carried out offline, i.e. the classification rules are 

established prior to the hazard occurrence. Therefore, at the moment of the 

urgency management, we only have to apply the classification technique, which 

involves a negligible cost of computational time. 

This fact highlights the need to base on complex criteria in order to successfully 

classify the input data sets according to the execution time they will cause. 

Consequently, we rely on Artificial Intelligence field to reach such an objective. 

Specifically, this experimental study shows the results obtained from the use of 

decision trees as classification technique. 

Test bed description. FireLib is a C function library for predicting the spread 

rate and intensity of free-burning wildfires, developed in 1996. It is based on 

the Rothermel fire model [1] to determine the direction and magnitude of the 

maximum rate of spread. The simulated scenario is the subsequently described: 

– Domain: For the characterization of fireLib, an artificial 1001x1001 cells map 

was used (cells width and height: 100 feet). In both cases, the indicated 

topography remained constant for all the executions. 

– Simulation duration: FireLib simulations end once the fire reaches one edge 

of the map. 

– Ignition point: The ignition point in the case of fireLib was the central cell 

of the map. 

Table 1 shows the assigned probability distributions for each type of input 

parameter. As regards wind speed and direction, the chosen distributions and 

their associated parameters were the ones used in [18]. The vegetation models 

correspond to the 13 standard Northern Forest Fire Laboratory (NFFL) fuel 

models [2]. 

8 

38

Input Distribution µ,σ Min,Max 

Vegetation Uniform 

model 

— 1,13 

Wind 

Speed 

Normal 12.83,6.25 — 

Wind 

rectionDi- 

Normal 56.6,13.04 — 

Dead fuel Uniform — 0,1 

moisture 

Live fuel Uniform — 0,4 

moisture 

Table 1. Input parameters distributions description. 

Once established the distribution of each input parameter, a set of 38750 

different combinations of input data sets was generated, and the simulations of 

each scenario were performed. 

As regards the computational platform, all the experiments carried out in 

this work were done on a cluster of 32 IBM x3550 nodes, each of which counting 

on 2 x Dual-Core Intel Xeon CPU 5160, 3.00GHz, 4MB L2 cache memory (2x2) 

and 12 GB Fully Buffered DIMM 667 MHz, running Linux version 2.6.16. 

Fig. 5. Execution times using fireLib. 

Empirical evaluation. As one can see in Figure 5, the variance on the simulation 

time is very noticeable. The great majority of the executions are located 

under the 2500 seconds threshold, but there were several executions that lasted 

more than 30000 seconds, and even more than 50000 seconds. 

From the point of view of emergency prediction, it is crucial to have the question 

of execution time under control, so we may deal with cases that drastically 

9 

39

slow down the prediction process. An elapsed time prediction for a simulator 

execution with an error on the order of thousands of seconds would be prohibitive, 

so, from cases like this one, there arises the need to be able to predict 

how the simulator is going to behave and, therefore, the need to use an efficient 

classification technique. 

In order to respond to this need, the experimental study carried out in this 

work consisted of using decision trees as the classification method, to be able to 

estimate, in advance, the execution time of fireLib, given a new unknown set of 

input parameters. 

The decision trees used in this research were the generated by the C4.5 

algorithm [17], specifically, the J48 open source Java implementation of the C4.5 

algorithm in the Weka [16] data mining tool. The data obtained from the 38750 

executions was used as a training set, and 1000 new instances were generated 

(according to the distributions shown in Table 1) to be used as a test set. 

The number of classes, and the execution time intervals they represent, were 

determined taking into account where our work is framed, i.e. the intervals chosen 

for each class are those that in a real emergency situation would matter (it has 

no sense, for example, to classify by intervals of 10 seconds when predicting 

forest fire spread). The defined classes are the following, where ET stands for 

execution time: 

– Class A: ET ≤ 900 seconds. 

– Class B: 900 seconds < ET ≤ 1800 seconds. 

– Class C: 1800 seconds < ET ≤ 3600 seconds. 

– Class D: 3600 seconds < ET ≤ 7200 seconds. 

– Class E: 7200 seconds < ET. 

The results of the application of decision trees to the test set are summarized 

in Table 2. Here, one of the main aspects to highlight is the prominence of the 

main diagonal, which means that perfect matches are predominant over the 

whole set of predictions. Furthermore, one can notice that the values decrease 

as one moves away from the main diagonal. Indeed, the worst possible cases 

(predict A when the real class is E, and vice-versa), never happened. 

Predicted Class 

A B C D E 

A 669 14 4 2 0 

B 17 72 9 4 0 

C 2 12 72 12 4 

D 5 6 14 24 5 

E 0 3 2 12 36 

Table 2. Correspondence between real and predicted classes. 

Real Class 

10 

40

Fig. 6. Classification accuracy. 

Figure 6 shows the absolute values of the number of predictions that totally 

hit the real class, as well as the absolute values where the prediction had an 

accuracy determined by the distance between classes. A Distance X accuracy 

means that there are X-1 classes between the predicted class and the real class. 

The most noticeable aspect when analyzing this graphic is that if we consider 

Distance 1 as a good prediction accuracy, then the results obtained present a 

96.8% of satisfactory classifications. 

4.2 Adjustment Time Restriction 

Once we have the capability to classify the time incurred in a simulation with a 

particular set of parameters, in the following experimental study we expose how 

we can take a great advantage from this technique, applying it in the two stage 

prediction strategy described in Section 2. 

The aim of this experiment is to demonstrate the benefits of the ability to 

discard, in advance, those initial settings for the simulation which execution 

times would cause the adjustment technique to last more than the initial pre-set 

deadline for the prediction. Furthermore, we demonstrate that the application of 

the above described classification technique does not have impact in the quality 

of the prediction results. 

Test bed description. In this case, we have used FARSITE [9], as a fire spread 

simulator. We carried out an analogous process as the one described in 4.1, with 

a training database of 20934 simulations, executed in the same computational 

platform. The distribution of each input parameter also corresponds to the one 

specified in Table 1. This experiment uses the GIS data from the benchmark 

provided by FARSITE (the Ashley project), and in every case, a simulation of 

30 hours is performed. 

The adjustment technique chosen for this study was the Genetic Algorithm. 

In this study, we analyse the results obtained from the calibration step, considering 

the adjustment time interval [0 hours - 5 hours]. It has been carried out 

ten experiments, starting from ten different initial random populations of fifty 

individuals, and evolving them through five generations. 


41

Population Calibration Error 

#generations 

with Class C 

members 

Average 

estimated 

execution time 

0 0.31238 0 6000 s 

1 0.120206 0 6000 s 

2 0.203242 2 9000 s 

3 0.127323 4 12000 s 

4 0.13543 2 9000 s 

5 0.022934 0 6000 s 

6 0.071767 1 7500 s 

7 0.178331 2 9000 s 

8 0.1724 0 6000 s 

9 0.209174 0 6000 s 

Table 3. Results obtained in the calibration interval [0 hours - 5 hours] for each 

population. 

It is worth emphasizing that the computational resource used in this work 

provides enough computing elements to be able to execute every individual of a 

given generation in a different node, i.e. all individuals of each generation start 

their corresponding simulation at the same time, being processed in parallel. 

This fact implies that the time incurred in processing each generation depends 

on the individual which produces the slowest simulation. 

In this experiment, we also establish a timeout of one hour, so simulations 

that reached this threshold were discarded from the study. 

Analysis of the results. Table 3 summarizes the obtained results for this 

experiment. The values in the second column correspond to the error of the 

best individual after five generations for each particular population. Since the 

underlying fire simulator produces a raster file indicating the time of arrival of 

the fire for each cell of the simulated map, the quality error is calculated by 

means of the following formula: 

E = 

(Cells ∪−InitCells) − (Cells ∩−InitCells) 

RealCells − InitCells 

This equation calculates the differences in the number of cells burned, both 

missing or in excess, between the simulated and the real fire. Cells∪ is the 

union of the number of cells burned in the real fire and the cells burned in the 

simulation, Cells∩ is the intersection between the number of cells burned in the 

real fire and in the simulation, RealCells are the cells burned in the real fire 

and InitCells are the cells burned at the starting time. 

As it was expected, different initial populations lead to different quality of results. 

Nevertheless, since our techniques are supposed to be applied in an urgent 

12 

42

situation, it is worth examining the time spent on each evolution process. For 

this purpose, we perform a post-mortem classification of the individuals involved 

in that process. This classification was performed following the methodology described 

in the previous section, defining the following classes, where ET stands 

for execution time: 

– Class A: ET ≤ 600 seconds. 

– Class B: 600 seconds < ET ≤ 1800 seconds. 

– Class C: 1800 seconds < ET ≤ 3600 seconds. 

As stated above, the time spent in each generation depends on the individual 

which produces the slowest simulation in that particular generation. Therefore, in 

order to evaluate the elapsed time for each evolution process, we focus on analyse, 

for each population, how many generations have individuals that notably delays 

its evolution, i.e. for each population, how many generations have individuals 

classified as C. Table 3 summarizes this information in the third column. 

From the complete set of input parameter combinations tested for the Genetic 

Algorithm, 23 individuals belonged to Class C. Applying the classification 

schema described in the previous section, 19 of them were correctly classified, 

i.e. the process provided a 82.36% hit ratio. 

It is worth mentioning that in those cases where a generation included one 

or more Class C members, at least one of them was correctly classified and, 

consequently, the wrong-classified individuals did not affect the average estimated 

execution time. This value, summarized in the fourth column of Table 3, 

is the summation of the estimated average time for each generation, which is the 

average value of the interval corresponding to the slowest class present in the 

generation. 

Another interesting results that should be pointed out from this experiment, 

consist of the lack of relation between the time incurred during the calibration 

step and the quality of results obtained. This fact becomes clear when examining 

the most and least favorable cases (populations 5 and 3, respectively). As one 

can see, the error obtained in population 5 is approximately six times lesser 

than the one obtained in popultaion 3. Besides, the average execution time of 

the former was one half of the one produced by the latter, which in absolute 

terms means a difference of 200 minutes. 

The main conclusion of this experiment is that if we apply the classification 

strategy previous to the submission of the individuals to the computing platform, 

we are able to detect, in advance, combinations of input parameters that will 

make the adjustment process to increase its duration prohibitely. Therefore, 

we can remove them from the process, and this elimination will not affect the 

accuracy of the results. 

Furthermore, for this experimental study, each generation was processed in 

a parallel way, so one can realize the huge gain that can be obtained from 

the application of the proposed methodology for prediction time and quality 

enhancement assessment. 

13 

43

5 Conclusions 

Natural hazard management is undoubtedly a relevant application area in which 

the DDDAS paradigm can play a very important role. As it has been proved in 

previous works, the application of this paradigm becomes crucial in order to 

improve the quality of the predictions given by the simulators. Particularly, the 

combination with the above exposed two-stage prediction method, contributes 

to relieve the input uncertainty problem and, therefore, enhancing the quality 

of prediction. 

This work constitutes an essential part of a very ambitious project, which 

consists of determining in advance how a certain combination of natural hazard 

simulator, computational resources, adjustment strategy, and frequency of data 

acquisition will perform, in terms of execution time and prediction quality. 

Since we are dealing in the area of natural hazards management, it is absolutely 

necessary to take into account the time incurred for the prediction 

method. For this purpose, we have designed a methodology to assess in the 

urgency-accuracy binomial in each particular case. 

As it is well known, the execution time of a particular simulator depends on 

the specific setting of the input parameters. However, as it has been exposed, 

it becomes hard to predict how certain variations on certain input parameters 

would affect the execution time. In this work, we approach such a challenge by 

means of Artificial Intelligence techniques. Particularly, in this work we present 

how we deal with simulators characterization by means of the use of decision 

trees as classification technique. 

The experimental studies have been done using two different forest fire spread 

simulator. The proposed classification scheme has been carried out considering 

FireLib and FARSITE simulators and a huge set of input parameters combinations, 

in order to validate the classification strategy with different setup conditions. 

The obtained results demonstrate that the use of decision trees as classification 

strategy is suitable for this research, obtaining up to 96.8% of satisfactory 

classification prediction. Furthermore, it has been demonstrated that it is 

possible to notably speedup the calibration process by the application of this 

classification strategy, without any loss of the results’ quality. 

These results represent a great advance and allow us to tackle the subsequent 

steps of the proposed methodology with a guaranteed background. 

References 

1. R. C. Rothermel. How to Predict the Spread and Intensity of Forest and Range 

Fires, USDAFS,OgdenTU,Gen.Tech.Rep.INT-143,pp. 1–5.1983. 

2. F. A. Albini. Estimating wildfire behavior and effects. Gen.Tech.Rep.INT-GTR- 

30. Ogden, UT: U.S. Department of Agriculture, Forest Service, Intermountain 

Forest and Range Experiment Station. 1976. 

14 

44

3. B. Abdalhaq, A methodology to enhance the Predction of Forest Fire Propagation, 

PhD Thesis dissertation. Universitat Autònoma de Barcelona (Spain). June 2004. 

4. K. Wendt, A. Cortés and T. Margalef, Knowledge-guided Genetic Algorithm for 

input parameter optimisation in environmental modelling, Procedia Computer Science 

2010, Volume 1(1), International Conference on Computational Science (ICCS 

2010), pp. 1361–1369. 

5. G. Bianchini, A. Cortés, T. Margalef and E. Luque, Improved Prediction Methods 

for Wildfires Using High Performance Computing A Comparison, LNCS,Volume 

3991, pp. 539–546, 2006. 

6. G. Bianchini, M. Denham, A. Cortés, T. Margalef and E. Luque, Wildland Fire 

Growth Prediction Method Based on Multiple Overlapping Solution, Journal of 

Computational Science, Volume 1, Issue 4, pp. 229–237. Ed. Elsevier Science. 2010. 

7. R. Rodríguez, A. Cortés and T. Margalef, Injecting Dynamic Real-Time Data into 

a DDDAS for Forest Fire Behavior Prediction, Lecture Notes in Computer Science, 

Volume 5545(2), pp. 489–499, 2009. 

8. M. Denham, A. Cortés A. and T. Margalef, Computational Steering Strategy to 

Calibrate Input Variables in a Dynamic Data Driven Genetic Algorithm for Forest 

Fire Spread Prediction, Lecture Notes in Computer Science, Volume 5545(2), 

pp. 479–488, 2009. 

9. M. A. Finney, FARSITE: Fire Area Simulator-model development and evaluation, 

Res. Pap. RMRS-RP-4, Ogden, UT: U.S. Department of Agriculture, Forest Service, 

Rocky Mountain Research Station, 1998. 

10. A. Lopes, M. Cruz and D. Viegas FireStation - An integrated software system 

for the numerical simulation of fire spread on complex toography. Environmental 

Modelling and Software 17(3), pp. 269–285. 2002. 

11. FIRE.ORG - Public Domain Software for the Wildland fire Community. 

http://www.fire.org. 

12. fireLib User Manual and Technical Reference (online). 

http://www.fire.org/downloads/fireLib/1.0.4/doc.html. 

13. F. Darema, Dynamic Data Driven Applications Systems: A New Paradigm for Application 

Simulations and Measurements, ICCS 2004, LNCS 3038, Springer Berlin 

/ Heidelberg, pp. 662–669. 2004. 

14. Dynamic Data Driven Application Systems homepage. http://www.dddas.org. 

15. P. Beckman, S. Nadella, N. Trebon and I. Beschastnikh, SPRUCE: A System for 

Supporting Urgent High-Performance Computing, Grid-Based Problem Solving Environments, 

Volume 239/2007, pp. 295–311. 2007. 

16. G. Holmes, A. Donkin and I. H. Witten. Weka: A machine learning workbench, 

Proceedings of the Second Australia and New Zealand Conference on Intelligent 

Information Systems, Brisbane, Australia. pp. 357–361. 1994. 

17. J. R. Quinlan. Improved use of continuous attributes in c4.5, Journal of Artificial 

Intelligence Research, Volume 4, pp. 77–90. 1996. 

18. R. E. Clark, A. S. Hope, S. Tarantola, D. Gatelli, P. E. Dennison and M. A. Moritz, 

Sensitivity Analysis of a Fire Spread Model in a Chaparral Landscape, Fire Ecology, 

Volume 4(1), pp. 1–13. 2004. 

19. H. Madsen and F. Jakobsen Cyclone induced storm surge and flood forecasting in 

the northern Bay of Bengal, Coastal Engineering, Volume 51, Issue 4, pp. 277–296. 

2004. 

20. S. D. Aberson, Five-day tropical cyclone track forecasts in the North Atlantic basin, 

Weather and Forecasting, Volume 13, pp. 1005–1015. 1998. 

21. H. C. Weber, Hurricane Track Prediction Using a Statistical Ensemble of Numerical 

Models, MonthlyWeatherReview,Volume131,pp.749-770.2003. 

15 

45

A Hybrid Algorithm combining Path Scanning 

and Biased Random Sampling for the Arc 

Routing Problem 

Sergio González 1 , Angel A. Juan 1 , Daniel Riera 1 , and José Cáceres 1 

Estudis d’informàtica, Multimèdia i Telecomunicació 

Open University of Catalonia - IN3 

Barcelona, Spain 

{sgonzalezmarti,ajuanp,drierat,jcaceresc}@uoc.edu 

Abstract. The Arc Routing Problem is a kind of NP-hard routing problems 

where the demand is located in some of the arcs connecting nodes 

and should be completely served fulfilling certain constraints. This paper 

presents a hybrid algorithm which combines a classical heuristic with biased 

random sampling, to solve the Capacitated Arc Routing Problem 

(CARP). This new algorithm is compared with the classical Path scanning 

heuristic, reaching results which outperform it. As discussed in the 

paper, the methodology presented is flexible, can be easily parallelised 

and it does not require any complex fine-tuning process. Some preliminary 

tests show the potential of the proposed approach as well as its 

limitations. 

Keywords: Arc Routing Problem, Combinatorial Optimisation, Hybrid Algorithms, 

Metaheuristics, Simulation. 


The Arc Routing Problem (ARP) is the counterpart to the Vehicle Routing Problem 

(VRP). In the latter, the demand is placed in nodes (i.e. clients) whereas in 

the ARP, it is located in arcs. Existing literature on ARP is not so extensive as 

for the VRP, although there are approaches to the VRP which can be adapted 

to the ARP, obtaining fairly good results. The objective of this research is to 

adapt the general idea proposed in [15], which were proposed for the Capacitated 

Vehicle Problem (CVRP) and apply them to the Capacitated Arc Routing 

Problem (CARP). 

The structure of this paper is as follows. First, the CARP problem is stated, 

establishing some assumptions and the basic notation. Section 3 visits the existing 

literature to revise the state of the art. Later, in section 4, the proposed 

Proceedings of the 18 th RCRA workshop on Experimental Evaluation of Algorithms 

for Solving Problems with Combinatorial Explosion (RCRA 2011). 


46

algorithm is introduced, presenting the classic Path Scanning algorithm proposed 

by Golden et al. in [9] first. Section 5 displays and discusses some results, 

and finally, section 6 states some conclusions and future work. 

2 The Capacitated Arc Routing Problem 

The Arc Routing Problem is a set of NP-hard problems where the objective is 

to determine a routing plan on a graph to serve a given set of nodes and arcs 

(also known as tasks), in contrast with the Vehicle Routing Problem, where the 

customer’s demands occur in nodes and the arcs are only used to model paths 

interconnecting nodes. The CARP is a particular case of the ARP in which every 

vehicle serving a route has a maximum capacity. 

The CARP problem is subjected to the following constraints: 

(a) All routes begin and end at the depot, 

(b) every vehicle has a maximum load capacity, which is considered to be the 

same for all vehicles, 

(c) every arc with positive demand must be satisfied, 

(d) every arc is served by a single vehicle, 

(e) every arc can be traversed as many times as required (though it can only be 

served by a single vehicle), 

(f) and the total routing cost is minimised. 

2.1 Basic Notation and Assumptions 

The CARP is defined over a non-complete graph G = (V,E), where V = 

{0, 1, 2,...,n} represents the set of n nodes, and E ⊆ A, A = {(i, j)|i, j ∈ 

V ; i = j} represents the set of m arcs connecting pairs of nodes, in which the 

demand is located. A fleet of identical vehicles of capacity W , based on depot 

node 0 must serve the set of customers denoted by E. Every edge e =(i, j) has 

a traversal cost or length cij, a positive or zero demand dij and, as undirected 

arc, can be traversed in both directions. Thus, a subset R of required edges E 

(or tasks) must be served by the fleet, which are those with positive demands, 

dij > 0. 

In this scenario, the classical CARP goal is to determine a minimum length 

set of vehicle trips covering all the tasks with the following constraints: 

(a) Every trip leaves the depot, visits a subset of tasks whose total demand does 

not exceed W and returns to the depot, 

(b) and every task must be served by one and only one vehicle, though edges 

can be traversed multiple times. 

47 

2

3 Related work on the Capacitated Arc Routing Problem 

The ARP has been widely studied, though existing publications on this field 

are considerably lower in number than those for the VRP. Some authors have 

published reviews of the advances performed on the ARP field. Particularly good 

results are [25], [14] and [5]. 

The study of the ARP was introduced on 1735, when Leonhard Euler presented 

his solution for the Königsberg bridge problem [12]. This problem, also 

known as the Euler Tour Problem, proposes, given a connected graph G =(V,E), 

to find a closed tour visiting every edge in E exactly once, or to determine that 

no such tour exists. 

The next ARP historically proposed was the Chinese Postman Problem 

(CPP), suggested on [21]. This is stated as given a connected graph G =(V,E,C) 

where C is a distance matrix. The aim is to find a tour which crosses every edge 

at least once and does this in the shortest possible way. When G is completely 

directed or completely symmetric, it can be resolved in polynomial time [4]. 

Following the CPP, in [22], Orloff suggested the Rural Postman Problem 

(RPP), which is formally stated as follows. Given an undirected graph G = 

(V,E,C) whereC is the cost matrix for the edges, find a minimum cost tour 

which passes through every edge in a subset R of E at least once. It can be 

proved that the RPP is NP-hard [16] and its hardness comes from determining 

how the tour should connect the various components of R. After that, the Min- 

Max k-Chinese Postman Problem (MM k-CPP) can be found in [7]. In this case, 

given a connected undirected graph G =(V,E,C) whereC is a cost matrix 

with a special depot node, the aim is to find k tours, starting and ending in 

the depot node, such that every edge is covered by at least one tour and the 

length of the longest tour is minimised. It should be noted that for this problem 

the objective is to minimise the makespan, whereas most other problems with 

multiple postmen (or vehicles) seek to minimise the total distance travelled. 

The CARP was introduced in [8], and was originally stated as follows. Given 

a connected undirected graph G =(V,E,C,Q) whereC is a cost matrix and Q 

is a demand matrix, and given a number of identical vehicles with capacity W , 

find the necessary tours such that: 

(a) Every arc with positive demand is served by exactly one vehicle, 

(b) the sum of demands on those arcs served by every vehicle does not exceed 

W , 

(c) and the total cost of the tours is minimised. 

This problem is considered as the classical CARP. It can be proved that 

the Capacitated Vehicle Problem (CVRP) can be transformed into a CARP [8], 

and the CARP can be transformed in CVRP [1], which make the two classes of 

problems equivalent, so algorithms used to solve one class can easily be adapted 

to solve the other, as we intend to do with our algorithm. For the transformation 

of CARP into CVRP, the resulting CVRP requires either fixing the variables or 

using edges with infinite cost. Furthermore, the resulting CVRP is a complete 

48 

3

graph of larger size than that of the original CARP. There also exists a problem 

in half way between CARP and CVRP, which is the Stringed CVRP [20], in 

which customers are to be served as in the CVRP but some of these customers 

are located along the streets. 

During the eighties, problem specific heuristics were the most widely used 

methods for solving the CARP. They include the Construct-Strike algorithm, 

the Path-Scanning and the Augment-Merge algorithms [9]. The performance 

of those classical methods is generally 10% to 40% above the optimal solution. 

Pearn [23] proposed modified versions for those heuristics by adding several types 

of randomness, outperforming original heuristics. There exist several benchmarks 

to test the performance of the algorithms against the classical CARP, which can 

be downloaded from [2], and which will be used in this paper to test the proposed 

algorithm. 

More recently, other problem specific heuristics have been proposed. Some of 

them are the Double Outer Scan heuristic [24], which combines the Augment- 

Merge and the Path-Scanning methods, and the Node Duplication heuristic [24], 

which uses similar ideas to those proposed in the Node Duplication Lower Bound 

[11]. 

Recently, most advances in development of heuristics for the classical CARP 

regard metaheuristics. Tabu Search algorithms have been constructed for solving 

the CARP. The first, called CARPET, was proposed in [13]. In it, unfeasible 

solutions are allowed but are also penalised. This algorithm outperformed the 

existing ones and is still one of the best performing algorithms for CARP. Also a 

combination of Tabu Search and Scatter Search to construct Tabu Scatter Search 

was proposed by Greistorfer [10]. Lacomme presented both a Genetic Algorithm 

[17] and a Memetic Algorithm [18]. In both algorithms crossover is performed 

on a giant tour, and fitness of a chromosome is based on the partitioning of 

the tour into vehicle tours. Currently these algorithms are among the very best 

performing solutions for the CARP. 

Another even younger generation of metaheuristics is that of the Ant Colony 

Systems. Lacomme [18] proposes an algorithm where two types of ants are used, 

one that makes the solution converge towards a minimum cost solution and 

another which ensures diversification to avoid getting trapped in a local minimum. 

A Guided Local Search algorithm is proposed in [3], suggesting that the 

distance of each edge is penalised according to some function which is adjusted 

throughout the algorithm. Computational experiments shows that this approach 

is promising. 

4 Proposed algorithm 

In this section the proposed algorithm is detailed. To do that, first of all the Path 

Scanning algorithm is reviewed to present later our approach on the Randomised 

Path Scanning. 

49 

4

4.1 Path Scanning algorithm 

The Path Scanning algorithm [9] is a simple and efficient algorithm which aims 

to get competitive results for CARP in low computational times. Its main idea is 

to construct five complete solutions, every one of them following an optimisation 

criterion. The final solution of the algorithm is the best —in terms of cost— of 

the five obtained. 

The way every route is constructed is not clearly defined in the original 

Golden paper, so it allows different interpretations when trying to implement it. 

The approach followed in this paper is to extend the current route by selecting 

only adjacent arcs with unserved demand, selecting that which best accomplishes 

the given criterion. Those five criterion consider that the vehicle is at node i and 

the route through the selected arc e to the node j: 

(1) Minimise the cost per unit demand (min{cij/dij}) 

(2) Maximise the cost per unit demand (max{cij/dij}) 

(3) Minimise the distance from node j back to depot. 

(4) Maximise the distance from node j back to depot. 

(5) If vehicle is less than half-full, minimise distance from node j back to depot, 

otherwise maximise this distance. 

In the case of non adjacent arcs with existing unserved demand, the closest 

arc —in terms of the shortest path distance— is selected. If there exists more 

than one arc at the same minimum distance, then the best arc accomplishing 

the current optimisation criterion is selected. 

Finally, once the vehicle capacity is exhausted, the current route is closed 

by returning the vehicle to the depot through the shortest path. The original 

algorithm does not state how the shortest path is computed. In our approach an 

implementation of Dijkstra’s algorithm is used. 

4.2 Randomised Path Scanning 

Recent advances in development of high-quality pseudo-random number generators 

have opened perspectives regarding the use of Monte Carlo Simulation 

(MCS) in combinatorial problems. As stated previously in this paper, the idea 

behind our algorithm is based on that from Juan et al. [15] for the CVRP. In 

that paper, a classical heuristic for the CVRP, as the Clarke and Wright Savings 

(CWS) heuristic, was chosen and combined with the MCS methodology. Thus, 

some random behaviour was introduced to the CWS heuristic in order to start 

an efficient search process inside the space of feasible solutions, which allows to 

improve original CWS results. 

Notice that this general approach has similarities with the Greedy Randomised 

Adaptive Search Procedure (GRASP) [6]. But our approach does not 

contain an expensive local search phase and includes a more detailed randomised 

construction step. 

In the studied case, the Path Scanning heuristic for CARP has been chosen 

and is combined with MCS to add randomness allowing it to reach better results. 

50 

5

With that, the Randomised Path Scanning (RPS) is obtained. In the proposed 

algorithm, two random processes are introduced into the original algorithm: 

(1) When constructing every solution, the optimisation criterion to select the 

next arc is not known beforehand. A criterion is randomly selected, with 

uniform probability distribution ([23] states that it gets better results than 

other probability distributions). 

(2) When selecting the next arc, the arc which better accomplishes the selected 

criterion is not chosen by default. All the candidate arcs are sorted following 

the selected criterion and a weight is given to every one of them, following 

a geometrical distribution. Thus, the next arc is selected randomly with a 

geometric probability distribution. 

With this randomisation, many valid solutions can be generated. An efficient 

search process inside the feasible solutions is started where each of these feasible 

solutions consists on a set of round-trip routes from the depot that, altogether, 

satisfy the demand of the arcs. 

Pseudo-code The algorithm is implemented as described next. First, the problem 

instance is loaded from the data files. Next, the arcs are extracted from the 

problem instance and stored in a static structure. After that, all the shortest 

paths between all pairs of nodes are computed using a Dijkstra implementation 

for the shortest path algorithm. A loop constructing complete solutions is started 

and finally the best solution among all the generated solutions, is selected. 

procedure RandPS; 

begin 

arp = getInstanceInputs (); 

arcs = getArcs(arp ); 

paths = constructShortestPathsMatrix ( arcs ); 

while stopping criterion not satisfied 

sol = buildRandomizedSolution(arcs , paths); 

if sol . cost < bestSolution . cost 

bestSolution = sol ; 

end if 

end while 

return bestSolution ; 

end 

5 Results 

The implementation of the RPS algorithm has been done as a Java application, 

using some state-of-the-art pseudo-random number generator. In particular, 

some classes from the SSJ library [19] were implemented, among them the 

51 

6

LFSR113 with a period of approximately 2 113 . To test the new algorithm, instances 

from [2] have been used, which are based on those of [9] and will allow 

the result comparison with the original Path Scanning algorithm. 

Table 1 shows results obtained with the gdb instances. The Path scanning 

solutions were obtained from Golden’s original article [9]. RPS results and times 

are obtained from the Java implementation described in previous section, generating 

10000 solutions on the loop which selects the best one. 

Problem Nodes Arcs LB BKS PS PS Time RandPS RandPS Time Gap (%) 

gdb1 12 22 316 316 316 0,005 316 0,61 0,00 

gdb2 12 26 339 339 367 0,006 339 0,71 7,63 

gdb3 12 22 275 275 289 0,003 275 0,63 4,84 

gdb4 11 19 287 287 320 0,002 287 0,51 10,31 

gdb5 13 26 377 377 417 0,002 383 0,76 8,15 

gdb6 12 22 298 298 316 0,001 298 0,56 5,70 

gdb7 12 22 325 325 357 0,003 325 0,57 8,96 

gdb8 27 46 344 348 416 0,015 358 2,45 13,94 

gdb9 27 51 303 303 355 0,017 324 2,74 8,73 

gdb10 12 25 275 275 302 0,003 275 0,62 8,94 

gdb11 22 45 395 395 424 0,003 395 1,90 6,84 

gdb12 13 23 458 458 560 0,001 490 0,58 12,50 

gdb13 10 28 536 536 592 0,002 536 0,68 9,46 

gdb14 7 21 100 100 102 0,001 100 0,46 1,96 

gdb15 7 21 58 58 58 0,001 58 0,42 0,00 

gdb16 8 28 127 127 131 0,002 127 0,68 3,05 

gdb17 8 28 91 91 93 0,002 91 0,66 2,15 

gdb18 9 36 164 164 168 0,003 164 0,92 2,38 

gdb19 11 11 55 55 57 0,001 55 0,23 3,51 





Table 1. Results obtained with gdb instances. LB=Lower Bound; BKS=Best Known 

Solution obtained from [18]. Times expressed in seconds. Bold indicates that it achieves 

the BKS and underline that it outperforms PS solution. 

6 Conclusions and future work 

From the obtained results it can be seen that the classical Path Scanning is 

outperformed by the new RPS. Furthermore, competitive results are obtained 

in small-medium sized instances, so the new algorithm accomplishes the main 

objective of this research, which is to prove if the ideas from [15] for the CVRP 

are valid for the CARP. 

52 

7

Future work in this research will be to add splitting and cache techniques 

to the algorithm, trying to improve results and optimize the algorithm. Due to 

the independence of all the generated solutions, the algorithm could easily be 

parallelised in order to improve its performance when attempting to solve larger 

instances of CARP problems. 

An additional future objective of the research is to apply the algorithm to different 

variants inside the ARP, specially the Arc Routing Problem with Stochastic 

Demand (ARPSD) since having the proposed algorithm randomisation, we 

think that the RPS algorithm will be well suited for this problem with random 

behaviour on the arcs’ demand. 

Acknowledgements 

This work has been partially supported by the Spanish Ministry of Science and 

Innovation (TRA2010-21644-C03), and has been developed in the context of the 

CYTED-IN3-HAROSA network (http://dpcs.uoc.edu). 

References 

[1] A.A. Assad, B.L. Golden and W.L. Pearn. Transforming arc routing into node 

routing problems. Computers and Operations Research, 14(4):285-288, 1987. 

[2] J.M. Bleneguer. http://www.uv.es/belengue/carp.html 

[3] P. Beullens, D. Cattrysse, L. Muyldermans and D. Van Oudheusden. A guided 

local search heuristic for the capacitated arc routing problem. European Journal 

of Operational Research, 147:629-643, 2003. 

[4] N. Christofides. The optimum traversal of a graph. OMEGA, The International 

Journal of Management Science, 1(6):719732, 1973. 

[5] A. Corberán and C. Prins. Recent results on arc routing problems: an annotated 

bibliography. Networks, 56(1):50-69, 2010. 

[6] T.A. Feo and M.G. Resende. Greedy randomized adaptive search procedures. Journal 

of Global Optimization, 6:109-133, 1995. 

[7] G.N. Frederickson, M.S. Hecht and C.E. Kim. Approximation algorithms for some 

routing problems. SIAM Journal of Computing, 7(2):178-193, 1978. 

[8] B.L. Golden and R.T. Wong. Capacitated arc routing problems. Networks, 11:305- 

315, 1981. 

[9] B.L. Golden, J.S. DeArmon and E.K. Baker. Computational experiments with 

algorithms for a class of routing problems. Computers and Operations Research 

10:47-59, 1983. 

[10] P. Greistorfer. A Tabu Scatter Search Metaheuristic for the Arc Routing Problem. 

Computers & Industrial Engineering, 44:249-266, 2003. 

[11] R. Hirabayashi, N. Nishida and Y. Saruwatari. Node duplication lower bounds for 

the capacitated arc routing problems. Journal of the Operations Research Society 

of Japan, 35(2):119-133, 1992. 

[12] H. Sachs, M. Stiebitz and R.J. Wilson. An historical note: Euler-s Königberg 

letters. Journal of Graph Theory, 12(1):133-139, 1988. 

[13] A. Hertz, G. Laporte and M. Mittaz. A tabu search heuristic for the capacitated 

arc routing problem. Operations Research, 48(1):129-135, 2000. 

53 

8

[14] A. Hertz. Recent Trends in Arc Routing. in Graph theory, Combinatorics and 

algorithms: Operations research/computer science interfaces series, M.C. Golumbic 

and I.B.A Hartman. 2005. 

[15] A.A. Juan, J. Faulin, R. Ruiz, B. Barrios and S. Caballé. The SR-GCWS hybrid 

algorithm for solving the capacitated vehicle routing problem. Applied Soft 

Computing, 10:215-224, 2010. 

[16] A.H.G. Rinnooy Kan and J.K. Lenstra. On general routing problems. Networks, 

6:273-280, 1976. 

[17] P. Lacomme, C. Prins and W. Ramdana-Chérif. Competitive genetic algorithms 

for the capacitated arc routing problem and its extensions. Lecture Notes in Computer 

Science, 2037:473-483, 2001. 

[18] P. Lacomme, C. Prins and W. Ramdane-Chérif. Competitive memetic algorithms 

for arc routing problems. Annals of Operations Research, 131:159-185, 2004. 

[19] P. L’Ecuyer. SSJ: A framework for stochastic simulation in Java. Proceedings of 

the Winter Simulation Conference, 2002, 234-242. 

[20] A. Løkketangen and J. Oppen. Arc routing in a node routing environment. Computers 

and Operations Research, 33(4):1033-1055, 2006. 

[21] K. Mei-Ko. Graphic programming using odd or even points. Chinese Mathematics, 

1:237-277, 1962. 

[22] C.S. Orloff. A fundamental problem in vehicle routing. Networks, 4:35-64, 1974. 

[23] W.L. Pearn. Approximate solutions for the capacitated arc routing problem. Computers 

& Operations Research, 16(6):589-600, 1989. 

[24] S. Wøhlk. Contributions to arc routing. PhD thesis, University of Southern Denmark, 

2005. 

[25] S. Wøhlk. A decade of the capacitated arc routing problem. The Vehicle Routing 

Problem: Latest Advances and New Challenges. Springer 2010. 

54 

9

Algorithms for Interval Data Minmax Regret 

Paths 

Carolinne Torres 1 , César Astudillo 2 , Matthew Bardeen 3 , and Alfredo 

Candia-Véjar 4 

1 Facultad de Ingeniería 

Universidad de Talca, Chile 

carolinne@alumnos.utalca.cl 

2 castudillo@utalca.cl 

3 mbardeen@utalca.cl 

4 acandia@utalca.cl 

Abstract. The present paper advocates to the exact and heuristic resolution 

for the interval data minmax regret shortest path problem. It 

is assumed that, in an input graph, only the lower and upper bounds 

are known for the edge lengths, defining a combinatorial optimization 

problem under uncertainty. The goal is to find a path s-t, whichminimizes 

the maximum regret. The literature includes algorithms that solve 

the classic version efficiently. However, the variant that we study in this 

manuscript is known to be NP-Hard. We propose a Simulated Annealing 

(SA) algorithm to tackle the aforementioned problem, and we compare 

its performance with three other schemes, namely, an exact algorithm 

that utilizes a Mixed Integer Programming (MIP) formulation, and two 

state-of-the-art heuristics. Our experimental results consider numerous 

instances possessing different topologies and sizes. 

We study a variant of the well known Shortest Path problem (SP) problem 

for which efficient algorithms have been designed since 1959 [4]. Given a digraph 

G =(V,A) (V is the set of nodes and E is the set of arcs) with non-negative arc 

costs associated to each arc and two nodes s and t in V , SP consists of finding a 

s-t path of minimum total cost. Dijkstra designed a polynomial time algorithm 

and from this, a number of other approaches have been proposed. Ahuja et al. 

present the different algorithmic alternatives to solve the problem [1]. 

Our interest is focused on shortest path problems where there exists uncertainty 

only in the objective function parameters. In the SP problem, for each arc 

we have a closed interval defining the possibilities for the arc length. A scenario 

is a vector where each number represents an element of an arc length interval. 

The uncertainty model used here is the minmax regret approach (MMR), sometimes 

named robust deviation; in this model the problem is to find a feasible 

solution being ε-optimal for any possible scenario with ε as small as possible. 




55

One of the properties of the minmax regret model is that it is not as pessimistic 

as the (absolute) minmax model. This model in combinatorial optimization has 

been largely studied lately, see the review of Aissi et al. [2] and the book by 

Kasperski [6]. 

It is known that minmax regret combinatorial optimization problems with 

interval data (MMRCO) are usually NP-hard even in the case when the classic 

problem is easy to solve; this is the case of the minimum spanning tree problem, 

shortest path problem, assignment problems and others, see [2] for details. In a 

few cases, an NP-Hard MMRCO problem can be resolved by a polynomial time 

algorithm for a class of problem instances. This is the case of the MMR spanning 

arborescences problem in digraphs, when the input graph is acyclic, obtaining 

a polynomial time algorithm, see [3]. Several efforts have been made for obtaining 

exact solutions using the broad repertory of exact methods, principally 

formulating a MMR problem like a MIP and then using a commercial code or applying 

branch and bound, branch and cut or Benders decomposition approaches 

in a dedicated scheme. Several studies have shown that Benders decomposition 

outperforms both branch and bound and branch and cut for obtaining exact solutions 

when applied to the following problems: MMR Spanning Trees [8], MMR 

Paths [11], MMR Assignment [13] and MMR Traveling Salesman [9]. 

Exact algorithms for Minmax Regret Paths have been proposed by [5, 6, 10, 

11]. All these papers show that exact solutions for MMR SP can be obtained by 

different methods and take into account several types of graphs and degree of 

uncertainty. However, the size of the graphs tested is limited to 2000 nodes. 

We present the results of two basic and fast heuristics defined by fixing 

specific scenarios, namely the mid point scenario and upper limit scenario. 

Several applications of the shortest paths problems consider networks with 

many thousands of nodes or more and then the task of designing heuristics for 

large MMRP is important, see [14] for a recent application. In this context, 

our main contributions in this paper are the analysis of the performance of the 

CPLEX solver for a MIP formulation of MMRP, the analysis of the performance 

of known heuristics for the problem and finally the analysis of the performance 

of a proposed simulated annealing approach for the problem. For experiments we 

consider two classes of networks, random and a class of networks used in telecommunications 

and both containing different problem sizes. Instances containing 

from 100 to 20000 nodes with different degrees of uncertainty were considered. 

In Section 2 we present some notation and the problem definition, in Section 

3 a mathematical programming formulation for MMRP is presented. The simulated 

annealing approach proposed and two known heuristics for MMRP are also 

presented. Definitions of the tested problem instances and analysis of the experiments 

conducted are discussed in Section 4. Finally in Section 5 conclusions and 

future work are presented. 

56 

2

Algorithm 1 Heuristic HM 

Input: Network G, and interval costs function c 

Output: A feasible solution Y for MMR-SP 

1. For all e ∈ A do 

2. c sM 

e =(c + e + c − e )/2 

3. end For 

4. Y ← OPT(s M ) 

5. Return Y,Z(Y ) 

Algorithm 2 Heuristic HU 

1. For all e ∈ A do 

2. c sU 

e = c + e 

3. end For 

4. Y ← OPT(s U ) 

5. Return Y,Z(Y ) 

2.3 Simulated Annealing for MMRP 

Simulated Annealing (SA) is a well known probabilistic metaheuristic proposed 

by Kirkpatrick et al. for solving hard combinatorial optimization in an approximate 

way [7]. 

Usually, SA seeks to avoid being trapped in local optimum as would normally 

occur in algorithms using local search methods. A key characteristic of SA is the 

possible acceptation of worse solutions than the current during the exploration 

of the local neighborhood. Accordingly with the physical analogy of SA with 

metallurgy, several parameters must be tuned in order to find good solutions. 

Typical parameters are associated to concepts like neighborhood, cooling schedule, 

size of internal loop and termination criterion. These parameters are usually 

adjusted through experimentation and testing. 

We now describe the main definitions of the concepts and parameters generally 

used in SA algorithms. 

Search Space A subgraph S of the original graph G is defined such that this 

subgraph contains a s-t path. In S a classical s-t shortest path subproblem 

is solved, where the arc costs are chosen taking the upper limit arc costs. 

Then, the optimum solution of these problem is evaluated for acceptation. 

Initial Solution The initial solution is obtained applying the heuristic HU to 

the original network S 1 . 

Cooling Programming A geometric descent of the temperature was used according 

to parameter β. 

Internal Loop This loop is defined by a parameter L and depend on the size 

of the instances tested. 

Neighborhood Search Moves Let S i be the subgraph of G considered at the 

iteration i and let x i be the solution given by the search space at the iteration 

i. Then we generate a new subgraph S i+1 of G from S i changing the status of 

some components of the vector characterizing S i . The number of components 

59 

5

For Karasan graphs, Table 1 shows the structure of instances, the value (exact 

or approximated) given by CPLEX, the value given by the heuristics and their 

gap from the optimum value, and the value and gap given by simulated annealing 

approach. 

We note CPLEX was able to solve, optimally, instances with up to 1000 

nodes. For instances with 5000, 10000 and 20000 nodes, memory problems only 

permit feasible solutions to be found. The gap value, given in relative percentage, 

is also presented. It can be seen that the gap increases with the complexity of 

the instances. With respect to the heuristics HM and HU, it is clear that HU 

has a better performance than HM and the gap of the values of HU is always 

small. SA consistently improves the value found by the heuristics. 

For Random graphs Table 3 shows the structure of each instance, the optimum 

value (exact or approximated) given by CPLEX, the value given by the 

heuristics and their gap from the optimum value, and the value given by the 

Simulated Annealing approach along with its gap. In this case CPLEX was able 

to optimally solve all of the instances. Again HU has better performance than 

HM and almost always finds an optimal solution. In two particular cases HM 

and HU only find good feasible solutions. In these cases, SA was not be able to 

improve on the solution provided by HU. 

Table 2 shows the times taken by CPLEX, HM, HU and SA for Karasan 

graphs. CPLEX had memory problems when solving problems with 10000 and 

20000 nodes. The execution time for CPLEX in these instances is also very high 

(in some cases, over 10 hours). Both heuristic methods take very little time to 

execute, and Simulated Annealing takes slightly over 2000 seconds in the worst 

case. 

Table 4 shows the times taken by CPLEX, HM, HU and SA for random 

graphs. We note execution times for optimal solutions given by CPLEX increases 

as a function of the number of nodes but the highest time is lower than two 

minutes. Execution times for HM and HU are negligible and for SA, the highest 

time is about two minutes. 

4 Conclusions and Future Work 

A simulated annealing algorithm was proposed for solving the interval data minmax 

regret path problem. The performance of this algorithm was compared with 

the performance of two known simple but effective heuristics for MMRCO problems; 

the optimal solution (in most cases) was provided by CPLEX from a linear 

integer programming formulation known for MMRP. Two classes of instances 

were considered for experimentation; random graphs and Karasan graphs. For 

random graphs, the optimal solutions were always obtained by CPLEX in reasonable 

times. The heuristic using the upper bound scenario outperforms that 

using the midpoint scenario and almost always finds the optimal solutions. In 

these cases, Simulated Annealing was obviously not able to improve on the initial 

solution given by the heuristics. 

61 

7

Table 1. The results of the analyses for Karasan graphs. For graph instances where 

HU did not find the optimum value, SA was always able to improve the results. In cases 

with many nodes, optimum values were sometimes not found by CPLEX (G >0). In 

these cases, b is used to indicate an unknown gap for the heuristic and SA methods. 

Instance |A| CPLEX/G HM/G HU/G SA/G 

K-100-200-0.9-7 651 79/0 89/12.7 80/1.3 80/1.3 

K-100-200-0.9-15 1275 29/0 33/13.8 30/3.4 30/3.4 

K-100-200-0.9-50 2500 34/0 34/0 34/0 34/0 

K-500-200-0.9-5 2475 1715/0 1840/7.3 1767/3.0 1742.8/1.6 

K-500-200-0.9-12 5856 220/0 243/10.5 227/3.2 223/1.4 

K-500-200-0.9-17 8211 95/0 95/0 95/0 95/0 

K-1000-200-0.9-7 6951 1438/0 1534/6.7 1445/0.5 1442/0.3 

K-1000-200-0.9-21 20559 182/0 205/12.6 186/2.2 183.6/0.9 

K-1000-200-0.9-60 56400 46/0 48/4.3 47/2.2 47/2.2 

K-5000-200-0.9-4 19984 18860/4.56 19684/b 19635/b 19186/b 

K-5000-200-0.9-10 49990 3289/0 3406/3.6 3299/0.3 3297/0.3 

K-5000-200-0.9-18 89676 1201/0 1268/5.6 1219/1.5 1210.4/0.8 

K-10000-200-0.9-4 39984 36537/6.97 38346/b 37685/b 36954/b 

K-10000-200-0.9-8 79936 10780/4.02 11361/b 10940/b 10863/b 

K-10000-200-0.9-15 149775 3020/1.90 3220/b 3039/b 3034.2/b 

K-20000-200-0.9-3 59991 118101/8.62 123392/b 124170/b 122227/b 

K-20000-200-0.9-5 99975 50711/6.23 53169/b 51735/b 51319.3/b 

K-20000-200-0.9-8 159936 20968/4.25 22107/b 21231/b 21142.7/b 

Table 2. Execution times (in seconds) for Karasan graphs. 

Instance |A| CPLEX HM HU SA 

K-100-200-0.9-7 651 0.05 0.00 0.00 0.10 

K-100-200-0.9-15 1275 0.05 0.00 0.00 0.12 

K-100-200-0.9-50 2500 0.05 0.01 0.01 0.14 

K-500-200-0.9-5 2475 9.41 0.01 0.01 1.89 

K-500-200-0.9-12 5856 0.85 0.02 0.02 2.56 

K-500-200-0.9-17 8211 0.86 0.03 0.03 3.01 

K-1000-200-0.9-7 6951 3736.00 0.03 0.03 9.79 

K-1000-200-0.9-21 20559 5.21 0.06 0.06 14.80 

K-1000-200-0.9-60 56400 4.74 0.16 0.15 28.01 

K-5000-200-0.9-4 19984 5319.99 0.09 0.09 216.29 

K-5000-200-0.9-10 49990 48183.76 0.19 0.18 145.76 

K-5000-200-0.9-18 89676 6567.90 0.31 0.30 142.41 

K-10000-200-0.9-4 39984 6304.30 0.19 0.19 2028.52 

K-10000-200-0.9-8 79936 12288.09 0.32 0.31 765.58 

K-10000-200-0.9-15 149775 43200.00 0.52 0.51 534.51 

K-20000-200-0.9-3 59991 11523.00 0.25 0.18 324.20 

K-20000-200-0.9-5 99975 10608.20 0.17 0.17 326.60 

K-20000-200-0.9-8 159936 12328.30 0.63 0.54 328.80 

62 

8

Table 3. The results of the analyses for Random graphs. For graph instances where 

HU did not find the optimum value, SA was always able to improve the results. 

Instance |A| CPLEX/G HM/G HU/G SA/G 

R-100-200-0.9-0.060 593 164/0 176/7.32 164/0 164/0 

R-100-200-0.9-0.110 1088 189/0 190/0.52 190/0.52 190/0.52 

R-100-200-0.9-0.250 2475 151/0 151/0 151/0 151/0 

R-500-200-0.9-0.010 2494 476/0 477/0.21 477/0.21 477/0.21 

R-500-200-0.9-0.020 4989 320/0 322/0.63 320/0 320/0 

R-500-200-0.9-0.035 8732 252/0 295/17.06 252/0 252/0 

R-1000-200-0.9-0.005 4994 328/0 384/17.07 328/0 328/0 

R-1000-200-0.9-0.010 9989 215/0 215/0 215/0 215/0 

R-1000-200-0.9-0.060 59939 113/0 139/23.43 113/0 113/0 

R-5000-200-0.9-0.001 24995 317/0 317/0 317/0 317/0 

R-5000-200-0.9-0.002 49990 320/0 395/23.43 320/0 320/0 

R-5000-200-0.9-0.004 99980 262/0 270/3.05 262/0 262/0 

R-10000-200-0.9-0.0004 39995 483/0 488/1.04 483/0 483/0 

R-10000-200-0.9-0.0008 79991 303/0 303/0 303/0 303/0 

R-10000-200-0.9-0.0015 149985 341/0 362/6.16 341/0 341/0 

R-20000-200-0.9-0.00013 51997 745/0 745/0 745/0 745/0 

R-20000-200-0.9-0.00023 91995 498/0 498/0 498/0 498/0 

R-20000-200-0.9-0.00043 171991 289/0 289/0 289/0 289/0 

Table 4. Execution times (in seconds) for Random graphs. 

Instance |A| CPLEX HM HU SA 

R-100-200-0.9-0.060 593 0.04 0.00 0.00 0.03 

R-100-200-0.9-0.110 1088 0.08 0.00 0.00 0.04 

R-100-200-0.9-0.250 2475 0.16 0.00 0.00 0.07 

R-500-200-0.9-0.010 2494 0.29 0.00 0.00 0.65 

R-500-200-0.9-0.020 4989 0.61 0.01 0.01 1.06 

R-500-200-0.9-0.035 8732 0.71 0.01 0.01 1.53 

R-1000-200-0.9-0.005 4994 0.61 0.01 0.01 2.59 

R-1000-200-0.9-0.010 9989 0.93 0.01 0.01 3.86 

R-1000-200-0.9-0.060 59939 6.03 0.07 0.07 18.34 

R-5000-200-0.9-0.001 24995 7.92 0.05 0.05 14.61 

R-5000-200-0.9-0.002 49990 16.31 0.08 0.08 2974.00 

R-5000-200-0.9-0.004 99980 82.35 0.36 0.37 49.87 

R-10000-200-0.9-0.0004 39995 22.12 0.10 0.10 44.21 

R-10000-200-0.9-0.0008 79991 21.97 0.14 0.14 5.27 

R-10000-200-0.9-0.0015 149985 58.42 0.26 0.27 118.21 

R-20000-200-0.9-0.00013 51997 30.58 0.13 0.13 47.40 

R-20000-200-0.9-0.00023 91995 68.40 0.23 0.24 77.01 

R-20000-200-0.9-0.00043 171991 83.78 0.34 0.33 119.26 

63 

9

For Karasan graphs, the optimal solution given by CPLEX was obtained 

only for instances with less than 5000 nodes. All of the instances with 10000 

or more nodes had estimated gaps between 1.9% and 8.62%. CPLEX was not 

able to find the optimum in these cases because of memory overflow errors. The 

Heuristic HU almost always outperformed HM and found solutions with small 

gaps in the cases where the optimum solution was known. Simulated Annealing 

consistently improved (or tied) the solutions found by the HU heuristic. Although 

the solutions found by SA were slightly worse than those found by CPLEX, the 

time spent finding these solutions was significantly less. 

It seems clear that for large networks (over 10000 nodes) of MMRP, it is 

convenient to use the heuristic HU to find reasonable solutions, and if time 

permits, to use the Simulated Annealing approach described here to improve 

those solutions. CPLEX still could offer aid in analyzing the heuristic results by 

providing bounds for the optimal solution. 

For future work with SA for MMRP it would be important to consider more 

instances while testing and also to consider more variety on some parameters 

when defining the test instances e.g., the parameter c. The application of the 

neighborhood scheme used here could be considered for the application of SA to 

other MMRCO problems like the minmax regret spanning tree problem. 

References 

1. R. K. Ahyja, T. L. Magnanti, and J. B. Orlin. Network flows : theory, algorithms, 

and applications. Prentice Hall, Upper Saddle River, NJ, 1993. 

2. H. Aissi, C. Bazgan, and D. Vanderpooten. Minmax and minmax regret versions of 

combinatorial optimization problems: A survey. European Journal of Operational 

Research, 197(2):427–438,Sept.2009. 

3. E. Conde and a. Candia. Minimax regret spanning arborescences under uncertain 

costs. European Journal of Operational Research, 182(2):561–577,Oct.2007. 

4. E. W. Dijkstra. A note on two problems in connexion with graphs. Numerische 

Mathematik, 1(1):269–271, Dec. 1959. 

5. O. Karasan, M. Pinar, and H. Yaman. The robust shortest path problem with 

interval data, 2001. 

6. A. Kasperski. Discrete Optimization with Interval Data, volume228ofStudies 

in Fuzziness and Soft Computing. Springer Berlin Heidelberg, Berlin, Heidelberg, 

2008. 

7. S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing. 

Science, 220(4598):671–680,1983. 

8. R. Montemanni. A Benders decomposition approach for the robust spanning 

tree problem with interval data. European Journal of Operational Research, 

174(3):1479–1490, Nov. 2006. 

9. R. Montemanni, J. Barta, M. Mastrolilli, and L. M. Gambardella. The Robust 

Traveling Salesman Problem with Interval Data. Transportation Science, 

41(3):366–381, Aug. 2007. 

10. R. Montemanni and L. Gambardella. An exact algorithm for the robust shortest 

path problem with interval data. Computers & Operations Research, 31(10):1667– 

1680, Sept. 2004. 

64 

10

11. R. Montemanni and L. M. Gambardella. The robust shortest path problem with 

interval data via Benders decomposition. 4or, 3(4):315–328, Dec. 2005. 

12. Y. Nikulin. Simulated annealing algorithm for the robust spanning tree problem. 

Journal of Heuristics, 14(4):391–402,Oct.2007. 

13. J. Pereira and I. Averbakh. Exact and heuristic algorithms for the interval data 

robust assignment problem. Computers & Operations Research, 38(8):1153–1163, 

Aug. 2011. 

14. P. Sanders and D. Schultes. Engineering fast route planning algorithms. In 

C. Demetrescu, editor, Proceedings of the 6th international conference on Experimental 

algorithms,volume4525ofLecture Notes in Computer Science,pages23–36, 

Berlin, Heidelberg, 2007. Springer-Verlag. 

65 

11

Community of Scientist Optimization: 

Foraging and Competing for Research Resources 

Alfredo Milani 1,2 , Valentino Santucci 1 

1 Department of Mathematics and Computer Science 

University of Perugia, Italy 

2 Department of Computer Science 

Hong Kong Baptist University, Hong Kong 

milani@unipg.it, valentino.santucci@dmi.unipg.it 

Abstract. A novel optimization paradigm, called Community of Scientists 

Optimization (CoSO), is presented in this paper. The approach is 

based on the metaphor of the behaviour of a community of scientists 

pursuing for research results and foraging the funds needed to organize 

and develop research activities. The expressivity of the metaphor allows 

to devise a wide variety of strategies which can be applied to general 

optimization domains. Experiments on benchmark problems in numerical 

optimization shows the effectiveness of the approach. From a theoretical 

standpoint the CoSO’s framework subsumes other evolutionary 

optimization hybrid approaches and it also represents a great potential 

for application in non numerical and agent based domains. 


Computational solutions to hard optimization problems greatly benefit from the 

use of evolutionary techniques [1, 10, 11]. Indeed, they are very useful to tackle 

hard optimization problems such as the ones arising in continuous numerical [21] 

and combinatorial optimization domains [6, 7]. Evolutionary techniques made use 

of different metaphors, often inspired by biological [5, 14, 20] or physical phenomena 

[3, 15], in order to design heuristics and strategies which can be employed 

during the exploration of the solutions search space. Evolutionary algorithms, 

in general, can be characterized as approximate methods which are based on 

some notion of time, i.e. generations or iterations, and some notions of solutions 

items, i.e. individuals, particles, ants, etc., which evolve over the time giving 

further successive approximations of the optimal solution. Different population 

dynamics [18] characterize the different approaches, in general individuals can 

breed, die, move, change their characteristics or behavior, thus improving their 

local search performances, and eventually collectively approaching the optimal 

solution. 




66

In recent years, biological behaviors [5, 12] have been the constant inspirational 

metaphor for evolutionary algorithms. The underlying hypothesis is that 

the emerging behavior of a number of simple distributed agents (such as bees, 

ants, schooling fish, birds, etc.) can exhibit an high level of organization and 

high level valuable properties, such as converging to some optimal solution. The 

starting point of this paper can be posed as a somewhat provocative question: 

if simple organisms offer collective emergent optimization behavior why do not 

not take inspiration from the behavior of highly evolved organisms, i.e. humans? 

The idea of Community of Scientist Optimization (CoSO) originates from 

the investigation of the mechanism of the collective emerging behavior of a very 

interesting biological organization: the human scientific community. CoSO is 

based on the distributed optimization process which has produced and produces 

the highest results in the advancement of knowledge, i.e. the method used by the 

scientific community. In other words CoSO exploits the mechanisms that humans 

employ in order to organize, select and finance the scientific research, but, despite 

of the suggestive starting inspiration, our final purpose is to investigate the 

effectiveness and the actual applicability of those mechanisms to computational 

optimization processes and domains. 

The rest of the paper is organized as follows. Next section analyzes the main 

features of the modern scientific research process seen as a collective emergent 

behavior. Section 3 introduces the CoSO approach and the evolutionary foraging 

optimization model in the framework of numerical optimization. Section 4 discusses 

experiments of a CoSO implementation with some benchmark problems 

and its comparison with PSO performances [12]. Finally, in Section 6, a discussion 

of the theoretical aspects of CoSO, its relationships with other foraging and 

evolutionary approaches and a description of possible future lines of research 

concludes the paper. 

2 Research Process as Emerging Behaviour 

2.1 Science from Patronage to Self Organization 

Scientific research, since the remote times of great Greek mathematicians, scientists 

and philosophers, until Middle Age and Renaissance has been mostly 

relying on the goodwill of a ”mecenate” or patron. The mecenate was usually a 

prince, a king or a very rich person, who sponsored the activity of a ”recognized 

scientist”, often by paying him/her a regular salary, and admitting him/her to 

its court in exchange of a different variety of services from art performances to 

scientific talks (this was the case, for example, of Leonardo Da Vinci with the 

court of Ludwig the Moor in Milan, and of the Greek scientist Archimede, who 

was applying his scientific findings to weapon design for the Greek navy against 

the Romans). The ability of the mecenate in selecting the right scientists and, 

conversely, the ability of the scientist to cope with the personal idiosyncrasies of 

the patron, other than the court social environment were crucial in the development 

of science of these early times. In other cases the scientists were people 

who lived in a favorable situation which allowed them a lot of thinking time: rich 

2 

67

people themselves, monks, employees in obscure offices (remember that Einstein 

himself was an employee in a Swiss patent office at the time of his relativity 

papers), etc. 

It was with the first scientific societies of 1600 and eventually with the advent 

of the industrial revolution that the process of scientific production started to 

boost, with the huge production of the XIX century and the still never ceasing 

amount of high valuable findings and results continuously produced. 

It must be noted that the most relevant change which took place in scientific 

production process was the moving from a top-down driven process, based on 

the arbitrary graceful judgment of the patron, into a distributed self organized 

system which self regulates its own expansion and evaluation criteria. Relevant 

elements in the modern scientific research are the role of scientific journals and 

committees in selecting papers to publish in the journals, and the more objective 

criteria adopted by government agencies to assign research funds. Funds 

assignment criteria are often based on publications and previous results of the 

proponents, then it is the scientific community itself, which indirectly assesses 

the projects to finance. On the other hand, funds also represent a foraging mechanism 

which establishes priorities in the use of the resources consumed by the 

individuals. 

2.2 Features and Emerging Behavior of the Research Process 

There are some features of the modern process of scientific production which 

are worth to be briefly discussed and which will be later integrated in the CoSO 

model: 

– Funds: scientists need funds to do their research, i.e. to hire new researchers 

or to buy tools, labs, books, etc. 

– Journals: journals are collections of results which acts as communication 

channels among scientists. Scientists read journals: to take inspiration from 

previous researches, to avoid to rediscover already known results and to 

improve previous results. 

– Selection/Publication: the scientific production is self selected. Reputable 

scientists select the papers that other scientists want to publish, i.e. to draw 

to public attention. The selection is held by mean of (hopefully) objective 

criteria, such as considering, if the proposed results improve the published 

ones. 

– Results: scientific results, i.e. findings which are worth to be published or 

often which are not improving the previous knowledge. Sometimes the information 

contained in scientific publication is negative, i.e. a work informs 

the community that a certain hypothesis is false or a certain research line 

is not promising, although often misconceived, negative information is also 

very useful for the progress of knowledge. 

– Research projects: research investigators propose research projects, i.e. programs 

containing description of the research area and plans of research detailing 

which resources are needed and how to employ them. A typical research 

3 

68

management strategy dilemma is deciding either to hire new researchers or 

to devote existing researchers to the project. 

– Fund assignments criteria and policies: the funds are assigned upon projects 

and are based on the scientific results of the proponents, i.e. the scientific 

groups with the best results are more likely to obtain funds. Moreover governmental 

agencies also guarantee that additional (and possibly conflicting) 

criteria are met, such as priorizing strategic topics and ensuring topics diversity 

(for instance the European Union funds a limited number of outsiders, 

challenging a certain amount of high risk projects per year; as a required 

feature the projects must concern new and diverse areas). 

The scientific research process is probably the most notable example of collective 

intelligence where the valuable emergent behavior is represented by the 

advancement of knowledge. In the scientific community, each researcher interacts 

with the others by reading journals and produces new results either by 

exploring new directions of research or by deepening existing lines of research. 

In both cases if the results improve the previous ones, the new ideas can be 

published and spread in the community thus representing an inspiration for further 

researches. Successful researches will more likely lead to obtain funds to 

continue the research, while non successful ones mean no funds and the end 

of the research activity. Funds are the food of modern scientists, competition 

for research funds introduces a foraging mechanism which indirectly acts as a 

selection mechanism. Once funds have been obtained, the successful proponent 

researcher has to decide a strategy of research fund management, since funds 

can be used in quite different ways: he can hire new researchers or can just keep 

doing research by himself for a longer time. Moreover decisions must be taken 

about research directions: where and how the new and old researchers should 

explore. Fund policies are also important as a global regulating mechanism: governmental 

agencies can establish that certain areas are strategic, or that certain 

areas of research cannot go below a minimal amount of resources, or that too 

many projects insist on the same area. Policies which aim at topics diversity 

can be seen as general heuristics which guarantee an equilibrated advancement 

of knowledge, and redistribute the risk of failure when research projects are too 

much dense in an area. 

In the next section it will be introduced CoSO: an evolutionary foraging 

approach based on the described features of the collective research process, where 

the emergent behaviour will be suitable for numerical optimization problems. 

3 Community of Scientist Optimization: an Evolutionary 

Foraging Optimization Model 

CoSO is an evolutionary foraging optimization algorithm whose key features are 

inspired from the metaphor of the scientific research process taking place in a 

community of scientists. 

4 

69

Let a multidimensional numerical optimization problem represented by an 

objective function f : Θ → R to be minimized (maximized) in the space of feasible 

solution Θ ⊆ R d (where d is a positive integer indicating the dimensionality 

of the problem). 

CoSO is composed by a dynamic set of researchers R = {r1,...,rn} that 

share one or more journals Jj and compete for publishing their best results, i.e. 

the best points visited in Θ with respect to the cost function f. Researchers 

use funds to organize research by also hiring new researchers to help them. At 

each iteration a researcher consumes one unit of funds, thus the researchers can 

possibly die by fund exhaustion. The activities of searching, publishing, fund 

distribution, and fund investment are synchronized by discrete time instants, 

also called iterations. As the iterations progress the journals will reflect the advancement 

of knowledge on function f eventually converging toward the optimal 

minimum (maximum) value. 

3.1 Journals 

CoSO journals {Jj} are a set of data structures which register the significant 

progress of exploration done by the researchers over the time. Each journal Jj 

is statically characterized by: 

– a journal length kj, i.e. the maximum number of results which can be published 

in a journal issue, 

– a set of readers/authors Rj ⊆ R, 

– a sequence of journal issues {Jj,t}, one issue for each discrete time instant 

t. 

A journal issue Jj,t is a list of at most kj papers, i.e. pairs (f(xi,u),xi,u) 

(where i ranges on the researchers set Rj and u ∈ [0,t]∩N) ordered with respect 

to f(xi,u), which contains (at most) the best kj results obtained until iteration 

t by some journal readers. Researchers will refer to the latest journal issues they 

read in order to decide their direction of research. Researchers will also publish 

in the journals they know. In this sense journals act as communication channels 

among researchers. 

Finally, note that papers submitted at time t to a journal Jj are published 

in the journal issue Jj,t+1 if and only if they result within the best kj submitted 

of all the times, i.e. they should improve the best previously published results 

and should be at most kj. 

3.2 Researchers 

Researchers represent the active search elements of CoSO. At each time instant 

t, a researcher ri is associated with a certain set of properties: 

– xi,t, aresearch position in the multidimensional space Θ, 

– vi,t, a movement vector indicating the direction of research with respect to 

the previous position at time step t − 1, 

5 

70

Table 2. Experiments Results 

Function 

Qm 

CoSO 

C PcQm PSO 

C Pc 

Sphere 8269 8269 1.00 8024 8024 1.00 

Rosenbrock 1339203 1339203 1.00 — — 0.00 

Ackley 340734 299846 0.88 46000 31280 0.68 

Rastrigin 67660 67660 1.00 — — 0.00 

Griewank 18460 17722 0.96 136852 120430 0.88 

optimization problems, while it shows a similar behavior for simple unimodal 

problems. 

Fig. 1. Convergence graph on Sphere Function 

5 Theoretical aspects of CoSO and Related Works 

CoSO shares many elements with PSO [12, 17], and more in general it subsumes 

this latter one 5 . On the other hand many important differences exists, first of all 

5 A PSO with all particles connected can be modeled by CoSO with a single journal of 

dimension 1 and a fund distribution strategy reassigning at each iteration one unit 

of fund to each researcher. In this way no researchers are created or deleted. 


76

Fig. 2. Convergence graph on Griewank Function 

the introduction of inheritance, foraging and selection mechanisms completely 

absent in the pure PSO approaches. 

An easy parallel can be done between the notion of researchers and PSO 

particles, since both are characterized by changing their position in the search 

space. On the other hand the use of journals in CoSO combined with foraging 

allows a more articulated dynamic. 

It is interesting to recognize some basic mechanisms of foraging: survival and 

indirect communications (see for instance indirect communication thru pheromone 

in ACO [5]). Journals acts as communication channels that researchers use to 

indirectly exchange information about where areas with good results are, i.e. 

where the food for surviving is. Funds are computational resources which are 

guaranteed to the best computing entities, i.e. the best researchers. Connecting 

the foraging to the performance, by the funds distribution strategy and allowing 

communication thru channels/journals makes possible to obtain a collective 

emergent behavior consisting in optimizing the performance, i.e. a collective 

converging behavior. 

CoSO also uses elements from classical Genetic Algorithms (GAs) [8, 16, 20]. 

The foraging mechanism induced by the notion of research funds introduces a 

selection mechanism which resembles the genetic survival of the fittest strategy 

[16]. In other words, CoSO implements a kind of ”publish or perish” rule which 

can be restated in a ”get good results then get funds or perish” rule. The inheritance 

methods used in CoSO can also be certainly related to GAs, whereas 

the foraging selection mechanism allows to promote the best researcher features. 

12 

77

A remarkable difference is that self-reproduction is used in CoSO, i.e. hired researchers 

tend to reflect most of the features of their creator, or to evolve them 

by mutation (see the perturbation of funds management strategy si). With this 

respect CoSO can be related also by bacteria foraging hybrid algorithms as [2, 

13, 14]. 

Other hybrid approaches can be related to CoSO, for instance [19] which 

introduces diversity in a particle swarm optimizer, or the recents [18, 22] where 

population dynamics and genetic operator are used in the PSO framework. 

Although the many connections with proposed hybrid approaches, CoSO has 

the remarkable merit of discovering some similar mechanisms within the unique 

coherent metaphor of scientific production process. 

Another interesting aspect of CoSO is that, despite of its application in the 

domain of numerical optimization, it can be easily extended to other areas and 

used as a framework for managing distributed agents in problems suitable to 

be solved by collective emergent behavior. Consider, for instance, applications 

domains where agents (i.e. retrieval agents crawling for ”interesting” documents, 

planning agents, web services, etc.) produce non-numerical solution instances or 

services, which can be compared and shared thru journals. The agents can have, 

in general, different computation capabilities/abilities which will be prized with 

different computational resources by the foraging mechanism. 

6 Conclusion 

CoSO is an innovative evolutionary approach to computational optimization 

based on the mechanisms used by the scientific community to manage the process 

of scientific production. Among the main relevant features of CoSO are the use of 

a foraging mechanism, i.e. competition for research funds, which indirectly acts 

as a selection mechanism, a self regulating ”outsider” strategy which ensures 

maintain diversity in the research topics, and evolving research management 

strategies for hiring new researchers which dynamically adapt. 

Despite of the many points of contacts with recent hybrid PSO and foraging 

proposals [2, 14, 18], CoSO metaphor offers a single framework where foraging, 

competition, communication, and search dynamics lead to a collective emergent 

behavior which results in an efficient optimization process. 

Experimental results in numerical optimization problems are encouraging 

since CoSO outperforms classical PSO [4, 17] in difficult benchmark problems. 

Future lines of research will regards the exploration of different criteria for 

funds assignments (i.e. taking into account of the historical performance of researchers), 

different evolution mechanism for journal relevance distribution and 

for funds management strategy (which currently do not evolve in the single researcher 

but in its outbreeds). An interesting line of research will be also the 

experimentation of CoSO as framework for organizing the collective behavior of 

distributed set of agents in the area of non-numerical problems. 

13 

78

References 

1. Bäck T. (1996) Evolutionary Algorithms in Theory and Practice, Oxford, NY 

2. Biswas A., Dasgupta S., Das S., Abraham A. (2007) Synergy of PSO and Bacterial 

Foraging Optimization – A Comparative Study on Numerical Benchmarks, In: 

Innovations in Hybrid Intelligent Systems, ASC 44, pp. 255–263, Springer 

3. Cerny V. (1985) A Thermodynamical Approach to the Travelling Salesman Problem: 

an Efficient Simulation Algorithm. In: Journal of Optimization Theory and 

Applications, 45:41–51 

4. Clerc M., Kennedy J. (2002) The Particle Swarm-Explosion, Stability, and Convergence 

in a Multidimensional Complex Space. In: IEEE Transactions on Evolutionary 

Computation 6(1):58–73 

5. Colorni A., Dorigo M., Maniezzo V. (1991) Distributed Optimization by Ant 

Colonies. In: Proceedings of First European Conference on Artificial Life, Elsevier 

Publishing, pp. 134–142 

6. Dorigo M., Gambardella L. M. (1997) Ant Colony System : A Cooperative Learning 

Approach to the Traveling Salesman Problem. In: IEEE Transactions on Evolutionary 

Computation, 1(1):53–66 

7. Dorigo M., Maniezzo V., Colorni A. (1996) Ant System: Optimization by a Colony 

of Cooperating Agents. In: IEEE Transactions on Systems, Man, and Cybernetics 

– Part B, 26(1):29–41 

8. Eiben A. E., Raué P. E., Ruttkay Z. S. (1994) Genetic Algorithms with Multi- 

Parent Recombination. In: Proceedings of Third Conference on Parallel Problem 

Solving from Nature, pp. 78–87 

9. Feoktistov V. (2006) Differential Evolution. In search of solutions. Springer 

10. Hingston P. F., Barone L. C., Michalewicz Z. (2008) Design by Evolution: Advances 

in Evolutionary Design. Springer 

11. Holland J. H. (1975) Adaptation in Natural and Artificial Systems. University of 

Michigan Press, Ann Arbor 

12. Kennedy J., Eberarth R. (1995) Particle Swarm Optimization. In: Proceedings of 

IEEE Conference on Neural Networks, IEEE Press, pp. 1942–1948 

13. Kim D. H., Abraham A., Cho J. H. (2007) A Hybrid Genetic Algorithm and 

Bacterial Foraging Approach for Global Optimization. In: Information Sciences, 

177(18):3918–3937 

14. Kim D. H., Cho C. H. (2005) Bacterial Foraging Based Neural Network Fuzzy 

Learning. In: Proceedings of Indian International Conference on Artificial Ingelligence, 

pp. 2030–2036 

15. Kirkpatrick S., Gelatt C. D., Vecchi M. P. (1983) Optimization by Simulated 

Annealing. In: Science New Series 220(4598):671–680 

16. Michalewicz Z. (1999) Genetic Algorithms + Data Structures = Evolution Programs. 

Springer-Verlag 

17. Poli R., Kennedy J., Blackwell T. (2007) Particle Swarm Optimization. An 

overview. In: Swarm Intelligence 1(1):33–57 

18. Qi K., Lei W., Qi-Di W. (2008) A Novel Ecological Particle Swarm Optimization 

Algorithm and its Population Dynamics Analysis. In: Applied Mathematics and 

Computation 205(1):61–72 

19. Riget J., Vesterstrøm J. S. (2002) A Diversity-Guided Particle Swarm Optimizer 

– the ARPSO. In: EVALife Technical Report no. 2002-02 

20. Schmitt L. M. (2001) Theory of Genetic Algorithms. In: Theoretical Computer 

Science, 259:1–61 

14 

79

21. Storn R., Price K. (1997) Differential evolution – A Simple and Efficient Heuristic 

for Global Optimization over Continuous Spaces. In: Journal of Global Optimization, 

11(4):341–359 

22. Yanjiang M., Zhihua C., Jianchao Z. (2009) Dynamic Population-Based Particle 

Swarm Optimization Combined with Crossover Operator. In: Proceedings of Ninth 

International Conference on Hybrid Intelligent Systems, vol. 1, pp. 399–404 

15 

80

An Empirical Study of Learning and Forgetting 

Constraints 

Ian P. Gent, Ian Miguel and Neil C.A. Moore 

{ipg,ianm,ncam}@cs.st-andrews.ac.uk 

School of Computer Science, University of St Andrews, St Andrews, Scotland, UK. 

Abstract. Conflict-driven constraint learning provides big gains on many 

CSP and SAT problems. However, time and space costs to propagate 

the learned constraints can grow very quickly, so constraints are often 

discarded (forgotten) to reduce overhead. We conduct a major empirical 

investigation into the overheads introduced by unbounded constraint 

learning in CSP. To the best of our knowledge, this is the first published 

study in either CSP or SAT. We obtain two significant results. 

The first is that a small percentage of learnt constraints do most propagation. 

While this is conventional wisdom, it has not previously been 

the subject of empirical study. Second, we show that even constraints 

that do no effective propagation can incur significant time overheads. 

Finally, by implementing forgetting, we confirm that it can significantly 

improve the performance of modern learning CSP solvers, contradicting 

some previous research. 


In this paper, we conduct an empirical investigation into the overheads introduced 

by unbounded constraint learning in CSP. To the best of our knowledge, 

this is the first published study in either CSP or SAT. We obtain two primary 

results. The first is that a small percentage of learnt constraints do most propagation. 

Although this is conventional wisdom, no published study exists. Second, 

we show that even constraints that do no effective propagation can incur significant 

time overheads. This clarifies conventional wisdom which suggests that 

watched literal propagators can have lower overheads when not in use. Finally, we 

show that forgetting can improve performance of modern learning CSP solvers 

by exhibiting a working implementation, contradicting some previous published 

research. 

2 Background: Learning and Forgetting in SAT and CSP 

Nogood learning is an important CSP search technique. In brief, when the solver 

reaches a dead-end, a new constraint is added to rule out future branches that 




81

are effective in learning solvers is consistent with the belief: if few constraints 

dominate collectively most can be thrown away without harming search. However 

constraint forgetting in some form is a positive necessity to avoid running 

out of memory, so it would still benefit the solver even if individual constraints 

were comparably effective. Irrespective, the effect must be quantified, and understanding 

the effect quantitatively might help to design effective forgetting 

strategies. 

Procedure Measuring effectiveness of an individual constraint is more difficult 

in a learning solver than in a standard backtracking solver, because the learning 

procedure combines constraints together. Hence a constraint may do little propagation 

itself, but constraints derived from it during the learning process may 

do a lot. Hence the influence of a constraint may be wide. This is a subtle issue 

and we have not attempted to measure it. Rather we will be measuring only the 

direct effects of individual constraints, and not their “influence”. 

Therefore, in this section, the number of unit propagations is used as a measure 

of the effectiveness of a learnt constraint. This choice is not immediate, so 

we will now discuss why it was chosen. The problem is that propagations are 

not necessarily beneficial if they remove values but do not contribute to domain 

wipeouts or other failures. To get around this issue, as part of its clause forgetting 

system (see §4.1) minisat [6] measures the number of times a constraint 

has been identified as part of the reason for a failure. Hence, we did consider 

using the number of propagations that lead to failure as a measure of constraint 

effectiveness, rather than raw number of propagations. However, over our 2050 

instances and 566,059 learned constraints, the correlation coefficient between 

propagation count and count of involvement in conflicts is 0.96. In other words 

each propagation is roughly equally likely to be involved in a conflict. Hence 

the following results should apply almost equally to propagations resulting in 

failure. The advantage of using the total number of propagations is that it is 

more easily defined and less coupled with learning. 

For efficiency reasons, solvers do not collect this data by default. In order 

to carry out these experiments our solver was amended to print out a short 

message whenever a constraint propagated, giving the unique constraint number 

and the node at which the propagation occurred. These data were then analysed 

externally with the aid of a statistical package. Although this slows the solver 

down, the experiment is fair because counts are not affected. 

Note that the later a constraint is posted, the less time is has to propagate. 

Hence the number of raw propagations carried out by each constraint are not 

directly comparable. To get around this, only constraints learned during the first 

50% of nodes approximately are included, and for each constraint the number 

of propagations are counted only over the following 50% of nodes, so that every 

count is over the same number of nodes. For example, if the problem is solved 

in 9999 nodes, constraints learned between nodes 1 and 5000 are included, and 

the constraint learned at node 278 is counted from nodes 278 to 5277. 

4 

84

Cum. sum of UPs 

0 5000 10000 15000 20000 25000 

0 

20 

40 

60 

Percentile of constraint 

Fig. 1. What proportion of constraints are responsible for what propagation? – single 

instance 

Results and analysis For instance latinSquare-dg-8 all.xml.minion we 

exhibit a graph that we will later show is representative of other instances. 

The upper curve 2 in Figure 1 shows what proportion of the best constraints 

are responsible for what proportion of all unit propagations (UPs). By “best” 

we mean doing the most propagations. Each point is an individual constraint 

and the constraints are sorted by increasing propagation count moving from the 

left to the right of the x-axis. The x-axis is the percentile of the constraint’s 

propagation. The y-axis is the number of propagations accounted for by that 

constraint and those with a lower percentile. For example, the circled point on 

the x-axis is the median (50th percentile) constraint by propagation count: it 

is the 5223th constraint, out of 10446. The total propagation count for all 5223 

constraints is exactly 5223 [sic] out of a total of 26220 for all constraints, i.e. 20% 

of the total. Hence the bottom 50% of constraints account for just 20% of all 

propagation. The slope is shallow until the 80th percentile constraint (marked 

by a small square), after which it steepens dramatically. Hence the top 20% of 

constraints do a lot more work than the rest. This agrees with the hypothesis 

that a minority of constraints do most propagation. 

In §2 we noted that each constraint is guaranteed to propagate at least once. 

This first propagation has the effect of a right branch, so does not contribute 

effectively since the solver would have done this anyway. Hence we now report 

results with these ineffective propagations deleted. In the black (lower) curve 

in Figure 1 the same graph is shown with 1 subtracted from the propagation 

count of each constraint. Here the curve is zero until the 80% percentile, meaning 

that the worst 80% of constraints contribute no additional propagation after the 

right branch, i.e. just one propagation each: just 20% of constraints do all useful 

propagation and 10% do almost all. 

The previous results focus on a specific instance, so we will now expand 

analysis to all 949 instances from the test set that cannot be solved within 1000 

nodes of search. This is done to ensure that a trend has a chance to establish: to 

2 the points are close enough together to appear as a single curve, rather than distinct 

points 

5 

85 

! 

80 

100

P Min. 1st Qu. Median Mean 3rd Qu. Max. 

1% 0.01 0.01 0.01 0.04 0.03 2.04 

5% 0.01 0.02 0.04 0.09 0.09 2.04 

10% 0.01 0.05 0.08 0.19 0.18 3.64 

15% 0.01 0.09 0.13 0.31 0.31 3.91 

20% 0.01 0.12 0.19 0.46 0.47 5.46 

25% 0.01 0.17 0.27 0.64 0.68 6.80 

30% 0.01 0.23 0.35 0.86 0.92 8.24 

35% 0.01 0.30 0.46 1.11 1.22 9.69 

40% 0.01 0.37 0.58 1.40 1.58 11.13 

45% 0.01 0.47 0.72 1.73 1.99 12.57 

50% 0.01 0.57 0.86 2.11 2.51 14.02 

55% 0.02 0.67 1.00 2.56 3.22 16.33 

60% 0.02 0.78 1.18 3.07 3.93 18.76 

65% 0.02 0.89 1.34 3.65 4.86 21.27 

70% 0.02 0.99 1.51 4.34 6.09 24.39 

75% 0.02 1.09 1.70 5.15 7.56 27.51 

80% 0.02 1.19 1.89 6.15 9.50 30.83 

85% 0.02 1.32 2.08 7.40 11.75 37.07 

90% 0.02 1.44 2.27 9.11 15.37 43.32 

95% 0.02 1.55 2.48 11.68 21.88 50.00 

100% 0.02 1.65 2.71 16.03 37.06 69.89 

Table 1. What proportion of constraints are responsible for what propagation? – all 

instances 

analyse only a few constraints might be less meaningful. In Table 1 for each chosen 

percentage P , we give what percentage of the best constraints are needed to 

account for P % of overall non-branching propagation 3 . These results show that 

usually a small proportion of the best constraints perform a disproportionate 

amount of propagation. For example 10% of all propagation is performed by a 

median of 0.08% and maximum of 3.64% of constraints, and 100% by a median 

of 2.71% and a maximum of 69.89%. Hence the behaviour described above for a 

single benchmark is robust over many instances: the best few constraints overwhelmingly 

perform most non-branching propagation. If anything, the above 

sample instance understates the effect, since it required about 20% instead of 

the median of 2.71% of constraints to do all propagations. 

Conclusion We have shown empirically that the best constraints are responsible 

for much of the propagation and thus search space reduction. 

3 It may seem anomalous that some entries exceed P %, since the best P % constraints 

must do at least P % of propagations. This apparent anomaly is because there may 

be no integer number of constraints doing P % of propagation, so it is necessary to 

overcount. 

6 

86

3.3 Clauses have high time as well as space costs 

Unit propagation by watched literals [19] is designed to reduce the amount of 

time spent propagating infrequently propagating constraints, by the possibility 

of watches migrating to inactive literals that do not trigger and cost nothing 

to propagate. Before describing the experiment, we will first briefly outline how 

watched literal propagation works. 

Unit propagation (UP) is a way of propagating clauses. Watched literals are 

an efficient implementation of UP, first described in [19]. The idea is to watch 

a pair of variables, that are not set to false. Provided that such variables exist, 

a clause must be satisfiable, and unit propagation needn’t happen yet. Suppose 

that one of these variables is set to false: if another non-false variable can be 

found then the propagation watches it instead, otherwise the single non-false 

variable has to be unit propagated to true immediately to avoid the constraint 

being unsatisfied. The empirical evidence suggests that since the propagator only 

cares about assignments to two variables it is efficient compared to other unit 

propagators that watch all assignments (e.g. ones that count false assignments). 

If the watched variables are set to 1 early in search then the clause will essentially 

be zero cost until the solver backtracks beyond that point, because it will never 

be triggered on those variables. 

Hence, perhaps weakly propagating constraints do not cost much time, if 

space is available to store them, since there is a possibility of infrequently propagating 

constraints doing little work? Hence the next question is: do constraints 

which do not propagate a lot cost significant time as well as space? 

Procedure The minimum amount of time to process a single domain event 

with a watched literal propagator can be on the order of a handful of machine 

instructions, taking nanoseconds to run, during which time the system clock may 

not tick. Hence, to obtain nano-scale timings, the solver keeps a running total of 

the number of processor clock ticks as recorded by the RDTSC register specific 

to Intel processors [13]. Each of these occupies 1/(2.66 × 10 9 ) seconds, since we 

used a 2.66 GHz Xeon E5430. The overhead of collecting data is very low, taking 

only one assembly instruction to get the number, and a few more cycles to add 

it to the running total. 

At the end of search, all the cycle counts are printed out and analysed externally 

with the aid of a statistical package. 

Results and analysis How does time spent correlate with unit propagations 

performed? Figure 2 is a scatterplot for the single instance used in §3.2. Each 

point represents a single constraint. The x-axis gives the number of unit propagations 

(including the right-branching initial one), and the y-axis the total number 

of processor cycles used to propagate it during the entire search. First, and unsurprisingly, 

as an individual constraint propagates more, it often requires more 

time to do so. What may be surprising is that the worst case for constraints 

is roughly constant, and independent of the number of propagations. That is, 

7 

87

Time (cycles) 

1e+08 

1e+07 

1e+06 

1e+05 

1 

10 

Number of UPs 

Fig. 2. How much time does propagation take? 

constraints which do no effective propagation can take a similar amount of time 

to propagate as constraints which propagate almost 1000 times. For this sample 

instance, 74% of propagation time is occupied with constraints that never propagate 

again after the first time. This suggests that learnt constraints can lead to 

significant time overhead without doing any useful propagation. 

Table 2 extends the study to the 1,923 instances out of the full set of 2,028 

where at least one constraint is learned. Each row is a chosen percentage R% 

of the total non-branching propagations, and the columns are summary statistics 

for what % of the overall propagation time the best constraints take to 

achieve R% of all propagation. A constraint is “better” than another if it does 

more propagations per second of time spent propagating. For example, the third 

row says that the median over all instances is that 10% of all non-branching 

propagation can be done in just 0.62% of the time taken by the best available 

constraints. Using the most efficient constraints, all non-branching propagation 

can be achieved in a mean of less than a quarter of the time of using all constraints. 

All other time spent is completely wasted since it leads to no effective 

propagation. 

Conclusion The results on all instances confirm the result from the single 

instance, and shows that learnt constraints which do no propagation contribute 

significantly to the time overhead of the solver. 

The design of watched literal propagators make it possible that constraints 

that do not propagate will cost the solver very little in time. This is because the 

watches could potentially migrate to ”silent literals” that do not trigger often. 

Hence, we feel it significant that we have shown that this is often not the case, 

and useless clauses can be very costly on an individual basis. 

4 Clause forgetting 

The above results suggest that, if picked carefully, the solver can often remove 

constraints to save a lot of time at only a small cost in search size. As described 

in §2, this is a well known and well used technique in both CSP and SAT. 

8 

88 

100 

1000

R Min. 1st Qu. Median Mean 3rd Qu. Max. 

1% 0.00 0.02 0.17 6.12 3.32 100.00 

5% 0.00 0.05 0.33 6.17 3.32 100.00 

10% 0.00 0.11 0.62 6.30 3.52 100.00 

15% 0.00 0.18 0.95 6.50 3.82 100.00 

20% 0.00 0.26 1.38 6.79 4.38 100.00 

25% 0.00 0.35 1.88 7.12 5.11 100.00 

30% 0.00 0.45 2.31 7.52 5.82 100.00 

35% 0.00 0.54 2.85 8.07 6.82 100.00 

40% 0.00 0.63 3.38 8.46 7.75 100.00 

45% 0.00 0.71 4.03 9.01 9.10 100.00 

50% 0.00 0.79 4.54 9.46 9.97 100.00 

55% 0.00 0.91 5.38 10.50 11.67 100.00 

60% 0.00 1.04 6.08 11.16 13.32 100.00 

65% 0.00 1.20 6.87 11.97 15.10 100.00 

70% 0.00 1.38 7.99 13.06 17.73 100.00 

75% 0.00 1.58 9.06 14.00 19.62 100.00 

80% 0.00 1.78 10.07 15.27 22.59 100.00 

85% 0.00 2.03 11.35 16.78 25.91 100.00 

90% 0.00 2.29 12.56 18.55 30.03 100.00 

95% 0.00 2.59 14.31 20.76 34.05 100.00 

100% 0.00 2.89 15.23 24.01 41.02 100.00 

Table 2. How much time does propagation take?–all instances 

Indeed Katsirelos and Bacchus have implemented relevance bounded learning 

for a g-learning solver in [16]. They report poor results showing that relevance 

bounding with k = 3 leads to more timeouts and slower solution time. However 

a very small number of similar problems are tried so results are inconclusive. 

In this section, we try a range of well-known existing strategies for forgetting 

learned constraints. 

4.1 Context 

For size-bounded and relevance-bounded learning [5, 8] the solver must respectively 

not learn the constraint if it has more than k literals in it or remove the 

constraint once k literals become unset for the first time. Both have been applied 

successfully to the CSP in the past, but using a s-learning solver. Since 

they were last tried, algorithms for propagating disjunctions have progressed 

significantly with the introduction of watched literal propagation [19], meaning 

that learned constraints are faster to propagate. Hence the techniques may no 

longer be useful and, if they are useful, the optimal choice of parameters will 

probably have changed as long clauses become less burdensome. Also, the learning 

algorithms applied have fundamentally changed with the advent of g-nogood 

learning. Katsirelos has shown [15] that the properties of clauses change as a 

result of g-learning, for example the average clause length can reduce. This also 

9 

89

motivates the re-evaluation of existing forgetting strategies. Finally, theoretical 

results [14, 3] from SAT show that there is an exponential separation between 

solvers using size-bounded learning and learning unrestricted on length, meaning 

that the former may need exponentially more search than the latter on particular 

problems. This means that size-bounded learning is theoretically discredited, 

but it remains to see how it performs in practice. 

Recently there have been a collection of new forgetting heuristics in SAT 

solvers, which are based on activity. Using activity-based heuristics the clauses 

that are least used for conflict analysis are removed when the solver needs to free 

space to learn new clauses. As well as guessing which clauses are least beneficial, 

new strategies also decide how many to keep. This is a difficult trade off, because 

keeping more increases propagation time, but throwing them away reduces 

inference power. The best choice is problem dependent. We will experiment on 

what we will call the minisat strategy after the solver it originated in [6]. 

The strategy has 3 main components: 

activity each clause has an activity score, which is incremented by 1 each time 

it is used as an explanation in the firstUIP procedure 

decay periodically, activities are reduced, so that clauses that have been active 

recently are prioritised 

forgetting just before the scores are decayed each time, half of all constraints 

are removed with a couple of exceptions: 

– those that have unit propagated in the current branch of search are kept, 

– those with scores below a fixed threshold are removed first even if the 

target of removing half has already been reached, and 

– binary and unary clauses are always kept. 

In order to implement this algorithm the frequency of decay & forgetting and 

the divisor for decay must be supplied. The threshold below which all clauses 

are removed is simply 1 over the size of the clause database. 

4.2 Experimental evaluation 

We will describe an experiment to test the effectiveness of the forgetting strategies 

from the literature described above. 

Implementing constraint forgetting As mentioned in §3.2 each learned constraint 

propagates at least once and this is necessary for the completeness of 

g-learning. Hence when implementing bounded learning, our solver propagates 

it once anyway even if the constraint is going to be discarded immediately. 

In our implementation, currently unit clauses, a.k.a. locked clauses 4 , can be 

slated for deletion meaning that they are not propagated any more, but the 

memory cannot be freed until it is no longer unit. In our solver, restarts are not 

4 nomenclature due to [6] 

10 

90

used. It is possible to prove that deleting clauses is safe (i.e. the solver is still 

complete), provided that they are not locked. 

For k relevance bounding, recall that the solver must remove the constraint 

when k literals become unset for the first time. Our implementation works as 

follows: when the constraint is created the literals are sorted by descending 

depth at which they became false 5 and the k’th depth is selected. When the 

solver backtracks beyond depth k, exactly k literals will have become unset 

and the constraint can be deleted. There is little runtime overhead using this 

implementation. 

The implementation of size-bounded learning and the minisat strategy follow 

straightforwardly from the definitions given above. 

Experimental methodology Each of the 2028 instances was executed four 

times with a 10 minute timeout, over 3 Linux machines each with 2 Intel Xeon 

cores at 2.4 GHz and 2GB of memory each, running kernel version 2.6.18 SMP. 

Parameters to each run were identical, and the minimum time for each is used 

in the analysis, in order to approximate the run time in perfect conditions (i.e. 

with no system noise) as closely as possible. Each instance was run on its own 

core, each with 1GB of memory. Minion was compiled statically (-static) using 

g++ version 4.4.3 with flag -O3. 

Beauty contest We tried each strategy with a wide range of parameters and in 

Table 3 report a selection of the best parameters for each. The best parameters 

were found by testing a wide interval of possible parameters, and finding a local 

optimum. Close to the local optimum more parameters were tried to locate the 

best single value where possible (e.g. for discrete parameters). minion with no 

learning at all is also included in the comparison under name “stock.undefined”. 

In the table, the strategies are abbreviated to name.parameter, except minisat 

which is abbreviated to minisat.interval.decayfactor. 

The “Beauty Contest” columns give both the number of instances solved and 

the total amount of time spent. Hence an instance that times out does not count 

towards instances solved and costs 600 seconds. The best strategy is that which 

solved the most instances, taking into account overall time to break ties. In 

the table the best strategies are listed first. Finally first and third quartiles and 

median nodes per second are given. These statistics show the increase or decrease 

in search speed. A solver with forgetting should have a higher search speed 

because it has fewer constraints to propagate. The ‘Search measures’ columns 

give measures of what effect each strategy has compared to unbounded learning. 

This is a measure of how effective search is compared to unbounded learning, as 

opposed to how fast. The columns are as follows: 

Instances means the number of instances the variant and unbounded both 

complete. The number of instances being compared in the following two 

statistics. 

5 this information is available from the learning subsystem 


91

speedup with rel6 vs stock 

10000.0 

1000.0 

100.0 

10.0 

1.0 

0.1 

0.1 1.0 10.0 100.0 1000.0 

stock solve time 

(a) No learning 

speedup with rel6 versus unbounded 

100.0 

10.0 

1.0 

0.1 

0.05 0.50 5.00 50.00 500.00 

unbounded solve time 

(b) Unbounded learning 

Fig. 3. Graph comparing the best strategy (relevance-bounded k =6)againstother 

strategies 

Nodes inc. means what factor additional nodes the strategy needs on those 

instances. The smaller the number 6 , the less propagation is lost as a result 

of forgetting. 

Speedup means speedup factor, e.g. speedup factor of 2 means that the strategy 

takes half the time to solve the all instances together. Note that because only 

instances completed by both are included, there are no timeouts in the total. 

The aim is to maximise nodes per second, while keeping the node increase 

as little as possible. 

Analysis of results In these results, most of the strategies for forgetting clauses 

improve over unbounded learning (none.undefined in Table 3) in terms of both 

instances solved and overall time. There is an overall increase in the number 

of instances solved: provided that the increased node rate compensates for the 

increase in the number of nodes searched, there will be a net win. There is an 

apparent paradox because for some strategies that beat unbounded learning, 

e.g. size.2, the number of nodes increases more than the node rate in the “search 

measures” section. However this is not a problem, because “beauty contest” is 

based on all instances, whereas “search measures” is based only on instances that 

didn’t timeout. Hence the paradox is because for these strategies, the instances 

that timed out were the most improved in terms of nodes and node rate. This 

makes sense when the instances that run the longest with unbounded learning 

are the most encumbered by useless clauses. 

These results are interesting because contrary to [16], relevance- and sizebounded 

learning work well for certain choices of k. However, the results in this 

paper were based on a larger set of benchmarks and a larger range of parameters 

were tried. Also, different implementation decisions in our solver will result 

6 constraint forgetting could occasionally lead to less search, as in backjumping [21], 

so a number under 1 is possible in principle 

12 

92

Strategy Beauty contest Search measures 

Instances Time 1st Q NPS Median NPS 3st Q NPS Instances Nodes inc. Speedup 

stock.undefined 1667 248598.9 403.9 1353.0 10390.0 1312 129.6 6.7 

relevance.6 1641 278203.7 205.3 502.4 1257.0 1336 2.4 4.2 

relevance.5 1639 277357.3 217.6 541.6 1433.0 1336 2.8 4.7 

relevance.4 1639 280652.1 222.5 533.4 1549.0 1333 3.6 4.3 

relevance.7 1637 278973.3 201.7 482.9 1184.0 1336 1.9 4.4 

size.10 1637 280804.7 196.7 534.4 1225.0 1336 4.1 5.1 

relevance.10 1636 279244.4 178.1 454.1 1021.0 1335 1.6 5.2 

relevance.3 1635 280366.6 242.1 566.2 1728.0 1336 5.5 3.4 

size.8 1635 281008.0 214.6 566.2 1383.0 1335 5.2 4.5 

size.5 1634 283213.5 235.9 595.7 1574.0 1335 7.5 3.9 

relevance.14 1631 281037.3 141.7 409.5 874.6 1334 1.3 5.6 

size.12 1631 282370.3 187.6 504.2 1143.0 1335 2.1 5.5 

size.13 1631 282911.4 180.1 485.7 1081.0 1335 1.8 5.5 

size.14 1631 283324.7 180.1 469.2 1044.0 1335 1.6 5.7 

relevance.15 1629 282680.8 136.6 404.9 865.1 1335 1.3 5.9 

size.9 1629 283146.9 205.9 541.2 1298.0 1334 4.5 5.0 

size.11 1629 283882.0 193.7 516.0 1170.0 1333 3.0 5.3 

relevance.16 1629 284854.4 134.5 406.7 860.9 1335 1.3 5.6 

size.15 1628 287587.7 176.5 463.9 1007.0 1333 1.7 4.7 

relevance.13 1627 281439.7 155.0 427.0 928.2 1335 1.4 5.3 

relevance.2 1625 287833.7 250.6 580.3 2006.0 1329 61.3 3.2 

relevance.12 1623 284866.5 159.0 420.5 928.9 1334 1.4 5.3 

size.2 1621 289421.7 257.4 604.3 2088.0 1327 21.6 3.7 

relevance.17 1620 288246.0 126.1 402.2 830.4 1335 1.3 5.1 

size.20 1619 295401.9 155.1 413.9 907.9 1335 1.3 4.9 

relevance.20 1618 293226.9 119.2 361.1 783.1 1334 1.2 5.3 

size.1 1616 294566.6 262.4 611.1 2192.0 1323 61.6 3.1 

mostrecent.1 1600 302325.7 227.2 544.0 2102.0 1319 65.8 3.1 

mostrecent.2 1600 305267.5 206.9 500.7 2008.0 1323 37.0 2.8 

mostrecent.10 1569 326114.8 155.6 381.5 1683.0 1323 34.8 2.6 

relevance.30 1555 333292.2 98.4 255.6 686.2 1335 1.2 4.1 

size.30 1554 330743.5 124.0 359.9 786.2 1335 1.2 4.2 

minisat.1.1 1517 349391.3 112.9 278.1 1164.0 1326 8.0 2.1 

relevance.40 1501 360096.1 70.5 166.2 635.5 1335 1.1 3.3 

size.40 1498 354322.2 108.1 260.1 720.8 1334 1.1 3.9 

mostrecent.100 1475 386555.2 77.2 217.8 1002.0 1326 6.1 2.2 

minisat.201.501 1440 410767.3 60.8 173.3 810.8 1321 2.0 2.0 

minisat.201.1001 1439 411044.4 60.9 170.6 800.4 1321 2.0 2.0 

minisat.201.1 1438 410130.1 60.9 174.2 805.6 1321 2.0 2.1 

minisat.401.501 1419 431958.5 46.4 152.4 698.8 1319 1.8 1.9 

minisat.401.1001 1417 438939.3 45.6 146.5 676.0 1320 1.8 1.7 

minisat.401.1 1413 444863.3 43.8 143.5 660.1 1319 1.8 1.6 

relevance.100 1404 406542.4 31.4 99.2 564.3 1330 1.0 2.0 

size.100 1397 406529.6 40.5 110.5 581.3 1330 1.1 1.9 

minisat.601.1001 1373 500036.1 36.8 127.9 586.7 1319 1.6 1.4 

minisat.601.501 1371 502484.1 36.1 121.2 583.9 1318 1.5 1.4 

mostrecent.1000 1371 559058.3 31.6 106.3 566.1 1330 1.3 1.6 

minisat.601.1 1367 510004.5 35.8 126.0 581.4 1316 1.4 1.5 

minisat.1.1001 1344 440553.2 22.7 100.7 585.6 1322 3.0 0.9 

none.undefined 1343 440552.2 22.2 76.4 510.0 1343 1.0 1.0 

minisat.1.501 1343 442209.0 22.6 97.6 574.2 1321 3.0 0.9 

Table 3. Comparison of various strategies for forgetting constraints 

in a different time-space trade off. In fact, the best strategy solves 298 more 

instances than unbounded learning in about 45 hours less runtime. However it 

still trails stock minion by 26 instances and about 8 hours of runtime. In spite 

of this, Figure 3(a) gives evidence that learning is still valuable and promising 

in specific cases. Each point is an instance, with the x-axis the runtime taken 

13 

93

y stock Minion and the y-axis is stock runtime over rel.6 runtime; points above 

the line are speedups and points below are slowdowns. Whilst many instances 

are slowed down, speedups of up to 5 orders of magnitude are available on 

some types of problem. Apart from the best strategy, various parameters for 

relevance-bounded learning perform similarly to k = 6, as well as some sizebounded 

learning parameters. It seems clear that they are significantly better 

than unbounded learning, but not much different to each other. 

The minisat strategy is not effective for any choice of parameters that we 

tried. However there is reason to believe that a better implementation might 

improve matters. Notice that the increase in nodes for the better strategies 

(200.X) is relatively small. Using a profiler, we have discovered that the reason 

for slowness is the amount of time taken to maintain and process the scores, and 

to process the constraints periodically. Hence perhaps a better implementation 

would turn out to perform competitively overall. 

Now we will analyse the best forgetting strategy more carefully. Figure 3(b) 

depicts the speedup on each instance for relevance-bounded k = 6 compared to 

unbounded. It shows that most individual instances are speeded up, sometimes 

by two orders of magnitude, although a few are slowed down by up to an order 

of magnitude. 

In conclusion, whether to use learning remains a modelling decision, where 

big wins are sometimes available but sometimes it is better turned off. 

5 Conclusions 

In this paper, we have carried out the first detailed empirical study of the effectiveness 

and costs of individual constraints in a CDCL solver. We found that, 

typically, a very small minority of constraints contribute most of the propagation 

added by learning. While this is conventional wisdom, it has not previously 

been the subject of empirical study. It is important to verify and make precise 

folklore results, for until evidence exists and is published it is unverifiable and 

acts as a barrier for entry to new researchers, who may not yet be aware of folk 

knowledge. 

Furthermore, these best constraints cost only a small fraction of the runtime 

cost. Conversely, constraints that do no effective propagation can incur significant 

time overheads. This contradicts conventional wisdom which suggests that 

watched literal propagators have lower overheads when not in use. This result 

shows why it is important to experiment on “known” results, because they are 

not always entirely correct. 

Together, these results explain why forgetting can work so well. It is obvious 

that forgetting is a positive necessity due to memory constraints, but this research 

shows that forgetting is not only necessary but also fortuituously effective 

because of the disparity in propagation between constraints. 

Finally, we performed an empirical survey of several simple techniques for 

forgetting constraints in g-learning, and found that they are extremely effec- 

14 

94

tive in making the learning solver more robust and efficient, contrary to some 

previously published evidence. 

References 

1. R. J. Bayardo and R. C. Schrag. Using CSP look-back techniques to solve realworld 

SAT instances. pages 203–208. AAAI Press, 1997. 

2. N. Beldiceanu, M. Carlsson, and J.-X. Rampon. Global constraint catalog. Technical 

Report 08, Swedish Institute of Computer Science, 2005. 

3. E. Ben-Sasson and J. Johannsen. Lower bounds for width-restricted clause learning 

on small width formulas. In SAT, volume6175ofLNCS, pages16–29,2010. 

4. F. Boussemart, F. Hemery, C. Lecoutre, and L. Sais. Boosting systematic search 

by weighting constraints. In ECAI 04, pages482–486,August2004. 

5. R. Dechter. Enhancement schemes for constraint processing: backjumping, learning, 

and cutset decomposition. Artif. Intell., 41(3):273–312,1990. 

6. N. Eén and N. Sörensson. An extensible SAT-solver. In E. Giunchiglia and A. Tacchella, 

editors, SAT, volume2919ofLNCS, pages 502–518. Springer, 2003. 

7. T. Feydy and P. J. Stuckey. Lazy clause generation reengineered. In I. P. Gent, 

editor, CP, volume5732ofLNCS, pages 352–366. Springer, 2009. 

8. D. Frost and R. Dechter. Dead-end driven learning. In AAAI-94, volume1,pages 

294–300. AAAI Press, 1994. 

9. I. Gent, I. Miguel, and N. Moore. Lazy explanations for constraint propagators. 

In PADL 2010, number 5937 in LNCS, January 2010. 

10. I. P. Gent, C. Jefferson, and I. Miguel. Minion: A fast scalable constraint solver. 

In ECAI, pages98–102,2006. 

11. M. L. Ginsberg. Dynamic backtracking. JAIR, 1:25–46,1993. 

12. E. Goldberg and Y. Novikov. Berkmin: A fast and robust SAT-solver. Discrete 

Applied Mathematics, 155(12):1549–1561,2007. 

13. Intel. IA-32 Intel Architecture Software Developer’s Manual Volume 1: Basic Architecture. 

Intel,Inc,2000. 

14. J. Johannsen. An exponential lower bound for width-restricted clause learning. In 

O. Kullmann, editor, SAT, volume5584ofLNCS, pages128–140,2010. 

15. G. Katsirelos. Nogood Processing in CSPs. PhD thesis, University of Toronto, Jan 

2009. http://hdl.handle.net/1807/16737. 

16. G. Katsirelos and F. Bacchus. Unrestricted nogood recording in CSP search. In 

CP, pages873–877,2003. 

17. G. Katsirelos and F. Bacchus. Generalized nogoods in CSPs. In M. M. Veloso and 

S. Kambhampati, editors, AAAI, pages390–396,2005. 

18. J. P. Marques-Silva and K. A. Sakallah. GRASP: A new search algorithm for 

satisfiability. In International Conference on Computer-Aided Design, pages220– 

227, November 1996. 

19. M. W. Moskewicz, C. F. Madigan, Y. Zhao, L. Zhang, and S. Malik. Chaff: Engineering 

an Efficient SAT Solver. In DAC 01, 2001. 

20. O. Ohrimenko, P. J. Stuckey, and M. Codish. Propagation via lazy clause generation. 

Constraints, 14(3):357–391,2009. 

21. P. Prosser. Domain filtering can degrade intelligent backtracking search. In 13th 

International Joint Conference on Artificial Intelligence. Morgan Kaufmann, 1993. 

22. G. Richaud, H. Cambazard, and N. Jussien. Automata for nogood recording in 

constraint satisfaction problems. In In CP06 Workshop on the Integration of SAT 

and CP techniques, 2006. 

15 

95

Job Shop Scheduling with Routing Flexibility and 

Sequence Dependent Setup-Times 

Angelo Oddi 1 , Riccardo Rasconi 1 , Amedeo Cesta 1 , and Stephen F. Smith 2 

1 Institute of Cognitive Science and Technology, CNR, Rome, Italy 

angelo.oddi,riccardo.rasconi,amedeo.cesta@istc.cnr.it 

2 Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA sfs@cs.cmu.edu 

Abstract. This paper presents a meta-heuristic algorithm for solving a job shop 

scheduling problem involving both sequence dependent setup-times and the possibility 

of selecting alternative routes among the available machines. The proposed 

strategy is a variant of the Iterative Flattening Search (IFS) schema. This 

work provides three separate results: (1) a constraint-based solving procedure 

that extends an existing approach for classical Job Shop Scheduling; (2) a new 

variable and value ordering heuristic based on temporal flexibility that take into 

account both sequence dependent setup-times and flexibility in machine selection; 

(3) an original relaxation strategy based on the idea of randomly breaking 

the execution orders of the activities on the machines with a activity selection 

criteria based on their proximity to the solution’s critical path. The efficacy of the 

overall heuristic optimization algorithm is demonstrated on a new benchmark set 

which is an extension of a well-known and difficult benchmark for the Flexible 

Job Shop Scheduling Problem. 


This paper describes an iterative improvement approach to solve job-shop scheduling 

problems involving both sequence dependent setup-times and the possibility of selecting 

alternative routes among the available machines. Over the last years there has been 

an increasing interest in solving scheduling problems involving both setup-times and 

flexible shop environments [3, 2]. This fact stems mainly from the observation that in 

various real-word industry or service environments there are tremendous savings when 

setup times are explicitly considered in scheduling decisions. In addition, the possibility 

of selecting alternative routes among the available machines is motivated by interest in 

developing Flexible Manufacturing Systems (FMS) [25] able to use multiple machines 

to perform the same operation on a job’s part, as well as to absorb large-scale changes, 

in volume, capacity, or capability. 

The proposed problem, called in the rest of the paper Flexible Job Shop Scheduling 

Problem with Sequence Dependent Setup Times (SDST-FJSSP) is a generalization 

of the classical Job Shop Scheduling Problem (JSSP) where a given activity may be 




96

processed on any one of a designated set of available machines and there are no setuptimes. 

This problem is more difficult than the classical JSSP (which is itself NP-hard), 

since it is not just a sequencing problem; in addition to deciding how to sequence activities 

that require the same machine (involving sequence-dependent setup-times), it 

is also necessary to choose a routing policy, i.e., deciding which machine will process 

each activity. The objective remains that of minimizing makespan. 

Despite this problem is often met in real manufacturing systems, not many papers 

take into account sequence dependent setup-times in flexible job-shop environments. 

On the other hand, a richer literature is available when setup-times and flexible jobshop 

environments are considered separately. In particular, on the side of setup-times 

a first reference work is [7], which relies on an earlier proposal presented in [6]. More 

recent works are [28] and [13], which propose effective heuristic procedures based on 

genetic algorithms and local search. In these works, the introduced local search procedures 

extend an approach originally proposed by [19] for the classical job-shop scheduling 

problem to the setup times case. A last noteworthy work is [5], which extends the 

well-known shifting bottleneck procedure [1] to the setup-time case. Both [5] and [28] 

have produced reference results on a previously studied benchmark set of JSSP with 

sequence dependent setup-times problems initially proposed by [7]. About the Flexible 

Job Shop Scheduling FJSSP an effective synthesis of the existing solving approaches is 

proposed in [14]. The core set of procedures which generate the best results include the 

genetic algorithm (GA) proposed in [10], the tabu search (TS) approach of [16] and the 

discrepancy-based method, called climbing depth-bound discrepancy search (CDDS), 

defined in [14]. Among the papers dealing with both sequence dependent setup times 

and flexible shop environments there is the work [23], which considers a shop type 

composed of pools of identical machines as well as two types of setup times: one modeling 

the transportation times between different machines (sequence dependent) and the 

other one modeling the required reconfiguration times (not sequence dependent) on the 

machines. The other work that deals with sequence dependent setup times and routing 

flexibility is [24], which considers a flow-shop environment with multi-purpose machines 

such that each stage of a job can be processed by a set of unrelated machines 

(the processing times of the jobs depend on the machine they are assigned to). [26] considers 

a problem similar to the previous one, where the jobs are composed by a single 

step, but setup-times are both sequence and machine dependent. Finally, [27] considers 

a job-shop problem with parallel identical machines, release times and due dates but 

sequence independent setup-times. 

This paper focuses on a family of solving techniques referred to as Iterative Flattening 

Search (IFS). IFS was first introduced in [8] as a scalable procedure for solving 

multi-capacity scheduling problems. IFS is an iterative improvement heuristic designed 

to minimize schedule makespan. Given an initial solution, IFS iteratively applies twosteps: 

(1) a subset of solving decisions are randomly retracted from a current solution 

(relaxation-step); (2) a new solution is then incrementally recomputed (flattening-step). 

Extensions to the original IFS procedure were made in two subsequent works [17, 12] 

and more recently [20] have performed a systematic study aimed at evaluating the effectiveness 

of single component strategies within the same uniform software framework. 

The IFS variant that we propose relies at its core on a constraint-based solver. This 

2 

97

procedure is an extension of the SP-PCP procedure proposed in [21]. SP-PCP generates 

consistent orderings of activities requiring the same resource by imposing precedence 

constraints on a temporally feasible solution, using variable and value ordering heuristics 

that discriminate on the basis of temporal flexibility to guide the search. We extend 

both the procedure and these heuristics to take into account both sequence dependent 

setup-times and flexibility in machine selection. To provide a basis for embedding this 

core solver within an IFS optimization framework, we also specify an original relaxation 

strategy based on the idea of randomly breaking the execution orders of the activities on 

the machines with a activity selection criteria based on their proximity to the solution’s 

critical path. 

The paper is organized as follows. Section 2 defines the SDST-FJSSP problem 

and Section 3 introduces a CSP representation. Section 4 describes the core constraintbased 

search procedure while Section 5 introduces details of the IFS meta-heuristics. 

An experimental section (Section 6) describes the performance of our algorithm on a 

set of benchmark problems, and explains the most interesting results. Some conclusions 

end the paper. 

2 The Scheduling Problem 

The SDST-FJSSP entails synchronizing the use of a set of machines (or resources) 

R = {r1,...,rm} to perform a set of n activities A = {a1,...,an} over time. The set 

of activities is partitioned into a set of nj jobs J = {J1,...,Jnj}. The processing of a 

job Jk requires the execution of a strict sequence of nk activities ai ∈ Jk and cannot be 

modified. All jobs are released at time 0. Each activity ai requires the exclusive use of a 

single resource ri for its entire duration chosen among a set of available resources Ri ⊆ 

R. No preemption is allowed. Each machine is available at time 0 and can process more 

than one operation of a given job Jk (recirculation is allowed). The processing time pir 

of each activity ai depends on the selected machine r ∈ Ri, such that ei − si = pir, 

where the variables si and ei represent the start and end time of ai. Moreover, for each 

resource r, the value st r ij 

represents the setup time between two generic activities ai 

and aj (aj is scheduled immediately after ai) requiring the same resource r, such that 

ei + st r ij ≤ sj. As is traditionally assumed in the literature, the setup times st r ij satisfy 

the so-called triangular inequality (see [7, 4]). The triangle inequality states that, for any 

three activities ai, aj, ak requiring the same resource, the inequality st r ij ≤ str ik + str kj 

holds. A solution S = {(s1, r1), (s2, r2),...,(sn, rn)} is a set of pairs (si, ri), where 

si is the assigned start time of ai, ri is the selected resource for ai and all the above 

constraints are satisfied. Let Ck be the completion time for the job Jk, the makespan is 

the value Cmax = max1≤k≤nj{Ck}. An optimal solution S ∗ is a solution S with the 

minimum value of Cmax. The SDST-FJSSP is NP-hard since it is an extension of the 

JSSP problem [11]. 

3 A CSP Representation 

There are different ways to model the problem as a Constraint Satisfaction Problem 

(CSP) [18]; here we use an approach similar to [21]. In particular, we focus on assigning 

3 

98

IFS(S,MaxFail, γ) 

begin 

1. Sbest ← S 

2. counter ← 0 

3. while (counter ≤ MaxFail) do 

4. RELAX(S, γ) 

5. S ←PCP(S, Cmax(Sbest)) 

6. if Cmax(S)

path set). As known, an activity ai belongs to the critical path (i.e., meets the critical 

path condition) when, given ai’s end time ei and its feasibility interval [lbi,ubi], the 

condition lbi = ubi holds. For each activity ai, the smaller the difference ubi − lbi 

computed on ei, the closer is ai to the critical path condition. At each IFS iteration, 

the critical path set is built so as to contain any activity ai with a probability directly 

proportional to the γ parameter and inversely proportional to the ubi − lbi value. For 

obvious reasons, the critical path-biased relaxation entails a smaller disruption on the 

solution S, as it operates on a smaller set of activities; the activities that are farther 

from the critical path condition will have a minimum probability to be selected. As 

explained in the following section, this difference has important consequences on the 

experimental behavior. 

6 Experimental Analysis 

The empirical evaluation has been carried out on a SDST-FJSSP benchmark set synthesized 

on purpose out of the first 20 instances of the edata subset of the FJSSP HUdata 

testbed from [15], and will therefore be referred to as SDST-HUdata. Each one 

of the SDST-HUdata instances has been created by adding to the original HUdata instance 

one Setup-Time matrix str (nJ × nJ) for each present machine r, where nJ is 

the number of present jobs. Without loss of generality, the same randomly generated 

Setup-Time matrix was added for each machine of all the benchmark instances. Each 

value st r i,j 

in the Setup-Time matrix models the setup time necessary to reconfigure the 

r-th machine to switch from job i to job j. Note that machine reconfiguration times are 

sequence dependent: setting up a machine to process a product of type j after processing 

a product of type i can generally take a different amount of time than setting up the 

same machine for the opposite transition. The elements str i,j of the Setup-Time matrix 

satisfy the triangle inequality [7, 4], that is, for each three activities ai, aj, ak requiring 

the same machine, the inequality str ij ≤ str ik + str kj holds. The 20 instances taken from 

HUdata (namely, the instances la01-la20) are divided in four groups of five (nJ × nA) 

instances each, where nJ is the number of jobs and nA is the number of activities per 

job for each instance. More precisely, group la01-la05 is (10 × 5), group la06-la10 is 

(15×5), group la11-la15 is (20×5), and group la16-la20 is (10×10). In all instances, 

the processing times on machines assignable to the same activity are identical, as in the 

original HUdata set. The algorithm used for these experiments has been implemented 

in Java and run on a AMD Phenom II X4 Quad 3.5 Ghz under Linux Ubuntu 10.4.1. 

Results. Table 1 and table 2 show the obtained results running our algorithm on the 

SDST-HUdata set using the Random or Slack-based procedure in the IFS relaxation step, 

respectively. Both tables are composed of 10 columns and 23 rows (one row per problem 

instance plus three data wrap-up rows). The best column lists the shortest makespans 

obtained in the experiments for each instance; underlined values represent the best values 

obtained from both tables (global bests). The columns labeled γ =0.2 to γ =0.9 

(see Section 4) contain the results obtained running the IFS procedure with a different 

value for the relaxing factor γ. For each problem instance (i.e., for each row) the values 

in bold indicate the best makespan found among all the tested γ values (γ runs). 


106

Table 1. Results with random selection procedure 

inst. best γ 

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 

la01 726 772 731 728 726 729 726 729 740 

la02 749 785 785 749 749 749 749 749 768 

la03 652 677 658 658 658 652 652 658 675 

la04 673 673 673 673 689 689 680 680 690 

la05 603 613 613 603 605 605 606 607 632 

la06 950 965 950 954 954 971 997 995 1020 

la07 916 946 916 925 919 947 950 987 1000 

la08 954 973 961 964 954 963 958 1000 1001 

la09 1002 1039 1002 1039 1020 1042 1020 1045 1068 

la10 977 1017 977 1022 977 1027 1008 1042 1048 

la11 1265 1265 1312 1285 1282 1345 1332 1372 1368 

la12 1088 1088 1114 1130 1167 1165 1199 1209 1198 

la13 1255 1255 1255 1255 1300 1280 1300 1316 1315 

la14 1292 1292 1315 1344 1346 1362 1351 1345 1372 

la15 1298 1298 1302 1338 1355 1352 1367 1388 1429 

la16 1012 1028 1012 1012 1012 1012 1012 1012 1023 

la17 864 881 885 885 864 888 864 864 902 

la18 985 1021 1007 1029 999 985 985 985 985 

la19 956 1006 992 975 956 956 978 959 981 

la20 997 1008 1010 997 997 997 997 997 999 

B (N) 12 6(1) 7(5) 6(4) 8(5) 6(5) 7(5) 5(3) 1(1) 

Av.C. 20149 17579 14767 11215 10950 9530 7782 7588 

Av.MRE 19.34 18.29 18.66 18.37 19.42 19.43 20.60 22.44 

For each γ run, the last three rows of both tables show respectively (up-bottom): 

(1) the number B of best solutions found locally (i.e., within the current table) and, 

underlined within round brackets, the number N of best solutions found globally (i.e., 

between both tables); (2) the average number of utilized solving cycles (Av.C.), and 

(3) the average mean relative error (Av.MRE) 6 with respect to the lower bounds of 

the original HUdata set (i.e., without setup times), reported in [16]. For all runs, a 

maximum CPU time limit was set to 800 seconds. 

One significant result that the tables show is the difference in the average of utilized 

solving cycles (Av.C. row) between the random and the slack-based relaxation 

procedure. In fact, it can be observed that on average the slack-based approach uses 

more solving cycles in the same allotted time than its random counterpart (i.e., the 

slack-based relaxation heuristic is faster in the solving process). This is explained by 

observing that the slack-based relaxation heuristic entails a less severe disruption of the 

current solution at each solving cycle compared to the random heuristic, as the former 

generally relaxes a lower number of activities (given the same γ value). The lower the 

disruption level of the current solution in the relaxation step, the easier it is to re-gain 

solution feasibility in the flattening step. In addition of this efficiency issue, the slackbased 

relaxation approach also provides the extra effectiveness deriving from operating 

in the vicinity of the critical path of the solution, as demonstrated in [8]. 

The good performance exhibited by the slack-based heuristic can be also observed 

by inspecting the B(N) rows in both tables. Clearly, the slack-based approach finds a 

6 The individual MRE of each solution is computed as follows: MRE = 100 × (Cmax − 

LB)/LB, where Cmax is the solution makespan and LB is the instance’s lower bound 

12 

107

Table 2. Results with slack-based selection procedure 

inst. best γ 

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 

la01 726 739 736 726 726 726 726 726 726 

la02 749 785 749 749 749 749 749 749 749 

la03 652 658 658 658 658 658 652 658 658 

la04 673 686 686 686 673 686 680 673 680 

la05 603 613 603 613 605 603 604 603 605 

la06 960 963 963 971 960 963 962 970 970 

la07 925 941 966 941 925 931 946 972 1000 

la08 948 983 963 948 964 993 967 994 973 

la09 1002 1020 1020 1002 1002 1040 1069 1052 1042 

la10 985 993 991 1007 1022 1022 1017 985 1024 

la11 1256 1256 1257 1295 1295 1308 1318 1324 1332 

la12 1082 1082 1097 1098 1159 1152 1188 1163 1207 

la13 1215 1222 1240 1240 1223 1215 1311 1301 1311 

la14 1285 1308 1285 1285 1311 1295 1335 1372 1345 

la15 1291 1333 1291 1330 1302 1311 1383 1389 1412 

la16 1007 1012 1012 1012 1007 1012 1012 1012 1012 

la17 858 889 868 893 895 888 858 859 872 

la18 985 1019 1025 1021 1007 985 985 985 985 

la19 956 1006 976 987 984 956 980 956 959 

la20 997 997 1033 997 997 997 1003 997 997 

B (N) 17 3(3) 4(4) 5(5) 8(6) 7(7) 5(5) 8(7) 4(4) 

Av.C. 21273 18068 15503 13007 10643 10653 8639 8575 

Av.MRE 18.67 18.09 18.26 18.19 18.14 19.58 19.44 20.16 

higher number of best solutions (17 against 12), which is confirmed by comparing the 

number of locally found bests (B) with the global ones (N), for each γ value, and for 

both heuristics. 

Another interesting aspect can be found analyzing the γ values range where the 

best performances are obtained (Av.MRE row). Inspecting the Av.MRE values, the 

following can in fact be stated: (1) the slack-based heuristic finds solutions of higher 

quality w.r.t. the random heuristic over the complete γ variability range; (2) in the random 

case, the best results are obtained in the [0.3, 0.5] γ range, while in the slack-based 

case the best γ range is wider ([0.3, 0.6]). 

7 Conclusions 

In this paper we have proposed the use of Iterative Flattening Search (IFS) as a means of 

effectively solving the SDST-FJSSP. The proposed algorithm uses as its core solving 

procedure an extended version of the SP-PCP procedure proposed by [21] and a new 

relaxation strategy targeted to the case of SDST-FJSSP. The effectiveness of the procedure 

was demonstrated on 20 modified instances of the edata subset of the FJSSP 

HUdata testbed from [15], a well known and difficult Flexible Job Shop Scheduling 

benchmark set. In particular, we show as the new slack-based relaxation strategy exhibits 

better performance than the random selection one. Further improvement of the 

current algorithm may be possible by incorporating additional heuristic information 

and search mechanisms. One of the next steps will be the collection of the benchmarks 

proposed in the cited works [23, 24, 26, 27], although no one of the problems proposed 

13 

108

in these papers coincides with the SDST-FJSSP, basically they can be seen as slight 

variations of this problem, hence the proposed IFS procedure can be adapted to solve an 

interesting and large class of flexible manufacturing scheduling problems. This will be 

the focus of our future work together the realization of a web repository to collect all 

the interesting benchmark sets. 

Acknowledgments 

CNR authors are partially supported by EU under the ULISSE project (Contract FP7.218815), 

and MIUR under the PRIN project 20089M932N (funds 2008). 

References 

1. J. Adams, E. Balas, and D. Zawack. The shifting bottleneck procedure for job shop scheduling. 

Management Science, 34(3):391–401, 1988. 

2. A. Allahverdi, C. Ng, T. Cheng, and M. Kovalyov. A survey of scheduling problems with 

setup times or costs. European Journal of Operational Research, 187(3):985–1032, 2008. 

3. A. Allahverdi and H. Soroush. The significance of reducing setup times/setup costs. European 

Journal of Operational Research, 187(3):978–984, 2008. 

4. C. Artigues and D. Feillet. A branch and bound method for the job-shop problem with 

sequence-dependent setup times. Annals OR, 159(1):135–159, 2008. 

5. E. Balas, N. Simonetti, and A. Vazacopoulos. Job shop scheduling with setup times, deadlines 

and precedence constraints. Journal of Scheduling, 11(4):253–262, 2008. 

6. P. Brucker, B. Jurisch, and B. Sievers. A branch and bound algorithm for the job-shop 

scheduling problem. Discrete Applied Mathematics, 49(1-3):107–127, 1994. 

7. P. Brucker and O. Thiele. A branch & bound method for the general-shop problem with 

sequence dependent setup-times. OR Spectrum, 18(3):145–161, 1996. 

8. A. Cesta, A. Oddi, and S. F. Smith. Iterative Flattening: A Scalable Method for Solving 

Multi-Capacity Scheduling Problems. In AAAI/IAAI. 17 th National Conference on Artificial 

Intelligence, pages 742–747, 2000. 

9. R. Dechter, I. Meiri, and J. Pearl. Temporal constraint networks. Artificial Intelligence, 

49:61–95, 1991. 

10. J. Gao, L. Sun, and M. Gen. A hybrid genetic and variable neighborhood descent algorithm 

for flexible job shop scheduling problems. Computers & Operations Research, 35:2892– 

2907, 2008. 

11. M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of 

NP-Completeness. W. H. Freeman & Co., New York, NY, USA, 1979. 

12. D. Godard, P. Laborie, and W. Nuitjen. Randomized Large Neighborhood Search for Cumulative 

Scheduling. In Proceedings of ICAPS-05, pages 81–89, 2005. 

13. M. A. González, C. R. Vela, and R. Varela. A Tabu Search Algorithm to Minimize Lateness 

in Scheduling Problems with Setup Times. In Proceedings of the CAEPIA-TTIA 2009 13th 

Conference of the Spanish Association on Artificial Intelligence, 2009. 

14. A. B. Hmida, M. Haouari, M.-J. Huguet, and P. Lopez. Discrepancy search for the flexible 

job shop scheduling problem. Computers & Operations Research, 37:2192–2201, 2010. 

15. J. Hurink, B. Jurisch, and M. Thole. Tabu search for the job-shop scheduling problem with 

multi-purpose machines. OR Spectrum, 15(4):205–215, February 1994. 

16. M. Mastrolilli and L. M. Gambardella. Effective neighbourhood functions for the flexible 

job shop problem. Journal of Scheduling, 3:3–20, 2000. 

14 

109

17. L. Michel and P. Van Hentenryck. Iterative Relaxations for Iterative Flattening in Cumulative 

Scheduling. In Proceedings of ICAPS-04, pages 200–208, 2004. 

18. U. Montanari. Networks of Constraints: Fundamental Properties and Applications to Picture 

Processing. Information Sciences, 7:95–132, 1974. 

19. E. Nowicki and C. Smutnicki. An advanced tabu search algorithm for the job shop problem. 

Journal of Scheduling, 8(2):145–159, 2005. 

20. A. Oddi, A. Cesta, N. Policella, and S. F. Smith. Iterative flattening search for resource 

constrained scheduling. J. Intelligent Manufacturing, 21(1):17–30, 2010. 

21. A. Oddi and S. Smith. Stochastic Procedures for Generating Feasible Schedules. In Proceedings 

14th National Conference on AI (AAAI-97), pages 308–314, 1997. 

22. N. Policella, A. Cesta, A. Oddi, and S. Smith. From Precedence Constraint Posting to Partial 

Order Schedules. AI Communications, 20(3):163–180, 2007. 

23. A. Rossi and G. Dini. Flexible job-shop scheduling with routing flexibility and separable 

setup times using ant colony optimisation method. Robotics and Computer-Integrated Manufacturing, 

23(5):503–516, 2007. 

24. R. Ruiz and C. Maroto. A genetic algorithm for hybrid flowshops with sequence dependent 

setup times and machine eligibility. European Journal of Operational Research, 169(3):781 

– 800, 2006. 

25. A. K. Sethi and S. P. Sethi. Flexibility in manufacturing: A survey. International Journal of 

Flexible Manufacturing Systems, 2:289–328, 1990. 10.1007/BF00186471. 

26. E. Vallada and R. Ruiz. A genetic algorithm for the unrelated parallel machine scheduling 

problem with sequence dependent setup times. European Journal of Operational Research, 

211(3):612 – 622, 2011. 

27. V. Valls, M. A. Perez, and M. S. Quintanilla. A tabu search approach to machine scheduling. 

European Journal of Operational Research, 106(2-3):277 – 300, 1998. 

28. C. R. Vela, R. Varela, and M. A. González. Local search and genetic algorithm for the job 

shop scheduling problem with sequence dependent setup times. Journal of Heuristics, 2009. 

15 

110

Automatic Generation of Efficient Domain-Optimized 

Planners from Generic Parametrized Planners 

Mauro Vallati 1 , Chris Fawcett 2 , Alfonso E. Gerevini 1 , 

Holger H. Hoos 2 , and Alessandro Saetti 1 

1 Dipartimento di Ingegneria dell’Informazione 

Università di Brescia, Italy 

{mauro.vallati,gerevini,saetti}@ing.unibs.it 

2 Computer Science Department 

University of British Columbia, Canada 

{fawcettc,hoos}@cs.ubc.ca 

Abstract. When designing state-of-the-art, domain-independent planning systems, 

many decisions have to be made with respect to the domain analysis or 

compilation performed during preprocessing, the heuristic functions used during 

search, and other features of the search algorithm. These design decisions can 

have a large impact on the performance of the resulting planner. By providing 

many alternatives for these choices and exposing them as parameters, planning 

systems can in principle be configured to work well on different domains. However, 

usually planners are used in default configurations that have been chosen because 

of their good average performance over a set of benchmark domains, with 

limited experimentation of the potentially huge range of possible configurations. 

In this work, we propose a general framework for automatically configuring a parameterized 

planner, showing that substantial performance gains can be achieved. 

We apply the framework to the well-known LPG planner, which has 62 parameters 

and over 6.5 × 10 17 possible configurations. We demonstrate that by using 

this highly parameterized planning system in combination with the off-the-shelf, 

state-of-the-art automatic algorithm configuration procedure ParamILS, the planner 

can be specialized obtaining significantly improved performance. 

Introduction 

When designing state-of-the-art, domain-independent planning systems, many decisions 

have to be made with respect to the domain analysis or compilation performed 

during preprocessing, the heuristic functions used during search, and several other features 

of the search algorithm. These design decisions can have a large impact on the 

performance of the resulting planner. By providing many alternatives for these choices 

and exposing them as parameters, highly flexible domain-independent planning systems 

are obtained, which then, in principle, can be configured to work well on different 

domains, by using parameter settings specifically chosen for solving planning problems 





from each given domain. However, usually such planners are used with default configurations 

that have been chosen because of their good average performance over a set 

of benchmark domains, based on limited exploration within a potentially vast space of 

possible configurations. The hope is that these default configurations will also perform 

well on domains and problems beyond those for which they were tested at design time. 

In this work, we advocate a different approach, based on the idea of automatically 

configuring a generic, parameterized planner using a set of training planning problems 

in order to obtain planners that perform especially well in the domains of these training 

problems. Automated configuration of heuristic algorithms has been an area of intense 

research focus in recent years, producing tools that have improved algorithm performance 

substantially in many problem domains. To our knowledge, however, these techniques 

have not yet been applied to the problem of planning. 

While our approach could in principle utilize any sufficiently powerful automatic 

configuration procedure, we have chosen the FocusedILS variant of the off-the-shelf, 

state-of-the-art automatic algorithm configuration procedure ParamILS [8]. At the core 

of the ParamILS framework lies Iterated Local Search (ILS), a well-known and versatile 

stochastic local search method that iteratively performs phases of a simple local search, 

such as iterative improvement, interspersed with so-called perturbation phases that are 

used to escape from local optima. The FocusedILS variant of ParamILS uses this ILS 

procedure to search for high-performance configurations of a given algorithm by evaluating 

promising configurations, using an increasing number of runs in order to avoid 

wasting CPU-time on poorly-performing configurations. ParamILS also avoids wasting 

CPU-time on low-performance configurations by adaptively limiting the amount of 

runtime allocated to each algorithm run using knowledge of the best-performing configuration 

found so far. 

ParamILS has previously been applied to configure state-of-the-art solvers for SAT 

[7] and mixed integer programming (MIP) [9]. This resulted in a version of the SAT 

solver Spear that won the first prize in one category of the 2007 Satisfiability Modulo 

Theories Competition [7]; it further contributed to the SATzilla solvers that won prizes 

in 5 categories of the 2009 SAT Competition and led to large improvements in the 

performance of CPLEX on several types of MIP problems [9]. Differently from SAT 

and MIP, in planning, explicit domain specifications are available through a planning 

language, which creates more opportunities for planners to take problem structure into 

account in parameterized components (e.g., specific search heuristics). This can lead to 

more complex systems, with greater opportunities for automatic parameter configuration, 

but also greater challenges (bigger, richer design spaces can be expected to give 

rise to trickier configuration problems). 

One such planning system is LPG (e.g., [3, 4]). Based on a stochastic local search 

procedure, LPG is a well-known efficient and versatile planner with many components 

that can be configured very flexibly via 62 exposed configurable parameters, which 

jointly give rise to over 6.5 × 10 17 possible configurations. The default settings of these 

parameters have been chosen to allow the system to work well on a broad range of 

domains. In this work, we used ParamILS to automatically configure LPG on various 

propositional domains; LPG’s configuration space is one of the largest considered so 

far in applications of ParamILS. 

2 


We tested our approach using ParamILS and LPG on 11 domains of planning problems 

used in previous international planning competitions (IPC-3–6). Our results demonstrate 

that by using automatically determined, domain-optimized configurations (LPG.sd), 

substantial performance gains can be achieved compared to the default configuration 

(LPG.d). Using the same automatic configuration approach to optimize the performance 

of LPG on a merged set of benchmark instances from different domains also results in 

improvements over the default, but these are less pronounced than those obtained by 

automated configuration for single domains. 

We also investigated to which extent the domain-optimized planners obtained by 

configuring the general-purpose LPG planner perform well compared to other state-ofthe-art 

domain-independent planners. Our results indicate that, for the class of domains 

considered in our analysis, LPG.sd is significantly faster than LAMA [10], the topperforming 

propositional planner of the last planning competition (IPC-6). 3 

Moreover, in order to understand how well our approach works compared to stateof-the-of-art 

systems in automated planning with learning, we have experimentally 

compared LPG.sd with the planners of the learning track of IPC-6, showing that in 

terms of speed and usefulness of the learned knowledge our system outperforms the 

respective IPC-6 winners PbP.s [5] and ObtuseWedge [11]. 

While in this work, we focus on the application of the proposed framework to the 

LPG planner, we believe that similarly good results can be obtained for highly parameterized 

versions of other existing planning systems. In general, our results suggest that 

in the future development of efficient planning systems, it is worth including many 

different variants and a wide range of settings for the various components, instead of 

committing at design time to particular choices and settings, and to use automated procedures 

for finding configurations of the resulting highly parameterized planning systems 

that perform well on the problems arising in a specific application domain under 

consideration. 

In the rest of this paper, we first provide some background and further information 

on LPG and its parameters. Next, we describe in detail our experimental analysis and 

results, followed by concluding remarks and a discussion of some avenues for future 

work. 

The Generic Parameterized Planner LPG 

In this section, we provide a very brief description of LPG and its parameters. LPG 

is a versatile system that can be used for plan generation, plan repair and incremental 

planning in PDDL2.2 domains [6]. The planner is based on a stochastic local search procedure 

that explores a space of partial plans represented through linear action graphs, 

which are variants of the very well-known planning graph [1]. 

Starting from the initial action graph containing only two special actions representing 

the problem initial state and goals, respectively, LPG iteratively modifies the 

3 The version of LAMA used in the competition has only four Boolean parameters exposed, 

which its authors recommend to leave unchanged; it is therefore not suitable for studying automatic 

parameter configuration. A newer, much more flexibly configurable version of LAMA 

has become available very recently, as part of the Fast Downward system, which we are studying 

in ongoing work. 

3 


Domain Configuration P1 P2 P3 P4 P5 P6 P7 Total 

Blocksworld 1 1 2 1 5 1 2 13 

Depots 2 2 1 1 2 2 2 12 

Gold-miner 2 3 0 1 4 2 1 13 

Matching-BW 1 2 2 1 3 0 2 11 

N-Puzzle 4 5 3 2 14 5 2 35 

Rovers 0 1 0 0 0 2 1 4 

Satellite 2 7 3 1 11 5 3 32 

Sokoban 0 1 1 1 1 1 2 7 

Zenotravel 3 5 2 3 11 5 3 32 

Merged set 0 1 0 1 5 2 2 11 

Number of parameters 6 15 8 6 17 7 3 62 

Table 1. Number of parameters of LPG that are changed by ParamILS in the configurations 

computed for nine domains independently considered (2nd–10th lines) and jointly considered 

(“merged set” line). Each P1–P7 column corresponds to a different parameter category (or planner 

component). 

The last line of Table 1 shows the number of LPG’s parameters that fall into each of 

these seven categories (planner components). 

Experimental Analysis 

In this section, we present the results of a large experimental study examining the effectiveness 

of the automated approach outlined in the introduction. While our analysis 

is focused on planning speed, we also report preliminary results on plan quality. 

Benchmark domains and instances 

In our first set of experiments, we considered problem instances from eight known 

benchmark domains used in the last four international planning competitions (IPC-3– 

6), Depots, Gold-miner, Matching-BW, N-Puzzle, Rovers, Satellite, Sokoban, 

and Zenotravel, plus the well-known domain Blocksworld. These domains were selected 

because they are not trivially solvable and random instance generators are available 

for them, such that large training and testing sets of instances can be obtained. 

For each domain, we used the respective random instance generator to derive three 

disjoint sets of instances: a training set with 2000 relatively small instances (benchmark 

T), a testing set with 400 middle-size instances (benchmark MS), and a testing set 

with 50 large instances (benchmark LS). The size of the instances in training set T was 

decided such that the instances may be solved by the default configuration of LPG in 

20 to 40 CPU seconds on average. For testing sets MS and LS, the size of the instances 

was defined such the instances may on average be solved by the default configuration of 

LPG in 50 seconds to 2 minutes and in 3 minutes to 7 minutes, respectively. This does 

not mean that all our problem instances can be solved by LPG, since we have just decided 

the size of the instances according to the performance of the default configuration, 

and then we have used random generators for deriving the actual instances. 

5 


For the experiments comparing automatically determined configurations of LPG 

against the planners that entered the learning track of IPC-6, we employed the same 

instance sets as those used in the competition. 

Automated configuration using ParamILS 

For all configuration experiments we used the FocusedILS variant of ParamILS version 

2.3.5 with default parameter settings. Using the default configuration of LPG as the 

starting point for the automated configuration process, we concurrently performed 10 

independent runs of FocusedILS per domain, using random orderings of the training 

set instances. 4 Each run of FocusedILS had a total CPU-time cutoff of 48 hours, and a 

cutoff time of 60 CPU seconds was used for each run of LPG performed during the configuration 

process. The objective function used by ParamILS for evaluating the quality 

of configurations was mean runtime, with timeouts and crashes assigned a penalized 

runtime of ten times the per-run cutoff. Out of the 10 configurations produced by these 

runs, we selected the configuration with the best training set performance (as measured 

by FocusedILS) as the final configuration of LPG for the respective domain. 

Additionally, we used FocusedILS for optimizing the configuration of LPG across 

all of the selected domains together. As with our approach for individual domains, we 

performed 10 independent runs of FocusedILS starting from the default configuration; 

again, the single configuration with the best performance on the merged training set as 

measured by FocusedILS was selected as the final result of the configuration process. 

The final configurations thus obtained were then evaluated on the two testing sets 

of instances (benchmarks MS and LS) for each domain. We used a timeout of 600 CPU 

seconds for benchmark MS, and 900 CPU seconds for benchmark LS. 

For convenience, we define the following abbreviations corresponding to configurations 

of LPG: 

– Default (LPG.d): The default configuration of LPG. 

– Random (LPG.r): Configurations selected independently at random from all possible 

configurations of LPG. 

– Specific (LPG.sd): The specific configuration of LPG found by ParamILS for each 

domain. 

– Merged (LPG.md): The configuration of LPG obtained by running ParamILS on 

the merged training set. 

Table 1 shows, for each parameter category of LPG, the number of parameters that 

are changed from their defaults by ParamILS in the derived domain-optimized configurations 

and in the configuration obtained for the merged training set. 

Empirical result 1 Domain-optimized configurations of LPG differ substantially from 

the default configuration. 

Moreover, we noticed that usually the changed configurations are considerably different 

from each other. 

4 Multiple independent runs of FocusedILS were used, because this approach can help ameliorate 

stagnation of the configuration process occasionally encountered otherwise. 

6 


Domain LPG.d LPG.r 

Score % solved Score % solved 

Blocksworld 99.00 99 0.00 16 

Depots 86.00 86 0.00 18 

Gold-miner 91.00 91 0.00 19 

Matching-BW 14.00 14 0.15 9 

N-Puzzle 59.10 89 34.75 86 

Rovers 85.81 100 31.21 53 

Satellite 96.02 100 18.99 37 

Sokoban 73.20 74 2.06 28 

Zenotravel 98.70 100 2.47 24 

Total 702.8 83.7 89.6 32.2 

Table 2. Speed scores and percentage of problems solved by LPG.d and LPG.r for 100 problems 

in each of 9 domains of benchmark MS. 

Results on specific domains 

The performance of each configuration was evaluated using the performance score functions 

adopted in IPC-6 [2]. The speed score of a configuration C is defined as the sum 

of the speed scores assigned to C over all test problems. The speed score assigned to C 

for a planning problem p is 0 if p is unsolved and T ∗ p /T (C)p otherwise, where T ∗ p is the 

lowest measured CPU time to solve problem p among those of the compared solvers, 

and T (C)p denotes the CPU time required by C to solve problem p. Higher values for 

the speed score indicate better performance. 

Table 2 shows the results of the comparison between LPG.d and LPG.r, which we 

conducted to assess the performance of the default configuration on our benchmarks. 

Empirical result 2 LPG.d is considerably faster and solves many more problems than 

LPG.r. 

Specifically, LPG.r solves very few problems in 6 of the 9 domains we considered, while 

LPG.d solves most of the considered problems in all but one domain. This observation 

also suggests that the default configuration is a much better starting point for deriving 

configurations using ParamILS than a random configuration. In order to confirm this 

intuition, we performed an additional set of experiments using the random configuration 

as starting point. As expected, the resulting configurations of LPG perform much worse 

than LPG.sd, and even sometimes perform worse than LPG.d. 

Figure 2 provides results in the form of a scatterplot, showing the performance of 

LPG.sd and LPG.d on the individual benchmark instances. We consider all instances 

solved by at least one of these planners. Each cross symbol indicates the CPU time 

used by LPG.d and LPG.sd to solve a particular problem instance of benchmarks MS and 

LS. When a cross appears under (above) the main diagonal, LPG.sd is faster (slower) 

than LPG.d; the distance of the cross from the main diagonal indicates the performance 

gap (the greater the distance, the greater the gap). The results in Figure 2 indicate that 

LPG.sd performs almost always better than LPG.d, often by 1–2 orders of magnitude. 

7 


CPU seconds of LPG.sd 

U 

100 

10 

1 

0.1 

0.1 1 10 

CPU seconds of LPG.d 

100 U 

CPU seconds of LPG.sd 

U 

100 

10 

1 

0.1 

0.1 1 10 

CPU seconds of LPG.d 

100 U 

Fig. 2. CPU time (log. scale) of LPG.sd with respect to LPG.d for problems of benchmarks MS 

(upper plot) and LS (bottom plot). U corresponds to runs that timed out with the given runtime 

cutoff. 

Table 3 shows the performance of LPG.d, LPG.md, and LPG.sd for each domain 

of benchmarks MS and LS in terms of speed score, percentage of solved problems and 

average CPU time (computed over the problems solved by all the considered configurations). 

These results indicate that LPG.sd solves many more problems, is on average 

much faster than LPG.d and LPG.md, and that for some benchmark sets LPG.sd always 

performs better than or equal to the other configurations, as the IPC score of LPG.sd is 

sometimes the maximum score (i.e., 400 points for benchmark MS, and 50 for benchmark 

LS). 5 

Empirical result 3 LPG.sd performs much better than both LPG.d and LPG.md. 

Interestingly, the results in Figure 2 and Table 3 also indicate that, for larger test 

problems, the performance gap between LPG.sd and LPG.d tends to increase: For ex- 

5 Additional results (not detailed here for lack of space) using 2000 test problems for each of the 

nine considered domains of the same size as those used for the training indicate a performance 

behavior very similar to the one observed for the MS and LS instances considered in Table 3. 

8 


Domain MS problems 

Speed score (% solved) Average CPU time 

LPG.d LPG.md LPG.sd LPG.d LPG.md LPG.sd 

Blocksworld 21.3 (98.8) 74.8 (100) 400 (100) 105.3 28.17 4.29 

Depots 124 (90.3) 164 (99) 345 (98.5) 78.1 42.4 5.7 

Gold-miner 18.5 (90.5) 232 (100) 374 (100) 94.4 7.4 1.6 

Matching-BW 9.74 (15.8) 72.5 (55.3) 375 (97.8) 93.8 42.3 5.6 

N-Puzzle 20.1 (85) 27.0 (86.3) 347 (86.8) 321.0 247 31.20 

Rovers 131 (100) 162 (100) 400 (100) 72.2 52.9 21.2 

Satellite 104 (100) 111 (100) 400 (100) 64.0 59.2 1.3 

Sokoban 26.7 (75.8) 191 (94.8) 335 (96.5) 24.6 6.15 1.19 

Zenotravel 49.1 (100) 97.2 (99.8) 397 (100) 103.7 57.6 11.1 

All above 280.3 (83.3) 304.3 (91.5) – 115.4 38.8 – 

Domain LS problems 

Speed score (% solved) Average CPU time 

LPG.d LPG.md LPG.sd LPG.d LPG.md LPG.sd 

Blocksworld 5.12 (100) 11.1 (100) 50 (100) 320.9 144.8 30.8 

Depots 3.91 (100) 17.4 (100) 44.1 (98) 326.6 181.1 25.7 

Gold-miner 1.54 (100) 32.6 (100) 35.9 (100) 327 21.0 21.2 

Matching-BW 1.51 (86) 15.2 (94) 47.4 (100) 225 72.3 1.90 

N-Puzzle 0.66 (100) 1.41 (100) 50 (100) 344 158 4.44 

Rovers 9.61 (100) 48.5 (100) 45.6 (100) 248 48.3 52.7 

Satellite 9.43 (100) 28.8 (100) 50 (100) 263 85.4 48.9 

Sokoban 4.55 (62) 24.0 (82) 38.7 (94) 70.8 7.00 4.23 

Zenotravel 0.52 (100) 4.26 (100) 50 (100) 294 42.9 2.90 

All above 12.6 (96) 49.7 (100) – 309.7 81.3 – 

Table 3. Speed score, percentage of solved problems, and average CPU time of LPG.d, LPG.md 

and LPG.sd for 400 MS and 50 LS instances in each of 9 domains, independently considered, and 

in all domains (last line). 

ample, on the middle-size instances of Matching-BW, LPG.sd is on average about one 

order of magnitude faster than LPG.d, while on the largest instances it has an average 

performance advantage of more than two orders of magnitude. 

Empirical result 4 LPG.sd is faster than LPG.d also for instances considerably larger 

than those used for deriving the planner configurations. 

This observation indicates that the approach used for deriving configurations scales well 

with increasing problem instance size. 

As can be seen from the last line of Table 3, LPG.md performs usually better than 

LPG.d on the individual domain test sets. Moreover, it performs better than LPG.d 

on the sets obtained by merging the test sets for all individual domains, which indicates 

that by using a merged training set, we successfully produced a configuration with good 

performance on average across all selected domains. 

9 


Domain LPG.sd vs. LAMA LPG.sd vs. PbP.s 

∆-speed ∆-solved ∆-speed ∆-solved 

Blocksworld +377.4 +52 +361.7 ±0 

Depots +393.9 +381 +211.1 +54 

Gold-miner +400 +400 +395.6 +319 

Matching-BW +227.8 +118 +40.7 +330 

N-Puzzle +255.7 +4 +279.8 −20 

Rovers +392.9 +14 +313.4 +9 

Satellite +388.1 +157 +253.6 +9 

Sokoban +340.1 +278 −41.6 +5 

Zenotravel +368.3 ±0 −282.1 +8 

Total +3144 +1404 +1532 +714 

Table 4. Performance gap between LPG.sd and LAMA (2nd-3rd columns) and LPG.sd and 

PbP.s (4-5th columns) for 400 MS problems in each of 9 domains in terms of speed score and 

number of solved problems. 

Empirical result 5 LPG.md performs better than LPG.d. 

Next, we compared our LPG configurations with state-of-the-art planning systems, 

namely, the winner of the IPC-6 classical track LAMA (configured to stop when the 

first solution is computed), and the winner of the IPC-6 learning track, PbP. The performance 

gap between LPG.sd and these planners for MS problems are shown in Table 4, 

where we report the speed score and the number of solved problems (positive numbers 

mean that LPG.sd performs better). These experimental results indicate clearly that 

our configurations of LPG are significantly faster and solve many more problems than 

LAMA. 

Empirical result 6 LPG.sd performs significantly better than LAMA on well-known 

non-trivial domains. 

Moreover, LPG.sd outperforms PbP.s in most of the selected domains: only for 

Sokoban and Zenotravel PbP.s obtains a better speed score (but performs slightly 

worse in terms of solved problems), and only for N-Puzzle it solves more problems (but 

it is generally slower). Interestingly, for these domains the multiplanner of PbP.s runs a 

single planner with an associated set of macro-actions; these macro-actions clearly help 

to significantly speed up the search phase of this planner. 

Empirical result 7 For the considered well-known benchmark domains, LPG.sd performs 

significantly better than PbP.s. 

Results on learning track of IPC-6 

To evaluate the effectiveness of our approach against recent learning-based planners, 

we compared our LPG.sd configurations with planners that entered the learning track 

10 

120

Planner # unsolved Speed score ∆-score 

LPG.sd 38 93.23 +59.7 

ObtuseWedge 63 63.83 +33.58 

PbP.s 7 69.16 −3.54 

RFA1 85 11.44 – 

Wizard+FF 102 29.5 +10.66 

Wizard+SGPlan 88 38.24 +7.73 

Table 5. Performance of the top 5 planners that took part in the learning track of IPC-6 plus 

LPG.sd, in terms of the number of unsolved problems, speed score and score gap with and without 

using the learned knowledge for the problems of the learning track of IPC-6. 

of IPC-6, based on the same performance criteria as used in the competition. Table 5 

shows performance in terms of the number of unsolved problems, speed score, and performance 

gap with and without using the learned knowledge (positive numbers mean 

that the planner performs better using the knowledge); the results in this table indicate 

that LPG.sd performs better than every solver that participated in the IPC-6 learning 

track, including the version of PbP.s which won the IPC-6 learning track. Although 

LPG.sd solves fewer problems than PbP (obtaining zero score for each unsolved problem), 

it achieves the best score as it is the fastest planner on 3 domains (Gold-miner, 

N-Puzzle and Sokoban), and it performs close to PbP.s on one additional domain 

(Matching-BW). Furthermore, the results in Table 5 indicate that the performance gap 

between LPG.sd and LPG.d is significant, and is greater than the gap achieved by 

ObtuseWedge, the planner recognised as best learner of the IPC-6 competition. 

Empirical result 8 According to the evaluation criteria of IPC-6, LPG.sd performs 

better than the winners of the learning track for speed and best-learning. 

Further preliminary results on plan quality 

Although the experimental analysis in this paper focuses on planning speed, we give 

some preliminary results indicating that automatic algorithm configuration is also promising 

for optimizing plan quality. Additional experiments to confirm this observation are 

in progress. Figure 3 shows results on two benchmark domains (100 problems each 

from the MS set) in terms of relative solution quality of LPG.sd and LPG.d over CPU 

time spent by the planner, where, in this context, LPG.sd refers to LPG configured for 

optimizing plan quality. Training was conducted based on LPG runs with cut-off of 2 

CPU minutes, with the objective to minimise the best plan cost (number of actions) 

within that time limit (LPG is an incremental planner computing a sequence of plans 

with increasing quality). The quality score of a configuration is defined analogously to 

the runtime score previously described, but using plan cost instead of CPU time. 

Overall, these results indicate that, at least for the domains considered here, LPG.sd 

always finds considerably better plans than LPG.d, unless small CPU-time limits are 

used, in which case they perform similarly. 


121

100 

80 

60 

40 

20 

Quality score 

LPG.d (Depots) 

LPG.sd (Depots) 

LPG.d (Gold-miner) 

LPG.sd (Gold-miner) 

0 

1 10 100 900 

Fig. 3. Quality score of LPG.d and LPG using domain-optimized configurations for computing 

high-quality plans w.r.t. an increasing CPU-time limit (x-axis: ranging from 1 to 900 seconds) for 

domains Depots and Gold-miner. 

Conclusions and Future Work 

We have investigated the application of computer-assisted algorithm design to automated 

planning and proposed a framework for automatically configuring a generic planner 

with several parameterized components to obtain specialized planners that work efficiently 

on given domains. In a large-scale empirical analysis, we have demonstrated 

that our approach, when applied to the state-of-the-art, highly parameterized LPG planning 

system, effectively generates substantially improved domain-optimized planners. 

Our work and results also suggest a potential method for testing new heuristics and 

algorithm components, based on measuring the performance improvements obtained by 

adding them to an existing highly-parameterized planner followed by automatic configuration 

for specific domains. The results may not only reveal to which extent new 

design elements are useful, but also under which circumstances they are most effective 

– something that would be very difficult to determine manually. 

We see several avenues for future work. Concerning the automatic configuration 

of LPG, we are conducting an experimental analysis about the usefulness of the proposed 

framework for identifying configurations improving the planner performance in 

terms of plan quality, of which in this paper we have given preliminary results. Moreover, 

we plan to apply the framework to metric-temporal planning domains. Finally, 

we believe that our approach can yield good results for other planners that have been 

rendered highly configurable by exposing many parameters. In particular, preliminary 

results from ongoing work indicate that substantial performance gains can be obtained 

when applying our approach to a very recent, highly parameterized version of the IPC-4 

winner Fast Downward. 

References 

1. Blum, A., and Furst, M., L. 1997. Fast planning through planning graph analysis. Artificial 

Intelligence 90:pp. 281–300. 

12 

122

2. Fern, A.; Khardon, R.; and Tadepalli, P. 2008. Learning track of the 6th international planning 

competition. In http://eecs.oregonstate.edu/ipc-learn/. 

3. Gerevini, A.; Saetti, A.; and Serina, I. 2003. Planning through stochastic local search and 

temporal action graphs. Journal of Artificial Intelligence Research 20:239–290. 

4. Gerevini, A.; Saetti, A.; and Serina, I. 2008. An approach to efficient planning with numerical 

fluents and multi-criteria plan quality. Artificial Intelligence 172(8-9):899–944. 

5. Gerevini, A.; Saetti, A.; and Vallati, M. 2009. An automatically configurable portfolio-based 

planner with macro-actions: PbP. In Proc. of ICAPS-09. 

6. Hoffmann, J., and Edelkamp, S. 2005. The deterministic part of IPC-4: An overview. Journal 

of Artificial Intelligence Research 24:519–579. 

7. Hutter, F.; Babić, D.; Hoos, H. H.; and Hu, A. J. 2007. Boosting verification by automatic 

tuning of decision procedures. In Formal Methods in Computer-Aided Design, 27–34. IEEE 

CS Press. 

8. Hutter, F.; Hoos, H. H.; Leyton-Brown, K.; and Stützle, T. 2009. ParamILS: An automatic 

algorithm configuration framework. Journal of Artificial Intelligence Research 36:267–306. 

9. Hutter, F.; Hoos, H. H.; and Leyton-Brown, K. 2010. Automated configuration of mixed 

integer programming solvers. In Proc. of CPAIOR-10. 

10. Richter, S. Helmert, M., and Westphal, M. 2007. Landmarks revisited. In Proc. of AAAI-07. 

11. Yoon, S.; Fern, A.; and Givan, R. 2008. Learning control knowledge for forward search 

planning. Journal of Machine Learning Research (JMLR) 9:683–718. 

13 

123

Taking Advantage of Domain Knowledge in 

Optimal Hierarchical Deepening Search Planning 

Pascal Schmidt 1,2 , Florent Teichteil-Königsbuch 1 , and Patrick Fabiani 1 

1 Onera - The French Aerospace Lab 

F-31055, Toulouse, France 

surname.lastname@onera.fr 

2 Université de Toulouse 

F-31000, Toulouse, France 

Abstract. In this paper, we propose a new algorithm, named HDS 

for Hierarchical Deepening Search, to solve large structured classical 

planning problems using the divide and conquer motto. A large majority 

of planning problems can be easily and recursively decomposed 

in many easier subproblems, what is efficiently exploited for instance by 

domain-independent approaches such as landmark techniques or domainknowledge 

formalisms like Hierarchical Task Networks (HTN). We propose 

to exploit domain knowledge in the form of HTNs to guide the 

generation of multiple levels of subgoals during the search. Compared 

with traditional HTN approaches, we rely on task effects and task-level 

heuristics to recursively optimize the plan level-by-level, instead of depthfirst 

non-optimal planning in the network. Higher level plan solutions are 

decomposed into subproblems and refined into finer level plans, which are 

in turn decomposed and refined. Backtracks between levels occur when 

costs of refined plans exceed the expected costs of higher-level plans, 

thus ensuring to produce optimal plans at each level of the hierarchy. 

We demonstrate the relevance of our approach on several well-known 

domains compared with state-of-the-art domain-knowledge planners. 

1 INTRODUCTION 

Automated planning is a field of Artificial Intelligence which aims at automatically 

computing a sequence of actions that lead to some goals from a given initial 

state. Many subareas have been explored, some assuming that effects of actions 

are deterministic [6]. Even in this case, solving realistic problems is challenging 

because finding a solution path may require to explore an exponential number 

of states with regard to the number of state variables. To cope with this combinatorial 

explosion, efficient algorithms have recourse to heuristics, which guide 

the search towards optimistic or approximate solutions. Remarkably, hierarchical 

methods iteratively decompose the planning problem into smaller and much 

simpler ones. 




124

In a vast majority of problems, the planner must deal with constraints, such 

as multiple predefined phases or protocols. Such constraints generally help solving 

the planning problem, because they prune lots of search paths where these 

constraints do not hold. They can be given by an expert of the problem to solve 

— which is often the case in many realistic applications such as military missions 

— or beforehand automatically deduced from the model. In this paper, 

we assume that these constraints are known and given to the planner. We thus 

propose a new method to model and solve a deterministic planning problem, 

based on a hierarchical and heuristic approach and taking advantage of these 

constraints. 

1.1 Intuition on a simple example 

Fig. 1. Path planning graph with high level choice 

We illustrate our idea on a simple navigation problem, but our approach 

pre-eminently targets complex structured problems formalized in a kind of hierarchical 

STRIPS semantics [6]. In the graph of Figure 1, the robot must go from 

A to L. A human operator who see this graph can immediately say that there 

is an important choice to do: go around the wall by the north through G or by 

the south through H, as shown in Figure 1. Therefore, it seems to us interesting 

to solve this problem at coarse grain, using this information to decide where we 

should pass before exploring at fine grain this solution, avoiding to explore the 

non-chosen branch. Refinement of the chosen path into elementary steps may 

question the previous choice, revealing an unseen difficulty. For instance, there 

may be a hole in E discovered when exploring the path via G in details in the 

planning process, forcing the agent to reappraise the choice of this path and 

changing its higher level decision to the path via H. We then replan at coarse 

grain using this new information, until the solution converges. 

Intuitively, this approach consists in making jumps in the state graph, then 

refining these jumps by recursively doing shorter ones until we apply only elementary 

steps. 

1.2 Related work 

The idea of adding domain-dependent control knowledge to help finding a plan 

is wide spread. We can cite TLPlan [1] in which the authors use temporal logic 

2 

125

(LTL) to give properties defining “good” plans (i.e. cheap plans that lead to the 

goal) over a sequence of actions or states (not only the current state). This allows 

for very precise guidance of the search either by checking if the current partial 

plan is correct, or if it may lead to a complete plan that satisfies the formulas. 

Other approaches use what is called procedural knowledge: an operator, who 

writes a planning problem, knows by experience some techniques, some groups 

of actions (and recursively) that achieve a subgoal and knows how to break down 

each goal and subgoal into finer subgoals. Several works are done in this field. In 

Hierarchical Task Networks (HTNs) [4], the global mission is recursively broken 

down into a combination of subtasks, until the planner applies only elementary 

actions. The High-level Actions (HLA) framework [10] differs from HTNs on the 

fact that no recipe is given for the whole mission: the planner has to built the 

first high level plan then refine it the same way as for HTNs. Planning algorithms 

are also associated with the BDI formalism [3]. Our main difference with these 

formalisms and associated planning techniques is that we plan one hierarchical 

level at a time and keep a coherence in the abstraction level of the different tasks 

in each hierarchical plan. Thus, we allow the planner to foresee shortcuts and 

difficulties at each level of the hierarchy, avoiding to plan an elementary step 

without knowing the long-term effect of this step at coarse grain. 

Other works aim at automatically learning some kind of procedural knowledge. 

For instance, Landmarks Planning techniques as used in Lama [12], where 

the planner deduces a set of subgoals from the problem, Macro-FF [2], where 

the planner tries to make groups of actions that have interesting effects, or 

HTN-MAKER [8] where the algorithm tries to generalize tasks by analyzing admissible 

plans. While these works are interesting, they assume that knowledge 

is learned rather given by human experts, what definitely targets applications 

with different design and operational constraints. 

We now present how we extended the HTN formalism to implement our 

contribution, and the algorithm we developed to solve problems expressed in 

this formalism. In a last part, we compare the performances of our planner with 

SHOP2, dynDFS and TLPlan on several planning benchmarks. 

2 FORMALISM 

PDDL planning The goal of “classical” planning is to compute a strategy 

called plan to reach a goal with the exact knowledge of the applicable actions 

and their effects in a completely known world. A problem of classical planning 

is a P = (s0,g,A)wheres0 is the initial state of the world, g the goal to 

reach, defined as a set of states, and A a set of actions. The initial state of the 

world and all the other states of the world are represented by a set of literals 

L describing the world. The goal is defined by a set of literals either true 

or false. If all literals (and their value) are given in the goal description, the 

goal state is unique, otherwise it defines a set of states. Each action is a tuple 

a = (name(a),precond(a),effects(a)), where name(a) is the name of the action, 

precond(a) are the preconditions required on the current state to apply a, and 

3 

126

effects(a) are the modifications done on the current state by the application of 

a. A plan π is a sequence of actions. π is a solution of the problem if by the 

application of all its actions from s0 it leads to g. 

In order to describe planning problems, the PDDL language (presented on [5]) 

and its various extensions are widely used. It is based on the Strips formalism, 

and breaks the problem into two parts, the domain that contains the set of 

actions A, and the problem that contains the initial state of the world s0, the 

goal definition g and the formula to define the cost. 

!"#$%"&'(&') 

*+!,%"&'(&')- !"#$%"&'(&') #"./ 

*+!,%"&'(&') 

0"%"&'(&')- *+!,%"&'(&') 

#"./ 

Fig. 2. Example of HTN 

Expressing hierarchy with HTNs A Hierarchical Task Network (HTN) [4] 

is an extension to classical planning that consists in modeling tasks, that is, 

abstract actions with different methods to break them down. A HTN problem 

is a tuple (s0, g, A, T ), where s0 is the initial state, g is the goal, A the set of 

elementary actions (as above), and T the set of tasks. A task t ∈ T is a set of 

preconditions and a set of methods: t = (precond(t),M(t)) where precond(t) isa 

literal formula that represents the set of states where the task can be performed, 

and M(t) is the set of methods m(t). Each method m(t) defines a possible 

decomposition of the task into subtasks or elementary actions. There are two 

ways of breaking down a task, parallel and sequential. A parallel decomposition 

gives the subtasks the possibility to be executed in parallel whereas a sequential 

decomposition forces the planner to put the subtasks one after the other in the 

given ordering. In most applications, the set of tasks is given by a human expert 

of the domain, and have a significant influence on the performance of the planner. 

A graphical representation of a HTN is shown on Figure 2. Tasks and elementary 

actions are represented in boxes, a horizontal line shows the different 

choices of methods for that task, and slanted bars show the decompositions of 

methods. Sequential decompositions are represented by arrows. We can see here 

a model to solve a path planning problem, where moveTo ?a ?l represents the 

highest level task (the mission) consisting for an agent ?a to reach the location 

4 

127

?l, jumpTo a high level move and goto an elementary step. The void task is the 

termination case, necessary to stop the recursion. 

Meta-effects to link tasks In the standard HTN formalism as defined by [4], 

each task represents a group of methods to achieve a sub-goal, but the planner 

does not have knowledge of the accomplished subtask. Therefore, it is impossible 

to tie up a task after another one without exploring it in details to know its 

effects. In other terms, the standard formalism does not allow for helpful coarse 

grain exploration of the problem. In order to use HTN tasks directly as macrooperators 

at any level, the first extension we need to add to the HTN formalism 

must give the planner the ability to know the effect of a task. 

In order to do so, we introduce meta-effects for HTNs. These meta-effects 

are attached to tasks like effects are attached to elementary actions. This allows 

the planner to get a knowledge of the main effects of a task and to assemble 

high-level tasks to make a high level plan. With that knowledge, it will be able 

to check the pre-conditions of the next task and compute a high-level heuristic. 

Our task t is now a set of preconditions, a set of methods and a set of effects: 

t = (precond(t),M(t), effects(t)). 

Here is the BNF of the meta-effects in PDDL language: 

::= ":metaEffect" 

::= 

|"(not" ")" 

|"(forall" "(" ")" ")" 

|"(when" ")" 

|"(and" + ")" 

::= 

|"(assign" ")" 

|"(increase" ")" 

|"(decrease" ")" 

where is a function, an expression, a 

boolean expression, a boolean condition and a list of 

typed variables. 

!"#$%"&'(&') 

*+!,%"&'(&')- !"#$%"&'(&') #"./ 

*+!,%"&'(&') 

(1&'(&') 

(1&'(&') 

0"%"&'(&')- *+!,%"&'(&') 

#"./ 

Fig. 3. Meta-effects in HTN 

5 

128

An example is shown on Figure 3. We give meta-effects to high-level tasks 

moveTo and jumpTo that give the result of the task, i.e. the position of the 

robot at the end of the task. These meta-effects are written in a rounded box 

on the graphic representations of HTN. Meta-effects can be more or less precise 

depending on which points are considered as relevant by the expert. Here, an 

estimated cost of a move or a jump, computed with euclidean distance from the 

starting point of the task to its destination, can be associated to the meta-effect 

if the underlying planner can deal with it. In PDDL, this example is written as: 

:metaEffect 

(and 

(increase (cost) (dist (at ?a) ?l)) 

(assign (at ?a) ?l) 

) 

The level of precision of the meta-effects have an important influence on 

the planning process: if they are exhaustive with respect to the effects of the 

underlying actions, then the effects of the task are totally predictable, and the 

choice of a given task will not need to be reconsidered. Contrary to the works 

by [10], who also define meta-effects, our effects are generally not complete, i.e. 

some numerical estimations of the final state are not well evaluated and some 

predicate changes are not present. This simplification allows us to use metaeffects 

the same way as normal effects in any forward planning algorithm. 

Inspired by admissible heuristics in classical planning, we define optimistic 

meta-effects such that the long term cost of a meta-effect is lower than the real 

one. 

Macro-tasks to avoid recursion Another weakness of standard HTNs concerns 

the modeling of methods that must be decomposed into an unknown number 

of subtasks (determined at planning time). For instance, consider our navigation 

graph of Figure 2. To break down the jumpTo task, we need to recursively 

write that jumpTo is a sequence of one goTo to a given point next to the starting 

point, followed by a jumpTo from there to the goal. 

This may cause several problems. For modelers and readers who do not have 

expert programming skills, it is not very intuitive to break this task down using 

recursion. One must deal with termination cases, or ask oneself if he would rather 

use right or left recursion. Most importantly, task recursion is incoherent with 

our idea of doing jumps in the state graph. At high level of hierarchy, the planner 

tries to plan a moveTo by refining it into a jumpTo and a moveTo. Atthenext 

level, the plan will be a goTo then a jumpTo, then another jumpTo and a moveTo. 

That is, the computed plan does not have any consistence in terms of hierarchy. 

Thus, we introduce macro-tasks as another extension in the spirit of regular 

expressions. The aim is to break down a task into an unknown number of 

subtasks that are all at the same level in the hierarchy. A method m is now defined 

as a precondition and a macro-task : m = (precond(m), macroTask(m)). 

A macro-task is recursively defined by several alternatives that express how to 

group subtasks together, the terminal case being a single subtask: 

– ordered: subtasks are executed in sequence; 

6 

129

– multseq: a subtask is executed an unknown number of times until a final 

condition is met; 

– optional: a subtask is executed only if needed; 

– pickOne: a subtask is executed, where the value of a variable satisfying a 

given constraint is set. If no variable satisfies the constraint, the current 

planning branch is considered as a deadend and the planner backtracks. 

Macro-tasks are lazily refined by the parser, so that the planner’s algorithm 

described in the next section needs to only assume that macro-tasks are sequences 

of subtasks. 

The definition of the grammar is the following: 

::= 

| ordered * 

| multseq until 

| optional 

| pickOne 

where is a list of variables, a subtask with its parameters 

and a boolean test. 

!"#$%"&'(&') 

0 

*+!,%"&'(&')- 

0 

."%"&'(&')-* 

(/&'(&') 

(/&'(&')- 

Fig. 4. Macro-task example 

An example of this extension is presented on Figure 4. Compared with Figure 

3, the model is simpler and more understandable, and above all, all different 

occurrences of a same task are all at the same level of the hierarchy. 

The PDDL decomposition of jumpTo is written as: 

:subtasks 

(:multseq 

(:pickOne (?l1 - loc) (isElem (at ?a) ?l1) 

(goTo ?a ?l1) 

) 

until (= (at ?a) ?l) 

) 

where loc is a type representing a location and isElem is a boolean function 

that tests if the path between its two arguments is elementary and possible or 

not. The keyword multseq represents a set of subtasks with an unknown arity, 

and pickOne defines the point of choice of a given variable. 

7 

130

3 ALGORITHM DESCRIPTION 

In this section, we present an algorithm that is able to solve any problem expressed 

in the previous formalism. The main idea of this algorithm, named HDS 

for Hierarchical Deepening Search, consists in computing first a plan with a low 

level of precision, then using this plan as a guide to compute a more precise plan, 

until we obtain a detailed plan that contains only elementary actions. At each 

step, the algorithm backtracks to a previous higher level plan if the cost of the 

current plan is higher than the expected quality of the lower-level plans. 

3.1 Using a lower precision plan as a guide 

Let n be a given level of the hierarchy. We assume first that complete plans 

have been constructed for all upper levels including n. The by-level planner uses 

the macro-tasks at level n + 1 and the plan Pn at level n to compute a higher 

precision plan Pn+1 that solves level n + 1. We can use any forward planning 

algorithm, for instance A ∗ , slightly modified to handle constraints from Pn in 

its exploration. 

The first idea is that the by-level planner uses all tasks (actions and macrotasks) 

as elementary actions, using the meta-effects of macro-tasks as normal 

effects. The second idea consists in keeping track, for each state, of its position 

in the HTN by means of an extended state σ := (σ.s, σ.p), composed of the 

state σ.s and the position σ.p in the HTN. Using this extended state, we can 

significantly restrict the branching factor: in each state, we pick-up actions that 

can be applied according to the position σ.p in the HTN among the ones whose 

preconditions are satisfied in state σ.s. This is quite similar to works by [9] and 

[10], except that we run this algorithm at each level of the hierarchy, not only 

the finest one. 

We initialize the forward planner with a root node containing the initial state 

of the problem. In each state that explored by the forward planning algorithm, we 

look at the position in the upper plan and the possible solutions proposed by the 

method decomposition of the upper task. Among all of these solutions, we keep 

only the applicable ones according to the current state and the preconditions. 

The possible sons are defined by the upper plan and the methods of the metaactions. 

We keep track of the current task of the upper plan and the current 

position in the methods decomposition of the task. According to the position in 

the higher plan, we have different branching possibilities: 

– if just entered a primitive action: apply it in the new plan and go to the next 

task, 

– if just entered a meta-action: sons are the different acceptable methods according 

to their preconditions, 

– if ordered/parallel: apply in sequence/parallel the different sub-tasks, without 

choices, 

– if optional: develop two sons, one with the optional subtask inserted, one 

without, 

8 

131

– if multseq/multpar: develop in sequence/parallel the subtask until the end 

condition is true 

– if pickone: develop all sons with all combination of variables accepted by the 

condition, 

– if face a subtask: check the preconditions, if true apply the effects, otherwise 

declare the branch as dead-end. 

Using an algorithm inspired from A* for this forward planner, we have the 

algorithm 1. As in A*, we maintain a planning tree for each level of hierarchy. 

This tree contains at each node: 

– a state (node.σ) 

– the lowest cost to reach it (node.cost), 

– an estimation of the cost to reach the goal (computed by an heuristic) 

(node.estim) 

– and the sons of this node (node.sons), that is, the reachable states according 

to the different applicable and acceptable actions. 

The algorithm rides recursively through this tree, choosing at each node 

its most promising son, that is, the one with the lowest sum of its cost and its 

estimated cost (line 22). Once a tip node is reached, that is, a node without sons, 

the algorithm applies all the applicable and acceptable actions from this state 

(line 8), and affects the resulting states as sons for the current node (line 14). 

The algorithm stops when the goal set has been reached or when it is established 

that the problem has no solution, that is, when no more node can be 

developed. It then extracts the plan from the A* tree (∅ if the problem has no 

solution) (line 5). 

3.2 Links between levels 

We present now (see Algorithm 2) how we construct the complete hierarchical 

plan (defined at all levels of the hierarchy) by refining or backtracking between 

plans iteratively constructed by the by-level planner. 

We initialize the planner with the init task of the problem (line 2), that is 

used by the by-level algorithm as a guide to compute the first plan. Then we 

keep an instance of the by-level planner for each level. The by-level planner is 

launched using the plan extracted from the upper level (line 5). 

Then, once the currently lowest level (lets call it n) by-level planner ends its 

work, we call a plan updater (line 10) on the higher level plans. This updater 

reports the actual best estimated cost to the final node of the upper by-level 

plan (at level k, k < n). By propagating this cost to the whole best branch of 

the planning structure, the updater will be able to determine if the better plan 

is still the same or not(line 10). This updater is called on each plan, from level 

n − 1 to level 0 (lines 8 to 13). At each step, the current evaluation of the best 

possible plan is reevaluated and used for the directly upper plan. The planning 

sequence starts again at the coarser level where the best plan has changed. 

We continue propagating the new cost estimation towards the coarsest plan. 

For each level, we note if the plan has been questioned or not. We then restart 

the computation at the coarsest level which has been questioned. 

9 

132

Algorithm 1: Astar by-level planner: 

1 begin runAStar(root,Pn) 

2 goalReached ← false; 

3 while root.cost< ∞∧¬goalReached do 

4 goalReached ← aStarRecPlanner(node,Pn); 

5 return extractPlan(root); 

6 begin aStarRecPlanner(node,Pn) 

7 if node.sons = ∅ then 

8 Ast ← next(node.σ.p) ∩ acceptable(node.σ.s); 

9 forall the a ∈ Ast do 

10 node’.σ.s ← apply(a.effects, node.σ.s); 

11 node’.σ.p ← track(node.σ.p, a, Pn+1); 

12 node’.cost ← node.cost + cost(node.σ.s, a, node’.σ.s); 

13 node’.estim ← heurist(node’.σ); 

14 node.sons ← node.sons ∪ {node’}; 

15 if node.sons = ∅ then node.estim = ∞; 

16 goalReached ← false; 

17 else 

18 if satisfies(node.σ.s,goal) then 

19 goalReached ← true; 

20 else 

21 node’ ← argminn’∈node.sons(n’.cost+n’.estim); 

22 goalReached ← aStarRecPlanner(node’); 

23 c ← minn’∈node.sons(n’.cost+n’.estim); 

24 node.estim ← c-node.cost; 

25 return goalReached; 

This algorithm terminates if it finds a plan containing only elementary tasks 

and which is not invalidated by the upper by-level planners, that is when the final 

solution is found, or when the estimated cost for the highest level by-level planner 

reaches infinity, that is, when it is estimated that no plan can be computed to 

reach the goal with the given decomposition. 

3.3 HDS Properties 

HDS properties first rely on the HTN and its meta-effects. If the solution (resp. 

optimal solution) is not reachable through the HTN, HDS will not be able to find 

any solution (resp. the optimal solution) of the problem. Assuming the HTN is 

well written, i.e. the optimal solution is reachable, the planner may still consider 

an intermediate solution that does not allow the planner to reach the optimal 

solution if the meta-effects are not optimistic (i.e. their long-term cost are higher 

than the real cost) ; the backtrack process will not be able to detect it. 

Second, the properties of our algorithm depends on the properties of the bylevel 

planner used: 

10 

133

To implement our by-level planner, we chose the Dijkstra algorithm, modified 

to use macro-tasks and information from upper plan. Even if there exists far more 

efficient algorithms in the literature, we chose to implement a very simple and 

quite naive by-level planner in order to highlight the relevance of our global 

Hierarchical Deepening Search approach (efficiency does not come from the bylevel 

planner but from our general framework). Along the same line, we do not 

use generic heuristics, such as Hmax or Hadd [7]. Without these heuristics, we 

have much less constraints on the formalism, and our planner accepts object 

functions, i.e. functions that return an object instead of just a number. We can 

also use non linear functions or effects. 

4.2 Comparisons with other planners 

Our planner HDS is optimal given an HTN decomposition on the problem. We 

compared HDS with TLPlan [1], a non optimal domain-dependent planner based 

on LTL temporal logic; with dynDFS [11], a domain-dependent optimal temporal 

planner based on the Timelines formalism; and with SHOP2 configured to 

find the optimal solution, which is a successful HTN planner (without metaeffects). 

The first three tracks presented in the next are from the IPC3 planning 

competition. All planners were allowed 2Gb of RAM and 10 minutes to plan 

each problem on a 3GHz Intel processor. 

Satellite. The Satellite STRIPS domain comes from IPC3 where a fleet of satellites 

have to take pictures of various events with various instruments. In the 

STRIPS version, each action has a time cost of one. The aim is to minimize 

the total time of the mission. Parallelism between satellites is authorized. In 

this domain, we compared HDS with dynDFS and with TLPlan. SHOP2 is not 

presented here as we did not have HTNs for SHOP2 on this track. 

Figure 5 presents planning times and costs for the different planners. Since 

parallelism of tasks is not yet available on HDS, our HTN decomposition is 

quite weak, including only tasks to initialize a sensor (turn towards an acceptable 

ground station then switch on the sensor and calibrate it) and to take a 

picture (turn to event then take picture), and cannot really take advantage of 

the hierarchical framework. 

HDS performances are similar to dynDFS, which is specialized in parallelism, 

but can solve less problems. In particular, HDS finds optimal plans for the problem 

that it could solve. As a reference, we report also the costs found by the 

domain-independent planner Lama [12] which, as TLPlan, cannot take advantages 

of parallelism in term of costs. Lama was set in the optimizing mode, and 

even if it does not always get the optimal cost (in a non parallelized plan), it 

can often find a much better solution than the one found by TLPlan, showing 

that TLPlan solutions are far from the optimal ones (TLPlan did not take advantage 

of parallelism and get high costs as soon as the problem has more than 

one satellite). 

12 

135

Fig. 5. Satellite 

Fig. 6. HDS vs SHOP2 

Freecell. This domain comes also from IPC3 and is inspired from by the famous 

Microsoft Windows game. We compared ourselves with SHOP2, giving SHOP2 

exactly the same HTNs as the ones given to HDS, except meta-effects and macrotasks 

that cannot be handled by SHOP2. We configured these HTNs to find the 

optimal solution at the finest level of the hierarchy. We forced the planners to 

send to the home location all unneeded cards, as automatically done in the 

Windows game. We provided another method to move a block of cards of the 

13 

136

same column if enough free cells are available. Figure 6 presents planning times 

for both planners. Costs are not plotted since HDS is optimal and SHOP2 is 

configured here to be optimal. SHOP2 is able to solve only the first problem, 

whereas HDS can solve the seven first ones. Additionally to meta-effects and 

macro-tasks, this difference can be due to several other factors: Lisp is far less 

efficient than OCaml and HDS can deal with more abstract functions and denser 

problem descriptions than SHOP2, leading to more efficient computation. 

Zeno Traveler. In this domain also extracted from IPC3, the planner has to 

make people reach their destination by plane. The planes have two speed modes: 

slow and zoom. Slow consumes far less fuel than zoom, but it is far slower. In 

the numeric version, as each move only takes one time step, zoom is never the 

best solution. Like Satellite, the lack of parallelism between tasks in our model 

restricts the capacities of HDS. We only gave as knowledge the information that 

the plane can only go to destinations where someone is waiting or where someone 

needs to go, and that someone already at his destination is not allowed to board 

the plane. 

We compared HDS with SHOP2 in its optimal mode, using its IPC3 HTN 

decomposition. We can see on Figure 6 the advantage of HDS: in the first two 

problems, with just one plane, HDS computation time is around 20 milliseconds, 

whereas SHOP2 computation time is around 200 milliseconds. For the other 

problems, the planner must choose among multiple planes, and even if parallelism 

is not really taken into account, SHOP2 cannot solve any of these problems, 

whereas HDS can solve the third problem in 14 seconds. 

Explore and Guide. This domain particularly puts in evidence the advantages 

of our approach. The goal is, for a helicopter, to drive back intruders to the 

border, having explored their known exit path in order to ensure that no trap 

is present. In this problem, non concurrent high-level tasks are easy to identify 

and their effects and costs are well approximated. Once the highest-level plan 

is computed, it is very helpful for the computation of the exploration strategy, 

that is split into sub-zones by an expert. 

We gave to SHOP2 exactly the same HTNs as to HDS, except meta-effects 

and macro-tasks. The main algorithmic difference is that HDS first explores 

at low precision and then refines this plan (with backtracks), whereas SHOP2 

directly explores at the finest precision level. Both planners return the same 

optimal solution for each problem. The results are presented in Figure 6, where 

we can see that HDS is still between one and two orders of magnitude quicker 

than SHOP2. 

5 CONCLUSION AND FUTURE WORKS 

In this paper, we proposed to use both macro-operator techniques and procedural 

control knowledge within the same informed planning framework. We 

introduced the meta-effects and macro-tasks extensions to the HTN formalism, 

14 

137

allowing us to jump forward in the state graph. We also proposed an algorithm 

named HDS that explores, level by level, such a structure, thus detecting traps 

and optimizing an abstract plan before refining it into a precise executable plan, 

backtracking to another high-level solution if necessary. We furthermore proved 

that HDS, thanks to the proposed extensions to the HTN formalism, is very 

efficient and optimal given the decomposition on structured problems. This contribution 

provides an assistance to write large planning problems using domain 

expertise, and to reduce the complexity of the underlying planning algorithm. 

The required domain expertise can be also automatically extracted from the 

model, and used in our approach. 

In a close future, we plan to implement real parallelism, not only in our model 

but also in our planner. We expect gains by introducing more human expertise 

in the domains and better performances on some problems. Since our algorithmic 

approach is quite generic, especially concerning the by-level planner, we 

plan to extend our contribution to other planning schemes, such as probabilistic 

planning, using a forward MDP by-level planner. 

References 

1. F. Bacchus and F. Kabanza. Using temporal logics to express search control knowledge 

for planning. Artificial Intelligence, 2000. 

2. A. Botea, M. Enzenberger, M. Müller, and J. Schaeffer. Macro-ff: Improving ai 

planning with automatically learned macro-operators. Journal of Artificial Intelligence 

Research, 24:581–621,2005. 

3. L. de Silva, S. Sardina, and L. Padgham. First principles planning in bdi systems. 

In Autonomous Agents and Multiagent Systems (AAMAS-09), 2009. 

4. K. Erol, J. Hendler, and D. S. Nau. HTN planning: complexity and expressivity. In 

AAAI’94: Proceedings of the twelfth national conference on Artificial intelligence 

(vol. 2), pages1123–1128,1994. 

5. M. Fox and D. Long. Pddl2.1: An extension to pddl for expressing temporal 

planning domains. Journal of Artificial Intelligence Research, 20:2003,2003. 

6. M. Ghallab, D. Nau, and P. Traverso. Automated Planning. Morgan Kaufmann, 

San Francisco, CA, USA, 2004. 

7. P. Haslum and H. Geffner. Admissible heuristics for optimal planning. pages 

140–149. AAAI Press, 2000. 

8. C. Hogg, H. Muñoz-Avila, and U. Kuter. Htn-maker: Learning htns with minimal 

additional knowledge engineering required. In Association for the Advancement of 

Artificial Intelligence (AAAI-08), 2008. 

9. U. Kuter and D. S. Nau. Using domain-configurable search control for probabilistic 

planning. In AAAI, Pittsburgh, Pennsylvania, USA, July 2005. 

10. B. Marthi, S. Russel, and J. Wolfe. Angelic hierarchical planning: Optimal and online 

algorithms. In International Conference on Automated Planning and Scheduling 

(ICAPS-08), 2008. 

11. C. Pralet and G. Verfaillie. Using constraint networks on timelines to model and 

solve planning and scheduling problems. In Proc. ICAPS, 2008. 

12. S. Richter, M. Helmert, and M. Westphal. Landmarks revisited. In 23rd AAAI 

Conference on Artificial Intelligence (AAAI-08), 2008. 

15 

138

Solving Disjunctive Temporal Problems with 

Preferences using Boolean Optimization solvers 

Marco Maratea 1 ,MaurizioPianfetti 1 ,andLucaPulina 2 

1 DIST, University of Genova, Viale F. Causa 15, Genova, Italy. 

marco@dist.unige.it,maurizio.pianfetti@studenti.ingegneria.unige.it 

2 DEIS, University of Sassari, Piazza Università 11, Sassari, Italy. 

lpulina@uniss.it 

Abstract. The Disjunctive Temporal Problem (DTP), which involves Boolean 

combination of difference constraints of the form x − y ≤ c, is an expressive 

framework for constraints modeling and processing. When a DTP is unfeasible 

we may want to select a feasible subset of its DTP constraints (i.e., disjunctions 

of difference constraints), possibly subject to some degree of satisfaction: The 

Max-DTP extends DTP by associating a preference, in the form of weight, to 

each DTP constraint for its satisfaction, and the goal is to find an assignment to 

its variables that maximizes the sum of weights of satisfied DTP constraints. In 

this paper we first present an approach based on Boolean optimization solvers to 

solve Max-DTPs. Then, we implement our ideas in TSAT++, an efficient DTP 

solver, and evaluate its performance on randomly generated Max-DTPs, using 

both different Boolean optimization solvers and two optimization techniques. 


The Disjunctive Temporal Problem (DTP), introduced in [8], is defined as the finite 

conjunction of DTP constraints, each DTP constraint being a finite disjunction of difference 

constraints of the form x−y ≤ c, where x and y are arithmetic variables ranging 

over a domain of interpretation (the set of real numbers R or the set of integers Z), and 

c is a numeric constant. The goal is to find an assignment to the variables of the problem 

that satisfies all DTP constraints. The DTP is recognized to be a good compromise 

between expressivity and efficiency, given that the arithmetic consistency of a set of 

difference constraints can be checked in polynomial time, and has found applications 

in many areas such as planning, scheduling, hardware and software verification, see, 

e.g., [19, 5]. Along the years several systems that can solve DTPs have been developed, 

e.g., SK [21], TSAT [2], CSPI [18], EPILITIS [23], TSAT++ [4], and MATHSAT [5]. 

Moreover, the competition of solvers for Satisfiability Modulo Theories (SMT-COMP) 3 

has two logics that include DTPs (called QF RDL and QF IDL, respectively). 

When a DTP is unfeasible, i.e., unsatisfiable, we may want to select a feasible subset 

of its DTP constraints, which can be possibly subject to some degree of satisfaction: 

The maximum satisfiability problem on a DTP (i.e., Max-DTP) extends DTP by associating 

a preference, in the form of cost, or weight, to each DTP constraint for taking 




3 http://www.smtcomp.org/. 

139

into account what the reward for the DTP constraint’s satisfaction. The goal is to find an 

assignment to the variables of the problem that maximizes thesumoftherewardofsatisfied 

DTP constraints. The introduction of preferences in DTPs has been first presented 

in [20], where complex preferences can be assigned to each difference constraint. 

In this paper we present an approach which extends the lazy SAT-based approach 

implemented in solvers for DTPs. The idea is to (i) abstract a Max-DTP P into a Conjunctive 

Normal Form (CNF) formula φ and an optimization function f; (ii) find a 

solution for ϕ under f with a Boolean optimization solver; and (iii) verify if the solution 

returned is consistent. Step (ii) can be implemented with a variety of approaches 

and solvers, ranging from Max-SAT 4 and Pseudo-Boolean (PB) 5 ,toAnswerSetProgramming 

(ASP) [12, 13]. Then, we implement our ideas by modifying the DTP solver 

TSAT++, a well-known and efficient solver for solving DTPs, and call the resulting 

system TSAT#. We finally evaluate its performance on randomly generatedDTPs, 

using a well-known generation method from [21] extended with randomlygenerated 

weights. We focus our analysis on TSAT#, as representative of thesolversimplementing 

the lazy SAT-based approach to DTPs, and consider Max-SAT andPBsolversas 

back-engines, as well as two optimization techniques that proved effective for solving 

DTPs. Our preliminary results show that the Max-SAT solver AKMAXSAT performs 

well on these benchmarks, and that the employed optimization techniqueshelptoreducing 

the search time. 

2 Formal Background 

Disjunctive Temporal Problems. Temporal constraints have been introduced in [8], as 

an extension of the Simple Temporal Problem (STP), which consists of conjunction 

of different constraints. Let V be a set of symbols, called variables. Adifference constraint, 

or simply constraint is an expression of the form x − y ≤ c, wherex, y ∈ V, 

and c is a numeric constant. A DTP formula, orsimplyformula, isacombinationof 

constraints via the unary connective “¬” fornegationandthen-ary connectives “∧” 

and “∨” (n ≥ 0) forconjunctionanddisjunction,respectively.Aconstraint literal, or 

simply literal, iseitheraconstraintoritsnegation.Ifa is a constraint, then a abbreviates 

¬a and ¬a stands for a.LetthesetD (domain of interpretation)beeitherthesetof 

the real numbers R, orthesetofintegersZ. Anassignment is a total function mapping 

variables to D. Letσ be an assignment and φ be a formula. Then σ |= φ (σ satisfies a 

formula φ)isdefinedasfollows. 

σ |= x − y ≤ c if and only if σ(x) − σ(y) ≤ c, 

σ |= ¬φ if and only if it is not the case that σ |= φ, 

σ |= (∧ n i=1 φi) if and only if for each i ∈ [1,n], σ |= φi, and 

σ |= (∨ n i=1 φi) if and only if for some i ∈ [1,n], σ |= φi. 

If σ |= φ then σ will also be called a model of φ. Wealsosaythataformulaφ is 

satisfiable if and only if there exists a model for it. 

4 http://www.maxsat.udl.cat/. 

5 See, e.g., http://www.cril.univ-artois.fr/PB10/. 

140 

2

ADTPistheproblemofdecidingwhetheraformulaissatisfiable or not in the 

given domain of interpretation D. Noticethatthesatisfiabilityofaformuladependson 

D, e.g.,theformulax − y>0 ∧ x − y

It is a well known fact that BF can be used in step 3 to check the satisfiability of a 

finite set Q of constraints of the form x − y ≤ c. Thisisdonebyfirstbuildingaconstraint 

graph for Q, see,e.g.,[6].Thesoundnessandcompletenessofthealgorithm is 

guaranteed by the soundness and completeness of the underlying solving procedure for 

solving DTPs, i.e., the solving procedure for φ (showed in [4]), and from the soundness 

and completeness of the Boolean optimization procedures employed. For solving step 

2 awiderangeofformulations,solvingproceduresandtechniques can be employed, 

e.g., Weighted Max-SAT, Pseudo-Boolean, and ASP. 

In the next paragraphs we present two optimization techniques that can help to improve 

the performance of the basic algorithm presented in this section. However, there 

is an optimization to the basic procedure that it is set by default in TSAT#: If the consistency 

check at step 3 does not succeed, we add a “reason” to the abstraction formula 

ϕ,i.e.,aclausethatpreventsthesolveremployedtore-compute an assignment µ having 

the literals corresponding to constraint literals that caused the arithmetic inconsistency 

assigned in the same way. Given the BF, computing such reason can be done efficiently, 

by considering the difference constraints involved in (one of the) negative cycles. Of 

course, methods for limiting the number of added reasons is needed, in order to let 

the procedure to still working in polynomial space, e.g., given a positive integer b, by 

adding only the reasons that contain a number of literals lessorequalthanb. 

Optimizations. We herewith highlight two optimization techniques, one theory dependent 

and one theory independent, that proved to be effective for solving DTPs, and that 

can be fruitfully used with black-box engines. Their general ideaistoreducetheenumeration 

of unfruitful assignments at a reasonable price. The first one, denoted with 

IS2,isapreprocessingstep:Foreachunorderedpair〈ci,cj〉 of distinct difference constraints 

appearing in the formula φ and involving the same variables, all possible pairs 

of literals built out of them are checked for consistency. Assuming ci and cj are inconsistent, 

the constraint ci ∨ cj is added to the input formula before calling TSAT#. The 

second technique, called “model reduction”, is based on the observation that an assignment 

µ generated by TSAT# can be redundant, that is, there might exist an assignment 

µ ′ ⊂ µ that propositionally entails the input formula. When this is the case, we can 

check the consistency of µ ′ instead µ. Detailsforbothtechniquescanbefoundin[4]. 

4 Implementation and Experimental Analysis 

We have implemented TSAT# as an extension of the TSAT++ solver[3],byintegrating 

some Max-SAT and PB solvers as back-engines for reasoning aboutBooleanoptimization 

problems. Specifically, the employed solvers are: MINIMAXSAT ver. 1.0 [14], 

MINISAT+ [9]ver.1.14,andAKMAXSAT [15], the version submitted to the last Max- 

SAT 2010 Competition. These are well-known solvers for Boolean optimization and 

among the best Partial Weighted Max-SAT 6 and Pseudo-Boolean (focusing on the OPT- 

SMALL-INT 7 category) solvers, in various Max-SAT and PB Evaluations andCompe- 

6 The “partial” version of the problem, where both hard and soft clauses are present, is needed 

because the original clauses of the abstracted problem are soft, while the added ones are hard. 

7 We remind that this is a category of PB Evaluations and Competitions where (i) no constraint 

has a sum of coefficients greater than 2 20 (20 bits), and (ii) the objective function is linear. 

142 

4

titions. We remind that MINISAT+ isaPBsolver,AKMAXSAT is a Max-SAT solver, 

while MINIMAXSAT accepts problems in both formalisms: Given it has been mainly 

evaluated on Max-SAT formulations, we rely on such format in our analysis. Given a 

CNF formula ϕ, andafunctionw 8 mapping each clause to a positive integer number 

that represents its weight, the main implementation part has been devoted to introduce 

weights and formulate the optimization problems in Max-SAT and PB formats. For 

Max-SAT problems, there is an immediate formulation by directly assigning weights 

to clauses, while PB problems need “clause selectors” to be added to each soft clause, 

and the optimization function to be defined over the clause selectors. In the following, 

given a Boolean optimization solver X, 

1. TSAT#(X) is TSAT# in plain configuration employing X for step 2; 

2. TSAT#+p(X) is TSAT# with model reduction enabled employing X for step 2; 

3. TSAT#+is(X) is TSAT# with IS2 preprocessing enabled employing X for step 2; 

4. TSAT#+is+p(X) is TSAT# with both model reduction and IS2 preprocessing enabled 

employing X for step 2. 

About the benchmarks, we randomly generated Max-DTPs, using awell-known 

generation method from [21] extended with random weights. Inparticular,inourmodel 

Max-DTPs are randomly generated by fixing the number k of disjuncts per clause, the 

number n of arithmetic variables, a positive integer L such that all the constants are 

taken in [−L, L], andapositiveintegerw such that all the weights are taken in [1,w]. 

Then, (i) the number of clauses m is increased to create bigger problems, (ii) for each 

tuple of values of the parameters, 10 instances are generatedandthenfedtothesolvers, 

and (iii) the median of the CPU times is plotted against the m/n ratio. We fix k =2, 

L =100, w =100, n =5, 10, andtheratiom/n varying from 6 to 10. The lower 

bound of m/n has been fixed as the lower positive integer for which there is amajority 

of unsatisfiable underlying DTPs. Further note that the DTP is alreadya“difficult” 

problem, and the analysis in literature on DTPs have been performed on problems with 

few tens of variables for the setting used in this paper 9 :addingpreferencesfurther 

increase the difficulty. The timeout for each problem has been set to 1800s on a Linux 

box equipped with a Pentium IV 3.2GHz processor and 1GB of RAM. 

Fig. 1 shows the results for TSAT# employing MINIMAXSAT (top), MINISAT+ 

(middle) and AKMAXSAT (bottom), respectively, on randomly generated Max-DTPs 

with n = 5. For each plot, the left one considers real-valued variables, while the 

right one considers integer-valued variables. First, we can notethattheoptimization 

techniques described help to significantly improve the efficiency: Enhancing the plain 

TSAT# version with one of the technique helps reducing the overall CPU time of 

around a factor of 3, whileenablingbothtechniquesinconjunctionimprovestheperformance 

of around one order of magnitude. Comparing the performance of the various 

Boolean optimization solvers employed, AKMAXSAT is clearly the best underlying 

8 w has been originally defined on DTPs. After the abstraction, there is a one-to-one correspondence 

between each DTP constraint and the related clause in ϕ. Thus, with a slight abuse of 

notation, we consider the optimization function to be defined on the clauses of ϕ. 

9 In [16], DTPs with many variables n are used, but the analysis is focused on problems with 

k>2 having ratios m/n such that a vast majority of instances are satisfiable. 

143 

5

meadin CPU time 

median CPU time 


110 

100 

90 

80 

70 

60 

50 

40 

30 

20 

10 

TSAT#(MiniMaxSAT) 

TSAT#+p(MiniMaxSAT) 

TSAT#+is(MiniMaxSAT) 

TSAT#+is+p(MiniMaxSAT) 

5 real-valued variables 

0 

30 35 40 

m/n 

45 50 

300 

250 

200 

150 

100 

10 

50 

0 

30 35 40 

m/n 

45 50 

9 

8 

7 

6 

5 

4 

3 

2 

1 

TSAT#(MiniSAT+) 

TSAT#+p(MiniSAT+) 

TSAT#+is(MiniSAT+) 

TSAT#+is+p(MiniSAT+) 

TSAT#(akmaxsat) 

TSAT#+p(akmaxsat) 

TSAT#+is(akmaxsat) 

TSAT#+is+p(akmaxsat) 



0 

30 35 40 

m/n 

45 50 




100 

90 

80 

70 

60 

50 

40 

30 

20 

10 

TSAT#(MiniMaxSAT) 

TSAT#+p(MiniMaxSAT) 

TSAT#+is(MiniMaxSAT) 

TSAT#+is+p(MiniMaxSAT) 

5 integer-valued variables 

0 

30 35 40 

m/n 

45 50 

300 

250 

200 

150 

100 

10 

50 

TSAT#(MiniSAT+) 

TSAT#+p(MiniSAT+) 

TSAT#+is(MiniSAT+) 

TSAT#+is+p(MiniSAT+) 


0 

30 35 40 

m/n 

45 50 

9 

8 

7 

6 

5 

4 

3 

2 

1 






0 

30 35 40 

m/n 

45 50 

Fig. 1. Results of TSAT# employing MINIMAXSAT (top), MINISAT+ (middle), and AKMAXSAT 

(bottom) on random Max-DTPs with 5 real-valued (left) and integer-valued (right) variables. 

solver on these benchmarks among the ones analyzed: On the biggest instances, it gains 

around one order of magnitude over MINIMAXSAT, andmorethanafactorof20 over 

MINISAT+. All considerations made hold with both real- and integer-valued variables. 

The intuition for the superior performance of AKMAXSAT is that, given the results 

of the 2010 Max-SAT Competition, AKMAXSAT seems to be very effective on randomly 

generated and synthetic benchmarks, Given our approach, the starting abstraction formula 

has the following structure: It is a fixed-length formula where each variable occurs 

once (with high probability) in the formula. But this was not fully expected: During the 

144 

6


1800 

1600 

1400 

1200 

1000 

800 

600 

400 

200 






0 

60 70 80 

m/n 

90 100 


1800 

1600 

1400 

1200 

1000 

800 

600 

400 

200 






0 

60 70 80 

m/n 

90 100 

Fig. 2. Results of TSAT# employing AKMAXSAT on random Max-DTPs with 10 real-valued 

(Left) and integer-valued (Right) variables. 

search, adding constraints corresponding to reasons, the number of occurrences of literals 

increase, giving to the formula a less “synthetic” structure. 

We increase the number of variables n to 10, andfocustheanalysisonTSAT# 

employing AKMAXSAT,i.e.,ourbestBooleanoptimizationsolveronthesebenchmarks. 

From Fig. 2 we can note that the impact of the optimization techniques is now different: 

Model reduction improves dramatically the performance of TSAT#(AKMAXSAT), while 

the impact of the preprocessing is limited with this setting. 

5 Conclusionsand FutureWork 

In this paper we have presented an approach to solving weighted maximum satisfiability 

on DTPs. The approach extends the one implemented in TSAT++, by employing 

Boolean optimization solvers as reasoning engines. The performance of the resulting 

system, TSAT#, employing some Max-SAT and PB solvers are analyzed on randomly 

generated benchmarks, together with the impact that both theory dependent and independent 

optimization techniques have on its performance. The AKMAXSAT Max-SAT 

solver is the best option among the solvers analyzed, and both optimizationtechniques 

help to improve the overall performance. Current research includes (i) the integration 

of other solvers in TSAT#, e.g., the ASP solver CLASP [11], that have proved to be 

very competitive at the 2009 PB Competition, (ii) the extension of our algorithm to 

deal with other forms of preferences, e.g., where weights canbeassociatedtoeachdifferent 

constraint, for which it is still possible to rely on PB andASPformalismsand 

systems, and (iii) acomparativeanalysiswithrivaltoolsforwhichsuchmorecomplex 

preferences can be easily specified, e.g. MAXILITIS and HYSAT [10]. 

Acknowledgments. We would like to thank Adrian Kügel for providing, and getting 

support to, AKMAXSAT, andtoMichaelD.MoffittandBartPeintnerfordiscussions 

about MAXILITIS and WEIGHTWATCHER. 

145 

7

References 

1. J. Argelich, C. M. Li, F. Manyà, and J. Planes. The first and second Max-SAT evaluations. 

Journal of Satisfiability, Boolean Modelling and Computation, 4(2-4):251–278, 2008. 

2. A. Armando, C. Castellini, and E. Giunchiglia. SAT-based procedures for temporal reasoning. 

Proc. of ECP 1999, volume 1809 of LNCS, pages 97–108. Springer, 1999. 

3. A. Armando, C. Castellini, E. Giunchiglia, M. Idini, and M. Maratea. TSAT++: An open 

platform for satisfiability modulo theories. ENTCS, 125(3):25–36, 2005. 

4. A. Armando, C. Castellini, E. Giunchiglia, and M. Maratea. The SAT-based approach to 

Separation Logic. Journal of Automated Reasoning, 35(1-3):237–263, 2005. 

5. M. Bozzano, R. Bruttomesso, A. Cimatti, T. A. Junttila, P. van Rossum, S. Schulz, and R. Sebastiani. 

MathSAT: Tight integration of SAT and mathematical decision procedures. Journal 

of Automated Reasoning, 35(1-3):265–293, 2005. 

6. T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. MIT 

Press, 2001. 

7. L. de Moura. http://yices.csl.sri.com/. 

8. R. Dechter, I. Meiri, and J. Pearl. Temporal constraint networks. Artificial Intelligence, 

49(1-3):61–95, Jan. 1991. 

9. N. Eén and N. Sörensson. Translating pseudo-Boolean constraints into SAT. Journal on 

Satisfiability, Boolean Modeling and Computation, 2:1–26, 2006. 

10. M. Franzle, C. Herde, T. Teige, S. Ratschan, and T. Schubert. Efficient solving of large 

non-linear arithmetic constraint systems with complex boolean structure. Journal on Satisfiability, 

Boolean Modeling and Computation, 1:209–236, 2007. 

11. M. Gebser, B. Kaufmann, A. Neumann, and T. Schaub. Conflict-driven answer set solving. 

In Proc. of IJCAI 2007, pages 386–392. Morgan Kaufmann Publishers, 2007. 

12. M. Gelfond and V. Lifschitz. The stable model semantics for logic programming. In Proc. 

of ICLP/SLP 1988, pages1070–1080,1988. 

13. M. Gelfond and V. Lifschitz. Classical negation in logic programs and disjunctive databases. 

New Generation Computing, 9:365–385, 1991. 

14. F. Heras, J. Larrosa, and A. Oliveras. MiniMaxSat: A new weighted Max-SAT solver. Journal 

of Artificial Intelligence Research (JAIR), 31:1–32, 2008. 

15. A. Kuger. Improved exact solver for the weighted Max-SAT problem. In Proc. of the 2nd 

Pragmatics of SAT (PoS-10) workshop,2010. 

16. B. Nelson and T. K. S. Kumar. CircuitTSAT: A solver for large instances of the disjunctive 

temporal problem. In Proc. of ICAPS 2008,pages232–239.AAAIPress,2008. 

17. R. Nieuwenhuis and A. Oliveras. On sat modulo theories and optimization problems. In 

Proc. of SAT 2006, volume 4121 of LNCS, pages 156–169. Springer, 2006. 

18. A. Oddi and A. Cesta. Incremental forward checking for the disjunctive temporal problem. 

In Proc. of ECAI-2000, pages 108–112, Berlin, 2000. 

19. A. Oddi, R. Rasconi, and A. Cesta. Project scheduling as a disjunctive temporal problem. 

In Proc. of ECAI 2010, volume 215 of Frontiers in Artificial Intelligence and Applications, 

pages 967–968. IOS Press, 2010. 

20. B. Peintner and M. E. Pollack. Low-cost addition of preferences to DTPs and TCSPs. In 

Proc. of AAAI 2004,pages723–728.AAAIPress/TheMITPress,2004. 

21. K. Stergiou and M. Koubarakis. Backtracking algorithms for disjunctions of temporal constraints. 

Artificial Intelligence, 120(1):81–117, 2000. 

22. O. Strichman, S. A. Seshia, and R. E. Bryant. Deciding Separation formulas with sat. In 

Proc. of CAV 2002, volume 2404 of LNCS, pages 209–222. Springer, 2002. 

23. I. Tsamardinos and M. Pollack. Efficient solution techniques for disjunctive temporal reasoning 

problems. Artificial Intelligence, 151:43–89, 2003. 

146 

8

Visualizing Learning Dynamics in Large-Scale 

Networks 

Manal Rayess 1 and Sherief Abdallah 1,2 

1 Faculty of Informatics 

The British University in Dubai 

P.O.Box 502216, Dubai, 

United Arab Emirates 

manal.rayess@gmail.com 

2 (Fellow)School of Informatics, 

University of Edinburgh 

Edinburgh, EH8 9LE, UK 

sherief.abdallah@buid.ac.ae 

Abstract. Learning in multiagent systems requires that agents change 

their behavior in an attempt to maximize their payoffs. This can result in 

the system having complex dynamics. Being able to visualize these complex 

dynamics is an important step toward understanding learning in 

multiagent systems. Previous work in this area either focused on smallscale 

theoretical analysis that is difficult to extend to large-scale networks, 

or used global performance metrics (such as the average payoff) 

as a rough approximation to the dynamics. 

In this paper we propose a new visualization methodology that combines 

network analysis with dimensionality reduction to visualize learning dynamics 

in large-scale networks of agents. First, the dynamics over the 

network are summarized using network measures, then we use dimensionality 

reduction to reduce the dimensions even further. We conduct a 

comparative study to investigate different network analysis measures and 

different dimensionality reduction techniques over different settings. The 

results confirm that using network analysis is beneficial for visualizing 

dynamics. 


Multiagent systems (MAS) are systems composed of multiple interacting intelligent 

agents, where an intelligent agent is a computational element that is capable 

of interacting with its environment (as well as with other agents). Learning in 

multiagent systems requires that agents change their behavior in an attempt to 

maximize their payoffs. This can result in the system having complex dynamics. 

Being able to visualize these complex dynamics is an important step toward 




147

understanding learning in multiagent systems. Previous work in this area either 

focused on small-scale theoretical analysis that is difficult to extend to largescale 

networks, or used global performance metrics (such as the average payoff) 

as a rough approximation to the dynamics. 

Traditional techniques have relied heavily on using global performance metrics 

the system is trying to optimize (such as the social welfare) or other summarizing 

statistics of the local performance parameters (such as the average number 

of wins) in visualizing the performance of multiagent systems. However, 

these techniques can overlook important information pertinent to the performance 

on the micro level such as the malfunction of some agent (e.g. due to 

a disruption in its learning functionality). Moreover, experiments have shown 

that depending on the global performance alone can overlook hidden instability 

[1]. Another limitation in existing visualization techniques is that such 

techniques are better suited for plotting the learning dynamics for few players 

(typically two) in games with low-dimensional payoff matrix (typically 2×2 

or 3×3). A common way to compare the performance of different learning algorithms 

in higher dimensional games, is to list the results in a table which 

is a non-visual technique. Examples of these existing techniques are surveyed 

in [10]. Figure 1 illustrates some of the traditional visualization techniques. 

The long-term goal of our research is to find a technique for visualizing the 

learning dynamics of adaptive agents in large-scale networks in a way that is 

capable of summarizing the interaction of agents over the network by as few 

parameters as possible, while being able to remain sensitive to learning dynamics 

of individual agents. The work we present here provides the first step toward 

the above goal. We propose a visualization methodology that consists of two 

steps. In the first step we use network analysis measures to summarize agent 

interactions over the network to fewer number of parameters that capture the 

network structure. In the second step we use dimensionality reduction to reduce 

the number of parameters to one or two dimensions that can be visualized. We 

summarize the contributions of this paper in the following points. 

1. A new visualization methodology that combines network analysis with dimensionality 

reduction. 

2. A comparative study of different combinations of dimensionality reduction 

techniques and social network measures. 

3. Extensive evaluation of our methodology to answer the following questions: 

RQ1. Can our visualization technique differentiate between different learning 

algorithms in a networked system? 

RQ2. Can our visualization technique capture disruptions in the learning of 

agents, and 

RQ3. Can our visualization technique capture disruptions in the network 

structure? 

148 

2

Fig. 1. Techniques used for visualizing the performance of learning algorithms in MAS. 

1) The policy trajectory plot is used to plot the performance of some learning algorithm 

for two players in a 2x2 game. Grayscales are used to indicate the direction of convergence. 

2) The simplex plot extends the trajectory plot to to a 3x3 game matrix. 3) In 

the directional field plot a velocity field is computed to represent the derivative of the 

strategies of both players with respect to the time. Velocities are displayed by arrows 

pointing in the direction of the derivative and the length of the arrow indicates the absolute 

value of the derivative. 4) The cumulative reward plot indicates whether agents 

converge to the maximum social welfare profile in coordination (and anti-coordination) 

games of n players. 

2 Methodology 

We propose a two-steps visualization methodology: summarizing interactions 

over the network using network analysis measures then use dimensionality reduction 

to limit number of parameters to one or two. Dimensionality reduction 

is the process of reducing the number of features or parameters of a data set 

consisting of a large number of interrelated variables, called latent variables, 

while retaining as much as possible of the variation. The most common linear 

dimensionality reduction method is the Principal Component Analysis (PCA). 

PCA [8] works by transforming the original data set into a new set of uncorrelated 

variables (PCs) which are ordered so that the first few retain most of the 

variation in all of the original variables (PC1, PC2,...,PCn). 

Network measures are functions that summarize a graph into numeric values 

to simplify the analysis of the network. A key network metric is the centrality 

of a node which is concerned with measuring the extent to which a node is 

”central” in the network. A node is said to be more central than others if: it has 

more ties, it can reach all others more quickly, or it controls the flow between 

the others. These three properties form the three measures of node centrality: 

degree, closeness, and betweenness [3]. 

149 

3

These measures were generalized by later works for weighted networks [2, 

5, 7]. In [7] the number of ties, i.e. the degree, is also taken into consideration 

(besides the weights of the links) in computing weighted centrality. The relative 

importance between the number of links and their weights can be tuned by a 

variable parameter α. Thus, the weighted degree combines both the degree and 

the strength of the node. 

The remaining of this paper shows the experiments we conducted for testing 

this visualization technique and discusses the results before concluding by 

summarizing the achievements of this work. 

3 Experiments and Results 

In order to apply the techniques we are making use of for our visualization 

method, we made use of several available tools. The tools we have used are: 

MATLAB dimensionality reduction toolbox for applying different dimensionality 

reduction techniques [4], tnet [6] for computing the weighted network measures, 

and NetLogo [12], multiagent programmable modeling environment, for building 

a platform for the sake of testing the proposed visualization technique on 

networks of adaptive agents. 

Before testing our visualization technique on networks of intelligent agents, 

we carried out several pilot experiments to compare and identify suitable combinations 

of a dimensionality reduction technique and social network measure(s). 3 

We concluded from the pilot experiments the following: a) PCA gives good 

results in least computational time, b) using PCA with one or two target dimensions 

on a combination of social network measures that summarized the interactions 

in some network reveal patterns that could not be revealed using the raw 

network dynamics, i.e. without using network measures, and c)using weighted 

degree centrality (in-degree and/or out-degree) is sufficient to producing good 

visualization results. 

The remainder of this section shows how we used this visualization technique 

on networks of adaptive agents. 

3.1 Experimental Settings 

We test the proposed visualization technique on large networks with nodes representing 

agents that involve in playing some game and learn their actions. We 

pick the ”battle of the sexes” game and we implement two learning algorithms, 

namely the Q-learning [11] and the Infinitesimal Gradient Ascent (IGA) [9]. 

The Battle of the Sexes is a sample of a coordination game. The game can 

be defined by the following payoff matrix: 

I F 

I 4,7 0,0 

F 3,3 7,4 

3 The details of the pilot experiments are described in a techincal report 

150 

4

For the sake of testing the visualization technique on networks of adaptive 

agents, we developed a platform using NetLogo. In this platform, agents are 

situated in a network, with pre-specified number of nodes and average node 

degree. The network is weighted in a manner similar to weighted social networks 

where weights typically represent the amount of communication between the 

corresponding nodes. In every time step each player has to learn two things: 

what action to play (one action among all players) and which neighbor(s) to 

play with. Learning about the action uses the learning algorithm as set by the 

user (Q or IGA), whereas learning about the neighbor to play with always uses 

Q-learning. The output of each run is the edge-list corresponding to the evolving 

weighted network. 

3.2 Results for Question 1: Can it distinguish different learning 

algorithms? 

We conducted different runs using both learning algorithm (Q-learning and IGA) 

and then we tried to use the dimensionality-reduced network metric technique 

of visualization on these runs as in Figure 2. 

We can clearly observe the distinction made by this visualization technique 

on the two learning algorithms. 

Fig. 2. Result of PCA (with target dimensions is 1) on weighted closeness for weights 

corresponding to values of the iterator (Q vector in Q-learning and policy in IGA) in 

10 different runs on a network of 5 nodes and average node degree of 2. The game 

played is Battle of the Sexes.Parameters for the learning algorithms are as follows: Q 

(ε initially is 1 and iteratively decays in multiples of 0.98, α =0.1,γ =0.9).IGA(η 

initially is 0.03 and then iteratively decays in multiples of 0.99). 

3.3 Results for Question 2: Can it capture disruptions in learning? 

When learning is disrupted at some stage for some agent, it will simply stop 

learning. This is simulated by having it playing a random action in each forth- 

151 

5

coming stage. Thus, it will stop maintaining the values of the learning parameters. 

We examined different runs for adaptive agents situated in random networks 

of different sizes (number of nodes × average node degree),ranging from 5 × 3to 

100×4, where the learning of random node(s) is disrupted at different times after 

convergence. We then computed the degree centrality for the resulting edgelist 

on which we used PCA to reduce the dimensionality. We plot the results of 

dimensionality reduction in 2D and mark the point of disruption in each run as 

shown in Figure 3. 

Fig. 3. Results of PCA (2 dimensions) on weighted degree (in-degree and out-degree 

with α =0.5 corresponding to weighted networks of different node and average link 

degrees. The learning of 20% random nodes is disrupted at different times after convergence. 

Learning algorithm is Q-learning. 

3.4 Results for Question 3: Can it capture disruptions in network 

structure? 

When a link is disrupted, i.e. disconnected, at some stage it will not take part 

in playing games with neighbors and, hence, will receive a zero payoff. As in 

the previous section, we plot the results of the weighted degree centrality of 

different runs where in each run one or more random links are disconnected after 

convergence. Figure 4 illustrates these plots in 2D where each plot correspond 

to a different run and the point of disruption is circled in red. 

3.5 Results for Question 4: Can it be used to identify the type and 

source of disruption? 

If we can distinguish between different types of disruptions and identify the 

source of this disruption, then we can say that our visualization technique can 

be used as a means of explanatory data analysis for multiagent systems. 

152 

6

Fig. 4. Results of PCA (2 dimensions) on weighted degree (in-degree and out-degree 

with α =0.5) corresponding to different weighted networks. Random links are disrupted 

at different times after convergence. Learning algorithm is IGA. 

As an experiment we plotted different lists of weighted node degree centrality 

metric that we computed for previous runs over time and we could observe that 

these network metrics of disrupted nodes show different trends than those of 

undisrupted nodes (see for example Figure 5). 

However, in the experiments we conducted it was not clear how to distinguish 

between different types of disruption. 

Fig. 5. Plotting weighted degree centrality of six nodes, three of which have their 

learning disrupted (the ones in red) at time 20. 

4 Conclusion and Future Work 

In this research we aimed at finding a technique for visualizing the learning dynamics 

in large-scale networks of multiagent systems in a way that is capable 

of summarizing the global performance of the whole system (on its macro level) 

153 

7

y as few parameters as possible, while being able to remain sensitive to the 

learning of individual agents (the micro level). For this we proposed the use of 

a combination of dimensionality reduction and weighted network metrics as a 

means of visualizing the performance in networks of adaptive agents. The results 

of the experiments have confirmed several claims we made in this study. 

Many things can be done, as a future work, to consolidate our findings and 

build on top of it. For example, we can extend the testing to other games (such as 

the Prisoner’s Dilemma and Matching Pennies) and other learning algorithms. 

Beside trying different tunings and parameters, one important extension to this 

work is by testing whether this technique can be used for explaining the performance 

of the system, for example by unambiguously identifying the type and 

source of the disruption in case it occurred in the system. 

References 

1. S. Abdallah. Using graph analysis to study networks of adaptive agent. In Proceedings 

of the 9th International Conference on Autonomous Agents and Multiagent 

Systems: volume 1 - Volume 1, AAMAS’10,pages517–524,Richland,SC,2010. 

International Foundation for Autonomous Agents and Multiagent Systems. 

2. A. Barrat, M. Barthélemy, R. Pastor-Satorras, and A. Vespignani. The architecture 

of complex weighted networks. Proceedings of the National Academy of Sciences 

of the United States of America, 101(11):3747–3752, March 2004. 

3. L. C. Freeman. Centrality in social networks conceptual clarification. Social Networks, 

1(3):215–239,1978-1979. 

4. MathWorks. Matlab toolbox for dimensionality reduction. http://homepage. 

tudelft.nl/19j49/Matlab_Toolbox_for_Dimensionality_Reduction.html, 

November 2010. 

5. M. E. J. Newman. Analysis of weighted networks. PHYS.REV.E, 70:056131,2004. 

6. T. Opsahl. Structure and Evolution of Weighted Networks. PhD thesis, University 

of London, London, UK, 2009. pp. 104-122. 

7. T. Opsahl, F. Agneessens, and J. Skvoretz. Node centrality in weighted networks: 

Generalizing degree and shortest paths. Social Networks, 32(3):245–251,July2010. 

8. K. Pearson. On lines and planes of closest fit to systems of points in space. Philosophical 

Magazine, 2(6):559–572,1901. 

9. S. P. Singh, M. J. Kearns, and Y. Mansour. Nash convergence of gradient dynamics 

in general-sum games. In Proceedings of the 16th Conference on Uncertainty in 

Artificial Intelligence, UAI ’00, pages 541–548, San Francisco, CA, USA, 2000. 

Morgan Kaufmann Publishers Inc. 

10. H. van den Herik, D. Hennes, M. Kaisers, K. Tuyls, and K. Verbeeck. Multi-agent 

Learning Dynamics: A Survey. In Cooperative Information Agents XI, Lecture 

Notes in Computer Science, pages 36–56. Springer Berlin / Heidelberg, September 

2007. 

11. C. Watkins. Learning from Delayed Rewards. PhD thesis, University of Cambridge,England, 

1989. 

12. U. Wilensky. Netlogo. http://ccl.northwestern.edu/netlogo/. Center for Connected 

Learning and Computer-Based Modeling, Northwestern University, Evanston, IL., 

1999. 

154 

8

ACO algorithms for solving a new Fleet 

Assignment Problem 

Javier Diego Martín 1 , Ignacio Rubio Sanz 1 , Miguel Ortega-Mier 1 , and Álvaro 

García-Sánchez 1 

Unidad Docente de Organización de la Producción 

Escuela Técnica Superior de Ingenieros Industriales 

Universidad Politécnica de Madrid, Spain 

jdiego@gesfor.es, ignacio.rubio.sanz@gmail.com , miguel.ortega.mier@upm.es, 

alvaro.garcia@upm.es 

Abstract. The Fleet Assignment Problem (FAP), which transportation 

companies have to deal with, consists in deciding the fleet size and assigning 

a type of vehicle to a set of scheduled trips in order to minimize 

the total operational costs. In this paper, we propose a new model for 

the FAP, referred to as the new flexible model for the fleet assignment 

problem. We have developed Ant Colony Optimization (ACO) method 

and have analyzed its performance and compared it with Branch&Bound 

in a wide range of instances. 


In passenger transportation, how vehicles are assigned to trips is very important. 

An effective and efficient management of the vehicle fleet has a significant impact 

on the company costs and thus it heavily influences the profit achieved. The 

Fleet Assignment Problem (FAP) addresses this issue. The goal is to find the 

optimal fleet dimension in order to meet the transport demand. Although the 

FAP initially was developed to solve problems within the aeronautical industry, 

we propose a new model that may be applied in different transportation contexts. 

Most of the problems presented in the literature are Mixed Integer Problems 

(MIP) models with several compromises, as can be found in [4], [1] or [5]. 

We have developed a new FAP model from those models. For large instances, 

solving times for the FAP model using Branch&Bound are too long. Therefore 

we developed an ACO method. The FAP can be easily formulated through a 

graph and ACO is a well-known constructive metaheuristic suitable for in that 

context. We have checked the solution quality obtained and we have analysed 

the execution time using ACO in some particular cases. 

The rest of the paper is organized as follows. In section 2 the problem is 

stated in detail. In section 3, we describe the ACO algorithms for adressing the 




155

problem. The results of the experimentation both using a general solver and the 

ACO framework are given in section 4, along with the main conclusions. 

2 Problem description 

A trip is a journey from an origin to a destination with a unique code. For 

example, a trip could be a flight from JFK to LAX, code AA1203. 

Each trip can be scheduled in one or several time departure windows. It 

means that the trip AA1203 might depart on Monday between 9am and 11am 

or Monday between 1pm and 3pm. Here, we have considered a uniqueness constraint, 

i.e, one and only one of the eligible time windows must be assigned. 

Additionally, a duty is a tuple defined by a trip and a time window. In the previous 

example, we two duties (AA1203, 9-11am) and (AA1203, 1-3pm) exist and 

one and only one must be active. 

There may exist time-precedence relationships among trips. For example, it 

might be necessary to impose that a trip can not depart later that a certain 

amount of time from another trip’s departure. For this requierment to be met, a 

constraint set ’earlier than’ was defined. Finally, two duties can be scheduled so 

that the time elapsed between their respective departure times must equal some 

particular value. 

Given a set of trips and a set of windows and the eligible duties, the problem 

consists in: 1) defining the departure time for every trip (which implies using a 

particular window) and 2) assigning a type of vehicle to each of them, so that 

the total cost is miminized. 

A solution for the problem would consist of a set of rotations with a predefined 

time horizon, where a rotation is the sequence of duties that a particular vehicle 

will perform. Moreover, rotations must be cyclic, meaning that the number of 

vehicles of every type at beginning of the time horizon will be the same that 

at the end of it. For example, if at the beginning time horizon at JFK there 

are three MD80 and two A320. The rotation is cyclic if at the end of the time 

horizon at JFK there are three MD80 and two A320. Otherwise, the rotation 

will not be cyclic regardless of the rotations. 

3 ACO for solving the FAP 

A graph associated to the problem must be defined (nodes and edges). Every 

node represents a duty assigned to a type of vehicle. There will be as many 

nodes as possibilities of assignment, so every node will have associated only a 

time window and only a vehicle type. Every edge linking two nodes represents 

two possible subsequent duties and a particular type of vehicle. For an edge to 

exist linking nodes i and i + 1 the following conditions are to be met: 

– The vehicle that performs the duty to which refers node i + 1 is the same as 

that for node i. 

2 

156

whose goal is to adjust and to propagate the time domain of the duties. The 

constraint propagation engine will recalculate all the time windows (minimum 

and maximum departure times for each duty) and if one of these values changes, 

we make sure the cadidate node can be assigned without affecting pre-existing 

duties of the rotation. 

4 Experimental results 

Although we have developed a new flexible model for the FAP, in our experimental 

results we only have studied airline instances, where trip durations are 

not variable. In addition, we have established the minimum time aircrafts must 

remain in airports, the costs of using an aircraft and the costs of the duties. This 

has been set according to the customer experience. We present the parameters 

used for the experimentation in table 1. 

Table 1. Vehicle parameters. In homogeneous instances, the vehicle used is MD80. 

Name Minimal Scale(min) Fixed Cost ($) Cost/min 

MD80 45 21,000 35 

A320 60 32,000 53.3 

B747 90 66,000 110 

The computational study has been developed with an IntelCoreDuo, 1.83 

GHz, 2Gb RAM computer. The MIP problems were modeled using AIMMS 

and solved with CPLEX v12.2. We have created a new program that generates 

instances for the problem because we have not found benchmark instances that 

simulate the reality modeled by the new FAP model developed. We have studied 

instances with 50, 250, 500, 1000 and 2000 trips. The ACO parameters have 

been defined using [3], and [2]. A summary of them is shown in table 2. The 

termination condition should be the number of iterations. This number was 

fixed according to the instance size and the execution time of the algorithm. So, 

we have considered that 500 iterations for small instances (up to 500 trips) and 

750 iterations for large ones (over 500 trips) is appropriate. We have run each 

algorithm 10 times.Finally, the ACO parameter α is always 1 regardless of the 

algorithm in order to avoid a stagnation state. 

The first anlysis consits in ranking the algorithms as to what extent they 

deviate from the value of the optimal solution. For every algorithm, we have 

calculated the average error for all instances and all ACO parameters. The results 

are shown in table 3. 

So, the best algorithms for solving FAP regardless of the nature of the problem 

are: ASRank, MMAS and EAS. 

The instance tables display the following information: N.Trips is the number 

of trips to be assigned, tLP is the execution time (in seconds) for to obtain either 

the optimal solution of the instance or a feasible solution, β is an ACO parameter, 

4 

158

Table 2. ACO parameters: Algorithm is the ACO algorithm implemented, β is a 

parameter, ρ is the evaporation rate, τ0 is the initial pheromone value, na is the number 

of ants, SelectNode is the rule used for assigning nodes and q0 is a constant value. 

Algorithm β ρ τ0 na SelectNode (q0) 

EAS 1, 3, 5 0.5 0.5 5, 10, 20 AS 

ASRank 1, 3, 5 0.5 0.5 5, 10, 20 AS 

MMAS 1, 3, 5 0.8 τmax=10 20 5, 10, 20 AS 

ACS 1, 2, 5 0.1 0.5 5, 10 ACS (0.9, 0.8) 

HyperCube (HC) 1, 2, 5 0.5 0.5 5, 10 ACS (0.9) 

HCMMAS 1, 2, 5 0.05 0.5 5, 10 ACS (0.9) 

Table 3. ACO Algorithms performance solving FAP 

Algorithm Solution average (%) 

EAS 10.74 

ASRank 9.53 

MMAS 10.05 

ACS 14.76 

HC 17.99 

HCMMAS 11.63 

Algorithm is the ACO algorithm used, x is the average of all the best algorithm 

solutions after 10 runs, (%) is the solution error obtained by ACO compared to 

the solution obtained in the MIP model and σx is its deviation. s bs and s ws are 

the best and the worst solutions found among the best solutions, it is the average 

value for the iteration where the best solution was found in each execution and 

t is the average of the execution time of the algorithm in seconds. 

The instances with homogeneous fleets and only one time window for each 

trip do not depend on the algorithm used, because ACO outperformed Branch 

and Bound as we usually obtain the optimal solution with short execution times 

with ACO. Moreover, we have noticed that the performance of ACO algorithms 

improves when β > 1 regardless of the number of ants. For the other cases 

studied, although the experimentation has been very extensive, we will only 

show the most remarkable experimental results in tables 4, 5 and 6. 

From the experimental results obtained, we can conclude: 

– Low-medium size instances: 

1. For heterogeneous fleet problems with trips with single time windows, 

ACO is more time consuming than Branch and Bound. Moreover, ACO 

does not attain the optimal solution, offering in average a 6% deviation 

from the optimal. 

2. For homogeneous fleet problems with multiple time windows, ACO is 80 

times faster than Branch and Bound, with an approximate error of 10% 

of the linear programming solutions. This means that ACO is the first 

recommended method for solving the problem. 

5 

159

Table 4. Heterogeneous fleet and only one time window for each trip. When the number 

of trips is 2000, the number of ants is 5. The symbol ∗ means that Branch and Bound 

for this trip size and this type of instance returns a feasible solution, not the optimal. 

The symbol - indicates that Branch and Bound could not solve the instance. 

N.Trips tLP β Algorithm x(%) σx s bs 

s ws 

it t 

EAS 634565 (0.00) 0 634565 634565 6.5 5.3 

3 ASRank 635503 (0.15) 2812.5 634565 643940 4.8 5.3 

50 1.42 MMAS 634565 (0.00) 0 634565 634565 46.6 5.2 

EAS 634565 (0.00) 0 634565 634565 7.5 5.4 

5 ASRank 636440 (0.30) 3750 634565 643940 7.3 5.2 

MMAS 635503 (0.15) 2812.5 634565 643940 26.2 5.3 

EAS 3165630 (4.23) 23118.5 3135360 3211500 341.5 45.7 

3 ASRank 3117160 (2.63) 24714.2 3087500 3164300 210.8 44.9 

250 3.35 MMAS 3147020 (3.62) 47031.2 3092440 3271800 368 44.4 

EAS 3194600 (5.18) 21904.8 3166820 3238760 295.9 44.3 

5 ASRank 3116290 (2.60) 20152 3099630 3171840 364 42.8 

MMAS 3113990 (2.53) 16366 3078910 3142250 376.5 42.3 

EAS 6396950 (9.37) 41116.3 6336570 6461500 377.8 224.7 

3 ASRank 6218530 (6.32) 59074.7 6127820 6306470 441.1 217.8 

500 10.6 MMAS 6260250 (7.04) 65356.8 6187330 6375870 534.6 224.1 

EAS 6411280 (9.61) 47359.2 6330340 6503270 572.3 223.3 

5 ASRank 6246110 (6.79) 41420.8 6187990 6311440 448.8 215 

MMAS 6201050 (6.02) 65617.7 6088480 6305690 436.8 213.5 

EAS 13117700 (8.71) 54935.5 13043400 13182200 552.5 662.6 

3 ASRank 12760100 (5.75) 89145.5 12666700 12939500 486.9 655.4 

1000 38 MMAS 12953300 (7.35) 30180.2 12888800 12976900 467.7 723.8 

EAS 13103800 (8.60) 55938.8 13018500 13207400 351.3 665.3 

5 ASRank 12782100 (5.93) 103109 12643000 1297400 613.7 728.2 

MMAS 12703000 (5.27) 32569.3 12600500 12810800 558.7 716.8 

EAS 25913560 106155.8 25752500 26029400 456.6 1382.4 

3 ASRank 25729980 40853.8 25673200 25770000 534.8 1278 

2000 ∗ 

- MMAS 25579380 64266.7 25472100 25660100 440.4 1056.8 

EAS 25931160 156627.37 25695300 26116100 495 1337.4 

5 ASRank 25767350 112500.7 25594900 25890800 471.75 1224.5 

MMAS 24967760 56325.3 24896200 25032200 560.8 1488.2 

6 

160

Table 5. Homogeneous fleets and only multiple time window for each trip. The symbol 

∗ means that B&B for this problems wiht this number of trips and this type of instance 

returns a feasible solution, not the optimal. If the % is negative, the solution obtained 

by ACO is this % better than the solution obtained by the MIP model. 

N.Trips tLP β Algorithm x(%) σx s bs s ws 

it t 

EAS 8.2 (2.50) 0.60 8 9 18.9 5.3 

3 ASRank 8.1 (1.25) 0.54 8 9 47.2 5 

50 2.17 MMAS 8.2 (2.50) 0.40 8 9 51.3 6.2 

EAS 8.7 (8.7) 0.64 8 10 15.7 5.6 

5 ASRank 8.4 (5.00) 0.30 8 9 2.3 5.7 

MMAS 8.2 (2.50) 0.75 8 9 22.6 6 

EAS 36.9 (11.82) 1.14 35 39 154 66.7 

3 ASRank 35.6 (7.88) 0.92 34 37 80.6 65.4 

250 4504.8 MMAS 35.4 (7.27) 0.92 34 36 80.2 71 

EAS 37.1 (12.42) 1.04 35 39 64.4 64.9 

5 ASRank 36.1 (9.39) 1.04 34 37 43 63.6 

MMAS 36.2 (9.70) 1.47 34 39 166.3 64.8 

EAS 74.2 (12.42) 1.33 72 77 144.2 231.4 

3 ASRank 71.1 (7.73) 1.51 69 74 125.3 227.9 

500 ∗ 

23631.41 MMAS 71.3 (8.03) 1.62 69 75 146.3 231.1 

EAS 74.8 (13.33) 1.60 71 77 146.7 232.9 

5 ASRank 71.8 (8.79) 0.87 70 73 203.7 225.7 

MMAS 71.6 (8.48) 1.50 69 74 182.6 220 

EAS 143.1 (-85.68) 1.45 140 145 147.9 619 

3 ASRank 137 (-86.29) 0.89 135 138 353.5 833.1 

1000 ∗ 

4522.21 MMAS 138.7 (-86.12) 1.42 136 141 291.6 846.4 

EAS 144 (-85.59) 2.10 142 149 239.2 634.7 

5 ASRank 139.8 (-86.01) 1.89 135 142 162.5 838.5 

MMAS 140 (-85.99) 3.10 135 145 197.4 811.6 

3. Finally, for homogeneous fleet problems and multiple time windows, 

ACO is 10 times faster than Branch and Bound, but the solution cost is 

a 11% worst with ACO. 

– Large size instances: 

The technique for solving the problem should be ACO because it finds reasonably 

good solutions with aceptable execution times. B&B usually could 

not attain any feasible solution, as the computer runs out of RAM memory. 

Finally, we have noted that the ACO metaheuristic is very dependent on the 

ACO parameters, specially of the evaporation rate ρ, theβ parameter and the 

number of ants na, improving the last two when their values grow. 

Acknowledgements 

This work partly stems from the participation of the authors in a research project 

funded by the “Ministerio de Industria, Comercio and Turismo”: Proyecto Avanza 

reference TSI-020100-2009-534, titled OPTILOGI. 

7 

161

Table 6. Heterogeneous fleet and only multiple time window for each trip. The symbol 

∗ means that MIP model for this trip size and this type of instance returns a feasible 

solution, not the optimal. If the % is negative, the solution obtained by ACO is this % 

better than the solution obtained by the MIP model. 

N.Trips tLP β Algorithm x(%) σx s bs 

s ws 

it t 

EAS 830515 (12.35) 23171.5 805660 877277 245 10.8 

3 ASRank 816587 (10.47) 28662.9 775218 868343 238 9.4 

50 1.54 MMAS 798368 (8.00) 21806.7 760218 843843 199.1 9.6 

EAS 912806 (23.48) 22786.3 887259 948785 153.6 13.5 

5 ASRank 875240 (18.40) 32651.7 807218 947402 183.8 10.8 

MMAS 842967 (14.03) 18483.6 804652 877343 158.4 11.8 

EAS 3154080 (15.91) 40072 3097800 3250180 206.7 107.4 

3 ASRank 3075030 (13.01) 41576.6 3009560 3150540 344.5 85.2 

250 2116.7 MMAS 3053370 (12.21) 44938.6 2973730 3106340 369.2 84.6 

EAS 3160620 (16.15) 33075.5 3111770 3211120 291.1 107 

5 ASRank 3063880 (12.60) 46758.8 3015690 3140410 330.1 83.7 

MMAS 3015740 (10.83) 50937.8 2939180 3101220 281.2 81.1 

EAS 6609470 (15.56) 63928.2 6520420 6698540 318.1 534.4 

3 ASRank 6400480 (11.90) 83969.8 6291810 6554270 484.8 527.2 

500 ∗ 

5656.8 MMAS 6407610 (12.03) 57190.3 6302740 6516430 418.1 535.1 

EAS 6535640 (14.27) 83706.2 6386760 6720470 416.6 533.5 

5 ASRank 6375490 (11.47) 45306.3 6313620 6477680 614.8 527.7 

MMAS 6397730 (11.85) 69973.4 6262400 6486810 561.4 531.7 

EAS 12616700(-60.38) 45839.8 12555600 12677200 359.6 794.4 

3 ASRank 12332500 (-61.28) 137824 12195300 12579700 234.2 863.8 

1000 ∗ 9455.7 MMAS 12412500 (-61.02) 136277 12244400 12632600 396 789.6 

EAS 12511200 (-60.72) 106052 12330900 12629800 345.8 858 

5 ASRank 12299100 (-61.38) 114649 12093300 12443900 506.2 1041.6 

MMAS 12216300 (-61.64) 176431 11935100 12384000 448.8 797.4 

References 

1. N. Belanger, G. Desaulniers, F. Soumis, and J. Desrosiers. Periodic airline fleet 

assignment with time windows, spacing constraints, and time dependent revenues. 

European Journal of Operational Research, 175:1754–1766,2006. 

2. C. Blum and M. Dorigo. The hyper-cube framework for ant colony optimization. 

IEEE Transactions on Systems Science and Cybernetics, 34:1161–1172,2004. 

3. M. Dorigo and T. Stutzle. Ant Colony Optimization. Massachusetts Institute of 

Technology, 2004. 

4. I. Ioachim, J. Desrosiers, F. Soumis, and N. Belanger. Fleet assignment and routing 

with schedule synchronization constraints. European Journal of Operational 

Research, 2:75–90,1999. 

5. H. D. Sherali, E. K. Bish, and X. Zhu. Airline fleet assignment concepts, models, 

and algortihms. European Journal of Operational Research, 172:1–30,2006. 

8 

162

A New Guillotine Placement Heuristic for the 

Orthogonal Cutting Problem * 

Slimane Abou Msabah 1 — Ahmed Riadh Baba-Ali 2 

1 Department of Computer Science, University of Science and Technology Houari 

Boumedienne, USTHB, Bab Ezzouar, Algiers, Algeria, slmalg@yahoo.com 

2 Department of Electronics, University of Science and Technology Houari Boumedienne, 

USTHB, Bab Ezzouar, Algiers, Algeria, riadhbabaali@yahoo.fr 

Abstract. The orthogonal cutting problem consists in finding an optimal arrangement 

of n items on identical dimension bins. Several placement heuristics 

are used to realize this task. The constraint of guillotine complicates more the 

problem. In our article, we are interested in the orthogonal cutting problem, taking 

into account the guillotine constraint. To do it, we propose a new placement 

heuristic inspired by the BLF routine, and which tries to place the items on levels, 

to verify the guillotine constraint, while exploiting intra-levels residues, in 

two directions vertically then horizontally. Our heuristic named BLF2G will be 

combined with a guided genetic algorithm, to be compared with the other heuristics 

and metaheuristics found in literature, on made test sets and known test 

sets. 

KEYWORDS: orthogonal cutting, guillotine constraint, combinatorial optimization, 

heuristics, genetic algorithm. 


The problem of cutting or placement is an optimization problem, whose objective is to 

determine a suitable arrangement of various items in others that are wider. The main 

objective is to maximize the use of the raw material, and thus to minimize the losses. 

The orthogonal cutting problem pulls its interest of the fact that it is applicable on 

several fields such as the cut of sheet steel, paper, fabrics, etc. … This is important for 

the industries of mass production where the optimization of the material plays an important 

role in the cost of manufacturing. 

In our work, we will propose a new guillotine placement routine, which aims the 

exploitation of the raw material. The development of such routine has to consider 

several parameters, such as the shape of the treated objects and the constraints imposed 

by the production system. 

In our case, we considered an orthogonal cutting problem, which treats a strip of 

fixed width and supposed infinite height, to generate items of rectangular shape. The 

used material can be a steel sheet and the machines of production are typically shears 

guillotine, which impose the cut from edge-to-edge (guillotine constraint). Items keep 

their original orientations to be cut on decorated or textured plates. 

* Proceedings of the 18 th RCRA workshop on Experimental Evaluation of Algorithms for 

Solving Problems with Combinatorial Explosion (RCRA 2011). 


Full paper can be found in : http://slmalg.unblog.fr/ 

1 

163

The routine, which we propose, pulls its profile of the BLF (Bottom Left Fill) routine; 

the layout of items is made in levels, to insure the guillotine constraint. Two 

other mechanisms are setup to exploit intra-levels residues, vertically and horizontally. 

After this introduction to the problem, we lists in section 2 of this article the placement 

routines found in literature. In section 3 we expose our routine, and we show its 

utility in exploitations of residues on test sets made to fit the bin. The results of our 

method are explained in section 4, which will be compared with the other heuristics, 

on test sets found in literature. We end our article with a conclusion, which will be 

presented in section 5. 

2 State of the art 

We are interested in this section to investigate the placement heuristics found in the 

literature from the existing methods to propose a routine which fits to our case. In 

order to take advantage. 

Baker Coffman and Rivest, present the BL (Bottom Left) heuristics to place the 

items in the lowest place to the left. They test several sequences of appearance of 

items and find that the sorting of the list of items according to the diminution of the 

widths gives better results [1]. 

Jakobs uses the BL heuristics, which tries to place the item in the highest place to 

the right, then he slides the item successively in the lowest place then most to the left 

possible. He uses afterward a genetic algorithm to find the sequence, which gives best 

result [9]. 

Lui and Teng perfected the BL heuristics by favouring the sliding of the item from 

top to bottom, the sliding of the item is made of right to the left if no movement below 

is possible [10]. 

The BLF Routine tries to place items in the lowest place to the left by exploiting 

the internal residues. Ramesh Babu and Ramesh Babu backs up the points which form 

the left lower corners of residues in a list. For every item to be placed an algorithm 

goes through the list, respecting the lowest order at the left and place the item in the 

first suitable place. So the list is modified according to the dimension of the placed 

item. [13]. 

Lodi et al. presented an approach named "Floor-Ceiling" (FC) who spreads the way 

of placing items on the levels. The approach FC places the items from left to right at 

the bottom of level and also place items from right to left at the top of the level [11]. 

Lodi et al. present a new variant of the FC routine to verify the guillotine constraint. 

This variant realizes the cuttings from edge to edge. 

Ben Messaoud et al. present a new placement heuristic, based on levels (SHF) to 

apply the FC algorithm by injecting items placed in the ceiling below [2]. 

Burke et al. present their Best Fit heuristic by the introduction of the notion of 

neighborhood. They build their format of cut as one goes along. Initially the list of 

items is sorted by decreasing height. They try to fill the lowest residue by the items 

which suits, if no item is suitable, they fill this space by an irrecoverable scrap until 

the next level, and so on; as a bricklayer who builds a wall. The item is moved either 

to the left or to the right according to the height of neighboring items. They also introduce 

a mechanism of orientation of the long items to reduce the losses [4]. 

3 Our contribution 

The placement policy is the crucial point in the laying out process, to maximize the 

exploitation of residues. We can divide the placement heuristics into two categories: 

2 

164

a) The direct heuristics, which place directly the current item in the first suitable place 

according to the applied policy, we can quote the routines Finite First Fit, Finite Nest 

Fit, Bottom Left, Bottom left Fill [3] … b) The heuristics which chooses, they choose 

a suitable place among various according to their laying out policies, to place the current 

item, such as Finite Best Fit, Finite Worst Fit … 

The first category of heuristics realizes the laying out at short time, and it depends 

totally on the order of appearance of the items. These heuristics are favorable to be 

combined with stochastic algorithms and metaheuristics. The second category requires 

an additional time to realize the laying out and gives, generally better results than the 

first category. 

Our placement routine tries to place items directly to the suitable place, according 

to a laying out policy which verifies the guillotine constraint. It belongs to the first 

category. 

3.1 Our Placement policy 

The use of an adequate placement policy gives better results. The guillotine constraint 

makes the problem more complicated. To find a good placement policy, which fits 

better to this established fact, we took advantage from placement methods which exists 

in the literature. The routines which base on the laying out in level are the best 

adapted to our scenario, the cut from edge to edge required by shears-guillotines 

adapts itself better to the layout in levels. 

The laying out of an item on the strip can be made according to three possible 

cases: 

Placement in levels. The strip is structured in levels; every level is characterized by a 

height and an available width. If the width of the item is lower or equal to the available 

width the item is thus placed. The available width is updated, and if the height of 

the item is superior to the level, the height of the level is redefined by the height of the 

item. We named this phase BLG, because it is a question of applying the routine of 

placement Bottom Left to levels to verify the constraint of Guillotine. The application 

of this routine engenders BLGsub-levels (Fig.1). 

Level i 

The BLGsub-levels 

Level 1 

Fig. 1. The layout on the levels 

Placement in BLGsub-level:. A BLGsub-level is characterized by a width, equal to 

the width of the item placed below, and by the available height. Items are ordered on 

these residues vertically. If the width of the item is lower or equal to the width of 

BLGsub-level and the height of the item is lower or equal to the available height of 

BLGsub-level, the item is thus placed in this BLGsub-level and the available height is 

updated. This phase is named BLFG (Bottom Left Fill Guillotine). The application of 

this routine engenders BLFGsub-levels (Fig. 2). 

3 

165 

Height of the Level i, as high as the 

longest item. 

Available width in the level i.

Available height of 

BLGsub-levels 

Width of BLGsub-levels 

Fig. 2. The layout on BLGsub-levels 

Placement in BLFGsub-level :. A BLFGsub-level is characterized by a height and 

an available width. If the height of the current item is lower or equals the height of the 

level and the width of the current item is lower or equal to the available width of 

BLFGsub-level, the item is thus placed in this BLFGsub-level and the available width 

of BLFGsub-level is updated. This phase is named BLF2G (Bottom Left Fill 2 (second 

exploitation of residues) Guillotine) (Fig. 3.). 

Height of 

BLFGsub-levels 

Fig. 3. The layout on BLFGsub-levels 

After this stage, the residues are too small to be exploited, but if we have a data set of 

smallest items, we can continue the exploitation using BLF3G, BLF4G, etc… 

3.2 The placement algorithm 

Read the dimension of the plates 

Load list of item 

For all items 

For all levels 

If current item can be 

placed in the level 

then place item 

Update the level 

Break End if 

For all BLGsub-levels 


placed in the BLGsub-level 

Then place item 

Update the BLGsublevel 

Break End if 

For all BLFGsub-levels 


placed in the BLFGsub-level 

3.3 The genetic Algorithm 

4 

The BLFGsub-levels 

Available width of 

BLFGsub-levels 

Then place item 

Update BLFGsub-level 

break End if 

Pass at the following 

BLFGsub-level 

End For 

Pass at the following 

BLGsub-level 

End For 

Pass at the following level 

End For 

If item not placed 

then place item in a new level 

update the new level 

End if 

Pass at the following item 

End for 

End 

To show the power of our policy of placement BLF2G, we will combine it with a 

classic GA. We used a real codification[7] where the chromosome is defined by the 

order of items. The order of appearance of items in the process of laying out, accord- 

166

ing to our policy BLF2G, determines the quality of every individual. We implemented 

a genetic algorithm approach with a population size of 100 and we fixed the number 

of generations by 20 times the number of items. Initially we generate a random population 

with a random ordering item in each individual. At each generation, our BLF2G 

policy gives the quality of each individual. The genetic operators are so defined: 

The crossover operator. We used the partially matched cross-over with 1-point 

cross-over (PMX1). We make then a correction to make the children’s valid. We are 

going to correct the child 1 with the missing genes according to their order of appearance 

in parent 2, and replace the double genes in child 2 by the missing genes according 

to their order of appearance in parent 1. 

Parent 1 : 123|456 123321 Child 1 : 123654 

PMX1 Correction 

Parent 2 : 654|321 654456 Child 2 : 654123 

The Mutation operator. It is about a permutation between two sites chosen randomly: 

Child : 1|2365|4 143652 


To estimate the performance of our heuristic we made a test sets which offer a maximal 

exploitation of the material (0 % of scraps) by applying the BLF2G policy. 

name # of Item Plates dimension Optimal height 

Msa17a, Msa17b, Msa17c 17 200 x 200 200 




Table 1. Our made test sets 

To estimate our heuristics we combined it with a genetic algorithm. The obtained 

results are compared with our policy BLF2G by applying a sorting according to the 

Decreasing Heights to the list of items. 

Msa17 Msa35 Msa75 Msa150 

a b c a b c a b c a b c 

BLF2G DH 240 245 263 220 225 229 214 210 210 205 205 218 

BLF2G GA 200 200 200 220 215 219 215 210 218 205 205 219 

Table 2. Results of the routine BLF2G+DH and BLF2G+GA. 

We notice that the GA combined with our policy of placement BLF2G, reached the 

optimal for the test sets of 17 items (small size), and has given comparable results 

with the BLF2G+DH heuristic for test sets of medium size. But for the test sets of big 

size the BLF2G+DH heuristic is better. 

4.1 Improvement 

The GA failed in front of the DH heuristic for the test sets of large-size. To remedy 

that we suggest injecting the individual sorted out according to DH policy in the initial 

population of the evolutionary process, which we name GAguided. The following table 

shows the results. 

5 

167

Msa17 Msa35 Msa75 Msa150 

a b c a B c a b c a b c 

BLF 2G DH 240 245 263 220 225 229 214 210 210 205 205 218 

BLF2G GA 200 200 200 220 215 219 215 210 218 205 205 219 

BLF2G GAguided 200 200 200 215 210 219 207 205 210 205 205 212 

Table 3. Result of the BLF2G+DH routine, BLF2G+GA and BLF2G+GA guided 

With this improvement the GAguided keeps its superiority with regard to the DH heuristic 

and gives results equal and even better than the DH heuristic for the test sets of any 

size. 

4.2 Test sets found in the literature 

To estimate better our method we are going to use the test sets of Hopper and Turton 

(2001) [8], Burke et al. (2004) [4], which are the most used: 

Test 

set 

Hopper & Turton 

2001 

Burke et 

al. 2004 

Name 

Size 

# of items 

Optimal 

Height 

box Size 

C1:P1,P2,P3 16/17 20 20x20 

C2:P1,P2,P3 25 15 40x15 

C3:P1,P2,P3 28/29 30 60x30 

C4:P1,P2,P3 49 60 60x60 

C5:P1,P2,P3 73 90 60x90 

C6:P1,P2,P3 97 120 80x120 

C7:P1,P2,P3 196/197 240 160x240 

N1 10 40 40x40 

N2 20 50 30x50 

N3 30 50 30x50 

6 

Test 

set 

Burke et al. 2004 

Table 4. Test sets found in literature. 

Name 

Size 

# of items 

Optimal 

Height 

box Size 

N4 40 80 80x80 

N5 50 100 100x100 

N6 60 100 50x100 

N7 70 100 80x100 

N8 80 80 100x80 

N9 100 150 50x150 

N10 200 150 70x200 

N11 300 150 70x200 

N12 500 300 100x300 

N13 3152 960 640x960 

The following table presents the obtained results by applying our GAguided+BLF2G 

policy with regard to GA+BLF, SA+BLF, New Best-Fit find in Burke et al. (2004). 

Test 

sets 

Hopper & Turton 2001 

Name GA+ SA+ New 

BLF BLF Best-Fit 

C1P1 20 20 21 20 

C1P2 21 21 22 22 

C1P3 20 20 24 21 

C2P1 16 16 16 16 

C2P2 16 16 16 16 

C2P3 16 16 16 15 

C3P1 32 32 32 32 

C3P2 32 32 34 33 

C3P3 32 32 33 31 

C4P1 64 64 63 66 

C4P2 63 64 62 65 

C4P3 62 63 62 62 

C5P1 95 94 93 95 

C5P2 95 95 92 97 

C5P3 95 95 92 95 

C6P1 127 127 123 125 

C6P2 126 126 122 128 

GAguided+BLF2 

G 

Test 

sets 

Burke et al. 2004 

Name GA+ 

BLF 

SA+ 

BLF 

New 

Best-Fit 

GAguided+BLF2 

G 

C6P3 126 126 124 126 

C7P1 255 255 247 250 

C7P2 251 253 244 248 

C7P3 254 255 245 249 

N1 40 40 45 40 

N2 51 52 53 50 

N3 52 52 52 53 

N4 83 83 83 87 

N5 106 106 105 105 

N6 103 103 103 106 

N7 106 106 107 116 

N8 85 85 84 85 

N9 155 155 152 153 

N10 154 154 152 154 

N11 155 155 152 153 

N12 313 312 306 309 

N13 - - 964 Out of service 

Table 5. comparison of the GA guided+BLF2G heuristic to GA+BLF, SA+BLF and New Best- 

Fit heuristics. 

We notice that for the test sets of small-size the method GA+BLF and SA+BLF gave 

results better than the heuristics New Best Fit. Our method gave comparable and 

sometimes better results than the other methods, such thing is explained by the care of 

the constraint of guillotine. For the test sets C2P3 and N2 our method reached the 

168

optimal, while the other methods did not manage to reach it, although they are free of 

the guillotine constraint. And that shows the power of our method in the exploitation 

of the material. 

For the test sets of medium size our method kept its position with regard to the 

other methods. But for the test sets of big size our method shows a better score with 

regard to the methods GA+BLF, and SA+BLF. And that confirms the good adopted 

laying out policy. The New Best-Fit heuristic marked a score better than our method 

for the test sets of large-size. And that confirms the failure of the GA in front of the 

test sets of large-size. 

The following graph compares the four methods according to the percentage of fall. 

% over optimal 

25 

20 

15 

10 

5 

0 

N1 

C1P3 

N2 

C2P2 

C3P1 

C3P2 

N4 

C4P2 

7 

N5 

N7 

Tests sets 

C5P2 

N8 

C6P2 

GA+BLF 

SA+BLF 

New Best-Fit 

GAguided+BLF2G 

Fig. 4. Comparison of the GAguided+BLF2G heuristic to GA+BLF, SA+BLF and New 

Best-Fit heuristics (% over optimal) 

According to the graph, we notice that the New Best-Fit heuristic has badly started for 

the test sets of small-size. But for the test sets of medium and big size it took over and 

shows better results. Both heuristics {GA, SA}+BLF, gave comparable results between 

them, with a light superiority of the genetic algorithm, whatever the size of 

problem. 

For the test sets of small-size, our method gave comparable results to GA+BLF and 

SA+BLF heuristics, and sometimes better. For C1P3 and C1P2 of small size, our 

method had a bad results. 

For the test sets of medium-size our method diverge with regard to the other heuristics, 

with some comparable results. 

For the test sets of large-size, our method gave better results with regard to 

{GA,SA}+BLF, but it stays below the New Best-Fit heuristics. 

5 Conclusion 

Our contribution, in the problem of rectangular cut, showed its efficiency. First of all 

we developed a powerful routine in exploitation of residues named BLF2G. This routine 

takes into account the constraint of cut from edge to edge. Secondly, we guided 

the genetic algorithm with the greedy heuristics DH, by the introduction of the DH 

individual in the initial population. 

The use of the GAguided combined with our routine BLF2G, has allowed to reach the 

optimal for Msa17 test set (a, b, and c) of small-size. But for the test sets of medium 

and large-size, the GA failed to investigate the space of search to find the sequence of 

items which offers the optimal solution. 

The comparisons made with the other methods which do not take the guillotine 

constraint into account on test sets made to insure the optimum without considering 

169 

N9 

C7P3 

N10 

N12

this constraint of guillotine, are very encouraging. The optimal is reached by our 

method repeatedly, especially for C2P3 and N2, such result is not reached by the other 

methods. Such thing explained the legitimacy of our heuristics of placement, in exploitation 

of residues. 

The Guided GA combined with our heuristic BLF2G allowed us to show the qualities 

of our method of placement for the test sets of small and mid-size. For the test sets 

of big size, our method BLF2G+GAguided was out of service for the test set N13 of 

spatial size. Almost the same thing occurred for the test sets MT01, … MT10, of 

Burke et al. [5]. 

Our BLF2G routine depends totally on the GA to find the optimal order of items. 

In perspective, we intend in the near future to give more intelligence to our heuristics 

BLF2G, by applying new policies of placement, of Best-Fit type, to escape from defects 

met by the GA. 

References 

1. B. S. Baker, E. G. Coffman, and R. L. Rivest, “Orthogonal packings in two dimensions,” 

SIAM Journal of Computing, 9, 4, pp. 846-855, 1980. 

2. S. Ben Messaoud, C. Chu, and M. L. Espinouse, “An approach to solve cutting stock 

sheets,” IEEE International Conference on Systems, Man and Cybernetics, pp. 5109-5113, 

2004. 

3. J. O. Berkey, and P. Y. Wang, “Two dimensional Finite bin packing algorithms.” Journal 

of the Operational Research Society, 38, pp. 423-429, 1987. 

4. E. K. Burke, G. Kendall and G. Whitwell, “A New Placement Heuristic for the Orthogonal 

Stock Cutting Problem,” Operations Research, vol. 52, no. 4, pp. 655–671, 2004. 

5. E. K. Burke, G. Kendall, and G. Whitwell, “A Simulated Annealing Enhancement of the 

Best-Fit Heuristic for the Orthogonal Stock-Cutting Problem,” INFORMS Journal on 

Computing, Vol. 21, No. 3, pp. 505-516, 2009. 

6. D. E. Goldberg, “Genetic algorithms in search, optimization, and machine learning,” 

Reading, MA: Addison-Wesley, 1989. 

7. E. Hopper, and B. Turton, “A genetic algorithm for a 2D industrial packing problem,” 

Computers and Industrial Engineering, vol. 37/1-2, pp. 375-378, 1999. 

8. E. Hopper, and B. Turton, “An empirical investigation of metaheuristic and heuristic algorithms 

for a 2D packing problem.” Eur. J. Oper. Res., 128, pp. 34-57, 2001. 

9. S. Jakobs, “On genetic algorithms for the packing of polygons,” European Journal of Operations 

Research, n° 88, pp. 165-181, 1996. 

10. D. Liu, and H. Teng, “An improved BL-algorithm for genetic algorithm of the orthogonal 

packing of rectangles,” Eur. J. Operational Research, 112, pp. 413-420, 1999. 

11. A. Lodi, S. Martello, and D. Vigo, “Heuristic and metaheuristic approches for a class of 

two-dimensional bin packing problems,” INFORMS journal on computing, Vol 11, pp. 

345-357, 1999. 

12. Z. Michalewics, “Genetic Algorithms + Data Structures = Evolution Programs,” Third, 

Revised and Extended Edition, Springer, 1996. 

13. A. Ramesh Babu, and N. Ramesh Babu, “Effective nesting of rectangular parts inmultiple 

rectangular sheets using genetic and heuristic algorithms,” International Journal of Production 

Research, Vol. 37, n°7, pp. 1625-1643, 1999. 

8 

170

Solving Distributed FCSPs with Naming Games ⋆ 

Stefano Bistarrelli 1,2 ,GiorgioGosti 3 and Francesco Santini 1 

1 Dipartimento di Matematica e Informatica, Università degli Studi di Perugia, Italy 

[bista,francesco.santini]@dipmat.unipg.it 

2 Istituto di Informatica e Telematica (IIT-CNR), Pisa, Italy 

stefano.bistarelli@iit.cnr.it 

3 Institute for Mathematical Behavioral Sciences, University Of California, Irvine, USA 

ggosti@uci.edu 

Abstract. Constraint Satisfaction Problems (CSPs) are the formalization of a 

large range of problems that emerge from computer science. The solving methodology 

described here is based on the Naming Game (NG). The NG was introduced 

to represent N agents that have to bootstrap an agreement on a name to give to 

an object (i.e. a word). In this paper we focus on solving Distributed FCSPs with 

an algorithm for NGs: each word on which the agents have to agree on is associated 

with a preference represented as a fuzzy score. The solution is the agreed 

word associated with the highest preference value. The two main features that 

distinguish this methodology from other DisFCSP solving methods are that the 

system can react to small instance changes and and it does not require pre-agreed 

agent/variable ordering. 


This paper presents a distributed method to solve Distrubuted Fuzzy Constraint Satisfaction 

Problems (DisFCSPs) [11,15,8,9,14]thatcomesfromageneralizationofthe 

Naming Game (NG)model[12,1,10,7].DisFSCPscanbeappliedtodealwithresource 

allocation, collaborative scheduling and distributed negotiation [8]. 

In DisFCSP protocols, the aim is to design a distributed architecture of processors, 

or more generally a group of agents, who cooperate to solve a fuzzy CSP instantiation. 

In this framework, we see the problem as a dynamic system and we selectthestable 

states of the system as the solutions to our CSP. To do this we design each agent so that 

it will move towards a stable local state. This system may be called “self-stabilizing” 

whenever the global stable state is obtained through the reinforcement of the local stable 

states [6]. When the system finds the stable state, the DisFCSP instantiationissolved. 

Aprotocoldesignedinthiswayisresistanttodamageandexternal threats because it 

can react to changes in the problem instance. Moreover, in ourapproachallagentshave 

equal chance to reveal private information. 

⋆ Research partially supported by the MIUR PRIN 20089M932N: “Innovative and multidisciplinary 

approaches for constraint and preference reasoning”, by CCOS FLOSS project 

“Software open source per la gestione dell’epigrafia dei corpus di lingue antiche”’, and by 

INDAM GNCS project “Fairness, Equità e Linguaggi”. 

171

The NGs describe a set of problems in which a number of agents bootstrap a commonly 

agreed name for one or more objects. In this paper we discuss a NG generalization 

in which agents have individual fuzzy preferences over words. This is a natural 

generalization of the NG, because it models the endogenous agents’s preferences and 

attitudes towards certain object naming system. Moreover, we add binary fuzzy constraints 

that represent exogenous causes that effect the agents preferences. As shown 

in [3, 4], a NG can be viewed as a particular crisp CSP instance. But,ifweaddpreference 

levels and constraints, the NG is no longer a crisp combinatorial problem: this 

new game may be interpreted as an optimization problem. 

This paper extends the results of [3, 4] in which non-fuzzy DCSPs are solved with 

NGs. The paper is organized as follows: in Sec. 2 we present the backgroundonNGs. 

Section 3 presents the algorithm in order to solve DisFCSPs. Then, Sec. 4 presents the 

tests and the results for the fuzzy NG algorithm. At last, Sec. 5summarizestherelated 

work and Sec. 6 reports the conclusions and ideas about futurework. 

2 Background onNamingGames 

The NGs [12, 1, 10, 7] describe a set of problems in which a number of agents bootstrap 

acommonlyagreednameforoneormoreobjects.Thegameisplayed by a population 

of N agents which play pairwise interactions in order to negotiate conventions, i.e. 

associations between forms and meanings, and it is able to describe the emergence of a 

global consensus among them. For the sake of simplicity the model does not take into 

account the possibility of homonyms, so that all meanings areindependentandonecan 

work with only one of them, without loss of generality. An example of such a game is 

that of a population that has to reach the consensus on the name(i.e.theform)toassign 

to an object (i.e. the meaning) exploiting only local interactions. However, as it will be 

clear, the model is appropriate to address all those situations in which negotiation rules 

adecisionprocess(i.e.opiniondynamics,etc.)[1]. 

Each NG is defined by an interaction protocol. There are two important aspects of 

the NG: the agents randomly interact and use a simple set of rules to update their state; 

the agents converge to a consistent state in which the object has assigned a uniquely 

name, by using a distributed social strategy. 

Generally, at each turn, two agents are randomly extracted to performtheroleof 

the speaker and the listener (or hearer as used in [12, 1]). The interactionbetweenthe 

speaker and the listener determines the agents’ update of their internal state. DCSPs and 

NGs share a variety of common features [3, 4]. 

The definition of Self-stabilizing algorithm in distributed computing was first introduced 

by [6]. A system is self-stabilizing whenever, each system configuration associated 

with a solution is an absorbing state (global stable state), and any initial state 

of the system is in the basin of attraction of at least one solution. Inaself-stabilizing 

algorithm, we program the agents of our distributed system tointeractwiththeirneighbors. 

The agents update their state through these interactions by trying to find a stable 

state in their neighborhood. Since the algorithm is distributed many legal configurations 

of the agents’ states and their neighbors’ states start arisingsparsely.Notallof 

these configurations are mutually compatible, and so they form mutually inconsistent 

172

potential cliques. The self-stabilizing algorithm must find awaytomakethegloballegal 

state emerge from the competition between these potential cliques. Dijkstra [6] and 

Collin [5] suggest that an algorithm designed in this way can not always converge, and 

aspecialagentisneededtobreakthesystemsymmetry.Moreprecisely, Dijkstra [6] 

and Collin [5] show that we can not guarantee that a system of uniform finite state machines 

can always solve the ring ordering problem. However, in [4] the authors show 

that an naming game based algorithm with homogeneous agents can find the ring ordering 

problem solution with probability 1. 

3 Solving DisFCSPswithNaming Games 

As in [15], we assign to each variable xi ∈ X of the DisFCSP P = 〈X, D, C, A〉, an 

agent ai ∈ A. Weassumethateachagentknowsalltheconstraintsthatactover its X 

variables [15]. Each agent i =1, 2,...,N (where |A| = N) searchesitsownvariable 

domain di ∈ D for its variable assignment that optimizes P .Thedegreeofsatisfaction 

of a fuzzy constraint tells us to what extent it is satisfied. Otherwise stated, the 

goal of the game is to make the agents find an assignment of their variablesthatmaximizes 

the overall fuzzy score result for the problem; fuzzy preferences of constraints 

are combined with min function. 

We restrict ourselves only to unary and binary constraints. Each agent has a unary 

constraint ci with support defined over its variable xi ∈ X; thisunaryconstraintsrepresent 

the local preference of the agents for each variable assignment di ∈ D. Any 

binary constraint ci,j returns a preference value p ∈ [0, 1] which states the combined 

preference over the assignment of xi and xj together. η[ai := b] is the set of all possible 

assignments of the variables in X such that variable ai is assigned b. caiη[ai := b] represents 

the preference level of agent ai for assignment b, andcai,aj η[ai := b, aj := d] 

represents combined preference level of agents ai and aj for the respective assignments 

b and d. Inthefollowing,wewillusethe symbol to directly perform the composition 

of fuzzy constraints, and c {s} to denote the set of constraints that act over s. Thus, 

maxb∈Ds( c {s}η[s := b]) defines the best fuzzy level that an agent can take given 

its knowledge of the surrounding constraints and its assignment b. Respectively,top 

is the set of domain assignments with the maximum fuzzy value of c {s}η[s := b], 

top = {b ∈ Ds|b =argmaxb∈Ds( c {s}η[s := b])}. Wemaysaythatthecommunication 

network is determined by the network of binary constraints, since we suppose an 

agent ai ∈ A can communicate only with the aj ∈ A agents sharing a binary constraint, 

i.e. ci,j ∈ C. 

At the beginning, each agent marks an element b that maximizes c {s}η[s := b], 

this is the elements that the agents prefers to be in the final solution. At each turn, the 

algorithm is based on two entities: a single speaker, whichcommunicatesinbroadcast 

its choice on the word and the related fuzzy preference and a set of listeners.Thelisteners 

are all the agents that share a constraint with the speaker. At each turn t,anagentis 

drawn with uniform probability to be the speaker. In the following we describe in detail 

each step of the interaction scheme that defines the behavior between the speaker and 

the listeners: we consider three phases, i) broadcast, ii) feedback and iii) update. 

173

3.1 Interaction Protocol 

Broadcast The speaker s executes the broadcast protocol. The speaker checks if the 

marked variable assigment b is in top. Ifthemarkedvariableassignmentisnotintop 

it selects a new variable assignment b with uniform probability from top, andmarksit. 

Then it sends the couple (b, max( 

b∈Ds 

c {s}η[s := b])) to all its neighboring listeners. 

Feedback All the listeners receive the broadcast message (b, u) from the speaker. Each 

listener l computes ∀dk, c {s,l}η[s := b][l = dk] (let’ us call this value vk for any 

chosen dk), that is it computes the combination of the fuzzy preferences (i.e. vk) for 

each dk assignment, supposing that s chooses word b. Eachlistenersendsbacktos a 

feedback message according to the following two cases: 

– Failure. Ifu > max 

k (vk) there is a failure, andthelistenerfeedbacksafailure 

message containing the maximum value and the corresponding assignment for l, 

Fail(max 

k (vk),l = dk). 

– Success.Ifu ≤ max 

k (vk), thereisasuccess, thelistenerfeedbacksSucc. 

Update The listener’s feedback determines the update of the listener and of the speaker. 

When the listener feedbacks a Succ,thenthelisteneralsolowerthepreferencelevelfor 

all the vk with a higher preference value: ∀vk.vk >uthen it sets vk = u.Ifthespeaker 

receives only Succ feedback messages from all its listeners, then it does not need to 

update. 

Otherwise, that is if the speaker receives a number of Fail(vj,lj = dj) feedback 

messages from h listeners (with h ≥ 1 and 1 ≤ j ≤ h), then it selects the 

worst vw fuzzy preference such that ∀j, vw ≤ vj. Thenitsendstoalllistenersa 

FailUpdate(c {lw}η[lw := bw]). Thus,thespeakersetsitsassignmenttob with the 

worst fuzzy preference level among the failure feedback messages of the listeners, i.e. 

c {s}η[s := b] =vw. Inaddition,eachlistenerl sets vl = vw, i.e.c {s,l}η[s := b][l := 

dl] =vw. 

3.2 Theorems 

With Lemma 1 we state that a subset of constraints C ′ ⊆ C has a higher fuzzy preference 

w.r.t. C. Wesaythatafuzzyconstraintproblemisα-consistent if it can be solved 

with a level of satisfiability of at least α (see also [2]). 

Lemma 1 ([2]). Consider a set of constraints C and any subset C ′ of C.Thenwehave 

C ≤ C ′ . 

The speaker selection rule defines a probability distribution function F that tells us 

the probability that a certain domain assignment is selected. c {s}η[s := b]) and the 

marked word determine F .InLemma2werelateF to the convergence of the algorithm 

with probability 1,relatedtothelevelofsatisfiabilityoftheproblem. 

174

Lemma 2. If the F function selects only the domain elements with preference level 

larger then α, thenthealgorithmconvergeswithprobability1,onlyifSol(P ) ≥ α. 

From [3, 4], if the F function chooses a random element in the word domain, then 

the algorithm converges to the same word, but this word could not be the optimal one, 

i.e. the word with the highest fuzzy preference. If we choose F in order to select only 

words with a preference greater than α,thenthealgorithmconvergestoasolutionwith 

aglobalpreferencegreaterthanα. 

With Prop. 1 and Prop. 2 we prepare the background for the main theorem of this 

section, i.e. Th. 1. Proposition 1 shows the stabilization of thealgorithmaftersome 

time, while Prop. 2 states that the algorithm converges with aprobabilityof1. 

Proposition 1. For time t → +∞, theweightassociatedtotheoptimalsolutionis 

equal for all the agents, and its equal to the minimum preference level of that word. 

Proposition 2. For any probability distribution F the algorithm converges with a probability 

of 1. 

At last, we state that the presented algorithm always converge to the best solution 

of the DisFCSP. 

Theorem 1. Since i) the algorithm always converges (see Prop. 2) and ii) by choosing 

afunctionF according to Lem. 2, the algorithm in 3.2 always converges to the best 

fuzzy solution, i.e. to the solution with the highest preference possible. 


To evaluate the runs we define the probability of a successful interaction at time, Pt(succ), 

given the state of the system at that time. Pt(succ) is determined by the probability that 

an agent is a speaker at time t, andtheprobabilitythatagent’sinteractionisasuccess 

Pt(succ|s = ai), Pt(succ) = Pt(succ|s = ai)P (s = ai). ThePt(succ|s = ai) 

depend on the state of the agent at time t. Inparticularitdependsonthevariableassignment 

(or word) b selected by F ,andifc {s}η[s := b] ≤ c {l}η[l := b]. Givenan 

algorithm run, at each time t we can compute Pt(succ|s = ai) over the states of all 

agents before that the interaction is performed. Since P (s = ai) =1/N ,wecancompute 

Pt(succ) = Pt(succ|s = ai)/N . 

For our benchmark, let us define a Random Fuzzy NG instance (RFNG). To generate 

such an instance, we assign to each agent the same domain of names D, andforeach 

agent and each agent’s name we draw a preference level between [0, 1] from a unifom 

distribution. Moreover, RFNG can only have crisp binary equality constraints. We also 

define the Path RFNG Instance [4] which is a RFNG instance, in which the constraint 

network is a path graph.Apathgraph(orlineargraph)isaparticularlysimpleexample 

of a tree, which has two terminal vertices (vertices that havedegree1), while all others 

(if any) have degree 2. 

We generated 5 such random instances, with 10 agents and 10 words each. For each 

one of these instances, we computed using a brutal force algorithm the best preference 

level and the word associated to this solution. Then, we ran this algorithm 10 times on 

175

each instance. To decide when the algorithm finds the solution, a graph crawler checks 

the agents’ marked words, and their marked words preferences. If all the agents agree 

on the marked variable, this means they find an agreement on thename.Then,thegraph 

crawler checks if the shared word has a preference level equaltothebestpreference,in 

such case we conclude that the algorithm has found the optimalsolution. 

In Fig. 1 we measure the evolution in time of Pt(succ) for the path RFNG instance. 

When Pt(succ) = 1,allinteractionsaregoingtobesuccessful,thusweareinan 

absorbing state, which from Th. 1, we know it is also a solution. 

Pt(succ) 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 50 100 150 200 250 300 

t 

Run 1 

Run 2 

Run 3 

Run 4 

Run 5 

Fig. 1. Evolution of the mean Pt(succ) over 5 different path RFNG instances. For each instance, 

we computed the mean Pt(succ) over 10 different runs. We set N =10,andthenumberof 

words to 10. 

In Fig. 2, we show the scaling of the mean number of messages MNM needed 

to the system to find a solution for different numbers of N variables in the path RFNG 

instances. For each N,theMNM was measured over 5 different path RFNG instances. 

We notice that the points approximately overlaps the function cN 1.8 . 

5 Related Work 

Whilst a number of approaches have been proposed to solve DCSPs [11, 15] or centralized 

FCSP [11] alone, only a few work is related to the combination of DCSPs and 

fuzzy CSPs. It is important to notice the fundamental difference with the DCSP algorithms 

designed by Yokoo [15]. Yokoo addresses three fundamental kinds of DCSP 

algorithms: Asynchronous Backtracking, Asynchronous weak-commitment Search and 

Distributed Breakout Algorithm [15]. Although these algorithms share the property of 

being asynchronous, they require a pre-agreed agent/variable ordering. The algorithm 

presented in this paper does not need this initial condition. 

DisFCSPs has been of interest to the Multi-Agent System community, especially in 

the context of distributed resource allocation, collaborative scheduling, and negotiation 

176

1e+08 

1e+07 

1e+06 

MNM 100000 

10000 

1000 

100 

Path RFNG 

cN a 

♦ 

♦ 

♦ 

♦ 

♦ 

♦ 

10 100 1000 

Fig. 2. Scaling of the mean number of messages MNM needed to the system to find a solution 

for different numbers of variables N in path RFNG instances. For each N, theMNM was 

measured over 5 different path RFNG instances. We notice thatthepointsapproximatelyoverlap 

the function cN 1.8 . 

(e.g. [8]). Those works focus on bilateral negotiations and when many agents take part, 

acentralcoordinatingagentmayberequired.Forexample,the work in [8] promotes a 

rotating coordinating agent which acts as a central point to evaluate different proposals 

sent by other agents. Hence the network model employed in those work is not totally 

distributed. 

In [13, 14] the authors define the fuzzy GENET model for solving binaryFCSPs. 

Fuzzy GENET is a neural network model for solving binary FCSPs. Through transforming 

FCSPs into [0, 1] integer programming problems, they display the equivalence between 

the underlying working mechanism of fuzzy GENET and thediscreteLagrangian 

method. Benchmarking results confirm its feasibility in tackling CSPs and flexibility in 

dealing with over-constrained problems. 

In [9] the authors propose two approaches to solve these problems: An iterative 

method and an adaptation of the Asynchronous Distributed constraint OPTimization 

algorithm (ADOPT)forsolvingDisFCSP.Theyalsopresentexperimentsontheperformance 

comparison of the two approaches, showing that ADOPT is moresuitablefor 

low density problems (density = num of links / number of agents). 

6 Conclusions and Future Work 

In this paper we have shown how NG problems [12, 1, 10, 7] can be extended with fuzzy 

preferences over words in order to solve a generic instance of aDisFCSP[11,15,8,9, 

14]. In the study, of such an algorithm we try to fully exploit the power of distributed 

calculation. Our algorithm is based on the random exploration of the system state space: 

it travels through the possible states until it finds the absorbing state, where it stabilizes. 

These goals are achieved through the union of new topics addressed in statistical physics 

(the NG), and the abstract framework posed by constraint solving. 

177 

N 

♦ 

♦ 

♦

In other words, we show that a DisFCSP algorithm may work without a predetermined 

agent ordering, and can probabilistically solve instances that where not thought 

to be solvable by such algorithms. Moreover, in the real world, a predetermined agent 

ordering may be a quite restrictive assumption. Hence, it is very important to explore 

and understand how such distributed systems may work and whatproblemsmayexist. 

In future work, we intend to evaluate an asynchronous version ofthisalgorithmin 

depth, and to test it using other comparison metrics, such as communication cost (number 

of message sent), NCCCs (number of non-concurrent constraint checks). Moreover, 

we would like to compare our algorithm against other distributed and asynchronous algorithms, 

such as the fuzzy GENET, and the fuzzy ADOPT. Furthermore, we will try 

to generalize it to generic semiring-based CSP instances [2], and not only fuzzy CSPs. 

References 

1. A. Baronchelli, M. Felici, E. Caglioti, V. Loreto, and L. Steels. Sharp transition towards 

shared vocabularies in multi-agent systems. CoRR,abs/physics/0509075,2005. 

2. S. Bistarelli. Semirings for Soft Constraint Solving and Programming, volume2962of 

LNCS. Springer,2004. 

3. S. Bistarelli and G. Gosti. Solving CSPs with naming games. In A. Oddi, F. Fages, and 

F. Rossi, editors, CSCLP,volume5655ofLNCS,pages16–32.Springer,2008. 

4. S. Bistarelli and G. Gosti. Solving distributed CSPs probabilistically. Fundam. Inform., 

105(1-2):57–78, 2010. 

5. Z. Collin, R. Dechter, and S. Katz. On the feasibility of distributed constraint satisfaction. In 

IJCAI,pages318–324,1991. 

6. E. W. Dijkstra. Self-stabilizing systems in spite of distributed control. Commun. ACM, 

17:643–644, November 1974. 

7. N. L. Komarova, K. A. Jameson, and L. Narens. Evolutionary models of color categorization 

based on discrimination. Journal of Mathematical Psychology, 51(6):359–382,2007. 

8. X. Luo, N. R. Jennings, N. Shadbolt, H. Leung, , and J. H. Lee. A fuzzy constraint based 

model for bilateral, multi-issue negotiations in semi-competitive environments. Artif. Intell., 

148:53–102, August 2003. 

9. X. T. Nguyen and R. Kowalczyk. On solving distributed fuzzy constraintsatisfactionproblems 

with agents. In Proceedings of the 2007 IEEE/WIC/ACM International Conference on 

Intelligent Agent Technology, IAT’07,pages387–390.IEEEComputerSociety,2007. 

10. M. A. Nowak, J. B. Plotkin, and D. C. Krakauer. The evolutionary language game. Journal 

of Theoretical Biology, 200(2):147–162,September1999. 

11. F. Rossi, P. van Beek, and T. Walsh. Handbook of Constraint Programming (Foundations of 

Artificial Intelligence). ElsevierScienceInc.,NewYork,NY,USA,2006. 

12. L. Steels. A self-organizing spatial vocabulary. Artificial Life,2(3):319–332,1995. 

13. J. Wong, K. Ng, and H. Leung. A stochastic approach to solving fuzzy constraint satisfaction 

problems. In Eugene Freuder, editor, Principles and Practice of Constraint Programming, 

volume 1118 of LNCS,pages568–569.Springer,1996. 10.1007/3-540-61551-2-119. 

14. J. H. Y. Wong and H. Leung. Extending genet to solve fuzzy constraint satisfaction problems. 

In Artificial intelligence/Innovative applications of artificial intelligence, AAAI’98IAAI 

’98, pages 380–385, Menlo Park, CA, USA, 1998. AAAI. 

15. M. Yokoo and K. Hirayama. Algorithms for distributed constraint satisfaction: A review. 

Autonomous Agents and Multi-Agent Systems, 3:185–207,June2000. 

178

On Improving MUS Extraction Algorithms 

Joao Marques-Silva 1,2 and Inês Lynce 2 

1 University College Dublin 

jpms@ucd.ie 

2 INESC-ID/IST, TU Lisbon 

ines@sat.inesc-id.pt 

Abstract. Minimally Unsatisfiable Subformulas (MUS) find a wide 

range of practical applications, including product configuration, 

knowledge-based validation, and hardware and software design and verification. 

MUSes also find application in recent Maximum Satisfiability 

algorithms and in CNF formula redundancy removal. Besides direct applications 

in Propositional Logic, algorithms for MUS extraction have 

been applied to more expressive logics. This paper proposes two algorithms 

for MUS extraction. The first algorithm is optimal in its class, 

meaning that it requires the smallest number of calls to a SAT solver. 

The second algorithm extends earlier work, but implements a number of 

new techniques. The resulting algorithms achieve significant performance 

gains with respect to state of the art MUS extraction algorithms. 

This paper appears in: 

Karem A. Sakallah and Laurent Simon (eds.) 

Proceedings of the 14th International Conference on Theory and Applications of 

Satisfiability Testing (SAT 2011). 

Lecture Notes in Computer Science, volume 6695, pages 159–173. 

Springer, 2011. 

The full paper is available at: 

http://dx.doi.org/10.1007/978-3-642-21581-0_14 




179

Applying UCT to Boolean Satisfiability 

Alessandro Previti 1 , Raghuram Ramanujan 2 , Marco Schaerf 1 , and Bart Selman 2 

1 Dipartimento di Informatica e Sistemistica Antonio Ruberti 

Sapienza, Università di Roma 

Roma, Italy 

elsandro84@gmail.com, marco.schaerf@uniroma1.it 

2 Department of Computer Science 

Cornell University 

Ithaca, New York 

{raghu, selman}@cs.cornell.edu 

Abstract. In this paper, we investigate the feasibility of applying UCTstyle 

techniques to the satisfiability of CNF formulae. We develop a new 

family of algorithms based on the idea of balancing exploitation (depthfirst 

search) and exploration (breadth-first search), combined with a 

simple heuristic evaluation of nodes. We compare our algorithm with 

a DPLL-based algorithm and WalkSAT, using the size of the tree and 

the number of flips as the performance measure. While our approach performs 

on par with DPLL on instances with little structure, it does quite 

well on structured instances where it can effectively reuse information 

gathered from one iteration on the next. We conclude with a discussion 

of a number of avenues for future work. 


Karem A. Sakallah and Laurent Simon (eds.) 

Proceedings of the 14th International Conference on Theory and Applications of 

Satisfiability Testing (SAT 2011). 

Lecture Notes in Computer Science, volume 6695, pages 373–374. 



http://dx.doi.org/10.1007/978-3-642-21581-0_35 




180

An Efficient Hierarchical Parallel Genetic 

Algorithm for Graph Coloring Problem 

Reza Abbasian and Malek Mouhoub 

Department of Computer Science 

University of Regina 

Regina, Canada 

{abbasiar, mouhoubm}@cs.uregina.ca} 

Abstract. Graph coloring problems (GCPs) are constraint optimization 

problems with various applications including scheduling, time tabling, 

and frequency allocation. The GCP consists in nding the minimum number 

of colors for coloring the graph vertices such that adjacent vertices 

have distinct colors. We propose a parallel approach based on Hierarchical 

Parallel Genetic Algorithms (HPGAs) to solve the GCP. We also 

propose a new extension to PGA, that is Genetic Modication (GM) operator 

designed for solving constraint optimization problems by taking 

advantage of the properties between variables and their relations. Our 

proposed GM for solving the GCP is based on a novel Variable Ordering 

Algorithm (VOA). In order to evaluate the performance of our new approach, 

we have conducted several experiments on GCP instances taken 

from the well known DIMACS website. The results show that the proposed 

approach has a high performance in time and quality of the solution 

returned in solving graph coloring instances taken from DIMACS 

website. The quality of the solution is measured here by comparing the 

returned solution with the optimal one. 


Natalio Krasnogor (ed.) 

Proceedings of the 13th annual conference on Genetic and evolutionary computation 

(GECCO 2011), pages 521–528. 

ACM, 2011. 


http://dx.doi.org/10.1145/2001576.2001648 




181

Checking Safety of Neural Networks with SMT 

Solvers: a Comparative Evaluation 

Luca Pulina 1 and Armando Tacchella 2 

1 DEIS, Università di Sassari, Italy 

lpulina@uniss.it 

2 DIST, Università di Genova, Italy 

Armando.Tacchella@unige.it 

Abstract. In this paper we evaluate state-of-the-art SMT solvers on 

encodings of verification problems involving Multi-Layer Perceptrons 

(MLPs), a widely used type of neural network. Verification is a key technology 

to foster adoption of MLPs in safety-related applications, where 

stringent requirements about performance and robustness must be ensured 

and demonstrated. In previous contributions, we have shown that 

safety problems for MLPs can be attacked by solving Boolean combinations 

of linear arithmetic constraints. However, the generated encodings 

are hard for current state-of-the-art SMT solvers, limiting our ability to 

verify MLPs in practice. The experimental results herewith presented 

are meant to provide the community with a precise picture of current 

achievements and standing open challenges in this intriguing application 

domain. 


R. Pirrone and F. Sorbello (eds.) 

Proceedings of the 12th International Conference of the Italian Association for 

Artificial Intelligence (AI*IA 2011). 

Lecture Notes in Computer Science, volume 6934. 





182

Plan Stability: Replanning versus Plan Repair 

Maria Fox 1 , Alfonso Gerevini 2 , Derek Long 1 , and Ivan Serina 2 

1 Department of Computer and Information Sciences 

University of Strathclyde, Glasgow, UK 

rstname.lastname@cis.strath.ac.uk 

2 Department of Electronics for Automation 

University of Brescia, Italy 

lastname@ing.unibs.it 

Abstract. The ultimate objective in planning is to construct plans for 

execution. However, when a plan is executed in a real environment it 

can encounter differences between the expected and actual context of 

execution. These differences can manifest as divergences between the 

expected and observed states of the world, or as a change in the goals 

to be achieved by the plan. In both cases, the old plan must be replaced 

with a new one. In replacing the plan an important consideration is plan 

stability. We compare two alternative strategies for achieving the stable 

repair of a plan: one is simply to replan from scratch and the other is 

to adapt the existing plan to the new context. We present arguments 

to support the claim that plan stability is a valuable property. We then 

propose an implementation, based on LPG, of a plan repair strategy 

that adapts a plan to its new context. We demonstrate empirically that 

our plan repair strategy achieves more stability than replanning and can 

produce repaired plans more efficiently than replanning. 


Derek Long, Stephen F. Smith, Daniel Borrajo, Lee McCluskey (eds.) 

Proceedings of the 16th International Conference on Automated Planning and 

Scheduling (ICAPS 2006), pages 212–221. 

AAAI, 2006. 


http://www.aiconferences.org/ICAPS/2006/Papers/ICAPS06-022.pdf 




183

Proceedings - ijcai-11

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?