10.07.2015 Views

Catalysis of Organic..

Catalysis of Organic..

Catalysis of Organic..

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Rothenberg et al. 263By dividing the problem this way, we translate it from an abstract problem incatalysis to one <strong>of</strong> relating one multi-dimensional space to another. This is still anabstract problem, but the advantage is that we can now quantify the relationshipbetween spaces B and C using QSAR and QSPR models. Note that space B containsmolecular descriptor values, rather than structures. These values, however, aredirectly related to the structures (8).Of course, the real catalyst space is infinite, and it is not possible to study all <strong>of</strong>it. Instead, we generate a very large space A (10 15 –10 17 catalysts) in silico, using avirtual synthesis platform, developed in our group and based on a ‘building block’synthesis concept (9). One basic assumption that we make here is that the ‘goodcatalyst’ we are seeking is somewhere on this grid. Note that less than 50 buildingblocks are needed to create a space <strong>of</strong> 10 17 catalysts, even when using only simplespecies that are joined selectively using a number <strong>of</strong> well defined reactions.To optimize the catalyst, we use an iterative approach (Figure 2), withconsecutive modelling, synthesis, and analysis steps. First, we consider all <strong>of</strong> theavailable data (from earlier experiments or from literature), and build a regressionmodel that connects the catalyst descriptors and the figures <strong>of</strong> merit (10). Thescreening is done in two stages. In the first, ‘rough screening’, we use 2D descriptorsto examine relatively large areas <strong>of</strong> space A. We select random subsets from thisspace (typically 10,000–50,000 catalysts). The program calculates the 2D catalystdescriptor values and uses the above model to predict the figures <strong>of</strong> merit for thesenew catalysts. Depending on the data available, one can also apply geneticalgorithms (GAs) at this stage to try and optimize the catalyst structure based on the2D descriptors using meta-modelling (11). The best catalysts (typically 200–500structures) are then selected for the next stage.In the second, ‘fine screening’ stage, the program computes the 3D descriptorsfor this new subset, and again projects the results on the model and predicts thefigures <strong>of</strong> merit. Basically, 3D descriptor models are more costly than 2D ones, butthey give better results (12). As we showed earlier (10, 13), nonlinear models thatcombine chemical and topological descriptors are well suited for predictingactivity/selectivity trends in homogeneous catalyst libraries, with typical correlationcoefficients <strong>of</strong> R 2 = 0.8–0.9.The result is a small subset <strong>of</strong> 20–50 new catalysts. These are then synthesizedand tested experimentally. The model is then updated and the cycle repeats. In theorythis process can repeat indefinitely, but our results on industrial data show that thefigures <strong>of</strong> merit usually converge after 5–6 cycles. This means that in principle it ispossible to indicate an optimal region in a space <strong>of</strong> a million catalysts after testingless than 300 ligand-metal complexes!

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!