24.02.2014 Views

Partial Differential Equations - Modelling and ... - ResearchGate

Partial Differential Equations - Modelling and ... - ResearchGate

Partial Differential Equations - Modelling and ... - ResearchGate

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

156 G. Bencteux et al.<br />

1 2 3 4 5 6 7 8 9 10<br />

G 1<br />

G 2<br />

G 3<br />

Fig. 3. Collection of p = 10 blocks into r = 3 groups.<br />

Repeat until convergence:<br />

1a. Local step on blocks: 1, 3, ..., (2i +1), ...<br />

1b. Local step on blocks: 2, 4, ..., (2i), ...<br />

2a. Global step on groups: {1, 2}, {3, 4}, ..., {2i − 1, 2i}, ...<br />

2b. Global step on groups: {2, 3}, {4, 5}, ..., {2i, 2i +1}, ...<br />

Fig. 4. Schematic view of the algorithm in the case of 2-block groups (r = 2): tasks<br />

appearing on the same line are independent from one another. Order between the<br />

steps 1a <strong>and</strong> 1b is reversed from one iteration to the other. The same holds for the<br />

steps 2a <strong>and</strong> 2b.<br />

in r overlapping groups (G l ) 1≤l≤r as shown in Figure 3. As each group only<br />

overlaps with its first neighbors, the problem (16) can be solved first for the<br />

groups (G 2l+1 ), next for the groups (G 2l ). We have observed that the number<br />

of iterations of the outer loop (local step + global step) does not significantly<br />

increase when the ‘exact’ global step (16) is replaced by the approximate<br />

global step consisting in optimizing first the odd groups, then the even groups.<br />

The numerical results performed so far (see Section 5) tend to show that the<br />

resulting algorithm scales linearly with the system size.<br />

A schematic view of the algorithm is provided in Figure 4.<br />

One important point (not taken into account in [BCHL07]) is that the<br />

Hessian of f enjoys a very specific structure. It is a sum of tensor products of<br />

square matrices of size m i . For example, with two-block groups (r =2),we<br />

have<br />

4∑<br />

HU = A (i) UB (i) (20)<br />

i=1<br />

with A (i) ∈M m2,m2 (R) <strong>and</strong>B (i) ∈M m1,m1 (R). Consequently, it is possible<br />

to compute Hessian-vector products, without assembling the Hessian, in<br />

O(m 1 m 2 max(m 1 ,m 2 )) elementary operations, instead of O(m 2 1 m 2 2) with<br />

a naive implementation. An additional source of acceleration is the fact that<br />

this formulation uses only matrix-matrix products. Efficient implementations<br />

of matrix-matrix products, taking advantage of higher numbers of floating<br />

point operations per memory access, are available in the BLAS 3 library

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!