SimRisk: An Integrated Open-Source Tool for Agent-Based ...

SimRisk: An Integrated Open-Source Tool for Agent-Based ... SimRisk: An Integrated Open-Source Tool for Agent-Based ...

users.tricity.wsu.edu
from users.tricity.wsu.edu More from this publisher
28.12.2014 Views

Figure 4: An object-oriented type hierarchy for model reuse. 5.2 Specific aim 2: develop high-performance generative simulation technology Simulation still plays an important and indispensable role in the practice of supply chain management. For this reason, we will retain simulation as part of our integrated stochastic analysis framework but will greatly improve its performance. Much of existing research on supply-chain simulation focuses on simulation algorithms and software implementation (cf. [Terzi and Cavalieri, 2004, Kleijnen, 2005]). In this research, our interest on simulation is to improve its efficiency and scalability by providing a better integration between software and hardware. Our research is motivated by recent advances in multi-core architecture and Petalevel computing platforms. These advances provide extra computation power for computers ranging from desktops to super computers. A research question is how to harness these powers to improve the speed and scalability of supply-chain simulation. Our solution is to develop a reconfigurable generative simulation approach for supply-chain analysis. For each supply-chain model, the proposed approach will on-the-fly generate simulation code that can take advantage of a targeted computer architecture. Specifically we will develop a generative simulation engine that can be reconfigured for two different types of architectures: multicore personal computers and high-performance clusters. For a desktop computer with a single multi-core processor, the engine will generate a multi-thread program. Each thread represents an agent using the IEEE POSIX thread model [IEEE and the open group, 2004]. For a cluster of multi-core processors, the generative simulation technology will generate a set of POSIX threads for each processor, and different sets of threads will communicate via Message Process Interface (MPI). A central issue in developing generative simulation engine is load balancing. That is, given 10

Server 1 Memory Server 2 Memory Server N Memory L2 cache C o r e 1 L2 cache C o r e 2 L2 cache C o r e 3 L2 cache C o r e 4 L2 cache C o r e 1 L2 cache C o r e 2 L2 cache C o r e 3 L2 cache C o r e 4 … L2 cache C o r e 1 L2 cache C o r e 2 L2 cache C o r e 3 L2 cache C o r e 4 PCI-e PCI-e PCI-e Fiber Optical Channel Figure 5: A cluster of quad-core servers. a hardware architecture and an agent-based model, how to distribute threads and processes to cores and processors for better performance. This issue has a special meaning in a cluster environment: a multi-core cluster supports both shared-memory and message-passing communication, which have very different characteristics. We will study optimal distribution of threads and processes for minimizing communication overhead in context of agent-based supply-chain simulation. Specifically we will study the following methods: (a) Explore model structure and data dependency to improve load balancing. For example, consider the supply chain in Figure 1.(b), and assume that we will run simulation on a cluster of quad-core processors, whose architecture is shown in Figure 5. Figure 6 shows a distribution of threads and processes using heuristics from model structure and data dependency. The communication between supply-chain elements is through shipments and messages. We assume that messages can only be passed along routes. As a general principle, threads for a sub-network of closely coupled elements will be placed on the cores of the same processor. These closely coupled elements require more frequent communication between them, which shall be implemented with less overhead using shared memory. As an example, in Figure 6 threads for the elements of the sub-networks of w21 a and wb 21 are assigned to the same processor, and processes for the sub-networks of s a and s b are allocated to different processors. In general, the higher elements are in a network hierarchy, the less they will communicate with each other, since the operations of elements on a higher level will be planned over a much bigger planning horizon. We reserve shared memory for communication among closely coupled low-level elements and use message passing for communication among high-level elements. (b) Profile threads and optimize thread scheduling. To further improve the performance of generated parallel simulators, we will profile the execution time and the overhead of threads, and use this information to optimize thread scheduling. Using the result from profiling, the generative simulation engine will express the thread scheduling problem as a linear programming problem. It will use the optimization result to define scheduling policy for threads. 11

Server 1<br />

Memory<br />

Server 2<br />

Memory<br />

Server N<br />

Memory<br />

L2<br />

cache<br />

C<br />

o<br />

r<br />

e<br />

1<br />

L2<br />

cache<br />

C<br />

o<br />

r<br />

e<br />

2<br />

L2<br />

cache<br />

C<br />

o<br />

r<br />

e<br />

3<br />

L2<br />

cache<br />

C<br />

o<br />

r<br />

e<br />

4<br />

L2<br />

cache<br />

C<br />

o<br />

r<br />

e<br />

1<br />

L2<br />

cache<br />

C<br />

o<br />

r<br />

e<br />

2<br />

L2<br />

cache<br />

C<br />

o<br />

r<br />

e<br />

3<br />

L2<br />

cache<br />

C<br />

o<br />

r<br />

e<br />

4<br />

…<br />

L2<br />

cache<br />

C<br />

o<br />

r<br />

e<br />

1<br />

L2<br />

cache<br />

C<br />

o<br />

r<br />

e<br />

2<br />

L2<br />

cache<br />

C<br />

o<br />

r<br />

e<br />

3<br />

L2<br />

cache<br />

C<br />

o<br />

r<br />

e<br />

4<br />

PCI-e<br />

PCI-e<br />

PCI-e<br />

Fiber Optical Channel<br />

Figure 5: A cluster of quad-core servers.<br />

a hardware architecture and an agent-based model, how to distribute threads and processes<br />

to cores and processors <strong>for</strong> better per<strong>for</strong>mance. This issue has a special meaning in a cluster<br />

environment: a multi-core cluster supports both shared-memory and message-passing communication,<br />

which have very different characteristics. We will study optimal distribution<br />

of threads and processes <strong>for</strong> minimizing communication overhead in context of agent-based<br />

supply-chain simulation. Specifically we will study the following methods:<br />

(a) Explore model structure and data dependency to improve load balancing. For example,<br />

consider the supply chain in Figure 1.(b), and assume that we will run simulation on<br />

a cluster of quad-core processors, whose architecture is shown in Figure 5. Figure 6<br />

shows a distribution of threads and processes using heuristics from model structure<br />

and data dependency. The communication between supply-chain elements is through<br />

shipments and messages. We assume that messages can only be passed along routes.<br />

As a general principle, threads <strong>for</strong> a sub-network of closely coupled elements will be<br />

placed on the cores of the same processor. These closely coupled elements require more<br />

frequent communication between them, which shall be implemented with less overhead<br />

using shared memory. As an example, in Figure 6 threads <strong>for</strong> the elements of the<br />

sub-networks of w21 a and wb 21 are assigned to the same processor, and processes <strong>for</strong> the<br />

sub-networks of s a and s b are allocated to different processors. In general, the higher<br />

elements are in a network hierarchy, the less they will communicate with each other, since<br />

the operations of elements on a higher level will be planned over a much bigger planning<br />

horizon. We reserve shared memory <strong>for</strong> communication among closely coupled low-level<br />

elements and use message passing <strong>for</strong> communication among high-level elements.<br />

(b) Profile threads and optimize thread scheduling. To further improve the per<strong>for</strong>mance<br />

of generated parallel simulators, we will profile the execution time and the overhead of<br />

threads, and use this in<strong>for</strong>mation to optimize thread scheduling. Using the result from<br />

profiling, the generative simulation engine will express the thread scheduling problem as<br />

a linear programming problem. It will use the optimization result to define scheduling<br />

policy <strong>for</strong> threads.<br />

11

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!