2002 - cesnet

2002 - cesnet 2002 - cesnet

13.07.2015 Views

In version 6.0, there was some news that we wished to also test in the extensivedistributed environment of MetaCentrum PC clusters. This includes the possibilityof using what is known as “dynamic load balancing”, which will allow forreasonable utilization of the available sources in case of an nonhomogeneouscomputational environment (as regards the performance).Tasks are loaded by FLUENT on individual computing nodes, each node isallocated a part of the task of an identical volume. The problematic situation(typical for MetaCentrum clusters) is when the performance of allocated computingnodes differs. The entire computation waits after each iteration in thecomputation convergence process for the weakest node and the computationis therefore delayed.With the use of the dynamic load balancing, FLUENT ensures a relative capacityof individual nodes and reorganizes task allocation, so that the computingtime is identical within individual nodes. Unfortunately, we found out that thismethod is not functional within Fluent version 6.0, notwithstanding the fact thatit requires the use of a graphic user interface, which is inconvenient for practicalcomputations of extensive tasks, administered by the batch system.Problems were also encountered with testing tasks with approx. 5 million cells,which we made out with what are known as nonconforming network interfaces.For parallel automatic loading to individual nodes, the interfaces should notcause any problems in a case where no network adaptation is carried out. Wefound out that such automatic task distribution was completed only for onemetacomputer configuration, failing elsewhere. This is why no further testingwith this task was carried out. This does not mean that it would be impossibleto compute the given type of task on PC clusters, it is only necessary to dividethis task into a particular number of computing nodes in advance. This is convenientfor the computational engineers; however, not for our testing of applicationperformance scaling.Within a single part of clusters (up to 32 processors) FLUENT reports a relativelygood scaling (linear acceleration), even for extensive tasks. After exceeding thislimit, computing times shorten slightly (for 40, 48, 56 processors); however, withhigher numbers the time increases again and with all the processors used (158),the computing time is worse than with 32 processors on one part of the cluster.The nonhomogeneity of a cluster may also prove to be disadvantageous for themetacomputer configurations, as we used computers with Intel Pentium II processors,700 MHz, as well as ADM Athlon 1.9 GHz (minos, PC cluster at ZČU).The use of high-speed networks (Myrinet, Gigabit Ethernet) is distinctive, particularlyfor the start-up of FLUENT and task loading. When we used the standardnetwork communicator (sockets), the FLUENT start-up period for a highernumber of linked computing nodes in a metacomputer (over 40 processors) isalarming. In some cases, it reached several hours (approximately 7 hours with104 High-speed National Research Network and its New Applications 2002

the use of 158 processors). When we made use of the Network MPI, FLUENTstarted up in this metacomputer configuration in several minutes. However, thenegative aspect is the longer period of a single iteration, lasting 49 seconds insteadof 44 seconds (sockets).The two Tables show partial results for the extensive tasks, with 13 millioncells – car airflow.Notes to Table 7.1: For the number of 8–32 processors, measurements are carriedout at the nympha cluster (Plzeň), for 8–15 processors per singleCPU on a machine, and further on with the use of both machine processors.Note 1: For 32–158 processors, there are 4 parts of the cluster nympha,minos, skurut, skirit, always evenly distributed number of allocatedcomputing nodes.Note 2: This configuration is 16 machines by two processors at each ofthe nympha, minos and skurut clusters.Note to Table 7.2: For 32–158 processors, there are 4 parts of the cluster nympha,minos, skurut, skirit, always evenly distributed number of allocatedcomputing nodes.We see that the use of Myrinet for the nympha cluster is negligible for a smallnumber of CPUs, unlike with a higher number of CPUs (the difference is obviousalready with 20 CPUs), when communication is likely to increase during thecomputation. We have seen the influence of Myrinet when monitoring the timeof task loading to a metacomputer, which is up to two times shorter. As regardsthe overall computing time, this is a marginal aspect.The Network MPI (i.e., use of MPI through the entire distributed system – withthe use of LAM implementation) is worse as regards both task loading and thecomputation itself. The only positive aspect is the FLUENT start-up time, whichis considerably better compared to the socket communication. This is againa rather negligible advantage with respect to a “reasonable” number of CPUs(approx. up to 40) and a usually demanding task (i.e., a task computed over aperiod of several days).Due to the low number of measurements, we are unable to formulate moreprecise conclusions and recommendations – this will be the goal of our effortsin the first half of 2003.In conclusion, we would like to point out that even a part of a PC cluster, i.e.,16 dual-processor nodes with 16 GB memory, can be used as a powerful toolfor highly demanding CFD tasks. The positive aspect is also the possibility ofusing a supercomputer for the definition of extensive tasks – in come cases, it isimpossible to prepare tasks directly in the parallel run of FLUENT (this sectionhas not been parallelized).High-speed National Research Network and its New Applications 2002105

In version 6.0, there was some news that we wished to also test in the extensivedistributed environment of MetaCentrum PC clusters. This includes the possibilityof using what is known as “dynamic load balancing”, which will allow forreasonable utilization of the available sources in case of an nonhomogeneouscomputational environment (as regards the performance).Tasks are loaded by FLUENT on individual computing nodes, each node isallocated a part of the task of an identical volume. The problematic situation(typical for MetaCentrum clusters) is when the performance of allocated computingnodes differs. The entire computation waits after each iteration in thecomputation convergence process for the weakest node and the computationis therefore delayed.With the use of the dynamic load balancing, FLUENT ensures a relative capacityof individual nodes and reorganizes task allocation, so that the computingtime is identical within individual nodes. Unfortunately, we found out that thismethod is not functional within Fluent version 6.0, notwithstanding the fact thatit requires the use of a graphic user interface, which is inconvenient for practicalcomputations of extensive tasks, administered by the batch system.Problems were also encountered with testing tasks with approx. 5 million cells,which we made out with what are known as nonconforming network interfaces.For parallel automatic loading to individual nodes, the interfaces should notcause any problems in a case where no network adaptation is carried out. Wefound out that such automatic task distribution was completed only for onemetacomputer configuration, failing elsewhere. This is why no further testingwith this task was carried out. This does not mean that it would be impossibleto compute the given type of task on PC clusters, it is only necessary to dividethis task into a particular number of computing nodes in advance. This is convenientfor the computational engineers; however, not for our testing of applicationperformance scaling.Within a single part of clusters (up to 32 processors) FLUENT reports a relativelygood scaling (linear acceleration), even for extensive tasks. After exceeding thislimit, computing times shorten slightly (for 40, 48, 56 processors); however, withhigher numbers the time increases again and with all the processors used (158),the computing time is worse than with 32 processors on one part of the cluster.The nonhomogeneity of a cluster may also prove to be disadvantageous for themetacomputer configurations, as we used computers with Intel Pentium II processors,700 MHz, as well as ADM Athlon 1.9 GHz (minos, PC cluster at ZČU).The use of high-speed networks (Myrinet, Gigabit Ethernet) is distinctive, particularlyfor the start-up of FLUENT and task loading. When we used the standardnetwork communicator (sockets), the FLUENT start-up period for a highernumber of linked computing nodes in a metacomputer (over 40 processors) isalarming. In some cases, it reached several hours (approximately 7 hours with104 High-speed National Research Network and its New Applications <strong>2002</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!