2002 - cesnet
2002 - cesnet 2002 - cesnet
In version 6.0, there was some news that we wished to also test in the extensivedistributed environment of MetaCentrum PC clusters. This includes the possibilityof using what is known as “dynamic load balancing”, which will allow forreasonable utilization of the available sources in case of an nonhomogeneouscomputational environment (as regards the performance).Tasks are loaded by FLUENT on individual computing nodes, each node isallocated a part of the task of an identical volume. The problematic situation(typical for MetaCentrum clusters) is when the performance of allocated computingnodes differs. The entire computation waits after each iteration in thecomputation convergence process for the weakest node and the computationis therefore delayed.With the use of the dynamic load balancing, FLUENT ensures a relative capacityof individual nodes and reorganizes task allocation, so that the computingtime is identical within individual nodes. Unfortunately, we found out that thismethod is not functional within Fluent version 6.0, notwithstanding the fact thatit requires the use of a graphic user interface, which is inconvenient for practicalcomputations of extensive tasks, administered by the batch system.Problems were also encountered with testing tasks with approx. 5 million cells,which we made out with what are known as nonconforming network interfaces.For parallel automatic loading to individual nodes, the interfaces should notcause any problems in a case where no network adaptation is carried out. Wefound out that such automatic task distribution was completed only for onemetacomputer configuration, failing elsewhere. This is why no further testingwith this task was carried out. This does not mean that it would be impossibleto compute the given type of task on PC clusters, it is only necessary to dividethis task into a particular number of computing nodes in advance. This is convenientfor the computational engineers; however, not for our testing of applicationperformance scaling.Within a single part of clusters (up to 32 processors) FLUENT reports a relativelygood scaling (linear acceleration), even for extensive tasks. After exceeding thislimit, computing times shorten slightly (for 40, 48, 56 processors); however, withhigher numbers the time increases again and with all the processors used (158),the computing time is worse than with 32 processors on one part of the cluster.The nonhomogeneity of a cluster may also prove to be disadvantageous for themetacomputer configurations, as we used computers with Intel Pentium II processors,700 MHz, as well as ADM Athlon 1.9 GHz (minos, PC cluster at ZČU).The use of high-speed networks (Myrinet, Gigabit Ethernet) is distinctive, particularlyfor the start-up of FLUENT and task loading. When we used the standardnetwork communicator (sockets), the FLUENT start-up period for a highernumber of linked computing nodes in a metacomputer (over 40 processors) isalarming. In some cases, it reached several hours (approximately 7 hours with104 High-speed National Research Network and its New Applications 2002
the use of 158 processors). When we made use of the Network MPI, FLUENTstarted up in this metacomputer configuration in several minutes. However, thenegative aspect is the longer period of a single iteration, lasting 49 seconds insteadof 44 seconds (sockets).The two Tables show partial results for the extensive tasks, with 13 millioncells – car airflow.Notes to Table 7.1: For the number of 8–32 processors, measurements are carriedout at the nympha cluster (Plzeň), for 8–15 processors per singleCPU on a machine, and further on with the use of both machine processors.Note 1: For 32–158 processors, there are 4 parts of the cluster nympha,minos, skurut, skirit, always evenly distributed number of allocatedcomputing nodes.Note 2: This configuration is 16 machines by two processors at each ofthe nympha, minos and skurut clusters.Note to Table 7.2: For 32–158 processors, there are 4 parts of the cluster nympha,minos, skurut, skirit, always evenly distributed number of allocatedcomputing nodes.We see that the use of Myrinet for the nympha cluster is negligible for a smallnumber of CPUs, unlike with a higher number of CPUs (the difference is obviousalready with 20 CPUs), when communication is likely to increase during thecomputation. We have seen the influence of Myrinet when monitoring the timeof task loading to a metacomputer, which is up to two times shorter. As regardsthe overall computing time, this is a marginal aspect.The Network MPI (i.e., use of MPI through the entire distributed system – withthe use of LAM implementation) is worse as regards both task loading and thecomputation itself. The only positive aspect is the FLUENT start-up time, whichis considerably better compared to the socket communication. This is againa rather negligible advantage with respect to a “reasonable” number of CPUs(approx. up to 40) and a usually demanding task (i.e., a task computed over aperiod of several days).Due to the low number of measurements, we are unable to formulate moreprecise conclusions and recommendations – this will be the goal of our effortsin the first half of 2003.In conclusion, we would like to point out that even a part of a PC cluster, i.e.,16 dual-processor nodes with 16 GB memory, can be used as a powerful toolfor highly demanding CFD tasks. The positive aspect is also the possibility ofusing a supercomputer for the definition of extensive tasks – in come cases, it isimpossible to prepare tasks directly in the parallel run of FLUENT (this sectionhas not been parallelized).High-speed National Research Network and its New Applications 2002105
- Page 53 and 54: Chemical Technology and PASNET, we
- Page 55 and 56: Manufacturerwww.accton.comwww.actio
- Page 57 and 58: particularly based on the fact that
- Page 59 and 60: 0101 NRZlaserMach-ZendermodulatorED
- Page 61 and 62: Cisco ONS 15201boosterfiber 130 kmC
- Page 63 and 64: 5 IP version 6Further expansion of
- Page 65 and 66: 5.2.1 IPv6 Network TopologyFor the
- Page 67 and 68: Missing in this list are especially
- Page 69 and 70: 5.2.7 Connecting End Site NetworksI
- Page 71 and 72: 5.4 IPv6 User Services and Applicat
- Page 73 and 74: sible reprogramming of functions, a
- Page 75 and 76: IPBIpb_0HFEHfe_1SORSor_0LUPLup_0RQR
- Page 77 and 78: As regards our router on the PC pla
- Page 79 and 80: the activities of our project. In 2
- Page 81 and 82: 6 Multimedia Transmissions6.1 Objec
- Page 83 and 84: scaling is considerably restricted;
- Page 85 and 86: Figure 6.1: Modified program for sh
- Page 87 and 88: 6.3.5 Direct Support for Pilot Grou
- Page 89 and 90: programs which control the integrat
- Page 91 and 92: Figure 6.4: Example of broadcast fr
- Page 93 and 94: Figure 6.5: Set of tuners and serve
- Page 95 and 96: and transfer the solution to an ope
- Page 97 and 98: The actual computing capacity of Me
- Page 99 and 100: 7.1.2 Information ServicesMetaCentr
- Page 101 and 102: Figure 7.1: Grid nodes for SC2002an
- Page 103: (December 2002). This was one of th
- Page 107 and 108: 8 Voice Services in CESNET2The proj
- Page 109 and 110: Figure 8.1: Number of phone calls i
- Page 111 and 112: VoGWs of members involved in the pr
- Page 113 and 114: connection and compare the features
- Page 115 and 116: • Extensive possibilities for the
- Page 117 and 118: H.323 WorldSIP PhoneMS Messengerand
- Page 119 and 120: ation with students at the Strahov
- Page 121 and 122: 9 Quality of Service in HighspeedNe
- Page 123 and 124: In addition, Juniper routers make u
- Page 125 and 126: We tried to divide the capacity of
- Page 127 and 128: low with respect to the capacity of
- Page 129 and 130: Figure 9.10: Loss rate on Prague-Po
- Page 131 and 132: sendingapplicationTCP senderseqackp
- Page 133 and 134: Figure 9.13: Example of a course of
- Page 135 and 136: Part IIInternational Projects
- Page 137 and 138: 10 GÉANTSince 1996, CESNET has bee
- Page 139 and 140: Figure 10.2: Utilization of individ
- Page 141 and 142: 11 DataGridSince 2001, our research
- Page 143 and 144: Therefore, we organized a meeting w
- Page 145 and 146: 12 SCAMPISCAMPI (Scaleable Monitori
- Page 147 and 148: 12.3.1 WP0 - Requirement AnalysisTh
- Page 149 and 150: 1 6 12 18 24 30requirements analysi
- Page 151 and 152: Part IIIOther Projects
- Page 153 and 154: 13 Online Education Infrastructurea
In version 6.0, there was some news that we wished to also test in the extensivedistributed environment of MetaCentrum PC clusters. This includes the possibilityof using what is known as “dynamic load balancing”, which will allow forreasonable utilization of the available sources in case of an nonhomogeneouscomputational environment (as regards the performance).Tasks are loaded by FLUENT on individual computing nodes, each node isallocated a part of the task of an identical volume. The problematic situation(typical for MetaCentrum clusters) is when the performance of allocated computingnodes differs. The entire computation waits after each iteration in thecomputation convergence process for the weakest node and the computationis therefore delayed.With the use of the dynamic load balancing, FLUENT ensures a relative capacityof individual nodes and reorganizes task allocation, so that the computingtime is identical within individual nodes. Unfortunately, we found out that thismethod is not functional within Fluent version 6.0, notwithstanding the fact thatit requires the use of a graphic user interface, which is inconvenient for practicalcomputations of extensive tasks, administered by the batch system.Problems were also encountered with testing tasks with approx. 5 million cells,which we made out with what are known as nonconforming network interfaces.For parallel automatic loading to individual nodes, the interfaces should notcause any problems in a case where no network adaptation is carried out. Wefound out that such automatic task distribution was completed only for onemetacomputer configuration, failing elsewhere. This is why no further testingwith this task was carried out. This does not mean that it would be impossibleto compute the given type of task on PC clusters, it is only necessary to dividethis task into a particular number of computing nodes in advance. This is convenientfor the computational engineers; however, not for our testing of applicationperformance scaling.Within a single part of clusters (up to 32 processors) FLUENT reports a relativelygood scaling (linear acceleration), even for extensive tasks. After exceeding thislimit, computing times shorten slightly (for 40, 48, 56 processors); however, withhigher numbers the time increases again and with all the processors used (158),the computing time is worse than with 32 processors on one part of the cluster.The nonhomogeneity of a cluster may also prove to be disadvantageous for themetacomputer configurations, as we used computers with Intel Pentium II processors,700 MHz, as well as ADM Athlon 1.9 GHz (minos, PC cluster at ZČU).The use of high-speed networks (Myrinet, Gigabit Ethernet) is distinctive, particularlyfor the start-up of FLUENT and task loading. When we used the standardnetwork communicator (sockets), the FLUENT start-up period for a highernumber of linked computing nodes in a metacomputer (over 40 processors) isalarming. In some cases, it reached several hours (approximately 7 hours with104 High-speed National Research Network and its New Applications <strong>2002</strong>