1 | |
---|
2 | Parallel Geant4 (ParGeant4) |
---|
3 | |
---|
4 | Maintained by Gene Cooperman (gene@ccs.neu.edu), |
---|
5 | and Viet Ha Nguyen (vietha@ccs.neu.edu) |
---|
6 | |
---|
7 | What is ParGeant4 ? |
---|
8 | |
---|
9 | ParGeant4 [1] is a parallel version of Geant4 that implements event-level |
---|
10 | parallelism to simulate separate events on remote processors. Typical |
---|
11 | simulations demonstrate a nearly linear speedup in running time as the |
---|
12 | number of remote processors increases. The needed enhancements of Geant4 |
---|
13 | are included in the examples/extended/parallel directory of the Geant4 |
---|
14 | distribution. |
---|
15 | |
---|
16 | Why is ParGeant4 useful? |
---|
17 | |
---|
18 | When doing a large Geant4 simulation, one often wishes to run on many |
---|
19 | processors to reduce the overall time. Traditionally, this has been done |
---|
20 | by splitting the events into multiple groups, and running Geant4 |
---|
21 | independently on each processor for its own group of events. This requires |
---|
22 | restarting a run if a processor goes down. It also requires saving the |
---|
23 | histogram files from each run, and merging the files prior to using the |
---|
24 | analysis tool. The human effort in this is considerable. |
---|
25 | |
---|
26 | ParGeant4 provides a much simpler mechanism. After setting up ParGeant4 one |
---|
27 | links and runs the sequential Geant4 application exactly as before, but |
---|
28 | additionally linking with some parallel libraries. Upon execution, ParGeant4 |
---|
29 | on the console sends out events to slave processes, collects all hits, and |
---|
30 | calls any analysis tool -- exactly as one would do in the sequential case. |
---|
31 | |
---|
32 | There is no need to split events into separate groups, track whether one of |
---|
33 | the processors crashed, merge histogram files, etc. If a slave processor |
---|
34 | crashes, ParGeant automatically re-sends the events of that slave processor |
---|
35 | to a new slave processor for re-execution. |
---|
36 | |
---|
37 | What is the performance of ParGeant4? |
---|
38 | |
---|
39 | As a rule of thumb, speedup will be nearly linear when each event simulation |
---|
40 | lasts for at least several milliseconds. ParGeant4 has been tested |
---|
41 | extensively on parallelizations of examples/novice/N02 and of |
---|
42 | examples/advanced/underground_physics. On N02, we see a speedup of 27 for |
---|
43 | 50 nodes and a speedup of 33 for 100 nodes. When using |
---|
44 | the --aggregated-tasks=50 option (see below) the speedup improves to 35 for |
---|
45 | 50 nodes and 60 for 100 nodes. |
---|
46 | |
---|
47 | In tests of underground_physics, events are longer and we see nearly linear |
---|
48 | speedup (94 times speedup with 100 nodes). |
---|
49 | |
---|
50 | Getting started |
---|
51 | |
---|
52 | Detailed information is under extended/parallel/ParN02/docs/000README. There |
---|
53 | are four steps: |
---|
54 | |
---|
55 | 1. Install TOP-C [2]. |
---|
56 | 2. Compile ParN02 by running gmake. |
---|
57 | 3. Make sure the "procgroup" file is correct and copy it to directory of |
---|
58 | the executable binary file (for example, $G4BIN/Linux-g++). |
---|
59 | 4. Run the parallel binary program. |
---|
60 | |
---|
61 | What is involved in setting up ParGeant4? |
---|
62 | |
---|
63 | To set up ParGeant4, one needs TOP-C [2] and Marshalgen [4] (free, open |
---|
64 | source software). If one is parallelizing a new Geant4 application, one |
---|
65 | must then add/modify approximately 20 lines of annotations (C++ comments |
---|
66 | to indicate shallow vs. deep copying of pointers, etc.) in the .h files |
---|
67 | for each hit type being defined by the application. For details of the |
---|
68 | annotations, refer to the manual of the Marshalgen package. Finally, in |
---|
69 | the main routine of the application, one replaces the call to the G4RunManager |
---|
70 | constructor by a call to the ParRunManager constructor. (ParRunManger is |
---|
71 | a derived class of G4RunManager.) |
---|
72 | |
---|
73 | After this, one invokes the already provided GNUMakefile (a slightly modified |
---|
74 | version of the Geant4 example GNUMakefile) to create the parallel application. |
---|
75 | Finally, one writes a "procgroup" file, which declares the names of the remote |
---|
76 | hosts to use in the parallel computation. Optionally, one may also specify |
---|
77 | filenames (e.g. slave1.out, slave2.out, ...) to store the printout from each |
---|
78 | slave process. One then calls the ParGeant4 binary exactly as one would call |
---|
79 | the Geant4 binary, and the results appear as normal, only faster. |
---|
80 | |
---|
81 | Are there examples of using ParGeant4? |
---|
82 | |
---|
83 | Yes. ParGeant4 includes parallelizations of other examples from the Geant4 |
---|
84 | distribution. Specifically, ParGeant4 includes example parallelizations of |
---|
85 | novice/N02, novice/N04, and advanced/underground_physics . |
---|
86 | |
---|
87 | What are some of the features of ParGeant4? |
---|
88 | |
---|
89 | ParGeant4 includes all of the features of TOP-C. In particular, after |
---|
90 | building a binary, "parMySimulation", one might call: |
---|
91 | |
---|
92 | ./parMySimulation --TOPC-help |
---|
93 | Display command options, and then exit. |
---|
94 | |
---|
95 | ./parMySimulation --TOPC-trace=0 |
---|
96 | By default, ParGeant4 traces each time a new event is sent to a task. |
---|
97 | This turns it off. |
---|
98 | |
---|
99 | ./parMySimulation --TOPC-verbose=0 |
---|
100 | By default, ParGeant4 provides statistics indicating what process was run, |
---|
101 | when it was run, what machine, the running times and elapsed times of master |
---|
102 | and the average slave, and other information. This turns off the statistics. |
---|
103 | |
---|
104 | ./parMySimulation --TOPC-aggregated-tasks=10 |
---|
105 | By default, ParGeant4 sends one event to one remote process before turning |
---|
106 | to the next process. This option sends 10 events to a single remote process in |
---|
107 | one message. This is useful when events are relatively short, and the network |
---|
108 | latency of sending a message is starting to dominate the running time. |
---|
109 | |
---|
110 | ./parMySimulation --TOPC-slave-timeout=3600 |
---|
111 | By default, if a remote process has not communicated with the master after |
---|
112 | 1800 seconds (a half hour), the slave process will kill itself. This prevents |
---|
113 | runaway processes that may be in an infinite loop for some event, or may have |
---|
114 | lost their socket to communicate with the master process. In this example, we |
---|
115 | allow 7200 seconds (two hours) because we expect simulation of some events to |
---|
116 | last up to (but not more than) 7200 seconds. |
---|
117 | |
---|
118 | By default, ParGeant4 uses its own subset implementation of MPI (MPINU). |
---|
119 | ParGeant4 adds approximately 50 KB to the "footprint" of the binary |
---|
120 | executable. By default, ParGeant4 uses "ssh" to set up remote processes. |
---|
121 | Those who wish to use their own MPI (perhaps if a batch cluster requires a |
---|
122 | specific MPI) may do so. |
---|
123 | (See "Configuring a Different `MPI')" in the TOP-C manual [3].) |
---|
124 | |
---|
125 | |
---|
126 | References: |
---|
127 | |
---|
128 | [1] ParGeant4: http://www.ccs.neu.edu/home/gene/pargeant4.html |
---|
129 | |
---|
130 | [2] TOP-C: http://www.ccs.neu.edu/home/gene/topc.html |
---|
131 | |
---|
132 | [3] TOP-C manual: http://www.ccs.neu.edu/home/gene/topc/topc_toc.html |
---|
133 | |
---|
134 | [4] Marshalgen : http://www.ccs.neu.edu/home/gene/marshalgen.html |
---|
135 | |
---|
136 | |
---|
137 | |
---|
138 | |
---|
139 | |
---|
140 | |
---|