| [807] | 1 |
|
|---|
| 2 | Parallel Geant4 (ParGeant4)
|
|---|
| 3 |
|
|---|
| 4 | Maintained by Gene Cooperman (gene@ccs.neu.edu),
|
|---|
| 5 | and Viet Ha Nguyen (vietha@ccs.neu.edu)
|
|---|
| 6 |
|
|---|
| 7 | What is ParGeant4 ?
|
|---|
| 8 |
|
|---|
| 9 | ParGeant4 [1] is a parallel version of Geant4 that implements event-level
|
|---|
| 10 | parallelism to simulate separate events on remote processors. Typical
|
|---|
| 11 | simulations demonstrate a nearly linear speedup in running time as the
|
|---|
| 12 | number of remote processors increases. The needed enhancements of Geant4
|
|---|
| 13 | are included in the examples/extended/parallel directory of the Geant4
|
|---|
| 14 | distribution.
|
|---|
| 15 |
|
|---|
| 16 | Why is ParGeant4 useful?
|
|---|
| 17 |
|
|---|
| 18 | When doing a large Geant4 simulation, one often wishes to run on many
|
|---|
| 19 | processors to reduce the overall time. Traditionally, this has been done
|
|---|
| 20 | by splitting the events into multiple groups, and running Geant4
|
|---|
| 21 | independently on each processor for its own group of events. This requires
|
|---|
| 22 | restarting a run if a processor goes down. It also requires saving the
|
|---|
| 23 | histogram files from each run, and merging the files prior to using the
|
|---|
| 24 | analysis tool. The human effort in this is considerable.
|
|---|
| 25 |
|
|---|
| 26 | ParGeant4 provides a much simpler mechanism. After setting up ParGeant4 one
|
|---|
| 27 | links and runs the sequential Geant4 application exactly as before, but
|
|---|
| 28 | additionally linking with some parallel libraries. Upon execution, ParGeant4
|
|---|
| 29 | on the console sends out events to slave processes, collects all hits, and
|
|---|
| 30 | calls any analysis tool -- exactly as one would do in the sequential case.
|
|---|
| 31 |
|
|---|
| 32 | There is no need to split events into separate groups, track whether one of
|
|---|
| 33 | the processors crashed, merge histogram files, etc. If a slave processor
|
|---|
| 34 | crashes, ParGeant automatically re-sends the events of that slave processor
|
|---|
| 35 | to a new slave processor for re-execution.
|
|---|
| 36 |
|
|---|
| 37 | What is the performance of ParGeant4?
|
|---|
| 38 |
|
|---|
| 39 | As a rule of thumb, speedup will be nearly linear when each event simulation
|
|---|
| 40 | lasts for at least several milliseconds. ParGeant4 has been tested
|
|---|
| 41 | extensively on parallelizations of examples/novice/N02 and of
|
|---|
| 42 | examples/advanced/underground_physics. On N02, we see a speedup of 27 for
|
|---|
| 43 | 50 nodes and a speedup of 33 for 100 nodes. When using
|
|---|
| 44 | the --aggregated-tasks=50 option (see below) the speedup improves to 35 for
|
|---|
| 45 | 50 nodes and 60 for 100 nodes.
|
|---|
| 46 |
|
|---|
| 47 | In tests of underground_physics, events are longer and we see nearly linear
|
|---|
| 48 | speedup (94 times speedup with 100 nodes).
|
|---|
| 49 |
|
|---|
| 50 | Getting started
|
|---|
| 51 |
|
|---|
| 52 | Detailed information is under extended/parallel/ParN02/docs/000README. There
|
|---|
| 53 | are four steps:
|
|---|
| 54 |
|
|---|
| 55 | 1. Install TOP-C [2].
|
|---|
| 56 | 2. Compile ParN02 by running gmake.
|
|---|
| 57 | 3. Make sure the "procgroup" file is correct and copy it to directory of
|
|---|
| 58 | the executable binary file (for example, $G4BIN/Linux-g++).
|
|---|
| 59 | 4. Run the parallel binary program.
|
|---|
| 60 |
|
|---|
| 61 | What is involved in setting up ParGeant4?
|
|---|
| 62 |
|
|---|
| 63 | To set up ParGeant4, one needs TOP-C [2] and Marshalgen [4] (free, open
|
|---|
| 64 | source software). If one is parallelizing a new Geant4 application, one
|
|---|
| 65 | must then add/modify approximately 20 lines of annotations (C++ comments
|
|---|
| 66 | to indicate shallow vs. deep copying of pointers, etc.) in the .h files
|
|---|
| 67 | for each hit type being defined by the application. For details of the
|
|---|
| 68 | annotations, refer to the manual of the Marshalgen package. Finally, in
|
|---|
| 69 | the main routine of the application, one replaces the call to the G4RunManager
|
|---|
| 70 | constructor by a call to the ParRunManager constructor. (ParRunManger is
|
|---|
| 71 | a derived class of G4RunManager.)
|
|---|
| 72 |
|
|---|
| 73 | After this, one invokes the already provided GNUMakefile (a slightly modified
|
|---|
| 74 | version of the Geant4 example GNUMakefile) to create the parallel application.
|
|---|
| 75 | Finally, one writes a "procgroup" file, which declares the names of the remote
|
|---|
| 76 | hosts to use in the parallel computation. Optionally, one may also specify
|
|---|
| 77 | filenames (e.g. slave1.out, slave2.out, ...) to store the printout from each
|
|---|
| 78 | slave process. One then calls the ParGeant4 binary exactly as one would call
|
|---|
| 79 | the Geant4 binary, and the results appear as normal, only faster.
|
|---|
| 80 |
|
|---|
| 81 | Are there examples of using ParGeant4?
|
|---|
| 82 |
|
|---|
| 83 | Yes. ParGeant4 includes parallelizations of other examples from the Geant4
|
|---|
| 84 | distribution. Specifically, ParGeant4 includes example parallelizations of
|
|---|
| 85 | novice/N02, novice/N04, and advanced/underground_physics .
|
|---|
| 86 |
|
|---|
| 87 | What are some of the features of ParGeant4?
|
|---|
| 88 |
|
|---|
| 89 | ParGeant4 includes all of the features of TOP-C. In particular, after
|
|---|
| 90 | building a binary, "parMySimulation", one might call:
|
|---|
| 91 |
|
|---|
| 92 | ./parMySimulation --TOPC-help
|
|---|
| 93 | Display command options, and then exit.
|
|---|
| 94 |
|
|---|
| 95 | ./parMySimulation --TOPC-trace=0
|
|---|
| 96 | By default, ParGeant4 traces each time a new event is sent to a task.
|
|---|
| 97 | This turns it off.
|
|---|
| 98 |
|
|---|
| 99 | ./parMySimulation --TOPC-verbose=0
|
|---|
| 100 | By default, ParGeant4 provides statistics indicating what process was run,
|
|---|
| 101 | when it was run, what machine, the running times and elapsed times of master
|
|---|
| 102 | and the average slave, and other information. This turns off the statistics.
|
|---|
| 103 |
|
|---|
| 104 | ./parMySimulation --TOPC-aggregated-tasks=10
|
|---|
| 105 | By default, ParGeant4 sends one event to one remote process before turning
|
|---|
| 106 | to the next process. This option sends 10 events to a single remote process in
|
|---|
| 107 | one message. This is useful when events are relatively short, and the network
|
|---|
| 108 | latency of sending a message is starting to dominate the running time.
|
|---|
| 109 |
|
|---|
| 110 | ./parMySimulation --TOPC-slave-timeout=3600
|
|---|
| 111 | By default, if a remote process has not communicated with the master after
|
|---|
| 112 | 1800 seconds (a half hour), the slave process will kill itself. This prevents
|
|---|
| 113 | runaway processes that may be in an infinite loop for some event, or may have
|
|---|
| 114 | lost their socket to communicate with the master process. In this example, we
|
|---|
| 115 | allow 7200 seconds (two hours) because we expect simulation of some events to
|
|---|
| 116 | last up to (but not more than) 7200 seconds.
|
|---|
| 117 |
|
|---|
| 118 | By default, ParGeant4 uses its own subset implementation of MPI (MPINU).
|
|---|
| 119 | ParGeant4 adds approximately 50 KB to the "footprint" of the binary
|
|---|
| 120 | executable. By default, ParGeant4 uses "ssh" to set up remote processes.
|
|---|
| 121 | Those who wish to use their own MPI (perhaps if a batch cluster requires a
|
|---|
| 122 | specific MPI) may do so.
|
|---|
| 123 | (See "Configuring a Different `MPI')" in the TOP-C manual [3].)
|
|---|
| 124 |
|
|---|
| 125 |
|
|---|
| 126 | References:
|
|---|
| 127 |
|
|---|
| 128 | [1] ParGeant4: http://www.ccs.neu.edu/home/gene/pargeant4.html
|
|---|
| 129 |
|
|---|
| 130 | [2] TOP-C: http://www.ccs.neu.edu/home/gene/topc.html
|
|---|
| 131 |
|
|---|
| 132 | [3] TOP-C manual: http://www.ccs.neu.edu/home/gene/topc/topc_toc.html
|
|---|
| 133 |
|
|---|
| 134 | [4] Marshalgen : http://www.ccs.neu.edu/home/gene/marshalgen.html
|
|---|
| 135 |
|
|---|
| 136 |
|
|---|
| 137 |
|
|---|
| 138 |
|
|---|
| 139 |
|
|---|
| 140 |
|
|---|