1 | |
---|
2 | ParGeant4: Geant4/TOP-C, a parallelization of Geant4 |
---|
3 | (event-level parallelism) |
---|
4 | |
---|
5 | Gene Cooperman |
---|
6 | Northeastern University |
---|
7 | gene@ccs.neu.edu, |
---|
8 | |
---|
9 | For the latest information on ParGeant4, see: |
---|
10 | http://www.ccs.neu.edu/home/gene/pargeant4.html |
---|
11 | Note that a version now exists that runs Geant4 over the Grid. |
---|
12 | Please write to gene@ccs.neu.edu for further information. |
---|
13 | To port other applications to a parallel version, read the |
---|
14 | files ../../info/PAR_INSTALL and ../../info/PAR_README. |
---|
15 | |
---|
16 | See the beginning of GNUmakefile for reasonable `make' targets to run it. |
---|
17 | To run it: |
---|
18 | 0. a. Follow the standard Geant4 installation procedure. |
---|
19 | b. Download and install TOP-C |
---|
20 | The TOP-C home page is at http://www.ccs.neu.edu/home/gene/topc.html |
---|
21 | cd <TOPC_INSTALL_DIR> |
---|
22 | gzip -dc topc.tar.gz | tar -xvf - |
---|
23 | cd topc |
---|
24 | ./configure |
---|
25 | make |
---|
26 | make check |
---|
27 | [ Copy bin/topc-config to your path ] |
---|
28 | c. Verify that the Geant4 example installs: |
---|
29 | cd $G4INSTALL/examples/extended/parallel/ParN04 |
---|
30 | make |
---|
31 | $G4WORKDIR/bin/$G4SYSTEM/ParN04 ParN04.in |
---|
32 | 2. make run |
---|
33 | [ By default, the included `procgroup' file creates two slave processes |
---|
34 | on localhost. ] |
---|
35 | [ Note that in addition to output on master, |
---|
36 | $G4WORKDIR/bin/$G4SYSTEM/slave*.out contains slave output. ] |
---|
37 | [ To remove intermediate files and start over: make parclean ] |
---|
38 | 3. Try running it with slave processes on remote processes. |
---|
39 | First, test that your local environment is set up correctly. |
---|
40 | Try: |
---|
41 | ssh <REMOTE_HOSTNAME> $G4WORKDIR/bin/$G4SYSTEM/ParN04 `pwd`/ParN04.in |
---|
42 | The above command needs to work without asking for a password. |
---|
43 | [ If you use dynamic libraries (*.so), make sure the LD_LIBRARY_PATH |
---|
44 | in your shell startup file (e.g. ~.tcshrc) includes both: |
---|
45 | $G4INSTALL/lib/$G4SYSTEM and $CLHEP_BASE_DIR/lib |
---|
46 | If you use AFS, you may need to type 'klog' to renew your AFS token. ] |
---|
47 | In `procgroup' file, replace `localhost' by desired remote hosts; |
---|
48 | Add additional remote hosts (additional slaves) if you like. |
---|
49 | Then: make run |
---|
50 | |
---|
51 | ============================================================================ |
---|
52 | If you read ParGNUmakefile, you'll find other things that you can |
---|
53 | modify. For example, all TOP-C additions are in conditionals: |
---|
54 | remove -DG4USE_TOPC from ParGNUmakefile and: |
---|
55 | make parclean; make run |
---|
56 | in order to re-compile and rerun without TOP-C. |
---|
57 | Define REMOTE_SHELL differently if you don't use `ssh' for a remote shell. |
---|
58 | (If undefined, ParGNUmakefile defines it to be `ssh') |
---|
59 | Define MACROFILE diferently to use a different set of input commands. |
---|
60 | Define MEM_MODEL=--seq |
---|
61 | to run with TOP-C, but using a single (sequential) process, suitable |
---|
62 | for easy debugging (via gdb, for example). |
---|
63 | Try: pushd $G4WORKDIR/bin/$G4SYSTEM/; ./ParN04 --TOPC-help |
---|
64 | to see TOP-C run-time options that can be invoked, such as |
---|
65 | pushd $G4WORKDIR/bin/$G4SYSTEM/; ./ParN04 --TOPC-num-slaves=5 ParN04.in |
---|
66 | Alternatively, modify TOPC_OPTIONS in ParGNUmakefile for the same effect. |
---|
67 | |
---|
68 | You can also try other targets: make run-debug |
---|
69 | This will run it under gdb, so you can single step to see what happens. |
---|
70 | make parclean - Start over with clean set of files. |
---|
71 | |
---|
72 | ============================================================================ |
---|
73 | New or modified files: |
---|
74 | ParN04.cc - Adds one line: #include "ParN04.icc" |
---|
75 | ParExample.icc inserts: #include "topc.h" |
---|
76 | and causes main to calls TOPC_init, TOPC_finalize, |
---|
77 | and to use: `new ParRunManager' instead of `new G4RunManager' |
---|
78 | GNUmakefile - Adds one line at beginning: include ParGNUmakefile |
---|
79 | ParGNUmakefile defines EXTRALIBS and CPPFLAGS so as to |
---|
80 | modify behavior of config/binmake.gmk |
---|
81 | in order to use TOP-C libraries and includes |
---|
82 | procgroup - Specifies which slave hosts to use, and where to put output |
---|
83 | For example: localhost 1 - > slave1.out |
---|
84 | host=`localhost', executable=`same as master', |
---|
85 | params of slave=`> slave1.out' (redirect output) |
---|
86 | If output not redirected, it goes to stdout on master. |
---|
87 | src/ParRunManager.cc - ParRunManger derived from G4RunManager |
---|
88 | replaces Gr4RunManager::DoEventLoop w/ TOP-C parallel loop, |
---|
89 | Adds certain local vars of DoEventLoop as ParRunManager members |
---|
90 | include/MarshaledObj.h - run-time utilities for marshalling |
---|
91 | include/MarshaledEx*Hit.h - marshals N04 hits (calorimeter hits) |
---|
92 | include/MarshaledG4*.h - marshaling routines for Geant4 data structures |
---|
93 | |
---|
94 | ~/slave*.out - Contains outputs of slave1, slave2, etc. |
---|
95 | Generated each time parallel ParN04 is executed. |
---|
96 | These files are specified in the file procgroup. |
---|
97 | |
---|
98 | ==================================================================== |
---|
99 | This version passes an event number to the slave and lets the |
---|
100 | slave generate the event. The slave passes back marshaled hits to |
---|
101 | the master. |
---|
102 | |
---|
103 | I will integrate the track level parallelism into this scenario at |
---|
104 | a later date. For the track level, I will generate several |
---|
105 | secondary tracks on the master, and then convert the secondary tracks |
---|
106 | to new events that can be passed to slaves. I will do this only if |
---|
107 | I detect that there are not enough initial events to fully occupy all |
---|
108 | the slaves. This scheme has the drawback that we are splitting an event |
---|
109 | into many events, which may make the summarization, histogram, and so |
---|
110 | on more difficult. However, track level parallelism will be triggered |
---|
111 | only when a very small number of events are generated. |
---|
112 | |
---|
113 | I also want to support postponing |
---|
114 | a track to the next event ( G4ClassificationOfNewTrack::fPostpone . |
---|
115 | To do this, each slave will wait to retire an event until it knows that |
---|
116 | the previous event has been retired. |
---|
117 | |
---|
118 | In addition, I plan to have only the master read commands and pass |
---|
119 | them to the slaves. Currently, the master and slaves each read |
---|
120 | identical commands. |
---|
121 | |
---|
122 | ==================================================================== |
---|
123 | If you are curious about some of the layers, the following |
---|
124 | stack trace [somewhat out of date now] gives some idea. |
---|
125 | [ This stack trace is from a run based on ParN02.] |
---|
126 | |
---|
127 | G4RunManager::BeamOn calls ParRunManager::DoEventLoop |
---|
128 | (since G4RunManager::DoEventLoop is virtual) |
---|
129 | ParRunManager::DoEventLoop calls TOPC_master_slave |
---|
130 | TOPC_master_slave calls submit_task_input |
---|
131 | submit_task_input eventually calls COMM_send_msg which calls MPI_Send |
---|
132 | (COMM_send_msg is the communication layer of TOPC; |
---|
133 | ParN04.cc was linked with the TOP-C MPI communication layer. |
---|
134 | The same source could have been linked with a POSIX threads layer, |
---|
135 | a communication layer, or some other communication layer. |
---|
136 | ) |
---|
137 | MPI_send calls send |
---|
138 | (where send is the socket system call of libc.so) |
---|
139 | |
---|
140 | (gdb) where |
---|
141 | #0 0x41946c62 in send () from /lib/libc.so.6 |
---|
142 | #1 0x400839c1 in send () at wrapsyscall.c:186 |
---|
143 | #2 0x805c547 in MPI_Send (buf=0x82690fc, count=4, datatype=3, dest=2, tag=1, comm=0) at sendrecv.c:236 |
---|
144 | #3 0x805a0b5 in COMM_send_msg (msg=0x82690fc, msg_size=4, dst=2, tag=TASK_INPUT_TAG) at comm-mpi.c:224 |
---|
145 | #4 0x805774e in send_task_input (slave=2, input={data = 0x82690fc, data_size = 4}, tag=TASK_INPUT_TAG) at topc.c:560 |
---|
146 | #5 0x8057aa8 in submit_task_input (input={data = 0x82690fc, data_size = 4}) at topc.c:659 |
---|
147 | #6 0x805813c in TOPC_master_slave (generate_task_input_=0x4003d2e4 <ParRunManager::GenerateEventInput(void)>, |
---|
148 | do_task_=0x4003d350 <ParRunManager::DoEvent(int *)>, check_task_result_=0x4003d420 <ParRunManager::CheckEventResult(int *, void *)>, |
---|
149 | update_shared_data_=0) at topc.c:922 |
---|
150 | #7 0x4003d18c in ParRunManager::DoEventLoop (this=0x80c0bf0, n_event=1, macroFile=0x0, n_select=-1) at src/ParRunManager.cc:51 |
---|
151 | #8 0x400b14d1 in G4RunManager::BeamOn () from /afs/cern.ch/user/c/cooperma/scratch-pcitapi07/geant4/lib/libG4run.so |
---|
152 | #9 0x400b870a in G4RunMessenger::SetNewValue () from /afs/cern.ch/user/c/cooperma/scratch-pcitapi07/geant4/lib/libG4run.so |
---|
153 | #10 0x4167157b in G4UIcommand::DoIt () from /afs/cern.ch/user/c/cooperma/scratch-pcitapi07/geant4/lib/libG4intercoms.so |
---|
154 | #11 0x416810a3 in G4UImanager::ApplyCommand () from /afs/cern.ch/user/c/cooperma/scratch-pcitapi07/geant4/lib/libG4intercoms.so |
---|
155 | #12 0x805db7e in G4UIterminal::ExecuteCommand () at /afs/cern.ch/sw/lhcxx/specific/redhat61/3.2.0/include/CLHEP/Random/Randomize.h:64 |
---|
156 | #13 0x805d42d in G4UIterminal::SessionStart () at /afs/cern.ch/sw/lhcxx/specific/redhat61/3.2.0/include/CLHEP/Random/Randomize.h:64 |
---|
157 | #14 0x8056a3d in main (argc=1, argv=0x80bfa00) at ParN02.cc:98 |
---|