[807] | 1 | |
---|
| 2 | ParGeant4: Geant4/TOP-C, a parallelization of Geant4 |
---|
| 3 | (event-level parallelism) |
---|
| 4 | |
---|
| 5 | Gene Cooperman |
---|
| 6 | Northeastern University |
---|
| 7 | gene@ccs.neu.edu, |
---|
| 8 | |
---|
| 9 | For the latest information on ParGeant4, see: |
---|
| 10 | http://www.ccs.neu.edu/home/gene/pargeant4.html |
---|
| 11 | Note that a version now exists that runs Geant4 over the Grid. |
---|
| 12 | Please write to gene@ccs.neu.edu for further information. |
---|
| 13 | To port other applications to a parallel version, read the |
---|
| 14 | files ../../info/PAR_INSTALL and ../../info/PAR_README. |
---|
| 15 | |
---|
| 16 | See the beginning of GNUmakefile for reasonable `make' targets to run it. |
---|
| 17 | To run it: |
---|
| 18 | 0. a. Follow the standard Geant4 installation procedure. |
---|
| 19 | b. Download and install TOP-C |
---|
| 20 | The TOP-C home page is at http://www.ccs.neu.edu/home/gene/topc.html |
---|
| 21 | cd <TOPC_INSTALL_DIR> |
---|
| 22 | gzip -dc topc.tar.gz | tar -xvf - |
---|
| 23 | cd topc |
---|
| 24 | ./configure |
---|
| 25 | make |
---|
| 26 | make check |
---|
| 27 | [ Copy bin/topc-config to your path ] |
---|
| 28 | c. Verify that the Geant4 example installs: |
---|
| 29 | cd $G4INSTALL/examples/extended/parallel/ParN04 |
---|
| 30 | make |
---|
| 31 | $G4WORKDIR/bin/$G4SYSTEM/ParN04 ParN04.in |
---|
| 32 | 2. make run |
---|
| 33 | [ By default, the included `procgroup' file creates two slave processes |
---|
| 34 | on localhost. ] |
---|
| 35 | [ Note that in addition to output on master, |
---|
| 36 | $G4WORKDIR/bin/$G4SYSTEM/slave*.out contains slave output. ] |
---|
| 37 | [ To remove intermediate files and start over: make parclean ] |
---|
| 38 | 3. Try running it with slave processes on remote processes. |
---|
| 39 | First, test that your local environment is set up correctly. |
---|
| 40 | Try: |
---|
| 41 | ssh <REMOTE_HOSTNAME> $G4WORKDIR/bin/$G4SYSTEM/ParN04 `pwd`/ParN04.in |
---|
| 42 | The above command needs to work without asking for a password. |
---|
| 43 | [ If you use dynamic libraries (*.so), make sure the LD_LIBRARY_PATH |
---|
| 44 | in your shell startup file (e.g. ~.tcshrc) includes both: |
---|
| 45 | $G4INSTALL/lib/$G4SYSTEM and $CLHEP_BASE_DIR/lib |
---|
| 46 | If you use AFS, you may need to type 'klog' to renew your AFS token. ] |
---|
| 47 | In `procgroup' file, replace `localhost' by desired remote hosts; |
---|
| 48 | Add additional remote hosts (additional slaves) if you like. |
---|
| 49 | Then: make run |
---|
| 50 | |
---|
| 51 | ============================================================================ |
---|
| 52 | If you read ParGNUmakefile, you'll find other things that you can |
---|
| 53 | modify. For example, all TOP-C additions are in conditionals: |
---|
| 54 | remove -DG4USE_TOPC from ParGNUmakefile and: |
---|
| 55 | make parclean; make run |
---|
| 56 | in order to re-compile and rerun without TOP-C. |
---|
| 57 | Define REMOTE_SHELL differently if you don't use `ssh' for a remote shell. |
---|
| 58 | (If undefined, ParGNUmakefile defines it to be `ssh') |
---|
| 59 | Define MACROFILE diferently to use a different set of input commands. |
---|
| 60 | Define MEM_MODEL=--seq |
---|
| 61 | to run with TOP-C, but using a single (sequential) process, suitable |
---|
| 62 | for easy debugging (via gdb, for example). |
---|
| 63 | Try: pushd $G4WORKDIR/bin/$G4SYSTEM/; ./ParN04 --TOPC-help |
---|
| 64 | to see TOP-C run-time options that can be invoked, such as |
---|
| 65 | pushd $G4WORKDIR/bin/$G4SYSTEM/; ./ParN04 --TOPC-num-slaves=5 ParN04.in |
---|
| 66 | Alternatively, modify TOPC_OPTIONS in ParGNUmakefile for the same effect. |
---|
| 67 | |
---|
| 68 | You can also try other targets: make run-debug |
---|
| 69 | This will run it under gdb, so you can single step to see what happens. |
---|
| 70 | make parclean - Start over with clean set of files. |
---|
| 71 | |
---|
| 72 | ============================================================================ |
---|
| 73 | New or modified files: |
---|
| 74 | ParN04.cc - Adds one line: #include "ParN04.icc" |
---|
| 75 | ParExample.icc inserts: #include "topc.h" |
---|
| 76 | and causes main to calls TOPC_init, TOPC_finalize, |
---|
| 77 | and to use: `new ParRunManager' instead of `new G4RunManager' |
---|
| 78 | GNUmakefile - Adds one line at beginning: include ParGNUmakefile |
---|
| 79 | ParGNUmakefile defines EXTRALIBS and CPPFLAGS so as to |
---|
| 80 | modify behavior of config/binmake.gmk |
---|
| 81 | in order to use TOP-C libraries and includes |
---|
| 82 | procgroup - Specifies which slave hosts to use, and where to put output |
---|
| 83 | For example: localhost 1 - > slave1.out |
---|
| 84 | host=`localhost', executable=`same as master', |
---|
| 85 | params of slave=`> slave1.out' (redirect output) |
---|
| 86 | If output not redirected, it goes to stdout on master. |
---|
| 87 | src/ParRunManager.cc - ParRunManger derived from G4RunManager |
---|
| 88 | replaces Gr4RunManager::DoEventLoop w/ TOP-C parallel loop, |
---|
| 89 | Adds certain local vars of DoEventLoop as ParRunManager members |
---|
| 90 | include/MarshaledObj.h - run-time utilities for marshalling |
---|
| 91 | include/MarshaledEx*Hit.h - marshals N04 hits (calorimeter hits) |
---|
| 92 | include/MarshaledG4*.h - marshaling routines for Geant4 data structures |
---|
| 93 | |
---|
| 94 | ~/slave*.out - Contains outputs of slave1, slave2, etc. |
---|
| 95 | Generated each time parallel ParN04 is executed. |
---|
| 96 | These files are specified in the file procgroup. |
---|
| 97 | |
---|
| 98 | ==================================================================== |
---|
| 99 | This version passes an event number to the slave and lets the |
---|
| 100 | slave generate the event. The slave passes back marshaled hits to |
---|
| 101 | the master. |
---|
| 102 | |
---|
| 103 | I will integrate the track level parallelism into this scenario at |
---|
| 104 | a later date. For the track level, I will generate several |
---|
| 105 | secondary tracks on the master, and then convert the secondary tracks |
---|
| 106 | to new events that can be passed to slaves. I will do this only if |
---|
| 107 | I detect that there are not enough initial events to fully occupy all |
---|
| 108 | the slaves. This scheme has the drawback that we are splitting an event |
---|
| 109 | into many events, which may make the summarization, histogram, and so |
---|
| 110 | on more difficult. However, track level parallelism will be triggered |
---|
| 111 | only when a very small number of events are generated. |
---|
| 112 | |
---|
| 113 | I also want to support postponing |
---|
| 114 | a track to the next event ( G4ClassificationOfNewTrack::fPostpone . |
---|
| 115 | To do this, each slave will wait to retire an event until it knows that |
---|
| 116 | the previous event has been retired. |
---|
| 117 | |
---|
| 118 | In addition, I plan to have only the master read commands and pass |
---|
| 119 | them to the slaves. Currently, the master and slaves each read |
---|
| 120 | identical commands. |
---|
| 121 | |
---|
| 122 | ==================================================================== |
---|
| 123 | If you are curious about some of the layers, the following |
---|
| 124 | stack trace [somewhat out of date now] gives some idea. |
---|
| 125 | [ This stack trace is from a run based on ParN02.] |
---|
| 126 | |
---|
| 127 | G4RunManager::BeamOn calls ParRunManager::DoEventLoop |
---|
| 128 | (since G4RunManager::DoEventLoop is virtual) |
---|
| 129 | ParRunManager::DoEventLoop calls TOPC_master_slave |
---|
| 130 | TOPC_master_slave calls submit_task_input |
---|
| 131 | submit_task_input eventually calls COMM_send_msg which calls MPI_Send |
---|
| 132 | (COMM_send_msg is the communication layer of TOPC; |
---|
| 133 | ParN04.cc was linked with the TOP-C MPI communication layer. |
---|
| 134 | The same source could have been linked with a POSIX threads layer, |
---|
| 135 | a communication layer, or some other communication layer. |
---|
| 136 | ) |
---|
| 137 | MPI_send calls send |
---|
| 138 | (where send is the socket system call of libc.so) |
---|
| 139 | |
---|
| 140 | (gdb) where |
---|
| 141 | #0 0x41946c62 in send () from /lib/libc.so.6 |
---|
| 142 | #1 0x400839c1 in send () at wrapsyscall.c:186 |
---|
| 143 | #2 0x805c547 in MPI_Send (buf=0x82690fc, count=4, datatype=3, dest=2, tag=1, comm=0) at sendrecv.c:236 |
---|
| 144 | #3 0x805a0b5 in COMM_send_msg (msg=0x82690fc, msg_size=4, dst=2, tag=TASK_INPUT_TAG) at comm-mpi.c:224 |
---|
| 145 | #4 0x805774e in send_task_input (slave=2, input={data = 0x82690fc, data_size = 4}, tag=TASK_INPUT_TAG) at topc.c:560 |
---|
| 146 | #5 0x8057aa8 in submit_task_input (input={data = 0x82690fc, data_size = 4}) at topc.c:659 |
---|
| 147 | #6 0x805813c in TOPC_master_slave (generate_task_input_=0x4003d2e4 <ParRunManager::GenerateEventInput(void)>, |
---|
| 148 | do_task_=0x4003d350 <ParRunManager::DoEvent(int *)>, check_task_result_=0x4003d420 <ParRunManager::CheckEventResult(int *, void *)>, |
---|
| 149 | update_shared_data_=0) at topc.c:922 |
---|
| 150 | #7 0x4003d18c in ParRunManager::DoEventLoop (this=0x80c0bf0, n_event=1, macroFile=0x0, n_select=-1) at src/ParRunManager.cc:51 |
---|
| 151 | #8 0x400b14d1 in G4RunManager::BeamOn () from /afs/cern.ch/user/c/cooperma/scratch-pcitapi07/geant4/lib/libG4run.so |
---|
| 152 | #9 0x400b870a in G4RunMessenger::SetNewValue () from /afs/cern.ch/user/c/cooperma/scratch-pcitapi07/geant4/lib/libG4run.so |
---|
| 153 | #10 0x4167157b in G4UIcommand::DoIt () from /afs/cern.ch/user/c/cooperma/scratch-pcitapi07/geant4/lib/libG4intercoms.so |
---|
| 154 | #11 0x416810a3 in G4UImanager::ApplyCommand () from /afs/cern.ch/user/c/cooperma/scratch-pcitapi07/geant4/lib/libG4intercoms.so |
---|
| 155 | #12 0x805db7e in G4UIterminal::ExecuteCommand () at /afs/cern.ch/sw/lhcxx/specific/redhat61/3.2.0/include/CLHEP/Random/Randomize.h:64 |
---|
| 156 | #13 0x805d42d in G4UIterminal::SessionStart () at /afs/cern.ch/sw/lhcxx/specific/redhat61/3.2.0/include/CLHEP/Random/Randomize.h:64 |
---|
| 157 | #14 0x8056a3d in main (argc=1, argv=0x80bfa00) at ParN02.cc:98 |
---|