| 1 | -------------------------------------------------------------------------------------
 | 
|---|
| 2 |  Comparaison performance de differentes machines en compilation / execution (calcul)
 | 
|---|
| 3 |                                    -----------------------
 | 
|---|
| 4 |     Mesures effectuees en Janvier 2007 ,       R. Ansari / C. Magneville
 | 
|---|
| 5 | -------------------------------------------------------------------------------------
 | 
|---|
| 6 | 
 | 
|---|
| 7 | (a) eros3 : Bipro-bicoeur Xeon@2.4 GHz Linux (xeon-lx-2.4GHz)  , gcc 3.2
 | 
|---|
| 8 | (b) ccali : Bipro-bicoeur Xeon@2.8 GHz Linux (xeon-lx-2.8GHz)  , icc 8.0 ou 9.0
 | 
|---|
| 9 |      Flags de compilation avec [-O -g]      
 | 
|---|
| 10 | (bb) grid49 : Bipro-bicoeur nouveau Xeon 5140 @ 2.33 GHz
 | 
|---|
| 11 | (bb4) grid50 : Bipro-quadri-coeur nouveau Xeon E5345 @ 2.33 GHz --> 8 coeurs
 | 
|---|
| 12 | (c) sgsda: AMD Bipro AMD opteron 248 @ 2.2 GHz (amd-lx-
 | 
|---|
| 13 |      Flags de compilation avec [-O -g]      
 | 
|---|
| 14 | (cc) grid-saclay: AMD opteron 275 Bipro-bicoeur  @ 2.2 GHz (amd275-lx)
 | 
|---|
| 15 |      Flags de compilation avec [-O -g]      
 | 
|---|
| 16 | 
 | 
|---|
| 17 | (d) asc: bipro alpha (@ ~1 GHz) server DS20 OSF (osf1)  , cxx 6.5 (osf-asc)
 | 
|---|
| 18 | (nouveau asc 420 MFLOPS, moins puissante que l'ancien asc 800 MFLOPS)
 | 
|---|
| 19 | (e) xp1000-dapnia: alpha xp1000 @ ~ 600 MHz ? OSF1 , cxx ? (osf-xp1000)
 | 
|---|
| 20 | (e') cool: alpha xp1000 @ ~ 667 MHz ? OSF1 5.1 , cxx 6.3 
 | 
|---|
| 21 | (f) superosf-dapnia: multi-proc alphaServer ES80 6 procs EV7 @ 1 GHz (super-osf)
 | 
|---|
| 22 | 
 | 
|---|
| 23 | (g) ccsvx01: XServe G5 bipro @~1.8-2GHz (Darwin/OSX) (G5-osx-2GHz) , gcc 3.3
 | 
|---|
| 24 | (h) PowerBook-Reza : Apple G4 @ 1.25 GHz (G4-osx-1.25GHz) , gcc 3.3
 | 
|---|
| 25 | (i) MacBook-Reza: Apple/ Core double-coeur Intel @ 1.83 GHz (core-osx-1.83GHz) gcc 4
 | 
|---|
| 26 | (j) MacPro-Grosdidier : Apple / Xeon 2 double-coeur @ 3 GHz gcc 4.0.1 , compil SOPHYA -O2 -g
 | 
|---|
| 27 | 
 | 
|---|
| 28 | (p) IBM-AIX regatta , xlC , IBM eServer pSeries 655 , 8 proc power4 @ 1.1 GHz
 | 
|---|
| 29 | (q) IBM-AIX meso , AIX 5.3, xlC V8 , IBM Power5 , 8 proc bi-coeur P575 @ 1.9 GHz 
 | 
|---|
| 30 | 
 | 
|---|
| 31 | (s) SGI-IRIX64 magique, CC 
 | 
|---|
| 32 | 
 | 
|---|
| 33 | NOTES : 
 | 
|---|
| 34 | - Sur les machines Xeon, il y a une interaction entre process / threads par rapport a 
 | 
|---|
| 35 | l'occupation des CPU's. On perd un facteur 3 en performance multi-threads/multi-taches.
 | 
|---|
| 36 | La machine MacPro avec OSX se debrouille quand meme mieux.
 | 
|---|
| 37 | - Effet du systeme ou carte mere ??? 
 | 
|---|
| 38 | 
 | 
|---|
| 39 | Flag de compilation 
 | 
|---|
| 40 | - Flag de compilation par defaut [-O -g] en general
 | 
|---|
| 41 | - Sur eros3 (xeon-linux gcc 3.3) [-O -g] OU [-O3 -g]
 | 
|---|
| 42 | - Sur Darwin [-g] ou [-O2 -g] (ou [-tune G5] sur XServe G5)
 | 
|---|
| 43 |    Sur les mac (en particulier G4/G5), grande difference entre -g et -Ox -g
 | 
|---|
| 44 |    mais peu de difference entre -O -O2 -O3  
 | 
|---|
| 45 | - Sur machine aix-meso [-O -g] ou [-O3 -g]
 | 
|---|
| 46 | 
 | 
|---|
| 47 | X/ Performances brutes cpupower et donnees SPEC ((http//www.spec.org) 
 | 
|---|
| 48 | ----------------------------------------------------------------------
 | 
|---|
| 49 | 
 | 
|---|
| 50 | (1) MFLOPS  -> cpupower 2   (x/y : -O -g / -O3) 
 | 
|---|
| 51 | SPECint2000 (3) / SPECfp2000 (2) (http//www.spec.org) 
 | 
|---|
| 52 | 
 | 
|---|
| 53 | X.1/ Performances en calcul double
 | 
|---|
| 54 | csh> cpupower 0 3000000  5 
 | 
|---|
| 55 |      3 10^6 operations doubles - sur memoire 3x3 10^6 doubles (~50 MO)
 | 
|---|
| 56 |       ===> ~ 24 MO / MFLOPS
 | 
|---|
| 57 | csh> cpupower 2
 | 
|---|
| 58 |      1.6 10^9 operations doubles - sur 3x20000 doubles (~0.5 MO)
 | 
|---|
| 59 | 
 | 
|---|
| 60 | 
 | 
|---|
| 61 | Compilation avec -O  (optimisation)
 | 
|---|
| 62 |   (1) cpupower 0 : debit memoire en MO/s
 | 
|---|
| 63 |   (2) cpupower 0  , MFLOPS   
 | 
|---|
| 64 |   (5) cpupower 2 ,  MFLOPS 
 | 
|---|
| 65 | 
 | 
|---|
| 66 | Compilation avec -g (debug / sans optimisation)
 | 
|---|
| 67 |   (3) cpupower 0  , MFLOPS  
 | 
|---|
| 68 |   (6) cpupower 2  , MFLOPS 
 | 
|---|
| 69 | 
 | 
|---|
| 70 | Compilation avec -O3 ou -fast ...( optimisation poussee) 
 | 
|---|
| 71 |   (4) cpupower 0  , MFLOPS  
 | 
|---|
| 72 |   (7) cpupower 2  , MFLOPS 
 | 
|---|
| 73 | 
 | 
|---|
| 74 | 
 | 
|---|
| 75 | ----------------------------------------------------------------------------------------------
 | 
|---|
| 76 |         MFLOPS       |(1) MO/s|   (2)     (3)      (4)   |    (5)       (6)       (7)      
 | 
|---|
| 77 | ----------------------------------------------------------------------------------------------
 | 
|---|
| 78 | (a)xeon-lx-2.4GHz    | 1290   |   53      53       55    |    338       340       320
 | 
|---|
| 79 | (b)xeon-lx-2.8GHzicc | 2040   |   85      80       83    |    914       409       914
 | 
|---|
| 80 | (bb)n-xeon-grid49    | 3000   |   125     85       130   |    660       500       660
 | 
|---|
| 81 | (bb)n-xeon4c-grid50  | 2568   |   107     103      109   |    655       500       655
 | 
|---|
| 82 | (c)amd-lx            | 1560   |   65      77       68    |    666       314       686 
 | 
|---|
| 83 | (cc)amd2-lx          |        |
 | 
|---|
| 84 | 
 | 
|---|
| 85 | (e')osf-cool         |  768   |   32      15       32    |    630       150       660     
 | 
|---|
| 86 | (f)superosf               
 | 
|---|
| 87 | 
 | 
|---|
| 88 | (g)G5-osx-1 GHz      | 2100   |   88      68       88    |   1000       255      1073
 | 
|---|
| 89 | (f)G4-osx-1.25GHz    |  600   |   25      16       25    |    417        93       430 
 | 
|---|
| 90 | (i)core-osx-1.83GHz  | 2500   |  107      75      107    |    855       309       884
 | 
|---|
| 91 | (j)xeon-osx            
 | 
|---|
| 92 | 
 | 
|---|
| 93 | (p)ibm-aix-regatta   | 3100   |  130      55      133    |    730       115      1750  (32 bits)
 | 
|---|
| 94 | (p)ibm-aix-meso      | 5700   |  240      70      320    |   1500       220      3400  (32 bits)
 | 
|---|
| 95 | 
 | 
|---|
| 96 | (s)sgi-magique       |  336   |  14       7       15     |    340        40       460  (32 bits)  
 | 
|---|
| 97 | ----------------------------------------------------------------------------------------------
 | 
|---|
| 98 | 
 | 
|---|
| 99 | X.2/  Comparaison performances int, float double 
 | 
|---|
| 100 |   cpupower compile avec -O 
 | 
|---|
| 101 | 
 | 
|---|
| 102 | (1) float , cpupowerF 0 3000000 5 / cpupowerF 2
 | 
|---|
| 103 |     -> MFLOPS (puissance de calcul sur float)
 | 
|---|
| 104 | (2) double, cpupowerD 0 3000000 5 / cpupowerF 2  (idem tableau X.1)
 | 
|---|
| 105 |     -> MDBLOPS (puissance de calcul sur float)
 | 
|---|
| 106 | (3) int, cpupowerI 0 3000000 5 / cpupowerI 2
 | 
|---|
| 107 |     -> MINTOPS  (puissance de calcul sur int=4 bytes)
 | 
|---|
| 108 | (4) long (ou long long (*)) cpupowerL 0 3000000 5 / cpupowerL 2
 | 
|---|
| 109 |     -> MLONOPS  (puissance de calcul sur long=8 bytes)
 | 
|---|
| 110 | ----------------------------------------------------------------------------------------------
 | 
|---|
| 111 |         MFLOPS       |   (1)MFLOPS       (2)MDBLOPS       (3)MINTOPS       (4)MLONOPS 
 | 
|---|
| 112 | ----------------------------------------------------------------------------------------------
 | 
|---|
| 113 | (a)xeon-lx-2.4GHz    | 
 | 
|---|
| 114 | (b)xeon-lx-2.8GHzicc |    166/905         90/900           166/1500         88/522    (*)
 | 
|---|
| 115 | (bb)nxeon-grid49     |    250/1030        125/660          250/2500         125/2280
 | 
|---|
| 116 | (bb4)xeon-4c-grid50  |    207/1019        110/660          207/2460         107/2285
 | 
|---|
| 117 | (c)amd-lx            |    125/695         65/675           125/1570         65/1045
 | 
|---|
| 118 | (cc)amd2-lx          | 
 | 
|---|
| 119 | 
 | 
|---|
| 120 | (e')osf-cool         |    60/635          32/631            62/640          31/630
 | 
|---|
| 121 | (f)superosf               
 | 
|---|
| 122 | 
 | 
|---|
| 123 | (g)G5-osx-1 GHz      |    180/1260        90/1150           165/940         81/280    (*)
 | 
|---|
| 124 | (f)G4-osx-1.25GHz    |    45/430          25/410            45/710          24/190    (*)
 | 
|---|
| 125 | (i)core-osx-1.83GHz  |    185/919         105/855           187/935         62/246    (*)
 | 
|---|
| 126 | (j)xeon-osx            
 | 
|---|
| 127 | 
 | 
|---|
| 128 | (p)ibm-aix-regatta   | 
 | 
|---|
| 129 | (p)ibm-aix-meso      |    250/1150        250/1500          250/1200        50/200     (32 bits)
 | 
|---|
| 130 |                      |    280/1500        250/1600          250/1100        210/1000   (64 bits -q64)
 | 
|---|
| 131 | 
 | 
|---|
| 132 | (s)sgi-magique       |  
 | 
|---|
| 133 | ----------------------------------------------------------------------------------------------
 | 
|---|
| 134 | 
 | 
|---|
| 135 | X.3/  Comparaison avec SPEC 
 | 
|---|
| 136 | csh>  cpupower 0 / cpupower 2 
 | 
|---|
| 137 | ----------------------------------------------------------------------
 | 
|---|
| 138 |                          MFLOPS(1)      SPECfp      SPECint 
 | 
|---|
| 139 | ----------------------------------------------------------------------
 | 
|---|
| 140 | (b)xeon-lx-2.8GHz         166/900        1400        1400
 | 
|---|
| 141 | (c)amd-lx                 125/690        1600        1300
 | 
|---|
| 142 | (cc)amd2-lx               675            1600        1300
 | 
|---|
| 143 | 
 | 
|---|
| 144 | (e)osf-xp1000             32/650         500         400
 | 
|---|
| 145 | (f)superosf               842            1100        700
 | 
|---|
| 146 | 
 | 
|---|
| 147 | (i)core-osx-1.83GHz       110/850        1400        1500    
 | 
|---|
| 148 | (j)xeon-osx               2600           2900          -
 | 
|---|
| 149 | 
 | 
|---|
| 150 | (p)ibm-aix-regatta        130/700        1050        700     
 | 
|---|
| 151 | ----------------------------------------------------------------------
 | 
|---|
| 152 | 
 | 
|---|
| 153 | 
 | 
|---|
| 154 | A/ Compilation tout SOPHYA 
 | 
|---|
| 155 | ----------------------------
 | 
|---|
| 156 | csh> time make all   (1)
 | 
|---|
| 157 | ou 
 | 
|---|
| 158 | csh> time make -j 2 all  (2)
 | 
|---|
| 159 |   Temps CPU 
 | 
|---|
| 160 |   Indice de performance 100*(1000/TCPU) 
 | 
|---|
| 161 |   Temps elapsed (vrai)
 | 
|---|
| 162 |   Temps vrai / TCPU
 | 
|---|
| 163 | 
 | 
|---|
| 164 | 
 | 
|---|
| 165 | ----------------------------------------------------------------------
 | 
|---|
| 166 |                          CPU(s)  IndPerf   TElapsed , TCPU/Elapsed %
 | 
|---|
| 167 | ----------------------------------------------------------------------
 | 
|---|
| 168 | (a)xeon-lx-2.4GHz (2)    615 s      162       410 s        150%  
 | 
|---|
| 169 |       avec -O3 -g (2)   1300 s       77       760 s        172%  
 | 
|---|
| 170 | (b)xeon-lx-2.8GHz (2)    755 s      132       540 s        140%
 | 
|---|
| 171 | (c)amd-lx         (2)    336 s      297       175 s        192%
 | 
|---|
| 172 | 
 | 
|---|
| 173 | (d)osf-asc (1)          1920 s       52      2340 s        83%   (??)
 | 
|---|
| 174 | (e)osf-xp1000 (1)        533 s      187       660 s        80%
 | 
|---|
| 175 | (f)superosf (1)          895 s      112       910 s        98%
 | 
|---|
| 176 | 
 | 
|---|
| 177 | (g)G5-osx-2GHz (2)       453 s      221       250 s        182%
 | 
|---|
| 178 |     -tune=G5            1100 s       90
 | 
|---|
| 179 |     -g -O                740 s                380 s        195%
 | 
|---|
| 180 | (h)G4-osx-1.25GHz (1)    660 s      151       710 s        93%   [-g]
 | 
|---|
| 181 |                         1500 s                             94%   [-O2 -g]
 | 
|---|
| 182 | (i)core-osx-1.83GHz (2)  209 s      478       116 s        180%
 | 
|---|
| 183 |               -O2   (1)  367 s      272       381          96%    
 | 
|---|
| 184 | (j)xeon-osx
 | 
|---|
| 185 | 
 | 
|---|
| 186 | (p)ibm-aix
 | 
|---|
| 187 | ----------------------------------------------------------------------
 | 
|---|
| 188 | 
 | 
|---|
| 189 | Taille shared libs : 
 | 
|---|
| 190 | (a)
 | 
|---|
| 191 | (c) 33 MO   
 | 
|---|
| 192 | (f) = (e) = 57 MO 
 | 
|---|
| 193 | (g) 80 MO
 | 
|---|
| 194 | (i) 83 MO
 | 
|---|
| 195 | 
 | 
|---|
| 196 | B/ Calcul brut (Tableaux de SOPHYA) avec / sans threads
 | 
|---|
| 197 | --------------------------------------------------------
 | 
|---|
| 198 | B.1.a/   Calcul sur vecteur 10 * V2 ~= DLO4 (V1) 
 | 
|---|
| 199 |          ~ 10 x 10 x 9. 10^6 operations double sur 2 x  9 10^6 double    
 | 
|---|
| 200 |          900 M.Ops r_8 / ~ 1500 MO 
 | 
|---|
| 201 | 
 | 
|---|
| 202 | (1) time cpupower 0     # compile avec -O  (/ -O -g)
 | 
|---|
| 203 | (2) time zthr arrdl 1 3000   1 thread
 | 
|---|
| 204 | (3) time zthr arrdl 2 3000   2 thread
 | 
|---|
| 205 | (4) time zthr arrdl 4 3000   4 thread
 | 
|---|
| 206 | (5) time zthr arrdl 6 3000   6 thread
 | 
|---|
| 207 | (6) time zthr arrdl 8 3000   8 thread
 | 
|---|
| 208 | 
 | 
|---|
| 209 | -----------------------------------------------------------------------------------
 | 
|---|
| 210 |                      (1)MFLOPS  (2)CPU/Elap/%   (3)CPU/Elap/%   (4)CPU/Elap/%
 | 
|---|
| 211 | -----------------------------------------------------------------------------------
 | 
|---|
| 212 | (a)xeon-lx-2.4GHz      53        
 | 
|---|
| 213 | (b)xeon-lx-2.8GHz      85       2.6/2.6/100%    5.3/2.9/180%   14.3/4.86/310% 
 | 
|---|
| 214 |                                    (5) 23/7.4/314%
 | 
|---|
| 215 | (bb)nxeon-grid49       125      2/2/99%         4/2/186%       8/2.45/326%
 | 
|---|
| 216 |                                    (5) 12/3.6/330%   (6) 16/4.7/340%
 | 
|---|
| 217 | (bb4)xeon-4c-grid50    110      2.2/2.2/99%     4.2/2.3/185%   8/2.5/321%
 | 
|---|
| 218 |                                    (5) 13/3.1/470%   (6) 21/3.8/544%
 | 
|---|
| 219 | (c)amd-lx              95        
 | 
|---|
| 220 |                                  
 | 
|---|
| 221 |                                   
 | 
|---|
| 222 | (e')osf-cool           32       5.7/5.8/98%     11.1/11.3/98%   22.3/22.5/98% 
 | 
|---|
| 223 | (f)superosf                    
 | 
|---|
| 224 | 
 | 
|---|
| 225 | (g)G5-osx-2GHz         88       2.5/2.6/99%     5.9/3.38/184%    11/6.45/173%    [-O2 -g]
 | 
|---|
| 226 | (h)G4-osx-1.25GHz      25       6.6/7/95%       13.4/13.8/97%                    [-O2 -g]
 | 
|---|
| 227 | (i)core-osx-1.83GHz   107       2.1/2.1/98%     4.3/2.9/150%     8.3/30/31%      [-O2 -g]
 | 
|---|
| 228 | (j)xeon-osx           
 | 
|---|
| 229 | 
 | 
|---|
| 230 | (p)ibm-aix-regatta    130        
 | 
|---|
| 231 | (q)ibm-aix-meso       150       0.7/1/70%       1.2/2./60%       3.2/2/150%   [-O3]
 | 
|---|
| 232 |                                   (5) 5.4/3/180%    (6) 6.4/3/210% 
 | 
|---|
| 233 | 
 | 
|---|
| 234 | (s)sgi-magique          7       78/78/99%       167/95/175%      339/96/352%     [-O -g: NON-OPT]
 | 
|---|
| 235 |                        14       16.4/16.5/99%   33.8/22.4/150%   79/32/250%      [-O -g2 OPT]
 | 
|---|
| 236 |  -----------------------------------------------------------------------------------
 | 
|---|
| 237 | 
 | 
|---|
| 238 | B.1.b/   Calcul sur vecteur V2 = Sin(V1) + Cos(V1) 
 | 
|---|
| 239 |          ~ 50 x 9. 10^6 operations double sur 2 x  9 10^6 double, mem ~ 150 MO 
 | 
|---|
| 240 |          ~500 M.Ops r_8 / ~ 600 MO I/O
 | 
|---|
| 241 | 
 | 
|---|
| 242 | (1) time cpupower 0     # compile avec -O  (/ -O -g)
 | 
|---|
| 243 | (2) time zthr arrmf 1 3000   1 thread
 | 
|---|
| 244 | (3) time zthr arrmf 2 3000   2 thread
 | 
|---|
| 245 | (4) time zthr arrmf 4 3000   4 thread
 | 
|---|
| 246 | (5) time zthr arrmf 6 3000   6 thread
 | 
|---|
| 247 | (6) time zthr arrmf 8 3000   8 thread
 | 
|---|
| 248 | 
 | 
|---|
| 249 | -----------------------------------------------------------------------------------
 | 
|---|
| 250 |                      (1)MFLOPS  (2)CPU/Elap/%   (3)CPU/Elap/%   (4)CPU/Elap/%
 | 
|---|
| 251 | -----------------------------------------------------------------------------------
 | 
|---|
| 252 | (a)xeon-lx-2.4GHz      53        
 | 
|---|
| 253 | (b)xeon-lx-2.8GHz      85       1.7/1.7/100%    3.5/2.1/173%     9.8/3.6/275%
 | 
|---|
| 254 |                                    (5) 12/3.6/330%    (6) 16/4.7/340%
 | 
|---|
| 255 | (bb)nxeon-grid49       125      1.6/1.6/100%    3.2/1.7/183%     6.7/2.1/314% 
 | 
|---|
| 256 |                                    (5) 10.1/3.2/320%  (6) 14.4/4.05/330%
 | 
|---|
| 257 | (c)amd-lx              95        
 | 
|---|
| 258 |                                  
 | 
|---|
| 259 | (e')osf-cool           32       4.2/4.3/98%     8.2/8.4/98%      16.1/16.2/98%
 | 
|---|
| 260 | (f)superosf                    
 | 
|---|
| 261 | 
 | 
|---|
| 262 | (g)G5-osx-2GHz         88       2.3/2.3/100%      5/3/165%        9.6/5.8/167%  [-O2 -g]
 | 
|---|
| 263 | (h)G4-osx-1.25GHz      25       4.5/4.8/95%       10.9/14.6/72%        [-O2 -g]
 | 
|---|
| 264 | (i)core-osx-1.83GHz   107       2.3/2.3/98%       4.8/3.1/158%                [-O2 -g]
 | 
|---|
| 265 | (j)xeon-osx           
 | 
|---|
| 266 | 
 | 
|---|
| 267 | (p)ibm-aix-regatta    130        
 | 
|---|
| 268 | (q)ibm-aix-meso       150       1./2/50%         2.8/3/86%       5.4/4/130%   [-O3]
 | 
|---|
| 269 |                                      (5) 10/4/250%    (6) 11.2/5/220%% 
 | 
|---|
| 270 | 
 | 
|---|
| 271 | (s)sgi-magique         7       11.5/11.7/99%     24/17/140%      51.5/18.4/280%  [-O -g NON-OPT]
 | 
|---|
| 272 |                       14       6.5/6.6/99%       13.3/12/110%    34.5/17.3/200%  [-O -g3 OPT]
 | 
|---|
| 273 |  -----------------------------------------------------------------------------------
 | 
|---|
| 274 | 
 | 
|---|
| 275 | 
 | 
|---|
| 276 | B.1.c/ Version corrige de zthr.cc (apres 23/05/07) 
 | 
|---|
| 277 |          arr = (c1*a1) + (c2*a2) 
 | 
|---|
| 278 |          ~ 3 x 4. 10^6 operations int_4 sur 3 x 4 10^6 int_4    
 | 
|---|
| 279 |          12 M.Ops int_4 / ~ 50 MO 
 | 
|---|
| 280 | 
 | 
|---|
| 281 | (1) time cpupower 0     # compile avec -O  (/ -O -g)
 | 
|---|
| 282 | (2) time zthr arr 1 2000   1 thread
 | 
|---|
| 283 | (3) time zthr arr 2 2000   2 thread
 | 
|---|
| 284 | (4) time zthr arr 4 2000   4 thread
 | 
|---|
| 285 | (5) time zthr arr 6 2000   6 thread
 | 
|---|
| 286 | (6) time zthr arr 8 2000   8 thread
 | 
|---|
| 287 | 
 | 
|---|
| 288 | -----------------------------------------------------------------------------------
 | 
|---|
| 289 |                      (1)MFLOPS  (2)CPU/Elap/%   (3)CPU/Elap/%   (4)CPU/Elap/%
 | 
|---|
| 290 | -----------------------------------------------------------------------------------
 | 
|---|
| 291 | (a)xeon-lx-2.4GHz      53        0.5/1/43%      1/1.1/88%      2.8/1/262%
 | 
|---|
| 292 |                                     (5) 4.5/1.8/246%      (6) 6.1/2.1/310% 
 | 
|---|
| 293 |                                   
 | 
|---|
| 294 |         
 | 
|---|
| 295 | (b)xeon-lx-2.8GHz      65        
 | 
|---|
| 296 |                                   
 | 
|---|
| 297 | (c)amd-lx              95        0.23/1/22%     0.44/1/51%       1/1/102%     [-O -g]
 | 
|---|
| 298 |                                      (5) 1.6/1/106%   (6) 2.2/1.2/100% 
 | 
|---|
| 299 | 
 | 
|---|
| 300 | 
 | 
|---|
| 301 |                                   
 | 
|---|
| 302 | (e')osf-cool           32        0.43/1.2/35%   0.6/1.33/44%     1.1/1.3/82%      [-O -g]
 | 
|---|
| 303 |                                      (5) 1.45/1.7/85%   (6) 1.83/2.16/84%         
 | 
|---|
| 304 | (f)superosf                    
 | 
|---|
| 305 | 
 | 
|---|
| 306 | (g)G5-osx-2GHz         88       1.5/1.5/100%    3.2/1.7/185%      6.6/3.5/188%    [-O -g]
 | 
|---|
| 307 | (g)G5-osx-2GHz         88       0.4/1/40%       0.9/1.0/90%       2/1.2/169%      [-tune=G5 -g]
 | 
|---|
| 308 |                                      (5) 3.3/2/164%    (6) 4.3/2.6/165%
 | 
|---|
| 309 | (h)G4-osx-1.25GHz      25       3/3/95%                                           [-O2 -g]
 | 
|---|
| 310 |                                   
 | 
|---|
| 311 | (i)core-osx-1.83GHz               [-O2 -g]
 | 
|---|
| 312 | 
 | 
|---|
| 313 | (j)xeon-osx           
 | 
|---|
| 314 | 
 | 
|---|
| 315 | 
 | 
|---|
| 316 | (p)ibm-aix-regatta   130        
 | 
|---|
| 317 | 
 | 
|---|
| 318 | (q)ibm-aix-meso      150        0.6/1/58%       1/1/91%           1.7/1.2/132%    [-O3]
 | 
|---|
| 319 |                                      (5) 2.4/1.2/193%   (6) 4.25/1.6/265%      
 | 
|---|
| 320 | 
 | 
|---|
| 321 |  -----------------------------------------------------------------------------------
 | 
|---|
| 322 | 
 | 
|---|
| 323 | B.1.x/ ancienne version de zthr (avant 23/05/07) 
 | 
|---|
| 324 |          On faisait 2 multiplications par ctye suivi d'un produit matriciel !
 | 
|---|
| 325 |          arr = c1*a1*c2*a2   ( ~ 3 10^6 op. double)
 | 
|---|
| 326 | (1) time cpupower 2     # compile avec -O3  (/ -O -g)
 | 
|---|
| 327 | (2) time zthr arr 1 1000   1 thread
 | 
|---|
| 328 | (3) time zthr arr 2 1000   2 thread
 | 
|---|
| 329 | (4) time zthr arr 4 1000   4 thread
 | 
|---|
| 330 | (5) time zthr arr 6 1000   6 thread
 | 
|---|
| 331 | (6) time zthr arr 8 1000   8 thread
 | 
|---|
| 332 | 
 | 
|---|
| 333 | -----------------------------------------------------------------------------------
 | 
|---|
| 334 |                      (1)MFLOPS  (2)CPU/Elap/% (2)IndPerf  (3)CPU/Elap/% (3)IndPerf
 | 
|---|
| 335 | -----------------------------------------------------------------------------------
 | 
|---|
| 336 | (a)xeon-lx-2.4GHz     1167        5.15/5.2/99%              11.4/5.8/196%
 | 
|---|
| 337 |                                            (4)36.6/9.28/394%
 | 
|---|
| 338 |         -O3 -g                    4.9/5./99%
 | 
|---|
| 339 | (b)xeon-lx-2.8GHz      920        2.3/2.3/100%              6.2/3.1/198%
 | 
|---|
| 340 |                                            (4)26/6.6/396%
 | 
|---|
| 341 | (c)amd-lx              690        3.6/3.6/99%               6.8/4/171%
 | 
|---|
| 342 |                                            (4)13.5/7/193%
 | 
|---|
| 343 |                                            (5)20.3/10.23/198%
 | 
|---|
| 344 | (cc)amd2-lx            675        2/2/99%                   4.15/2.1/197%
 | 
|---|
| 345 |                                            (4)8.25/4.15/198%
 | 
|---|
| 346 |                                            (5)13.6/4.6/292%
 | 
|---|
| 347 |                                            (6)19.8/6.5/300%
 | 
|---|
| 348 | 
 | 
|---|
| 349 | (d)osf-asc             420        6.3s/6.5s/99%            16.9/8.8/192%   
 | 
|---|
| 350 |                                            (4)29.9/15.7/191%
 | 
|---|
| 351 | (e)osf-xp1000          648        5.1/5.3/96.6%            11.4/11.4/99%       
 | 
|---|
| 352 |                                            (4)25.2/25.5/99%
 | 
|---|
| 353 | (f)superosf            842        2.87/2.88/99.6%          6.25/4.1/153%
 | 
|---|
| 354 |                                            (4)11.6/3.06/379%          
 | 
|---|
| 355 | 
 | 
|---|
| 356 | (h)G4-osx-1.25GHz       92        44s/48s/91%              86.7/99.8/92%  [-g]
 | 
|---|
| 357 |                        380        12.2/12.9/95%            24/25.3/95%    [-O2 -g]                   
 | 
|---|
| 358 | (g)G5-osx-2GHz        1151        20s/20s/99%              40s/23s/170%
 | 
|---|
| 359 |                                            (4) 80.8/45/180%
 | 
|---|
| 360 |    -O -g                          4.5/4.9/91%              9.3/4.7/197%
 | 
|---|
| 361 |                                            (4) 18.3/9.4/197%
 | 
|---|
| 362 |    -tune=G5                       3.35/3.8/88%             7.1/3.6/196%
 | 
|---|
| 363 | (h)G4-osx-1.25GHz       92        44s/48s/91%              86.7/99.8/92%  [-g]
 | 
|---|
| 364 |                        380        12.2/12.9/95%            24/25.3/95%    [-O2 -g]                   
 | 
|---|
| 365 |                                            (4) 14/7.5/187%
 | 
|---|
| 366 | (i)core-osx-1.83GHz    855        11.5/11.5/100%           23/11.6/192%   [-g]
 | 
|---|
| 367 |                                            (4) 46/23/199%
 | 
|---|
| 368 |               -O2                 3.85/3.89/99%            7.7/3.9/198%   [-O2 -g]
 | 
|---|
| 369 |                                            (4) 15.4/7.77/198%
 | 
|---|
| 370 | 
 | 
|---|
| 371 | (j)xeon-osx           2600        2.5/2.5/100%             5.1/2.6/199%
 | 
|---|
| 372 |                                            (4) 11.5/3.2/362%
 | 
|---|
| 373 |                                            (5) 17.4/4.77/365%
 | 
|---|
| 374 | 
 | 
|---|
| 375 | (p)ibm-aix-regatta  1750/730      6.8/6.9/98%              13.1/6.75/195%
 | 
|---|
| 376 |                                            (4) 26.3/11.7/225%
 | 
|---|
| 377 | (q)ibm-aix-meso     3600/1250     3.6/3.75/96%             7.35/3.7/197%
 | 
|---|
| 378 |                                            (4) 12.46/4.2/298%
 | 
|---|
| 379 |                                            (5) 219/6.7/280%
 | 
|---|
| 380 |                                            (6) 24/4.5/530%
 | 
|---|
| 381 | 
 | 
|---|
| 382 | 
 | 
|---|
| 383 | (s)sgi-magique         460        60/60/99%       
 | 
|---|
| 384 |  -----------------------------------------------------------------------------------
 | 
|---|
| 385 | 
 | 
|---|
| 386 | 
 | 
|---|
| 387 | B.2/ Multiplication de matrices mtx = mtx1 * mtx2 
 | 
|---|
| 388 |      ~ 2  10^9 op. double / thread
 | 
|---|
| 389 | (1) time cpupower 2  (-O3 / -O -g)
 | 
|---|
| 390 | (2) time zthr mtx 1 1000   1 thread
 | 
|---|
| 391 | (3) time zthr mtx 2 1000   2 thread
 | 
|---|
| 392 | (4) time zthr mtx 4 1000   4 thread
 | 
|---|
| 393 | (5) time zthr mtx 6 1000   6 thread
 | 
|---|
| 394 | (6) time zthr mtx 8 1000   8 thread
 | 
|---|
| 395 | 
 | 
|---|
| 396 | -----------------------------------------------------------------------------------
 | 
|---|
| 397 |                      (1)MFLOPS  (2)CPU/Elap/% (2)IndPerf  (3)CPU/Elap/% (3)IndPerf
 | 
|---|
| 398 | -----------------------------------------------------------------------------------
 | 
|---|
| 399 | (a)xeon-lx-2.4GHz     1167        6.5/6.5/100%               17.4/8.8/198%
 | 
|---|
| 400 |                                            (4) 80.5/20.3/397%
 | 
|---|
| 401 |                                            (5) 114.5/29.6/387%
 | 
|---|
| 402 |                                            (6) 160/40.3/387%
 | 
|---|
| 403 |                                            
 | 
|---|
| 404 | (b)xeon-lx-2.8GHz      920        3.4/3.4/100%               12/6.1/199%
 | 
|---|
| 405 |                                            (4) 55.8/14/400%
 | 
|---|
| 406 |                                            (5) 79.5/20.3/392%
 | 
|---|
| 407 |                                            (6) 102/25.8/396%
 | 
|---|
| 408 | (bb)nxeon-grid49       660        4.3/4.3/100%               9.3/4.7/200%
 | 
|---|
| 409 |                                            (4) 28.8/7.3/390%
 | 
|---|
| 410 |                                            (5) 41/10.4/393%
 | 
|---|
| 411 |                                            (6) 57.5/14.45/397%
 | 
|---|
| 412 | (bb4)xeon-4c-grid50    660        4.7/4.7/100%               10.7/5.15/199%
 | 
|---|
| 413 |                                            (4) 27.8/7.1/391%
 | 
|---|
| 414 |                                            (5) 57.4/10.6/540%
 | 
|---|
| 415 |                                            (6) 129/16.8/776%
 | 
|---|
| 416 | (c)amd-lx              690        6.98/6.98/100%             14.1/8.15/173%
 | 
|---|
| 417 |                                            (4) 27.7/14.23/194%
 | 
|---|
| 418 |                                            (5) 41.4/21.07/196%
 | 
|---|
| 419 |                                            (6) 55.4/27.9/198.7%  
 | 
|---|
| 420 | (cc)amd2-lx            675        4.1/4.1/100%               9.55/4.8/198%
 | 
|---|
| 421 |                                            (4) 20/10.27/195%
 | 
|---|
| 422 |                                            (5) 32.8/11.16/294%
 | 
|---|
| 423 |                                            (6) 42.75/13.8/309%
 | 
|---|
| 424 | 
 | 
|---|
| 425 | 
 | 
|---|
| 426 | (d)osf-asc             420        13.5s/13.7s/98%            32/16.5/194%   
 | 
|---|
| 427 |                                            (4) 67.5/34.4/196%
 | 
|---|
| 428 | (e)osf-xp1000          648        13/14.1/92%                27.1/27.4/99%
 | 
|---|
| 429 |                                            (4) 54/54.7/99.6%
 | 
|---|
| 430 |                                            (5) 80.6/81/99.6%
 | 
|---|
| 431 |                                            (6) 107.8/108.3/99.5%
 | 
|---|
| 432 | (e')osf-cool                      13/13.22/98%               26/26.1/99%
 | 
|---|
| 433 |                                            (4) 51.8/51.9/99%
 | 
|---|
| 434 | (f)superosf            842        6.1/7.24/84%               12.35/6.29/196%
 | 
|---|
| 435 |                                            (4) 24.3/6.31/385%
 | 
|---|
| 436 |                                            (5) 36.5/10.9/335%
 | 
|---|
| 437 |                                            (6) 50.1/18.15/276%
 | 
|---|
| 438 | 
 | 
|---|
| 439 | (g)G5-osx-2GHz        1151        23/23.7/97%                46.5/27.5/170%
 | 
|---|
| 440 |                                            (4) 93.4/49.4/189%
 | 
|---|
| 441 |   -O -g                           6.2/6.2/100%                14.2/7.2/197%
 | 
|---|
| 442 |                                            (4) 28.3/14.36/197%
 | 
|---|
| 443 |   -tune=G5                        5.7/5.8/98%                13.3/6.8/197%
 | 
|---|
| 444 |                                            (4) 26.8/13.56/197%
 | 
|---|
| 445 |                                            (6) 53.8/27.25/197%
 | 
|---|
| 446 | (h)G4-osx-1.25GHz      333        23.5/24.5/96%                              [-O2]
 | 
|---|
| 447 | (i)core-osx-1.83GHz    855        12.6/12.7/100%             25.8/13.4/194% 
 | 
|---|
| 448 |                                            (4) 51.6/26/199%
 | 
|---|
| 449 |             -O2                   4.25/4.5/94%               10.6/5.36/198%
 | 
|---|
| 450 |                                            (4) 20.87/10.68/198%
 | 
|---|
| 451 |       -O2 2 jobs //           2 x 5/5.4/92%
 | 
|---|
| 452 | (j)xeon-osx           2600        2.8/2.8/99%                9.3/4.66/199%
 | 
|---|
| 453 |                                            (4) 31.4/8.6/364%
 | 
|---|
| 454 |                                            (5) 47.1/12.96/364%
 | 
|---|
| 455 |                                            (6) 62.8/17.38/362%
 | 
|---|
| 456 | 
 | 
|---|
| 457 | (p)ibm-aix-regatta  1750/730      9.5/9.7/98%                18.3/16.0/114%
 | 
|---|
| 458 |                                            (4) 38.3/24.7/155%
 | 
|---|
| 459 | (p)ibm-aix-meso     3600/1250     2.3/2.3/99%                5.1/2.64/194%   (compil avec -O3)
 | 
|---|
| 460 |                                            (4) 11.4/4.16/272%
 | 
|---|
| 461 |                                            (5) 20.2/5.85/344%
 | 
|---|
| 462 |                                            (6) 29.9/6.74/442%
 | 
|---|
| 463 | 
 | 
|---|
| 464 | (s)sgi-magique         400        44/44.3/99%                96.5/55/176%
 | 
|---|
| 465 | 
 | 
|---|
| 466 |  -----------------------------------------------------------------------------------
 | 
|---|
| 467 | 
 | 
|---|
| 468 | 
 | 
|---|
| 469 | B.4/ Operations sur tableaux doubles- mesures avec spar 
 | 
|---|
| 470 |   csh> time spar 2 1 2000 2000
 | 
|---|
| 471 |   (1) cpupower 2  MFLOPS
 | 
|---|
| 472 |   (2) MFLOPS (double) spar 
 | 
|---|
| 473 |   (3) time spar 2 5 1000 2000 CPU/Elap/% 
 | 
|---|
| 474 | -----------------------------------------------------------------------------------
 | 
|---|
| 475 |                      (1)MFLOPS      (2)CPU / %         (3)CPU/Elap/%
 | 
|---|
| 476 | -----------------------------------------------------------------------------------
 | 
|---|
| 477 | (a)xeon-lx-2.4GHz      53       ~ 20-35 MFLOPS , 90%     20/20.2/99%       [-g -O] 
 | 
|---|
| 478 |                                   
 | 
|---|
| 479 |         
 | 
|---|
| 480 | (b)xeon-lx-2.8GHz      65        
 | 
|---|
| 481 |                                   
 | 
|---|
| 482 | (c)amd-lx              95       ~ 20-40 MFLOPS , 99%     17.2/17.2/100%    [-g -O] 
 | 
|---|
| 483 | 
 | 
|---|
| 484 | 
 | 
|---|
| 485 | (d)osf-asc                     
 | 
|---|
| 486 |                                   
 | 
|---|
| 487 | (e)osf-xp1000          32       ~ 15-25 MFLOPS , 90%     37.6/41.2/91%      [-g -O]  
 | 
|---|
| 488 | (f)superosf                    
 | 
|---|
| 489 | 
 | 
|---|
| 490 | (g)G5-osx-2GHz         88       ~ 10-25 MFLOPS , 99%     45/45/100%         [-g -O] ou [-g -O2]
 | 
|---|
| 491 | (h)G4-osx-1.25GHz      25       ~ 8-16  MFLOPS , 92%     45.5/52/90%        [-g -O2]
 | 
|---|
| 492 |                                   
 | 
|---|
| 493 | (i)core-osx-1.83GHz             
 | 
|---|
| 494 | 
 | 
|---|
| 495 | (j)xeon-osx           
 | 
|---|
| 496 | 
 | 
|---|
| 497 | 
 | 
|---|
| 498 | (p)ibm-aix-regatta   130        
 | 
|---|
| 499 | 
 | 
|---|
| 500 | (q)ibm-aix-meso      150        ~ 80-100 MFLOPS , 90%   5./23/22%     [-O3]    
 | 
|---|
| 501 | 
 | 
|---|
| 502 | 
 | 
|---|
| 503 | 
 | 
|---|
| 504 | (s)sgi-magique         460        
 | 
|---|
| 505 |  -----------------------------------------------------------------------------------
 | 
|---|
| 506 | 
 | 
|---|
| 507 | B.5/  Calcul/comparaison avec JET/tjet 
 | 
|---|
| 508 | csh> time tjet 10 2000 2000   OU tjet 10 2000 1000
 | 
|---|
| 509 |  (1) TCPU EltAccess C/pointeurs 
 | 
|---|
| 510 |  (2) TCPU m1*c1+m2*c2+m3*c3 C/pointeurs 
 | 
|---|
| 511 |  (3) TCPU EltAccess  SOPHYA
 | 
|---|
| 512 |  (4) TCPU m1*c1+m2*c2+m3*c3 SOPHYA  / Methodes (MultCst, AddArr ...)
 | 
|---|
| 513 |  (5) TCPU m1*c1+m2*c2+m3*c3 SOPHYA JET
 | 
|---|
| 514 | 
 | 
|---|
| 515 |  -----------------------------------------------------------------------------------
 | 
|---|
| 516 |                             (1)        (2)         (3)         (4)           (5)             
 | 
|---|
| 517 |  -----------------------------------------------------------------------------------
 | 
|---|
| 518 | (b)xeon-lx-2.8GHz-icc       0.87       0.63        1.55      2.7/1.6         0.57
 | 
|---|
| 519 | (c)amd-lx                   0.94       0.79        1.85      3.4/2.1         0.76
 | 
|---|
| 520 | 
 | 
|---|
| 521 | (e')osf-cool                2.85       2.45        3.1       6.5/5.5         4.1 
 | 
|---|
| 522 | 
 | 
|---|
| 523 | (g)G5-osx-2GHz              1.5        0.61        2.1       4.1/1.6         0.6   (-g -O2)
 | 
|---|
| 524 |     -tune=G5 -fast          1.1        0.62        1.3       4.1/1.6         0.58  
 | 
|---|
| 525 | (h)G4-osx-1.25GHz           3.86       2.2         5         9.4/6.2         3     (-g -O2)
 | 
|---|
| 526 | (i)core-osx-1.83GHz         1.1        0.49        1.6       2.8/1.7         0.68
 | 
|---|
| 527 | 
 | 
|---|
| 528 | (q)ibm-aix-meso             0.43       0.27        0.52      1.12/0.75       0.35  
 | 
|---|
| 529 | 
 | 
|---|
| 530 | (s)sgi-magique              2.45       1.9         5.65      7.45/6.3        2.8  (-O -g3)
 | 
|---|
| 531 | -----------------------------------------------------------------------------------
 | 
|---|
| 532 | 
 | 
|---|
| 533 | 
 | 
|---|
| 534 | C/ Calcul fft (FFTW , FFTPack )
 | 
|---|
| 535 | -------------------------------
 | 
|---|
| 536 | 
 | 
|---|
| 537 | (1) time cpupower 2 
 | 
|---|
| 538 | (2) time tfft 2000000 W d 0 0 (avec FFTW)
 | 
|---|
| 539 | (3) time tfft 2000000 P d 0 0 (avec FFTPack_Sophya)
 | 
|---|
| 540 | 
 | 
|---|
| 541 | IndPerf=1000/TCPU  
 | 
|---|
| 542 | 
 | 
|---|
| 543 | 
 | 
|---|
| 544 | -----------------------------------------------------------------------------------
 | 
|---|
| 545 |                      (1)MFLOPS     (2)CPU/Elap/%   (2)IndPerf   (3)CPU/Elap/%  (3)IndPerf
 | 
|---|
| 546 | -----------------------------------------------------------------------------------
 | 
|---|
| 547 | (a)xeon-lx-2.4GHz     1167        5.5/5.6/97%        180         7.4/7.4/100%     135    
 | 
|---|
| 548 |        -O3 -g                                                    6.8/7.1/96%      147    
 | 
|---|
| 549 | 
 | 
|---|
| 550 | (b)xeon-lx-2.8GHz      920        3.6/3.7/98%                    3.7/3.8/99%
 | 
|---|
| 551 |                                                           ~2x    6.5/8.8/73%
 | 
|---|
| 552 |                                                           ~4x    8/14/55%   (15 sec elapsed)
 | 
|---|
| 553 | (bb)nxeon-grid49       660                                       3.3/3.3/99% 
 | 
|---|
| 554 |                                                           ~2x    3.3/3.4/100%  (4 sec elapsed)
 | 
|---|
| 555 |                                                           ~4x    4.6/4.6/99%   (5 sec elapsed)
 | 
|---|
| 556 |                                                           ~8x    4.5/4.5/50%   (9 sec elapsed)
 | 
|---|
| 557 | (bb4)xeon-4c-grid50    660                                       3.6/3.6/99% 
 | 
|---|
| 558 |                                                           ~2x    3.6/3.6/100%  (4 sec elapsed)
 | 
|---|
| 559 |                                                           ~4x    4.4/4.4/99%   (5 sec elapsed)
 | 
|---|
| 560 |                                                           ~8x    7.1/7.1/99%   (7  sec elapsed)
 | 
|---|
| 561 |                                                           ~16x   7.1/14.4/50%   (15 sec elapsed)
 | 
|---|
| 562 |          
 | 
|---|
| 563 | (c)amd-lx              690        3.2/3.75/86%                   4.2/4.2/100%     238
 | 
|---|
| 564 |                                                             ~2x  4.7/4.7/99%
 | 
|---|
| 565 | (cc)amd2-lx            675        2.8/2.8/99%                    3.56/3.58/99%
 | 
|---|
| 566 | 
 | 
|---|
| 567 | (d)osf-asc             420        13.3/13.9/95.7%     75        12.2/17.6/70%     82
 | 
|---|
| 568 | (e)osf-xp1000          648        9.9/10.2/97%       101        9.3/9.46/98.5%    107     
 | 
|---|
| 569 | (e')cool                          9.2/10.1/91%                  9.1/9.3/98%
 | 
|---|
| 570 | (f)superosf            842        6./6.22/96.7%      166        5.1/5.18/97.4%    196     
 | 
|---|
| 571 | 
 | 
|---|
| 572 | (g)G5-osx-2GHz        1151        8.5/8.7/97.5%      120        11.5/11.6/99%     87
 | 
|---|
| 573 |     -O -g                         5.1/5.2/98%        190        5.4/5.5/99%       180
 | 
|---|
| 574 |     -tune=G5                      5/5.1/98%          200        5.5/5.6/97%       180
 | 
|---|
| 575 | (h)G4-osx-1.25GHz       92        15.2/15.9/96%       66        23.8/34.1/70%     42    [-g]
 | 
|---|
| 576 |                        380        8.8/10.3/86%                  14.7/20.9/65%           [-O2 -g]
 | 
|---|
| 577 | (i)core-osx-1.83GHz    855        4.6/4.75/97%       217        7/7.08/99%        142
 | 
|---|
| 578 |          -O2                      3.2/3.2/99%        312        3.6/3.6/99%       277
 | 
|---|
| 579 |    -O2 2 jobs //              2 x 3.9/4.3/90%                 2x  4.7/5/91%       200     
 | 
|---|
| 580 | (j)xeon-osx           2600                                      2.6/2.6/98%       384
 | 
|---|
| 581 |        2 jobs //                                           ~2x  3.6/4.1/87%       250
 | 
|---|
| 582 |        4 jobs //                                           
 | 
|---|
| 583 | 
 | 
|---|
| 584 | (p)ibm-aix-regatta 1750/730       6.25/18.9/33%      160        5.25/15.7/33%     190
 | 
|---|
| 585 | (q)ibm-aix-meso    3600/1250      3.95/4.3/91%       250        3.82/4./94%       260
 | 
|---|
| 586 |        2 jobs //                                           ~2x  3.88/4.2/92%      250
 | 
|---|
| 587 |                                   2.8/4.76/59%                  2.1/4.45/47%     [-O3]
 | 
|---|
| 588 |                                                           ~2x   2.5/4.7/54%
 | 
|---|
| 589 |                                                           ~4x   2.5/4.8/55% 
 | 
|---|
| 590 | 
 | 
|---|
| 591 | 
 | 
|---|
| 592 | (s)sgi-magique         460        22/22/98%           42         24.5/25/99%       40           
 | 
|---|
| 593 |  -----------------------------------------------------------------------------------
 | 
|---|
| 594 | 
 | 
|---|
| 595 | 
 | 
|---|
| 596 | D/ Calcul inversion par lapack
 | 
|---|
| 597 | -------------------------------
 | 
|---|
| 598 | 
 | 
|---|
| 599 | lpk inverse 1000,1000 0 
 | 
|---|
| 600 | ---> temps de calcul inversion par lapack 
 | 
|---|
| 601 | -------------------------------------------------------------------------------------------
 | 
|---|
| 602 |              CPU/Elap/%       (1)         
 | 
|---|
| 603 | -------------------------------------------------------------------------------------------
 | 
|---|
| 604 | (a)xeon-lx-2.4GHz          5.6/~100%    
 | 
|---|
| 605 | (b)xeon-lx-2.8GHz          5.34/~90%
 | 
|---|
| 606 | (c)amd-lx                  5.5/5.5/99%      
 | 
|---|
| 607 | 
 | 
|---|
| 608 | (d)osf-asc 
 | 
|---|
| 609 | (e')cool                   2.8/2.9/95% 
 | 
|---|
| 610 | (f)superosf           
 | 
|---|
| 611 | 
 | 
|---|
| 612 | (f)G4-osx-1.25GHz          2.3/~100%   [-O2 -g]
 | 
|---|
| 613 | (h)G5-osx-2GHz             0.8/~100%
 | 
|---|
| 614 |     -O -g                  0.86/~100%            
 | 
|---|
| 615 | (i)core-osx-1.83GHz        1.93/~100%
 | 
|---|
| 616 |               -O2   
 | 
|---|
| 617 | (p)ibm-aix-regatta        
 | 
|---|
| 618 | (q)ibm-aix-meso            0.55/~100%
 | 
|---|
| 619 | 
 | 
|---|
| 620 | (s)sgi-magique             5.3/~90%
 | 
|---|
| 621 | --------------------------------------------------------------------------------------------
 | 
|---|
| 622 | 
 | 
|---|
| 623 | 
 | 
|---|
| 624 | K/ Efficacite de gestion de lock (mutex) avec les threads et tableaux 
 | 
|---|
| 625 | ----------------------------------------------------------------------
 | 
|---|
| 626 | (32 threads - operant sur 2000 vecteurs ~ 64000 lock/unlock/wait/broadcast)
 | 
|---|
| 627 | 
 | 
|---|
| 628 | (1) time zthr syncp 32 2000 4 
 | 
|---|
| 629 | (2) time zthr sync 32 2000 4 
 | 
|---|
| 630 | (1) time zthr syncp 4 15000 130 
 | 
|---|
| 631 | (2) time zthr sync  4 15000 130 
 | 
|---|
| 632 | -------------------------------------------------------------------------------------------
 | 
|---|
| 633 |              CPU/Elap/%       (1)             (2)              (3)              (4) 
 | 
|---|
| 634 | -------------------------------------------------------------------------------------------
 | 
|---|
| 635 | (a)xeon-lx-2.4GHz         23.5/14/168%    4.3/1.2/365%     7.9/5.5/142%      4/2.15/190%
 | 
|---|
| 636 |       Avant ThSafeOp      17/178%
 | 
|---|
| 637 |       avec -O3 -g 
 | 
|---|
| 638 | (b)xeon-lx-2.8GHz (2)     
 | 
|---|
| 639 | (bb4)xeon-4c-grid50       1/1/96%         1.6/1.2/133%     4.9/4.05/120%     6/5/122%
 | 
|---|
| 640 | (c)amd-lx                 0.6/1/63%        0.6/1/60%       3.5/3.5/102%      2.6/2.7/98% 
 | 
|---|
| 641 | 
 | 
|---|
| 642 | (d)osf-asc               4.5/3.4/132%     3.35/2/170%     15.8/10.5/150%     13/8/163%
 | 
|---|
| 643 |      5.4/100%(NoThSafe)
 | 
|---|
| 644 | (e')cool                 1.3/1.37/95%     1.35/1.5/89%     5.3/5.3/99%       5.2/5.2/99%    
 | 
|---|
| 645 | (e)superosf (1)          
 | 
|---|
| 646 | 
 | 
|---|
| 647 | 
 | 
|---|
| 648 | (g)G5-osx-2GHz (2)     2.6/130% (NoThSafe)                   
 | 
|---|
| 649 |     -O -g                2.6/1.7/150%      6.8/3.7/187%    4/2.75/142%       4.7/3/155%
 | 
|---|
| 650 | (h)G4-osx-1.25GHz (1)    40.5/42.6/95%    42.2/43.5/97%      [-g]
 | 
|---|
| 651 |                           3.9/4.3/89%      3.5/4/89%       3.8/4/95%         4.3/4.6/93%
 | 
|---|
| 652 | 
 | 
|---|
| 653 | (i)core-osx-1.83GHz      7.7/7.1/108%     7.8/6.7/116%    30.2/29.6/102%    30.3/29.2/104%   [-g]
 | 
|---|
| 654 |               -O2        2.7/1.8/152%     6/3.16/190%     3.4/2.4/142%      3.2/2.5/164%     [-O2 -g]
 | 
|---|
| 655 | (j)xeon-osx               
 | 
|---|
| 656 |       Avant ThSafeOp      2.55/143%
 | 
|---|
| 657 | 
 | 
|---|
| 658 | (p)ibm-aix-regatta        4.7/111%
 | 
|---|
| 659 | (q)ibm-aix-meso           7.5/2.8/300%     17/3.8/450%    8.2/3.05/270%      4.85/2.43/200%    
 | 
|---|
| 660 | 
 | 
|---|
| 661 | --------------------------------------------------------------------------------------------
 | 
|---|
| 662 | 
 | 
|---|
| 663 | 
 | 
|---|
| 664 | 
 | 
|---|
| 665 | L/ I/O et PPF 
 | 
|---|
| 666 | -----------------
 | 
|---|
| 667 | Ecriture/lecture de n=10^7 lignes de int+6double, Total ~ 500 MO 
 | 
|---|
| 668 | (1) time tstdtable w xx.ppf swap 10000000 1024 0
 | 
|---|
| 669 | (2) time tstdtable r xx.ppf swap 10000000 1024 0
 | 
|---|
| 670 | (3) time tstdtable w xx.ppf swap 50000000 1024 0
 | 
|---|
| 671 | (4) time tstdtable r xx.ppf swap 50000000 1024 0
 | 
|---|
| 672 | 
 | 
|---|
| 673 | -------------------------------------------------------------------------------------------
 | 
|---|
| 674 |              CPU/Elap/%       (1)              (2)              (3)                (4)
 | 
|---|
| 675 | -------------------------------------------------------------------------------------------
 | 
|---|
| 676 | (a)xeon-lx-2.4GHz          17/26/63%       5.5/5.6/94%        
 | 
|---|
| 677 | 
 | 
|---|
| 678 | (b)xeon-lx-2.8GHz          7/18.5/40%      2.7/2.8/100%                                   7000000
 | 
|---|
| 679 | 
 | 
|---|
| 680 | (c)amd-lx                  5.9/6./97%      3.4/3.4/100%      30/32/93%      24/165/13% ?
 | 
|---|
| 681 | 
 | 
|---|
| 682 | (d)osf-asc 
 | 
|---|
| 683 | (e')cool                   15/30/50%       13/13/99%
 | 
|---|
| 684 | (f)superosf           
 | 
|---|
| 685 | 
 | 
|---|
| 686 | (g)G5-osx-2GHz             14/14.2/98%     6/6.14/99% 
 | 
|---|
| 687 | 
 | 
|---|
| 688 | (h)G4-osx-1.25GHz          26/29.7/87%     15.7/38.6/41%               [-O2 -g]
 | 
|---|
| 689 | (i)core-osx-1.83GHz        37/37.8/98%     22/48.4/45%                                 [-g]
 | 
|---|
| 690 |               -O2          10.5/17.4/55%   11.2/41.2/27%                               [-O2 -g]
 | 
|---|
| 691 | (p)ibm-aix-regatta        
 | 
|---|
| 692 | (q)ibm-aix-meso           5.5/16.8/38%    5.7/13.2/43%      32.7/85/39%    29/60/49%
 | 
|---|
| 693 |                           6/11/55%        4/9/44%
 | 
|---|
| 694 |                           6.3/10.6/60%    4/6/64%
 | 
|---|
| 695 |    2 lecture //                    ~2x  4/6/60% Elapsed 7 sec
 | 
|---|
| 696 |    1 write + 2 read //    6.3/10.4/60%  ~2 3.8/6.4/59%  Elapsed < 11 sec                
 | 
|---|
| 697 |    1 write + 3 read //                                 19 sec
 | 
|---|
| 698 |    4 read //                                           6 sec
 | 
|---|
| 699 | --------------------------------------------------------------------------------------------
 | 
|---|
| 700 | 
 | 
|---|
| 701 | 
 | 
|---|
| 702 | 
 | 
|---|