| 1 | -------------------------------------------------------------------------------------
 | 
|---|
| 2 |  Comparaison performance de differentes machines en compilation / execution (calcul)
 | 
|---|
| 3 |                                    -----------------------
 | 
|---|
| 4 |     Mesures effectuees en Janvier 2007 ,       R. Ansari / C. Magneville
 | 
|---|
| 5 | -------------------------------------------------------------------------------------
 | 
|---|
| 6 | 
 | 
|---|
| 7 | (a) eros3 : Bipro-bicoeur Xeon@2.4 GHz Linux (xeon-lx-2.4GHz)  , gcc 3.2
 | 
|---|
| 8 | (b) ccali : Bipro-bicoeur Xeon@2.8 GHz Linux (xeon-lx-2.8GHz)  , icc 8.0 ou 9.0
 | 
|---|
| 9 |      Flags de compilation avec [-O -g]      
 | 
|---|
| 10 | (c) sgsda: AMD Bipro AMD opteron 248 @ 2.2 GHz (amd-lx-
 | 
|---|
| 11 |      Flags de compilation avec [-O -g]      
 | 
|---|
| 12 | (cc) grid-saclay: AMD opteron 275 Bipro-bicoeur  @ 2.2 GHz (amd275-lx)
 | 
|---|
| 13 |      Flags de compilation avec [-O -g]      
 | 
|---|
| 14 | 
 | 
|---|
| 15 | (d) asc: bipro alpha (@ ~1 GHz) server DS20 OSF (osf1)  , cxx 6.5 (osf-asc)
 | 
|---|
| 16 | (nouveau asc 420 MFLOPS, moins puissante que l'ancien asc 800 MFLOPS)
 | 
|---|
| 17 | (e) xp1000-dapnia: alpha xp1000 @ ~ 600 MHz ? OSF1 , cxx ? (osf-xp1000)
 | 
|---|
| 18 | (e') cool: alpha xp1000 @ ~ 667 MHz ? OSF1 5.1 , cxx 6.3 
 | 
|---|
| 19 | (f) superosf-dapnia: multi-proc alphaServer ES80 6 procs EV7 @ 1 GHz (super-osf)
 | 
|---|
| 20 | 
 | 
|---|
| 21 | (g) ccsvx01: XServe G5 bipro @~1.8-2GHz (Darwin/OSX) (G5-osx-2GHz) , gcc 3.3
 | 
|---|
| 22 | (h) PowerBook-Reza : Apple G4 @ 1.25 GHz (G4-osx-1.25GHz) , gcc 3.3
 | 
|---|
| 23 | (i) MacBook-Reza: Apple/ Core double-coeur Intel @ 1.83 GHz (core-osx-1.83GHz) gcc 4
 | 
|---|
| 24 | (j) MacPro-Grosdidier : Apple / Xeon 2 double-coeur @ 3 GHz gcc 4.0.1 , compil SOPHYA -O2 -g
 | 
|---|
| 25 | 
 | 
|---|
| 26 | (p) IBM-AIX regatta , xlC , IBM eServer pSeries 655 , 8 proc power4 @ 1.1 GHz
 | 
|---|
| 27 | (q) IBM-AIX meso , AIX 5.3, xlC V8 , IBM Power5 , 8 proc bi-coeur P575 @ 1.9 GHz 
 | 
|---|
| 28 | 
 | 
|---|
| 29 | (s) SGI-IRIX64 magique, CC 
 | 
|---|
| 30 | 
 | 
|---|
| 31 | NOTES : 
 | 
|---|
| 32 | - Sur les machines Xeon, il y a une interaction entre process / threads par rapport a 
 | 
|---|
| 33 | l'occupation des CPU's. On perd un facteur 3 en performance multi-threads/multi-taches.
 | 
|---|
| 34 | La machine MacPro avec OSX se debrouille quand meme mieux.
 | 
|---|
| 35 | - Effet du systeme ou carte mere ??? 
 | 
|---|
| 36 | 
 | 
|---|
| 37 | Flag de compilation 
 | 
|---|
| 38 | - Flag de compilation par defaut [-O -g] en general
 | 
|---|
| 39 | - Sur eros3 (xeon-linux gcc 3.3) [-O -g] OU [-O3 -g]
 | 
|---|
| 40 | - Sur Darwin [-g] ou [-O2 -g] (ou [-tune G5] sur XServe G5)
 | 
|---|
| 41 |    Sur les mac (en particulier G4/G5), grande difference entre -g et -Ox -g
 | 
|---|
| 42 |    mais peu de difference entre -O -O2 -O3  
 | 
|---|
| 43 | - Sur machine aix-meso [-O -g] ou [-O3 -g]
 | 
|---|
| 44 | 
 | 
|---|
| 45 | X/ Performances brutes cpupower et donnees SPEC ((http//www.spec.org) 
 | 
|---|
| 46 | ----------------------------------------------------------------------
 | 
|---|
| 47 | 
 | 
|---|
| 48 | (1) MFLOPS  -> cpupower 2   (x/y : -O -g / -O3) 
 | 
|---|
| 49 | SPECint2000 (3) / SPECfp2000 (2) (http//www.spec.org) 
 | 
|---|
| 50 | 
 | 
|---|
| 51 | X.1/ Performances en calcul double
 | 
|---|
| 52 | csh> cpupower 0 3000000  5 
 | 
|---|
| 53 |      3 10^6 operations doubles - sur memoire 3x3 10^6 doubles (~50 MO)
 | 
|---|
| 54 |       ===> ~ 24 MO / MFLOPS
 | 
|---|
| 55 | csh> cpupower 2
 | 
|---|
| 56 |      1.6 10^9 operations doubles - sur 3x20000 doubles (~0.5 MO)
 | 
|---|
| 57 | 
 | 
|---|
| 58 | 
 | 
|---|
| 59 | Compilation avec -O  (optimisation)
 | 
|---|
| 60 |   (1) cpupower 0 : debit memoire en MO/s
 | 
|---|
| 61 |   (2) cpupower 0  , MFLOPS   
 | 
|---|
| 62 |   (5) cpupower 2 ,  MFLOPS 
 | 
|---|
| 63 | 
 | 
|---|
| 64 | Compilation avec -g (debug / sans optimisation)
 | 
|---|
| 65 |   (3) cpupower 0  , MFLOPS  
 | 
|---|
| 66 |   (6) cpupower 2  , MFLOPS 
 | 
|---|
| 67 | 
 | 
|---|
| 68 | Compilation avec -O3 ou -fast ...( optimisation poussee) 
 | 
|---|
| 69 |   (4) cpupower 0  , MFLOPS  
 | 
|---|
| 70 |   (7) cpupower 2  , MFLOPS 
 | 
|---|
| 71 | 
 | 
|---|
| 72 | 
 | 
|---|
| 73 | ----------------------------------------------------------------------------------------------
 | 
|---|
| 74 |         MFLOPS       |(1) MO/s|   (2)     (3)      (4)   |    (5)       (6)       (7)      
 | 
|---|
| 75 | ----------------------------------------------------------------------------------------------
 | 
|---|
| 76 | (a)xeon-lx-2.4GHz    | 1290   |   53      53       55    |    338       340       320
 | 
|---|
| 77 | (a)xeon-lx-2.8GHzicc | 2040   |   85      80       83    |    914       409       914
 | 
|---|
| 78 | (c)amd-lx            | 1560   |   65      77       68    |    666       314       686 
 | 
|---|
| 79 | (cc)amd2-lx          |        |
 | 
|---|
| 80 | 
 | 
|---|
| 81 | (e')osf-cool         |  768   |   32      15       32    |    630       150       660     
 | 
|---|
| 82 | (f)superosf               
 | 
|---|
| 83 | 
 | 
|---|
| 84 | (g)G5-osx-1 GHz      | 2100   |   88      68       88    |   1000       255      1073
 | 
|---|
| 85 | (f)G4-osx-1.25GHz    |  600   |   25      16       25    |    417        93       430 
 | 
|---|
| 86 | (i)core-osx-1.83GHz  | 2500   |  107      75      107    |    855       309       884
 | 
|---|
| 87 | (j)xeon-osx            
 | 
|---|
| 88 | 
 | 
|---|
| 89 | (p)ibm-aix-regatta   | 3100   |  130      55      133    |    730       115      1750
 | 
|---|
| 90 | (p)ibm-aix-meso      | 3600   |  150      75      150    |   1480       203      3600
 | 
|---|
| 91 | ----------------------------------------------------------------------------------------------
 | 
|---|
| 92 | 
 | 
|---|
| 93 | X.2/  Comparaison performances int, float double 
 | 
|---|
| 94 |   cpupower compile avec -O 
 | 
|---|
| 95 | 
 | 
|---|
| 96 | (1) float , cpupowerF 0 3000000 5 / cpupowerF 2
 | 
|---|
| 97 |     -> MFLOPS (puissance de calcul sur float)
 | 
|---|
| 98 | (2) double, cpupowerD 0 3000000 5 / cpupowerF 2  (idem tableau X.1)
 | 
|---|
| 99 |     -> MDBLOPS (puissance de calcul sur float)
 | 
|---|
| 100 | (3) int, cpupowerI 0 3000000 5 / cpupowerI 2
 | 
|---|
| 101 |     -> MINTOPS  (puissance de calcul sur int)
 | 
|---|
| 102 | (4) long (ou long long (*)) cpupowerL 0 3000000 5 / cpupowerL 2
 | 
|---|
| 103 |     -> MLONOPS  (puissance de calcul sur long)
 | 
|---|
| 104 | ----------------------------------------------------------------------------------------------
 | 
|---|
| 105 |         MFLOPS       |   (1)MFLOPS       (2)MDBLOPS       (3)MINTOPS       (4)MLONOPS 
 | 
|---|
| 106 | ----------------------------------------------------------------------------------------------
 | 
|---|
| 107 | (a)xeon-lx-2.4GHz    | 
 | 
|---|
| 108 | (a)xeon-lx-2.8GHzicc |    166/905         90/900           166/1500         88/522    (*)
 | 
|---|
| 109 | (c)amd-lx            |    125/695         65/675           125/1570         65/1045
 | 
|---|
| 110 | (cc)amd2-lx          | 
 | 
|---|
| 111 | 
 | 
|---|
| 112 | (e')osf-cool         |    60/635          32/631            62/640          31/630
 | 
|---|
| 113 | (f)superosf               
 | 
|---|
| 114 | 
 | 
|---|
| 115 | (g)G5-osx-1 GHz      |    180/1260        90/1150           165/940         81/280    (*)
 | 
|---|
| 116 | (f)G4-osx-1.25GHz    |    45/430          25/410            45/710          24/190    (*)
 | 
|---|
| 117 | (i)core-osx-1.83GHz  |    185/919         105/855           187/935         62/246    (*)
 | 
|---|
| 118 | (j)xeon-osx            
 | 
|---|
| 119 | 
 | 
|---|
| 120 | (p)ibm-aix-regatta   | 
 | 
|---|
| 121 | (p)ibm-aix-meso      |    250/628         160/1260          240/1120        
 | 
|---|
| 122 | ----------------------------------------------------------------------------------------------
 | 
|---|
| 123 | 
 | 
|---|
| 124 | X.3/  Comparaison avec SPEC 
 | 
|---|
| 125 | csh>  cpupower 2 
 | 
|---|
| 126 | ----------------------------------------------------------------------
 | 
|---|
| 127 |                          MFLOPS(1)      SPECfp      SPECint 
 | 
|---|
| 128 | ----------------------------------------------------------------------
 | 
|---|
| 129 | (b)xeon-lx-2.8GHz         900            1400        1400
 | 
|---|
| 130 | (c)amd-lx                 690            1600        1300
 | 
|---|
| 131 | (cc)amd2-lx               675            1600        1300
 | 
|---|
| 132 | 
 | 
|---|
| 133 | (e)osf-xp1000             648             500         400
 | 
|---|
| 134 | (f)superosf               842            1100         700
 | 
|---|
| 135 | 
 | 
|---|
| 136 | (f)G4-osx-1.25GHz      92/380                                 (-g)/(-O2 -g)
 | 
|---|
| 137 | (i)core-osx-1.83GHz    310/880            1400        1500    (-g)/(-O2 -g)
 | 
|---|
| 138 | (j)xeon-osx              2600            2900          -
 | 
|---|
| 139 | 
 | 
|---|
| 140 | (p)ibm-aix-regatta    730/1750           1050         700     (-O -g)/(-O3)
 | 
|---|
| 141 | (p)ibm-aix-meso      1250/3600                                (-O -g)/(-O3)
 | 
|---|
| 142 | ----------------------------------------------------------------------
 | 
|---|
| 143 | 
 | 
|---|
| 144 | 
 | 
|---|
| 145 | A/ Compilation tout SOPHYA 
 | 
|---|
| 146 | ----------------------------
 | 
|---|
| 147 | csh> time make all   (1)
 | 
|---|
| 148 | ou 
 | 
|---|
| 149 | csh> time make -j 2 all  (2)
 | 
|---|
| 150 |   Temps CPU 
 | 
|---|
| 151 |   Indice de performance 100*(1000/TCPU) 
 | 
|---|
| 152 |   Temps elapsed (vrai)
 | 
|---|
| 153 |   Temps vrai / TCPU
 | 
|---|
| 154 | 
 | 
|---|
| 155 | 
 | 
|---|
| 156 | ----------------------------------------------------------------------
 | 
|---|
| 157 |                          CPU(s)  IndPerf   TElapsed , TCPU/Elapsed %
 | 
|---|
| 158 | ----------------------------------------------------------------------
 | 
|---|
| 159 | (a)xeon-lx-2.4GHz (2)    615 s      162       410 s        150%  
 | 
|---|
| 160 |       avec -O3 -g (2)   1300 s       77       760 s        172%  
 | 
|---|
| 161 | (b)xeon-lx-2.8GHz (2)    755 s      132       540 s        140%
 | 
|---|
| 162 | (c)amd-lx         (2)    336 s      297       175 s        192%
 | 
|---|
| 163 | 
 | 
|---|
| 164 | (d)osf-asc (1)          1920 s       52      2340 s        83%   (??)
 | 
|---|
| 165 | (e)osf-xp1000 (1)        533 s      187       660 s        80%
 | 
|---|
| 166 | (f)superosf (1)          895 s      112       910 s        98%
 | 
|---|
| 167 | 
 | 
|---|
| 168 | (g)G5-osx-2GHz (2)       453 s      221       250 s        182%
 | 
|---|
| 169 |     -tune=G5            1100 s       90
 | 
|---|
| 170 |     -g -O                740 s                380 s        195%
 | 
|---|
| 171 | (h)G4-osx-1.25GHz (1)    660 s      151       710 s        93%   [-g]
 | 
|---|
| 172 |                         1500 s                             94%   [-O2 -g]
 | 
|---|
| 173 | (i)core-osx-1.83GHz (2)  209 s      478       116 s        180%
 | 
|---|
| 174 |               -O2   (1)  367 s      272       381          96%    
 | 
|---|
| 175 | (j)xeon-osx
 | 
|---|
| 176 | 
 | 
|---|
| 177 | (p)ibm-aix
 | 
|---|
| 178 | ----------------------------------------------------------------------
 | 
|---|
| 179 | 
 | 
|---|
| 180 | Taille shared libs : 
 | 
|---|
| 181 | (a)
 | 
|---|
| 182 | (c) 33 MO   
 | 
|---|
| 183 | (f) = (e) = 57 MO 
 | 
|---|
| 184 | (g) 80 MO
 | 
|---|
| 185 | (i) 83 MO
 | 
|---|
| 186 | 
 | 
|---|
| 187 | B/ Calcul brut (Tableaux de SOPHYA) avec / sans threads
 | 
|---|
| 188 | --------------------------------------------------------
 | 
|---|
| 189 | 
 | 
|---|
| 190 | B.1/ Version corrige de zthr.cc (apres 23/05/07) 
 | 
|---|
| 191 |          arr = (c1*a1) + (c2*a2) 
 | 
|---|
| 192 |          ~ 3 x 4. 10^6 operations int_4 sur 3 x 4 10^6 int_4    
 | 
|---|
| 193 |          12 M.Ops int_4 / ~ 50 MO 
 | 
|---|
| 194 | 
 | 
|---|
| 195 | (1) time cpupower 0     # compile avec -O  (/ -O -g)
 | 
|---|
| 196 | (2) time zthr arr 1 2000   1 thread
 | 
|---|
| 197 | (3) time zthr arr 2 2000   2 thread
 | 
|---|
| 198 | (4) time zthr arr 4 2000   4 thread
 | 
|---|
| 199 | (5) time zthr arr 6 2000   6 thread
 | 
|---|
| 200 | (6) time zthr arr 8 2000   8 thread
 | 
|---|
| 201 | 
 | 
|---|
| 202 | -----------------------------------------------------------------------------------
 | 
|---|
| 203 |                      (1)MFLOPS  (2)CPU/Elap/%   (3)CPU/Elap/%   (4)CPU/Elap/%
 | 
|---|
| 204 | -----------------------------------------------------------------------------------
 | 
|---|
| 205 | (a)xeon-lx-2.4GHz      53        0.5/1/43%      1/1.1/88%      2.8/1/262%
 | 
|---|
| 206 |                                     (5) 4.5/1.8/246%      (6) 6.1/2.1/310% 
 | 
|---|
| 207 |                                   
 | 
|---|
| 208 |         
 | 
|---|
| 209 | (b)xeon-lx-2.8GHz      65        
 | 
|---|
| 210 |                                   
 | 
|---|
| 211 | (c)amd-lx              95        0.23/1/22%     0.44/1/51%       1/1/102%     [-O -g]
 | 
|---|
| 212 |                                      (5) 1.6/1/106%   (6) 2.2/1.2/100% 
 | 
|---|
| 213 | 
 | 
|---|
| 214 | 
 | 
|---|
| 215 | (d)osf-asc                     
 | 
|---|
| 216 |                                   
 | 
|---|
| 217 | (e')osf-cool           32        0.43/1.2/35%   0.6/1.33/44%     1.1/1.3/82%      [-O -g]
 | 
|---|
| 218 |                                      (5) 1.45/1.7/85%   (6) 1.83/2.16/84%         
 | 
|---|
| 219 | (f)superosf                    
 | 
|---|
| 220 | 
 | 
|---|
| 221 | (g)G5-osx-2GHz         88       1.5/1.5/100%    3.2/1.7/185%      6.6/3.5/188%    [-O -g]
 | 
|---|
| 222 | (g)G5-osx-2GHz         88       0.4/1/40%       0.9/1.0/90%       2/1.2/169%      [-tune=G5 -g]
 | 
|---|
| 223 |                                      (5) 3.3/2/164%    (6) 4.3/2.6/165%
 | 
|---|
| 224 | (h)G4-osx-1.25GHz      25       3/3/95%                                           [-O2 -g]
 | 
|---|
| 225 |                                   
 | 
|---|
| 226 | (i)core-osx-1.83GHz               [-O2 -g]
 | 
|---|
| 227 | 
 | 
|---|
| 228 | (j)xeon-osx           
 | 
|---|
| 229 | 
 | 
|---|
| 230 | 
 | 
|---|
| 231 | (p)ibm-aix-regatta   130        
 | 
|---|
| 232 | 
 | 
|---|
| 233 | (q)ibm-aix-meso      150        0.6/1/58%       1/1/91%           1.7/1.2/132%    [-O3]
 | 
|---|
| 234 |                                      (5) 2.4/1.2/193%   (6) 4.25/1.6/265%      
 | 
|---|
| 235 | 
 | 
|---|
| 236 | 
 | 
|---|
| 237 | (s)sgi-magique         460        
 | 
|---|
| 238 |  -----------------------------------------------------------------------------------
 | 
|---|
| 239 | 
 | 
|---|
| 240 | B.1.bis/ ancienne version de zthr (avant 23/05/07) 
 | 
|---|
| 241 |          On faisait 2 multiplications par ctye suivi d'un produit matriciel !
 | 
|---|
| 242 |          arr = c1*a1*c2*a2   ( ~ 3 10^6 op. double)
 | 
|---|
| 243 | (1) time cpupower 2     # compile avec -O3  (/ -O -g)
 | 
|---|
| 244 | (2) time zthr arr 1 1000   1 thread
 | 
|---|
| 245 | (3) time zthr arr 2 1000   2 thread
 | 
|---|
| 246 | (4) time zthr arr 4 1000   4 thread
 | 
|---|
| 247 | (5) time zthr arr 6 1000   6 thread
 | 
|---|
| 248 | (6) time zthr arr 8 1000   8 thread
 | 
|---|
| 249 | 
 | 
|---|
| 250 | -----------------------------------------------------------------------------------
 | 
|---|
| 251 |                      (1)MFLOPS  (2)CPU/Elap/% (2)IndPerf  (3)CPU/Elap/% (3)IndPerf
 | 
|---|
| 252 | -----------------------------------------------------------------------------------
 | 
|---|
| 253 | (a)xeon-lx-2.4GHz     1167        5.15/5.2/99%              11.4/5.8/196%
 | 
|---|
| 254 |                                            (4)36.6/9.28/394%
 | 
|---|
| 255 |         -O3 -g                    4.9/5./99%
 | 
|---|
| 256 | (b)xeon-lx-2.8GHz      920        2.3/2.3/100%              6.2/3.1/198%
 | 
|---|
| 257 |                                            (4)26/6.6/396%
 | 
|---|
| 258 | (c)amd-lx              690        3.6/3.6/99%               6.8/4/171%
 | 
|---|
| 259 |                                            (4)13.5/7/193%
 | 
|---|
| 260 |                                            (5)20.3/10.23/198%
 | 
|---|
| 261 | (cc)amd2-lx            675        2/2/99%                   4.15/2.1/197%
 | 
|---|
| 262 |                                            (4)8.25/4.15/198%
 | 
|---|
| 263 |                                            (5)13.6/4.6/292%
 | 
|---|
| 264 |                                            (6)19.8/6.5/300%
 | 
|---|
| 265 | 
 | 
|---|
| 266 | (d)osf-asc             420        6.3s/6.5s/99%            16.9/8.8/192%   
 | 
|---|
| 267 |                                            (4)29.9/15.7/191%
 | 
|---|
| 268 | (e)osf-xp1000          648        5.1/5.3/96.6%            11.4/11.4/99%       
 | 
|---|
| 269 |                                            (4)25.2/25.5/99%
 | 
|---|
| 270 | (f)superosf            842        2.87/2.88/99.6%          6.25/4.1/153%
 | 
|---|
| 271 |                                            (4)11.6/3.06/379%          
 | 
|---|
| 272 | 
 | 
|---|
| 273 | (h)G4-osx-1.25GHz       92        44s/48s/91%              86.7/99.8/92%  [-g]
 | 
|---|
| 274 |                        380        12.2/12.9/95%            24/25.3/95%    [-O2 -g]                   
 | 
|---|
| 275 | (g)G5-osx-2GHz        1151        20s/20s/99%              40s/23s/170%
 | 
|---|
| 276 |                                            (4) 80.8/45/180%
 | 
|---|
| 277 |    -O -g                          4.5/4.9/91%              9.3/4.7/197%
 | 
|---|
| 278 |                                            (4) 18.3/9.4/197%
 | 
|---|
| 279 |    -tune=G5                       3.35/3.8/88%             7.1/3.6/196%
 | 
|---|
| 280 | (h)G4-osx-1.25GHz       92        44s/48s/91%              86.7/99.8/92%  [-g]
 | 
|---|
| 281 |                        380        12.2/12.9/95%            24/25.3/95%    [-O2 -g]                   
 | 
|---|
| 282 |                                            (4) 14/7.5/187%
 | 
|---|
| 283 | (i)core-osx-1.83GHz    855        11.5/11.5/100%           23/11.6/192%   [-g]
 | 
|---|
| 284 |                                            (4) 46/23/199%
 | 
|---|
| 285 |               -O2                 3.85/3.89/99%            7.7/3.9/198%   [-O2 -g]
 | 
|---|
| 286 |                                            (4) 15.4/7.77/198%
 | 
|---|
| 287 | 
 | 
|---|
| 288 | (j)xeon-osx           2600        2.5/2.5/100%             5.1/2.6/199%
 | 
|---|
| 289 |                                            (4) 11.5/3.2/362%
 | 
|---|
| 290 |                                            (5) 17.4/4.77/365%
 | 
|---|
| 291 | 
 | 
|---|
| 292 | (p)ibm-aix-regatta  1750/730      6.8/6.9/98%              13.1/6.75/195%
 | 
|---|
| 293 |                                            (4) 26.3/11.7/225%
 | 
|---|
| 294 | (q)ibm-aix-meso     3600/1250     3.6/3.75/96%             7.35/3.7/197%
 | 
|---|
| 295 |                                            (4) 12.46/4.2/298%
 | 
|---|
| 296 |                                            (5) 219/6.7/280%
 | 
|---|
| 297 |                                            (6) 24/4.5/530%
 | 
|---|
| 298 | 
 | 
|---|
| 299 | 
 | 
|---|
| 300 | (s)sgi-magique         460        60/60/99%       
 | 
|---|
| 301 |  -----------------------------------------------------------------------------------
 | 
|---|
| 302 | 
 | 
|---|
| 303 | 
 | 
|---|
| 304 | B.2/ Multiplication de matrices mtx = mtx1 * mtx2 
 | 
|---|
| 305 |      ~ 2  10^9 op. double / thread
 | 
|---|
| 306 | (1) time cpupower 2  (-O3 / -O -g)
 | 
|---|
| 307 | (2) time zthr mtx 1 1000   1 thread
 | 
|---|
| 308 | (3) time zthr mtx 2 1000   2 thread
 | 
|---|
| 309 | (4) time zthr mtx 4 1000   4 thread
 | 
|---|
| 310 | (5) time zthr mtx 6 1000   6 thread
 | 
|---|
| 311 | (6) time zthr mtx 8 1000   8 thread
 | 
|---|
| 312 | 
 | 
|---|
| 313 | -----------------------------------------------------------------------------------
 | 
|---|
| 314 |                      (1)MFLOPS  (2)CPU/Elap/% (2)IndPerf  (3)CPU/Elap/% (3)IndPerf
 | 
|---|
| 315 | -----------------------------------------------------------------------------------
 | 
|---|
| 316 | (a)xeon-lx-2.4GHz     1167        6.5/6.5/100%               17.4/8.8/198%
 | 
|---|
| 317 |                                            (4) 80.5/20.3/397%
 | 
|---|
| 318 |                                            (5) 114.5/29.6/387%
 | 
|---|
| 319 |                                            (6) 160/40.3/387%
 | 
|---|
| 320 |                                            
 | 
|---|
| 321 | (b)xeon-lx-2.8GHz      920        3.4/3.4/100%               12/6.1/199%
 | 
|---|
| 322 |                                            (4) 55.8/14/400%
 | 
|---|
| 323 | (c)amd-lx              690        6.98/6.98/100%             14.1/8.15/173%
 | 
|---|
| 324 |                                            (4) 27.7/14.23/194%
 | 
|---|
| 325 |                                            (5) 41.4/21.07/196%
 | 
|---|
| 326 |                                            (6) 55.4/27.9/198.7%  
 | 
|---|
| 327 | (cc)amd2-lx            675        4.1/4.1/100%               9.55/4.8/198%
 | 
|---|
| 328 |                                            (4) 20/10.27/195%
 | 
|---|
| 329 |                                            (5) 32.8/11.16/294%
 | 
|---|
| 330 |                                            (6) 42.75/13.8/309%
 | 
|---|
| 331 | 
 | 
|---|
| 332 | 
 | 
|---|
| 333 | (d)osf-asc             420        13.5s/13.7s/98%            32/16.5/194%   
 | 
|---|
| 334 |                                            (4) 67.5/34.4/196%
 | 
|---|
| 335 | (e)osf-xp1000          648        13/14.1/92%                27.1/27.4/99%
 | 
|---|
| 336 |                                            (4) 54/54.7/99.6%
 | 
|---|
| 337 |                                            (5) 80.6/81/99.6%
 | 
|---|
| 338 |                                            (6) 107.8/108.3/99.5%
 | 
|---|
| 339 | (f)superosf            842        6.1/7.24/84%               12.35/6.29/196%
 | 
|---|
| 340 |                                            (4) 24.3/6.31/385%
 | 
|---|
| 341 |                                            (5) 36.5/10.9/335%
 | 
|---|
| 342 |                                            (6) 50.1/18.15/276%
 | 
|---|
| 343 | 
 | 
|---|
| 344 | (g)G5-osx-2GHz        1151        23/23.7/97%                46.5/27.5/170%
 | 
|---|
| 345 |                                            (4) 93.4/49.4/189%
 | 
|---|
| 346 |   -O -g                           6.2/6.2/100%                14.2/7.2/197%
 | 
|---|
| 347 |                                            (4) 28.3/14.36/197%
 | 
|---|
| 348 |   -tune=G5                        5.7/5.8/98%                13.3/6.8/197%
 | 
|---|
| 349 |                                            (4) 26.8/13.56/197%
 | 
|---|
| 350 |                                            (6) 53.8/27.25/197%
 | 
|---|
| 351 | (h)G4-osx-1.25GHz      333        23.5/24.5/96%                              [-O2]
 | 
|---|
| 352 | (i)core-osx-1.83GHz    855        12.6/12.7/100%             25.8/13.4/194% 
 | 
|---|
| 353 |                                            (4) 51.6/26/199%
 | 
|---|
| 354 |             -O2                   4.25/4.5/94%               10.6/5.36/198%
 | 
|---|
| 355 |                                            (4) 20.87/10.68/198%
 | 
|---|
| 356 |       -O2 2 jobs //           2 x 5/5.4/92%
 | 
|---|
| 357 | (j)xeon-osx           2600        2.8/2.8/99%                9.3/4.66/199%
 | 
|---|
| 358 |                                            (4) 31.4/8.6/364%
 | 
|---|
| 359 |                                            (5) 47.1/12.96/364%
 | 
|---|
| 360 |                                            (6) 62.8/17.38/362%
 | 
|---|
| 361 | 
 | 
|---|
| 362 | (p)ibm-aix-regatta  1750/730      9.5/9.7/98%                18.3/16.0/114%
 | 
|---|
| 363 |                                            (4) 38.3/24.7/155%
 | 
|---|
| 364 | (p)ibm-aix-meso     3600/1250     2.3/2.3/99%                5.1/2.64/194%   (compil avec -O3)
 | 
|---|
| 365 |                                            (4) 11.4/4.16/272%
 | 
|---|
| 366 |                                            (5) 20.2/5.85/344%
 | 
|---|
| 367 |                                            (6) 29.9/6.74/442%
 | 
|---|
| 368 | 
 | 
|---|
| 369 | (s)sgi-magique         460        49/49/99%                  101/56/181%
 | 
|---|
| 370 | 
 | 
|---|
| 371 |  -----------------------------------------------------------------------------------
 | 
|---|
| 372 | 
 | 
|---|
| 373 | 
 | 
|---|
| 374 | B.4/ Operations sur tableaux doubles- mesures avec spar 
 | 
|---|
| 375 |   csh> time spar 2 1 2000 2000
 | 
|---|
| 376 |   (1) cpupower 2  MFLOPS
 | 
|---|
| 377 |   (2) MFLOPS (double) spar 
 | 
|---|
| 378 |   (3) time spar 2 5 1000 2000 CPU/Elap/% 
 | 
|---|
| 379 | -----------------------------------------------------------------------------------
 | 
|---|
| 380 |                      (1)MFLOPS      (2)CPU / %         (3)CPU/Elap/%
 | 
|---|
| 381 | -----------------------------------------------------------------------------------
 | 
|---|
| 382 | (a)xeon-lx-2.4GHz      53       ~ 20-35 MFLOPS , 90%     20/20.2/99%       [-g -O] 
 | 
|---|
| 383 |                                   
 | 
|---|
| 384 |         
 | 
|---|
| 385 | (b)xeon-lx-2.8GHz      65        
 | 
|---|
| 386 |                                   
 | 
|---|
| 387 | (c)amd-lx              95       ~ 20-40 MFLOPS , 99%     17.2/17.2/100%    [-g -O] 
 | 
|---|
| 388 | 
 | 
|---|
| 389 | 
 | 
|---|
| 390 | (d)osf-asc                     
 | 
|---|
| 391 |                                   
 | 
|---|
| 392 | (e)osf-xp1000          32       ~ 15-25 MFLOPS , 90%     37.6/41.2/91%      [-g -O]  
 | 
|---|
| 393 | (f)superosf                    
 | 
|---|
| 394 | 
 | 
|---|
| 395 | (g)G5-osx-2GHz         88       ~ 10-25 MFLOPS , 99%     45/45/100%         [-g -O] ou [-g -O2]
 | 
|---|
| 396 | (h)G4-osx-1.25GHz      25       ~ 8-16  MFLOPS , 92%     45.5/52/90%        [-g -O2]
 | 
|---|
| 397 |                                   
 | 
|---|
| 398 | (i)core-osx-1.83GHz             
 | 
|---|
| 399 | 
 | 
|---|
| 400 | (j)xeon-osx           
 | 
|---|
| 401 | 
 | 
|---|
| 402 | 
 | 
|---|
| 403 | (p)ibm-aix-regatta   130        
 | 
|---|
| 404 | 
 | 
|---|
| 405 | (q)ibm-aix-meso      150        ~ 80-100 MFLOPS , 90%   5./23/22%     [-O3]    
 | 
|---|
| 406 | 
 | 
|---|
| 407 | 
 | 
|---|
| 408 | 
 | 
|---|
| 409 | (s)sgi-magique         460        
 | 
|---|
| 410 |  -----------------------------------------------------------------------------------
 | 
|---|
| 411 | 
 | 
|---|
| 412 | B.5/  Calcul/comparaison avec JET/tjet 
 | 
|---|
| 413 | csh> time tjet 10 2000 2000   OU tjet 10 2000 1000
 | 
|---|
| 414 |  (1) TCPU EltAccess C/pointeurs 
 | 
|---|
| 415 |  (2) TCPU m1*c1+m2*c2+m3*c3 C/pointeurs 
 | 
|---|
| 416 |  (3) TCPU EltAccess  SOPHYA
 | 
|---|
| 417 |  (4) TCPU m1*c1+m2*c2+m3*c3 SOPHYA  / Methodes (MultCst, AddArr ...)
 | 
|---|
| 418 |  (5) TCPU m1*c1+m2*c2+m3*c3 SOPHYA JET
 | 
|---|
| 419 | 
 | 
|---|
| 420 |  -----------------------------------------------------------------------------------
 | 
|---|
| 421 |                             (1)        (2)         (3)         (4)           (5)             
 | 
|---|
| 422 |  -----------------------------------------------------------------------------------
 | 
|---|
| 423 | (b)xeon-lx-2.8GHz-icc       0.87       0.63        1.55      2.7/1.6         0.57
 | 
|---|
| 424 | (c)amd-lx                   0.94       0.79        1.85      3.4/2.1         0.76
 | 
|---|
| 425 | 
 | 
|---|
| 426 | (e')osf-cool                2.85       2.45        3.1       6.5/5.5         4.1 
 | 
|---|
| 427 | 
 | 
|---|
| 428 | (g)G5-osx-2GHz              1.5        0.61        2.1       4.1/1.6         0.6   (-g -O2)
 | 
|---|
| 429 |     -tune=G5 -fast          1.1        0.62        1.3       4.1/1.6         0.58  
 | 
|---|
| 430 | (h)G4-osx-1.25GHz           3.86       2.2         5         9.4/6.2         3     (-g -O2)
 | 
|---|
| 431 | (i)core-osx-1.83GHz         1.1        0.49        1.6       2.8/1.7         0.68
 | 
|---|
| 432 | 
 | 
|---|
| 433 | (q)ibm-aix-meso             0.43       0.27        0.52      1.12/0.75       0.35  
 | 
|---|
| 434 | -----------------------------------------------------------------------------------
 | 
|---|
| 435 | 
 | 
|---|
| 436 | 
 | 
|---|
| 437 | C/ Calcul fft (FFTW , FFTPack )
 | 
|---|
| 438 | -------------------------------
 | 
|---|
| 439 | 
 | 
|---|
| 440 | (1) time cpupower 2 
 | 
|---|
| 441 | (2) time tfft 2000000 W d 0 0 (avec FFTW)
 | 
|---|
| 442 | (3) time tfft 2000000 P d 0 0 (avec FFTPack_Sophya)
 | 
|---|
| 443 | 
 | 
|---|
| 444 | IndPerf=1000/TCPU  
 | 
|---|
| 445 | 
 | 
|---|
| 446 | 
 | 
|---|
| 447 | -----------------------------------------------------------------------------------
 | 
|---|
| 448 |                      (1)MFLOPS     (2)CPU/Elap/%   (2)IndPerf   (3)CPU/Elap/%  (3)IndPerf
 | 
|---|
| 449 | -----------------------------------------------------------------------------------
 | 
|---|
| 450 | (a)xeon-lx-2.4GHz     1167        5.5/5.6/97%        180         7.4/7.4/100%     135    
 | 
|---|
| 451 |        -O3 -g                                                    6.8/7.1/96%      147    
 | 
|---|
| 452 | (b)xeon-lx-2.8GHz      920     
 | 
|---|
| 453 | (c)amd-lx              690        3.2/3.75/86%                   4.2/4.2/100%     238
 | 
|---|
| 454 |                                                             ~2x  4.7/4.7/99%
 | 
|---|
| 455 | (cc)amd2-lx            675        2.8/2.8/99%                    3.56/3.58/99%
 | 
|---|
| 456 | 
 | 
|---|
| 457 | (d)osf-asc             420        13.3/13.9/95.7%     75        12.2/17.6/70%     82
 | 
|---|
| 458 | (e)osf-xp1000          648        9.9/10.2/97%       101        9.3/9.46/98.5%    107     
 | 
|---|
| 459 | (e')cool                                                        9.1/9.3/98%
 | 
|---|
| 460 | (f)superosf            842        6./6.22/96.7%      166        5.1/5.18/97.4%    196     
 | 
|---|
| 461 | 
 | 
|---|
| 462 | (g)G5-osx-2GHz        1151        8.5/8.7/97.5%      120        11.5/11.6/99%     87
 | 
|---|
| 463 |     -O -g                         5.1/5.2/98%        190        5.4/5.5/99%       180
 | 
|---|
| 464 |     -tune=G5                      5/5.1/98%          200        5.5/5.6/97%       180
 | 
|---|
| 465 | (h)G4-osx-1.25GHz       92        15.2/15.9/96%       66        23.8/34.1/70%     42    [-g]
 | 
|---|
| 466 |                        380        8.8/10.3/86%                  14.7/20.9/65%           [-O2 -g]
 | 
|---|
| 467 | (i)core-osx-1.83GHz    855        4.6/4.75/97%       217        7/7.08/99%        142
 | 
|---|
| 468 |          -O2                      3.2/3.2/99%        312        3.6/3.6/99%       277
 | 
|---|
| 469 |    -O2 2 jobs //              2 x 3.9/4.3/90%                 2x  4.7/5/91%       200     
 | 
|---|
| 470 | (j)xeon-osx           2600                                      2.6/2.6/98%       384
 | 
|---|
| 471 |        2 jobs //                                           ~2x  3.6/4.1/87%       250
 | 
|---|
| 472 |        4 jobs //                                           
 | 
|---|
| 473 | 
 | 
|---|
| 474 | (p)ibm-aix-regatta 1750/730       6.25/18.9/33%      160        5.25/15.7/33%     190
 | 
|---|
| 475 | (q)ibm-aix-meso    3600/1250      3.95/4.3/91%       250        3.82/4./94%       260
 | 
|---|
| 476 |        2 jobs //                                           ~2x  3.88/4.2/92%      250
 | 
|---|
| 477 | 
 | 
|---|
| 478 | 
 | 
|---|
| 479 | (s)sgi-magique         460        22/22/98%           42         24.5/25/99%       40           
 | 
|---|
| 480 |  -----------------------------------------------------------------------------------
 | 
|---|
| 481 | 
 | 
|---|
| 482 | 
 | 
|---|
| 483 | D/ Calcul inversion par lapack
 | 
|---|
| 484 | -------------------------------
 | 
|---|
| 485 | 
 | 
|---|
| 486 | lpk inverse 1000,1000 0 
 | 
|---|
| 487 | ---> temps de calcul inversion par lapack 
 | 
|---|
| 488 | -------------------------------------------------------------------------------------------
 | 
|---|
| 489 |              CPU/Elap/%       (1)         
 | 
|---|
| 490 | -------------------------------------------------------------------------------------------
 | 
|---|
| 491 | (a)xeon-lx-2.4GHz          5.6/~100%    
 | 
|---|
| 492 | (b)xeon-lx-2.8GHz 
 | 
|---|
| 493 | (c)amd-lx                  5.5/5.5/99%      
 | 
|---|
| 494 | 
 | 
|---|
| 495 | (d)osf-asc 
 | 
|---|
| 496 | (e')cool                   2.8/2.9/95% 
 | 
|---|
| 497 | (f)superosf           
 | 
|---|
| 498 | 
 | 
|---|
| 499 | (f)G4-osx-1.25GHz          2.3/~100%   [-O2 -g]
 | 
|---|
| 500 | (h)G5-osx-2GHz             0.8/~100%
 | 
|---|
| 501 |     -O -g                  0.86/~100%            
 | 
|---|
| 502 | (i)core-osx-1.83GHz        1.93/~100%
 | 
|---|
| 503 |               -O2   
 | 
|---|
| 504 | (p)ibm-aix-regatta        
 | 
|---|
| 505 | (q)ibm-aix-meso            0.55/~100%
 | 
|---|
| 506 | 
 | 
|---|
| 507 | --------------------------------------------------------------------------------------------
 | 
|---|
| 508 | 
 | 
|---|
| 509 | 
 | 
|---|
| 510 | K/ Efficacite de gestion de lock (mutex) avec les threads et tableaux 
 | 
|---|
| 511 | ----------------------------------------------------------------------
 | 
|---|
| 512 | (32 threads - operant sur 2000 vecteurs ~ 64000 lock/unlock/wait/broadcast)
 | 
|---|
| 513 | 
 | 
|---|
| 514 | (1) time zthr syncp 32 2000 4 
 | 
|---|
| 515 | (2) time zthr sync 32 2000 4 
 | 
|---|
| 516 | (1) time zthr syncp 4 15000 130 
 | 
|---|
| 517 | (2) time zthr sync  4 15000 130 
 | 
|---|
| 518 | -------------------------------------------------------------------------------------------
 | 
|---|
| 519 |              CPU/Elap/%       (1)             (2)              (3)              (4) 
 | 
|---|
| 520 | -------------------------------------------------------------------------------------------
 | 
|---|
| 521 | (a)xeon-lx-2.4GHz         23.5/14/168%    4.3/1.2/365%     7.9/5.5/142%      4/2.15/190%
 | 
|---|
| 522 |       Avant ThSafeOp      17/178%
 | 
|---|
| 523 |       avec -O3 -g 
 | 
|---|
| 524 | (b)xeon-lx-2.8GHz (2)     
 | 
|---|
| 525 | (c)amd-lx                 0.6/1/63%        0.6/1/60%       3.5/3.5/102%      2.6/2.7/98% 
 | 
|---|
| 526 | 
 | 
|---|
| 527 | (d)osf-asc               4.5/3.4/132%     3.35/2/170%     15.8/10.5/150%     13/8/163%
 | 
|---|
| 528 |      5.4/100%(NoThSafe)
 | 
|---|
| 529 | (e')cool                 1.3/1.37/95%     1.35/1.5/89%     5.3/5.3/99%       5.2/5.2/99%    
 | 
|---|
| 530 | (e)superosf (1)          
 | 
|---|
| 531 | 
 | 
|---|
| 532 | 
 | 
|---|
| 533 | (g)G5-osx-2GHz (2)     2.6/130% (NoThSafe)                   
 | 
|---|
| 534 |     -O -g                2.6/1.7/150%      6.8/3.7/187%    4/2.75/142%       4.7/3/155%
 | 
|---|
| 535 | (h)G4-osx-1.25GHz (1)    40.5/42.6/95%    42.2/43.5/97%      [-g]
 | 
|---|
| 536 |                           3.9/4.3/89%      3.5/4/89%       3.8/4/95%         4.3/4.6/93%
 | 
|---|
| 537 | 
 | 
|---|
| 538 | (i)core-osx-1.83GHz      7.7/7.1/108%     7.8/6.7/116%    30.2/29.6/102%    30.3/29.2/104%   [-g]
 | 
|---|
| 539 |               -O2        2.7/1.8/152%     6/3.16/190%     3.4/2.4/142%      3.2/2.5/164%     [-O2 -g]
 | 
|---|
| 540 | (j)xeon-osx               
 | 
|---|
| 541 |       Avant ThSafeOp      2.55/143%
 | 
|---|
| 542 | 
 | 
|---|
| 543 | (p)ibm-aix-regatta        4.7/111%
 | 
|---|
| 544 | (q)ibm-aix-meso           7.5/2.8/300%     17/3.8/450%    8.2/3.05/270%      4.85/2.43/200%    
 | 
|---|
| 545 | 
 | 
|---|
| 546 | --------------------------------------------------------------------------------------------
 | 
|---|
| 547 | 
 | 
|---|
| 548 | 
 | 
|---|
| 549 | 
 | 
|---|
| 550 | L/ I/O et PPF 
 | 
|---|
| 551 | -----------------
 | 
|---|
| 552 | Ecriture/lecture de n=10^7 lignes de int+6double, Total ~ 500 MO 
 | 
|---|
| 553 | (1) time tstdtable w xx.ppf swap 10000000 1024 0
 | 
|---|
| 554 | (2) time tstdtable r xx.ppf swap 10000000 1024 0
 | 
|---|
| 555 | (3) time tstdtable w xx.ppf swap 50000000 1024 0
 | 
|---|
| 556 | (4) time tstdtable r xx.ppf swap 50000000 1024 0
 | 
|---|
| 557 | 
 | 
|---|
| 558 | -------------------------------------------------------------------------------------------
 | 
|---|
| 559 |              CPU/Elap/%       (1)              (2)              (3)                (4)
 | 
|---|
| 560 | -------------------------------------------------------------------------------------------
 | 
|---|
| 561 | (a)xeon-lx-2.4GHz          17/26/63%       5.5/5.6/94%        
 | 
|---|
| 562 | (b)xeon-lx-2.8GHz 
 | 
|---|
| 563 | (c)amd-lx                  5.9/6./97%      3.4/3.4/100%      30/32/93%      24/165/13% ?
 | 
|---|
| 564 | 
 | 
|---|
| 565 | (d)osf-asc 
 | 
|---|
| 566 | (e')cool                   15/30/50%       13/13/99%
 | 
|---|
| 567 | (f)superosf           
 | 
|---|
| 568 | 
 | 
|---|
| 569 | (g)G5-osx-2GHz             14/14.2/98%     6/6.14/99% 
 | 
|---|
| 570 | 
 | 
|---|
| 571 | (h)G4-osx-1.25GHz          26/29.7/87%     15.7/38.6/41%               [-O2 -g]
 | 
|---|
| 572 | (i)core-osx-1.83GHz        37/37.8/98%     22/48.4/45%                                 [-g]
 | 
|---|
| 573 |               -O2          10.5/17.4/55%   11.2/41.2/27%                               [-O2 -g]
 | 
|---|
| 574 | (p)ibm-aix-regatta        
 | 
|---|
| 575 | (q)ibm-aix-meso           5.5/16.8/38%    5.7/13.2/43%      32.7/85/39%    29/60/49%
 | 
|---|
| 576 | 
 | 
|---|
| 577 | --------------------------------------------------------------------------------------------
 | 
|---|
| 578 | 
 | 
|---|
| 579 | 
 | 
|---|
| 580 | 
 | 
|---|