-------------------------------------------------------------------------------------
 Comparaison performance de differentes machines en compilation / execution (calcul)
                                   -----------------------
    Mesures effectuees en Janvier 2007 ,       R. Ansari / C. Magneville
-------------------------------------------------------------------------------------

(a) eros3 : Bipro-bicoeur Xeon@2.4 GHz Linux (xeon-lx-2.4GHz)  , gcc 3.2
(b) ccali : Bipro-bicoeur Xeon@2.8 GHz Linux (xeon-lx-2.8GHz)  , icc 8.0 ou 9.0
     Flags de compilation avec [-O -g]      
(c) sgsda: AMD Bipro AMD opteron 248 @ 2.2 GHz (amd-lx-
     Flags de compilation avec [-O -g]      
(cc) grid-saclay: AMD opteron 275 Bipro-bicoeur  @ 2.2 GHz (amd275-lx)
     Flags de compilation avec [-O -g]      

(d) asc: bipro alpha (@ ~1 GHz) server DS20 OSF (osf1)  , cxx 6.5 (osf-asc)
(nouveau asc 420 MFLOPS, moins puissante que l'ancien asc 800 MFLOPS)
(e) xp1000-dapnia: alpha xp1000 @ ~ 600 MHz ? OSF1 , cxx ? (osf-xp1000)
(e') cool: alpha xp1000 @ ~ 667 MHz ? OSF1 5.1 , cxx 6.3 
(f) superosf-dapnia: multi-proc alphaServer ES80 6 procs EV7 @ 1 GHz (super-osf)

(g) ccsvx01: XServe G5 bipro @~1.8-2GHz (Darwin/OSX) (G5-osx-2GHz) , gcc 3.3
(h) PowerBook-Reza : Apple G4 @ 1.25 GHz (G4-osx-1.25GHz) , gcc 3.3
(i) MacBook-Reza: Apple/ Core double-coeur Intel @ 1.83 GHz (core-osx-1.83GHz) gcc 4
(j) MacPro-Grosdidier : Apple / Xeon 2 double-coeur @ 3 GHz gcc 4.0.1 , compil SOPHYA -O2 -g

(p) IBM-AIX regatta , xlC , IBM eServer pSeries 655 , 8 proc power4 @ 1.1 GHz
(q) IBM-AIX meso , AIX 5.3, xlC V8 , IBM Power5 , 8 proc bi-coeur P575 @ 1.9 GHz 

(s) SGI-IRIX64 magique, CC 

NOTES : 
- Sur les machines Xeon, il y a une interaction entre process / threads par rapport a 
l'occupation des CPU's. On perd un facteur 3 en performance multi-threads/multi-taches.
La machine MacPro avec OSX se debrouille quand meme mieux.
- Effet du systeme ou carte mere ??? 

Flag de compilation 
- Flag de compilation par defaut [-O -g] en general
- Sur eros3 (xeon-linux gcc 3.3) [-O -g] OU [-O3 -g]
- Sur Darwin [-g] ou [-O2 -g] (ou [-tune G5] sur XServe G5)
   Sur les mac (en particulier G4/G5), grande difference entre -g et -Ox -g
   mais peu de difference entre -O -O2 -O3  
- Sur machine aix-meso [-O -g] ou [-O3 -g]

X/ Performances brutes cpupower et donnees SPEC ((http//www.spec.org) 
----------------------------------------------------------------------

(1) MFLOPS  -> cpupower 2   (x/y : -O -g / -O3) 
SPECint2000 (3) / SPECfp2000 (2) (http//www.spec.org) 

X.1/ Performances en calcul double
csh> cpupower 0 3000000  5 
     3 10^6 operations doubles - sur memoire 3x3 10^6 doubles (~50 MO)
      ===> ~ 24 MO / MFLOPS
csh> cpupower 2
     1.6 10^9 operations doubles - sur 3x20000 doubles (~0.5 MO)


Compilation avec -O  (optimisation)
  (1) cpupower 0 : debit memoire en MO/s
  (2) cpupower 0  , MFLOPS   
  (5) cpupower 2 ,  MFLOPS 

Compilation avec -g (debug / sans optimisation)
  (3) cpupower 0  , MFLOPS  
  (6) cpupower 2  , MFLOPS 

Compilation avec -O3 ou -fast ...( optimisation poussee) 
  (4) cpupower 0  , MFLOPS  
  (7) cpupower 2  , MFLOPS 


----------------------------------------------------------------------------------------------
        MFLOPS       |(1) MO/s|   (2)     (3)      (4)   |    (5)       (6)       (7)      
----------------------------------------------------------------------------------------------
(a)xeon-lx-2.4GHz    | 1290   |   53      53       55    |    338       340       320
(b)xeon-lx-2.8GHzicc | 2040   |   85      80       83    |    914       409       914
(c)amd-lx            | 1560   |   65      77       68    |    666       314       686 
(cc)amd2-lx          |        |

(e')osf-cool         |  768   |   32      15       32    |    630       150       660     
(f)superosf               

(g)G5-osx-1 GHz      | 2100   |   88      68       88    |   1000       255      1073
(f)G4-osx-1.25GHz    |  600   |   25      16       25    |    417        93       430 
(i)core-osx-1.83GHz  | 2500   |  107      75      107    |    855       309       884
(j)xeon-osx            

(p)ibm-aix-regatta   | 3100   |  130      55      133    |    730       115      1750  (32 bits)
(p)ibm-aix-meso      | 5700   |  240      70      320    |   1500       220      3400  (32 bits)

(s)sgi-magique       |  336   |  14       7       15     |    340        40       460  (32 bits)  
----------------------------------------------------------------------------------------------

X.2/  Comparaison performances int, float double 
  cpupower compile avec -O 

(1) float , cpupowerF 0 3000000 5 / cpupowerF 2
    -> MFLOPS (puissance de calcul sur float)
(2) double, cpupowerD 0 3000000 5 / cpupowerF 2  (idem tableau X.1)
    -> MDBLOPS (puissance de calcul sur float)
(3) int, cpupowerI 0 3000000 5 / cpupowerI 2
    -> MINTOPS  (puissance de calcul sur int=4 bytes)
(4) long (ou long long (*)) cpupowerL 0 3000000 5 / cpupowerL 2
    -> MLONOPS  (puissance de calcul sur long=8 bytes)
----------------------------------------------------------------------------------------------
        MFLOPS       |   (1)MFLOPS       (2)MDBLOPS       (3)MINTOPS       (4)MLONOPS 
----------------------------------------------------------------------------------------------
(a)xeon-lx-2.4GHz    | 
(a)xeon-lx-2.8GHzicc |    166/905         90/900           166/1500         88/522    (*)
(c)amd-lx            |    125/695         65/675           125/1570         65/1045
(cc)amd2-lx          | 

(e')osf-cool         |    60/635          32/631            62/640          31/630
(f)superosf               

(g)G5-osx-1 GHz      |    180/1260        90/1150           165/940         81/280    (*)
(f)G4-osx-1.25GHz    |    45/430          25/410            45/710          24/190    (*)
(i)core-osx-1.83GHz  |    185/919         105/855           187/935         62/246    (*)
(j)xeon-osx            

(p)ibm-aix-regatta   | 
(p)ibm-aix-meso      |    250/1150        250/1500          250/1200        50/200     (32 bits)
                     |    280/1500        250/1600          250/1100        210/1000   (64 bits -q64)

(s)sgi-magique       |  
----------------------------------------------------------------------------------------------

X.3/  Comparaison avec SPEC 
csh>  cpupower 0 / cpupower 2 
----------------------------------------------------------------------
                         MFLOPS(1)      SPECfp      SPECint 
----------------------------------------------------------------------
(b)xeon-lx-2.8GHz         166/900        1400        1400
(c)amd-lx                 125/690        1600        1300
(cc)amd2-lx               675            1600        1300

(e)osf-xp1000             32/650         500         400
(f)superosf               842            1100        700

(i)core-osx-1.83GHz       110/850        1400        1500    
(j)xeon-osx               2600           2900          -

(p)ibm-aix-regatta        130/700        1050        700     
----------------------------------------------------------------------


A/ Compilation tout SOPHYA 
----------------------------
csh> time make all   (1)
ou 
csh> time make -j 2 all  (2)
  Temps CPU 
  Indice de performance 100*(1000/TCPU) 
  Temps elapsed (vrai)
  Temps vrai / TCPU


----------------------------------------------------------------------
                         CPU(s)  IndPerf   TElapsed , TCPU/Elapsed %
----------------------------------------------------------------------
(a)xeon-lx-2.4GHz (2)    615 s      162       410 s        150%  
      avec -O3 -g (2)   1300 s       77       760 s        172%  
(b)xeon-lx-2.8GHz (2)    755 s      132       540 s        140%
(c)amd-lx         (2)    336 s      297       175 s        192%

(d)osf-asc (1)          1920 s       52      2340 s        83%   (??)
(e)osf-xp1000 (1)        533 s      187       660 s        80%
(f)superosf (1)          895 s      112       910 s        98%

(g)G5-osx-2GHz (2)       453 s      221       250 s        182%
    -tune=G5            1100 s       90
    -g -O                740 s                380 s        195%
(h)G4-osx-1.25GHz (1)    660 s      151       710 s        93%   [-g]
                        1500 s                             94%   [-O2 -g]
(i)core-osx-1.83GHz (2)  209 s      478       116 s        180%
              -O2   (1)  367 s      272       381          96%    
(j)xeon-osx

(p)ibm-aix
----------------------------------------------------------------------

Taille shared libs : 
(a)
(c) 33 MO   
(f) = (e) = 57 MO 
(g) 80 MO
(i) 83 MO

B/ Calcul brut (Tableaux de SOPHYA) avec / sans threads
--------------------------------------------------------
B.1.a/   Calcul sur vecteur 10 * V2 ~= DLO4 (V1) 
         ~ 10 x 10 x 9. 10^6 operations double sur 2 x  9 10^6 double    
         900 M.Ops r_8 / ~ 1500 MO 

(1) time cpupower 0     # compile avec -O  (/ -O -g)
(2) time zthr arrdl 1 3000   1 thread
(3) time zthr arrdl 2 3000   2 thread
(4) time zthr arrdl 4 3000   4 thread
(5) time zthr arrdl 6 3000   6 thread
(6) time zthr arrdl 8 3000   8 thread

-----------------------------------------------------------------------------------
                     (1)MFLOPS  (2)CPU/Elap/%   (3)CPU/Elap/%   (4)CPU/Elap/%
-----------------------------------------------------------------------------------
(a)xeon-lx-2.4GHz      53        
(b)xeon-lx-2.8GHz      65       2.6/2.6/100%    5.3/2.9/180%   14.3/4.86/310% 
                                   (5) 23/7.4/314%
(c)amd-lx              95        
                                 
                                  
(e')osf-cool           32       5.7/5.8/98%     11.1/11.3/98%   22.3/22.5/98% 
(f)superosf                    

(g)G5-osx-2GHz         88       2.5/2.6/99%     5.9/3.38/184%    11/6.45/173%    [-O2 -g]
(h)G4-osx-1.25GHz      25       6.6/7/95%       13.4/13.8/97%                    [-O2 -g]
(i)core-osx-1.83GHz   107       2.1/2.1/98%     4.3/2.9/150%     8.3/30/31%      [-O2 -g]
(j)xeon-osx           

(p)ibm-aix-regatta    130        
(q)ibm-aix-meso       150       0.7/1/70%       1.2/2./60%       3.2/2/150%   [-O3]
                                  (5) 5.4/3/180%    (6) 6.4/3/210% 

(s)sgi-magique          7       78/78/99%       167/95/175%      339/96/352%     [-O -g: NON-OPT]
                       14       16.4/16.5/99%   33.8/22.4/150%   79/32/250%      [-O -g2 OPT]
 -----------------------------------------------------------------------------------

B.1.b/   Calcul sur vecteur V2 = Sin(V1) + Cos(V1) 
         ~ 50 x 9. 10^6 operations double sur 2 x  9 10^6 double, mem ~ 150 MO 
         ~500 M.Ops r_8 / ~ 600 MO I/O

(1) time cpupower 0     # compile avec -O  (/ -O -g)
(2) time zthr arrmf 1 3000   1 thread
(3) time zthr arrmf 2 3000   2 thread
(4) time zthr arrmf 4 3000   4 thread
(5) time zthr arrmf 6 3000   6 thread
(6) time zthr arrmf 8 3000   8 thread

-----------------------------------------------------------------------------------
                     (1)MFLOPS  (2)CPU/Elap/%   (3)CPU/Elap/%   (4)CPU/Elap/%
-----------------------------------------------------------------------------------
(a)xeon-lx-2.4GHz      53        
(b)xeon-lx-2.8GHz      65       1.7/1.7/100%    3.5/2.1/173%     9.8/3.6/275%
(c)amd-lx              95        
                                 
(e')osf-cool           32       4.2/4.3/98%     8.2/8.4/98%      16.1/16.2/98%
(f)superosf                    

(g)G5-osx-2GHz         88       2.3/2.3/100%      5/3/165%        9.6/5.8/167%  [-O2 -g]
(h)G4-osx-1.25GHz      25       4.5/4.8/95%       10.9/14.6/72%        [-O2 -g]
(i)core-osx-1.83GHz   107       2.3/2.3/98%       4.8/3.1/158%                [-O2 -g]
(j)xeon-osx           

(p)ibm-aix-regatta    130        
(q)ibm-aix-meso       150       1./2/50%         2.8/3/86%       5.4/4/130%   [-O3]
                                     (5) 10/4/250%    (6) 11.2/5/220%% 

(s)sgi-magique         7       11.5/11.7/99%     24/17/140%      51.5/18.4/280%  [-O -g NON-OPT]
                      14       6.5/6.6/99%       13.3/12/110%    34.5/17.3/200%  [-O -g3 OPT]
 -----------------------------------------------------------------------------------


B.1.c/ Version corrige de zthr.cc (apres 23/05/07) 
         arr = (c1*a1) + (c2*a2) 
         ~ 3 x 4. 10^6 operations int_4 sur 3 x 4 10^6 int_4    
         12 M.Ops int_4 / ~ 50 MO 

(1) time cpupower 0     # compile avec -O  (/ -O -g)
(2) time zthr arr 1 2000   1 thread
(3) time zthr arr 2 2000   2 thread
(4) time zthr arr 4 2000   4 thread
(5) time zthr arr 6 2000   6 thread
(6) time zthr arr 8 2000   8 thread

-----------------------------------------------------------------------------------
                     (1)MFLOPS  (2)CPU/Elap/%   (3)CPU/Elap/%   (4)CPU/Elap/%
-----------------------------------------------------------------------------------
(a)xeon-lx-2.4GHz      53        0.5/1/43%      1/1.1/88%      2.8/1/262%
                                    (5) 4.5/1.8/246%      (6) 6.1/2.1/310% 
                                  
        
(b)xeon-lx-2.8GHz      65        
                                  
(c)amd-lx              95        0.23/1/22%     0.44/1/51%       1/1/102%     [-O -g]
                                     (5) 1.6/1/106%   (6) 2.2/1.2/100% 


                                  
(e')osf-cool           32        0.43/1.2/35%   0.6/1.33/44%     1.1/1.3/82%      [-O -g]
                                     (5) 1.45/1.7/85%   (6) 1.83/2.16/84%         
(f)superosf                    

(g)G5-osx-2GHz         88       1.5/1.5/100%    3.2/1.7/185%      6.6/3.5/188%    [-O -g]
(g)G5-osx-2GHz         88       0.4/1/40%       0.9/1.0/90%       2/1.2/169%      [-tune=G5 -g]
                                     (5) 3.3/2/164%    (6) 4.3/2.6/165%
(h)G4-osx-1.25GHz      25       3/3/95%                                           [-O2 -g]
                                  
(i)core-osx-1.83GHz               [-O2 -g]

(j)xeon-osx           


(p)ibm-aix-regatta   130        

(q)ibm-aix-meso      150        0.6/1/58%       1/1/91%           1.7/1.2/132%    [-O3]
                                     (5) 2.4/1.2/193%   (6) 4.25/1.6/265%      

 -----------------------------------------------------------------------------------

B.1.x/ ancienne version de zthr (avant 23/05/07) 
         On faisait 2 multiplications par ctye suivi d'un produit matriciel !
         arr = c1*a1*c2*a2   ( ~ 3 10^6 op. double)
(1) time cpupower 2     # compile avec -O3  (/ -O -g)
(2) time zthr arr 1 1000   1 thread
(3) time zthr arr 2 1000   2 thread
(4) time zthr arr 4 1000   4 thread
(5) time zthr arr 6 1000   6 thread
(6) time zthr arr 8 1000   8 thread

-----------------------------------------------------------------------------------
                     (1)MFLOPS  (2)CPU/Elap/% (2)IndPerf  (3)CPU/Elap/% (3)IndPerf
-----------------------------------------------------------------------------------
(a)xeon-lx-2.4GHz     1167        5.15/5.2/99%              11.4/5.8/196%
                                           (4)36.6/9.28/394%
        -O3 -g                    4.9/5./99%
(b)xeon-lx-2.8GHz      920        2.3/2.3/100%              6.2/3.1/198%
                                           (4)26/6.6/396%
(c)amd-lx              690        3.6/3.6/99%               6.8/4/171%
                                           (4)13.5/7/193%
                                           (5)20.3/10.23/198%
(cc)amd2-lx            675        2/2/99%                   4.15/2.1/197%
                                           (4)8.25/4.15/198%
                                           (5)13.6/4.6/292%
                                           (6)19.8/6.5/300%

(d)osf-asc             420        6.3s/6.5s/99%            16.9/8.8/192%   
                                           (4)29.9/15.7/191%
(e)osf-xp1000          648        5.1/5.3/96.6%            11.4/11.4/99%       
                                           (4)25.2/25.5/99%
(f)superosf            842        2.87/2.88/99.6%          6.25/4.1/153%
                                           (4)11.6/3.06/379%          

(h)G4-osx-1.25GHz       92        44s/48s/91%              86.7/99.8/92%  [-g]
                       380        12.2/12.9/95%            24/25.3/95%    [-O2 -g]                   
(g)G5-osx-2GHz        1151        20s/20s/99%              40s/23s/170%
                                           (4) 80.8/45/180%
   -O -g                          4.5/4.9/91%              9.3/4.7/197%
                                           (4) 18.3/9.4/197%
   -tune=G5                       3.35/3.8/88%             7.1/3.6/196%
(h)G4-osx-1.25GHz       92        44s/48s/91%              86.7/99.8/92%  [-g]
                       380        12.2/12.9/95%            24/25.3/95%    [-O2 -g]                   
                                           (4) 14/7.5/187%
(i)core-osx-1.83GHz    855        11.5/11.5/100%           23/11.6/192%   [-g]
                                           (4) 46/23/199%
              -O2                 3.85/3.89/99%            7.7/3.9/198%   [-O2 -g]
                                           (4) 15.4/7.77/198%

(j)xeon-osx           2600        2.5/2.5/100%             5.1/2.6/199%
                                           (4) 11.5/3.2/362%
                                           (5) 17.4/4.77/365%

(p)ibm-aix-regatta  1750/730      6.8/6.9/98%              13.1/6.75/195%
                                           (4) 26.3/11.7/225%
(q)ibm-aix-meso     3600/1250     3.6/3.75/96%             7.35/3.7/197%
                                           (4) 12.46/4.2/298%
                                           (5) 219/6.7/280%
                                           (6) 24/4.5/530%


(s)sgi-magique         460        60/60/99%       
 -----------------------------------------------------------------------------------


B.2/ Multiplication de matrices mtx = mtx1 * mtx2 
     ~ 2  10^9 op. double / thread
(1) time cpupower 2  (-O3 / -O -g)
(2) time zthr mtx 1 1000   1 thread
(3) time zthr mtx 2 1000   2 thread
(4) time zthr mtx 4 1000   4 thread
(5) time zthr mtx 6 1000   6 thread
(6) time zthr mtx 8 1000   8 thread

-----------------------------------------------------------------------------------
                     (1)MFLOPS  (2)CPU/Elap/% (2)IndPerf  (3)CPU/Elap/% (3)IndPerf
-----------------------------------------------------------------------------------
(a)xeon-lx-2.4GHz     1167        6.5/6.5/100%               17.4/8.8/198%
                                           (4) 80.5/20.3/397%
                                           (5) 114.5/29.6/387%
                                           (6) 160/40.3/387%
                                           
(b)xeon-lx-2.8GHz      920        3.4/3.4/100%               12/6.1/199%
                                           (4) 55.8/14/400%
                                           (5) 79.5/20.3/392%
                                           (6) 102/25.8/396%
(c)amd-lx              690        6.98/6.98/100%             14.1/8.15/173%
                                           (4) 27.7/14.23/194%
                                           (5) 41.4/21.07/196%
                                           (6) 55.4/27.9/198.7%  
(cc)amd2-lx            675        4.1/4.1/100%               9.55/4.8/198%
                                           (4) 20/10.27/195%
                                           (5) 32.8/11.16/294%
                                           (6) 42.75/13.8/309%


(d)osf-asc             420        13.5s/13.7s/98%            32/16.5/194%   
                                           (4) 67.5/34.4/196%
(e)osf-xp1000          648        13/14.1/92%                27.1/27.4/99%
                                           (4) 54/54.7/99.6%
                                           (5) 80.6/81/99.6%
                                           (6) 107.8/108.3/99.5%
(e')osf-cool                      13/13.22/98%               26/26.1/99%
                                           (4) 51.8/51.9/99%
(f)superosf            842        6.1/7.24/84%               12.35/6.29/196%
                                           (4) 24.3/6.31/385%
                                           (5) 36.5/10.9/335%
                                           (6) 50.1/18.15/276%

(g)G5-osx-2GHz        1151        23/23.7/97%                46.5/27.5/170%
                                           (4) 93.4/49.4/189%
  -O -g                           6.2/6.2/100%                14.2/7.2/197%
                                           (4) 28.3/14.36/197%
  -tune=G5                        5.7/5.8/98%                13.3/6.8/197%
                                           (4) 26.8/13.56/197%
                                           (6) 53.8/27.25/197%
(h)G4-osx-1.25GHz      333        23.5/24.5/96%                              [-O2]
(i)core-osx-1.83GHz    855        12.6/12.7/100%             25.8/13.4/194% 
                                           (4) 51.6/26/199%
            -O2                   4.25/4.5/94%               10.6/5.36/198%
                                           (4) 20.87/10.68/198%
      -O2 2 jobs //           2 x 5/5.4/92%
(j)xeon-osx           2600        2.8/2.8/99%                9.3/4.66/199%
                                           (4) 31.4/8.6/364%
                                           (5) 47.1/12.96/364%
                                           (6) 62.8/17.38/362%

(p)ibm-aix-regatta  1750/730      9.5/9.7/98%                18.3/16.0/114%
                                           (4) 38.3/24.7/155%
(p)ibm-aix-meso     3600/1250     2.3/2.3/99%                5.1/2.64/194%   (compil avec -O3)
                                           (4) 11.4/4.16/272%
                                           (5) 20.2/5.85/344%
                                           (6) 29.9/6.74/442%

(s)sgi-magique         400        44/44.3/99%                96.5/55/176%

 -----------------------------------------------------------------------------------


B.4/ Operations sur tableaux doubles- mesures avec spar 
  csh> time spar 2 1 2000 2000
  (1) cpupower 2  MFLOPS
  (2) MFLOPS (double) spar 
  (3) time spar 2 5 1000 2000 CPU/Elap/% 
-----------------------------------------------------------------------------------
                     (1)MFLOPS      (2)CPU / %         (3)CPU/Elap/%
-----------------------------------------------------------------------------------
(a)xeon-lx-2.4GHz      53       ~ 20-35 MFLOPS , 90%     20/20.2/99%       [-g -O] 
                                  
        
(b)xeon-lx-2.8GHz      65        
                                  
(c)amd-lx              95       ~ 20-40 MFLOPS , 99%     17.2/17.2/100%    [-g -O] 


(d)osf-asc                     
                                  
(e)osf-xp1000          32       ~ 15-25 MFLOPS , 90%     37.6/41.2/91%      [-g -O]  
(f)superosf                    

(g)G5-osx-2GHz         88       ~ 10-25 MFLOPS , 99%     45/45/100%         [-g -O] ou [-g -O2]
(h)G4-osx-1.25GHz      25       ~ 8-16  MFLOPS , 92%     45.5/52/90%        [-g -O2]
                                  
(i)core-osx-1.83GHz             

(j)xeon-osx           


(p)ibm-aix-regatta   130        

(q)ibm-aix-meso      150        ~ 80-100 MFLOPS , 90%   5./23/22%     [-O3]    



(s)sgi-magique         460        
 -----------------------------------------------------------------------------------

B.5/  Calcul/comparaison avec JET/tjet 
csh> time tjet 10 2000 2000   OU tjet 10 2000 1000
 (1) TCPU EltAccess C/pointeurs 
 (2) TCPU m1*c1+m2*c2+m3*c3 C/pointeurs 
 (3) TCPU EltAccess  SOPHYA
 (4) TCPU m1*c1+m2*c2+m3*c3 SOPHYA  / Methodes (MultCst, AddArr ...)
 (5) TCPU m1*c1+m2*c2+m3*c3 SOPHYA JET

 -----------------------------------------------------------------------------------
                            (1)        (2)         (3)         (4)           (5)             
 -----------------------------------------------------------------------------------
(b)xeon-lx-2.8GHz-icc       0.87       0.63        1.55      2.7/1.6         0.57
(c)amd-lx                   0.94       0.79        1.85      3.4/2.1         0.76

(e')osf-cool                2.85       2.45        3.1       6.5/5.5         4.1 

(g)G5-osx-2GHz              1.5        0.61        2.1       4.1/1.6         0.6   (-g -O2)
    -tune=G5 -fast          1.1        0.62        1.3       4.1/1.6         0.58  
(h)G4-osx-1.25GHz           3.86       2.2         5         9.4/6.2         3     (-g -O2)
(i)core-osx-1.83GHz         1.1        0.49        1.6       2.8/1.7         0.68

(q)ibm-aix-meso             0.43       0.27        0.52      1.12/0.75       0.35  

(s)sgi-magique              2.45       1.9         5.65      7.45/6.3        2.8  (-O -g3)
-----------------------------------------------------------------------------------


C/ Calcul fft (FFTW , FFTPack )
-------------------------------

(1) time cpupower 2 
(2) time tfft 2000000 W d 0 0 (avec FFTW)
(3) time tfft 2000000 P d 0 0 (avec FFTPack_Sophya)

IndPerf=1000/TCPU  


-----------------------------------------------------------------------------------
                     (1)MFLOPS     (2)CPU/Elap/%   (2)IndPerf   (3)CPU/Elap/%  (3)IndPerf
-----------------------------------------------------------------------------------
(a)xeon-lx-2.4GHz     1167        5.5/5.6/97%        180         7.4/7.4/100%     135    
       -O3 -g                                                    6.8/7.1/96%      147    

(b)xeon-lx-2.8GHz      920        3.6/3.7/98%                    3.7/3.8/99%
                                                          ~2x    6.5/8.8/73%
                                                          ~4x    8/14/55%   (15 sec elapsed)
         
(c)amd-lx              690        3.2/3.75/86%                   4.2/4.2/100%     238
                                                            ~2x  4.7/4.7/99%
(cc)amd2-lx            675        2.8/2.8/99%                    3.56/3.58/99%

(d)osf-asc             420        13.3/13.9/95.7%     75        12.2/17.6/70%     82
(e)osf-xp1000          648        9.9/10.2/97%       101        9.3/9.46/98.5%    107     
(e')cool                          9.2/10.1/91%                  9.1/9.3/98%
(f)superosf            842        6./6.22/96.7%      166        5.1/5.18/97.4%    196     

(g)G5-osx-2GHz        1151        8.5/8.7/97.5%      120        11.5/11.6/99%     87
    -O -g                         5.1/5.2/98%        190        5.4/5.5/99%       180
    -tune=G5                      5/5.1/98%          200        5.5/5.6/97%       180
(h)G4-osx-1.25GHz       92        15.2/15.9/96%       66        23.8/34.1/70%     42    [-g]
                       380        8.8/10.3/86%                  14.7/20.9/65%           [-O2 -g]
(i)core-osx-1.83GHz    855        4.6/4.75/97%       217        7/7.08/99%        142
         -O2                      3.2/3.2/99%        312        3.6/3.6/99%       277
   -O2 2 jobs //              2 x 3.9/4.3/90%                 2x  4.7/5/91%       200     
(j)xeon-osx           2600                                      2.6/2.6/98%       384
       2 jobs //                                           ~2x  3.6/4.1/87%       250
       4 jobs //                                           

(p)ibm-aix-regatta 1750/730       6.25/18.9/33%      160        5.25/15.7/33%     190
(q)ibm-aix-meso    3600/1250      3.95/4.3/91%       250        3.82/4./94%       260
       2 jobs //                                           ~2x  3.88/4.2/92%      250
                                  2.8/4.76/59%                  2.1/4.45/47%     [-O3]
                                                          ~2x   2.5/4.7/54%
                                                          ~4x   2.5/4.8/55% 


(s)sgi-magique         460        22/22/98%           42         24.5/25/99%       40           
 -----------------------------------------------------------------------------------


D/ Calcul inversion par lapack
-------------------------------

lpk inverse 1000,1000 0 
---> temps de calcul inversion par lapack 
-------------------------------------------------------------------------------------------
             CPU/Elap/%       (1)         
-------------------------------------------------------------------------------------------
(a)xeon-lx-2.4GHz          5.6/~100%    
(b)xeon-lx-2.8GHz          5.34/~90%
(c)amd-lx                  5.5/5.5/99%      

(d)osf-asc 
(e')cool                   2.8/2.9/95% 
(f)superosf           

(f)G4-osx-1.25GHz          2.3/~100%   [-O2 -g]
(h)G5-osx-2GHz             0.8/~100%
    -O -g                  0.86/~100%            
(i)core-osx-1.83GHz        1.93/~100%
              -O2   
(p)ibm-aix-regatta        
(q)ibm-aix-meso            0.55/~100%

(s)sgi-magique             5.3/~90%
--------------------------------------------------------------------------------------------


K/ Efficacite de gestion de lock (mutex) avec les threads et tableaux 
----------------------------------------------------------------------
(32 threads - operant sur 2000 vecteurs ~ 64000 lock/unlock/wait/broadcast)

(1) time zthr syncp 32 2000 4 
(2) time zthr sync 32 2000 4 
(1) time zthr syncp 4 15000 130 
(2) time zthr sync  4 15000 130 
-------------------------------------------------------------------------------------------
             CPU/Elap/%       (1)             (2)              (3)              (4) 
-------------------------------------------------------------------------------------------
(a)xeon-lx-2.4GHz         23.5/14/168%    4.3/1.2/365%     7.9/5.5/142%      4/2.15/190%
      Avant ThSafeOp      17/178%
      avec -O3 -g 
(b)xeon-lx-2.8GHz (2)     
(c)amd-lx                 0.6/1/63%        0.6/1/60%       3.5/3.5/102%      2.6/2.7/98% 

(d)osf-asc               4.5/3.4/132%     3.35/2/170%     15.8/10.5/150%     13/8/163%
     5.4/100%(NoThSafe)
(e')cool                 1.3/1.37/95%     1.35/1.5/89%     5.3/5.3/99%       5.2/5.2/99%    
(e)superosf (1)          


(g)G5-osx-2GHz (2)     2.6/130% (NoThSafe)                   
    -O -g                2.6/1.7/150%      6.8/3.7/187%    4/2.75/142%       4.7/3/155%
(h)G4-osx-1.25GHz (1)    40.5/42.6/95%    42.2/43.5/97%      [-g]
                          3.9/4.3/89%      3.5/4/89%       3.8/4/95%         4.3/4.6/93%

(i)core-osx-1.83GHz      7.7/7.1/108%     7.8/6.7/116%    30.2/29.6/102%    30.3/29.2/104%   [-g]
              -O2        2.7/1.8/152%     6/3.16/190%     3.4/2.4/142%      3.2/2.5/164%     [-O2 -g]
(j)xeon-osx               
      Avant ThSafeOp      2.55/143%

(p)ibm-aix-regatta        4.7/111%
(q)ibm-aix-meso           7.5/2.8/300%     17/3.8/450%    8.2/3.05/270%      4.85/2.43/200%    

--------------------------------------------------------------------------------------------



L/ I/O et PPF 
-----------------
Ecriture/lecture de n=10^7 lignes de int+6double, Total ~ 500 MO 
(1) time tstdtable w xx.ppf swap 10000000 1024 0
(2) time tstdtable r xx.ppf swap 10000000 1024 0
(3) time tstdtable w xx.ppf swap 50000000 1024 0
(4) time tstdtable r xx.ppf swap 50000000 1024 0

-------------------------------------------------------------------------------------------
             CPU/Elap/%       (1)              (2)              (3)                (4)
-------------------------------------------------------------------------------------------
(a)xeon-lx-2.4GHz          17/26/63%       5.5/5.6/94%        

(b)xeon-lx-2.8GHz          7/18.5/40%      2.7/2.8/100%                                   7000000

(c)amd-lx                  5.9/6./97%      3.4/3.4/100%      30/32/93%      24/165/13% ?

(d)osf-asc 
(e')cool                   15/30/50%       13/13/99%
(f)superosf           

(g)G5-osx-2GHz             14/14.2/98%     6/6.14/99% 

(h)G4-osx-1.25GHz          26/29.7/87%     15.7/38.6/41%               [-O2 -g]
(i)core-osx-1.83GHz        37/37.8/98%     22/48.4/45%                                 [-g]
              -O2          10.5/17.4/55%   11.2/41.2/27%                               [-O2 -g]
(p)ibm-aix-regatta        
(q)ibm-aix-meso           5.5/16.8/38%    5.7/13.2/43%      32.7/85/39%    29/60/49%
                          6/11/55%        4/9/44%
                          6.3/10.6/60%    4/6/64%
   2 lecture //                    ~2x  4/6/60% Elapsed 7 sec
   1 write + 2 read //    6.3/10.4/60%  ~2 3.8/6.4/59%  Elapsed < 11 sec                
   1 write + 3 read //                                 19 sec
   4 read //                                           6 sec
--------------------------------------------------------------------------------------------



