-------------------------------------------------------------------------------------
 Comparaison performance de differentes machines en compilation / execution (calcul)
                                   -----------------------
    Mesures effectuees en Janvier 2007 ,       R. Ansari / C. Magneville
-------------------------------------------------------------------------------------

(a) eros3 : Bipro-bicoeur Xeon@2.4 GHz Linux (xeon-lx-2.4GHz)  , gcc 3.2
(b) ccali : Bipro-bicoeur Xeon@2.8 GHz Linux (xeon-lx-2.8GHz)  , icc 8.0 ou 9.0
(c) sgsda: AMD Bipro AMD opteron 248 @ 2.2 GHz (amd-lx-
(cc) grid-saclay: AMD opteron 275 Bipro-bicoeur  @ 2.2 GHz (amd275-lx)

(d) asc: bipro alpha (@ ~1 GHz) server DS20 OSF (osf1)  , cxx 6.5 (osf-asc)
(nouveau asc 420 MFLOPS, moins puissante que l'ancien asc 800 MFLOPS)
(e) xp1000-dapnia: alpha xp1000 @ ~ 600 MHz ? OSF1 , cxx ? (osf-xp1000)
(e') cool: alpha xp1000 @ ~ 667 MHz ? OSF1 5.1 , cxx 6.3 
(f) superosf-dapnia: multi-proc alphaServer ES80 6 procs EV7 @ 1 GHz (super-osf)

(g) ccsvx01: XServe G5 bipro @~1GHz (Darwin/OSX) (G5-osx-1GHz) , gcc 3.3
(h) PowerBook-Reza : Apple G4 @ 1.25 GHz (G4-osx-1.25GHz) , gcc 3.3
(i) MacBook-Reza: Apple/ Core double-coeur Intel @ 1.83 GHz (core-osx-1.83GHz) gcc 4
(j) MacPro-Grosdidier : Apple / Xeon 2 double-coeur @ 3 GHz gcc 4.0.1 , compil SOPHYA -O2 -g

(p) IBM-AIX regatta , xlC , IBM eServer pSeries 655 , 8 proc power4 @ 1.1 GHz
(q) IBM-AIX meso , AIX 5.3, xlC V8 , IBM Power5 , 8 proc bi-coeur P575 @ 1.9 GHz 

(s) SGI-IRIX64 magique, CC 

NOTES : 
- Sur les machines Xeon, il y a une interaction entre process / threads par rapport a 
l'occupation des CPU's. On perd un facteur 3 en performance multi-threads/multi-taches.
La machine MacPro avec OSX se debrouille quand meme mieux.
- Effet du systeme ou carte mere ??? 


Donnees SPECint2000 (3) / SPECfp2000 (2) (http//www.spec.org)
(1) MFLOPS  -> cpupower 2   (x/y : -O -g / -O3)
----------------------------------------------------------------------
                         MFLOPS(1)      SPECfp      SPECint 
----------------------------------------------------------------------
(b)xeon-lx-2.8GHz         900            1400        1400
(c)amd-lx                 690            1600        1300
(cc)amd2-lx               675            1600        1300

(d)osf-xp1000             648             500         400
(f)superosf               842            1100         700

(i)core-osx-1.83GHz       855            1400        1500
(j)xeon-osx              2600            2900          -

(p)ibm-aix-regatta    730/1750           1050         700
(p)ibm-aix-meso      1250/3600         
----------------------------------------------------------------------


A/ Compilation tout SOPHYA 
----------------------------
csh> time make all   (1)
ou 
csh> time make -j 2 all  (2)
  Temps CPU 
  Indice de performance 100*(1000/TCPU) 
  Temps elapsed (vrai)
  Temps vrai / TCPU


----------------------------------------------------------------------
                         CPU(s)  IndPerf   TElapsed , TCPU/Elapsed %
----------------------------------------------------------------------
(a)xeon-lx-2.4GHz (2)    615 s      162       410 s        150%  
      avec -O3 -g (2)   1300 s       77       760 s        172%  
(b)xeon-lx-2.8GHz (2)    755 s      132       540 s        140%
(c)amd-lx         (2)    336 s      297       175 s        192%

(d)osf-asc (1)          1920 s       52      2340 s        83%   (??)
(e)osf-xp1000 (1)        533 s      187       660 s        80%
(f)superosf (1)          895 s      112       910 s        98%

(f)G4-osx-1.25GHz (1)    660 s      151       710 s        93%
(h)G5-osx-1GHz (2)       453 s      221       250 s        182%
    -tune=G5            1100 s       90
(i)core-osx-1.83GHz (2)  209 s      478       116 s        180%
              -O2   (1)  367 s      272       381          96%    
(j)xeon-osx

(p)ibm-aix
----------------------------------------------------------------------

Taille shared libs : 
(a) 
(f) = (e) = 57 MO 
(h) 80 MO
(i) 83 MO

B/ Calcul brut (Tableaux de SOPHYA) avec / sans threads
--------------------------------------------------------

B.1/ arr = c1*a1+c2*a2
(1) time cpupower 2     # compile avec -O3  (/ -O -g)
(2) time zthr arr 1 1000   1 thread
(3) time zthr arr 2 1000   2 thread
(4) time zthr arr 4 1000   4 thread
(5) time zthr arr 6 1000   6 thread
(6) time zthr arr 8 1000   8 thread

-----------------------------------------------------------------------------------
                     (1)MFLOPS  (2)CPU/Elap/% (2)IndPerf  (3)CPU/Elap/% (3)IndPerf
-----------------------------------------------------------------------------------
(a)xeon-lx-2.4GHz     1167        5.15/5.2/99%              11.4/5.8/196%
                                           (4)36.6/9.28/394%
        -O3 -g                    4.9/5./99%
(b)xeon-lx-2.8GHz      920        2.3/2.3/100%              6.2/3.1/198%
                                           (4)26/6.6/396%
(c)amd-lx              690        3.6/3.6/99%               6.8/4/171%
                                           (4)13.5/7/193%
                                           (5)20.3/10.23/198%
(cc)amd2-lx            675        2/2/99%                   4.15/2.1/197%
                                           (4)8.25/4.15/198%
                                           (5)13.6/4.6/292%
                                           (6)19.8/6.5/300%

(d)osf-asc             420        6.3s/6.5s/99%            16.9/8.8/192%   
                                           (4)29.9/15.7/191%
(e)osf-xp1000          648        5.1/5.3/96.6%            11.4/11.4/99%       
                                           (4)25.2/25.5/99%
(f)superosf            842        2.87/2.88/99.6%          6.25/4.1/153%
                                           (4)11.6/3.06/379%          

(f)G4-osx-1.25GHz      333        44s/48s/91%              86.7/99.8/92%
(h)G5-osx-1GHz        1151        20s/20s/99%              40s/23s/170%
                                           (4) 80.8/45/180%
   -tune=G5                       3.35/3.8/88%             7.1/3.6/196%
                                           (4) 14/7.5/187%
(i)core-osx-1.83GHz    855        11.5/11.5/100%           23/11.6/192%
                                           (4) 46/23/199%
              -O2                 3.85/3.89/99%            7.7/3.9/198%
                                           (4) 15.4/7.77/198%

(j)xeon-osx           2600        2.5/2.5/100%             5.1/2.6/199%
                                           (4) 11.5/3.2/362%
                                           (5) 17.4/4.77/365%

(p)ibm-aix-regatta  1750/730      6.8/6.9/98%              13.1/6.75/195%
                                           (4) 26.3/11.7/225%
(q)ibm-aix-meso     3600/1250     3.6/3.75/96%             7.35/3.7/197%
                                           (4) 12.46/4.2/298%
                                           (5) 219/6.7/280%
                                           (6) 24/4.5/530%


(s)sgi-magique         460        60/60/99%       
 -----------------------------------------------------------------------------------


B.2/ Multiplication de matrices mtx = mtx1 * mtx2 

(1) time cpupower 2  (-O3 / -O -g)
(2) time zthr mtx 1 1000   1 thread
(3) time zthr mtx 2 1000   2 thread
(4) time zthr mtx 4 1000   4 thread
(5) time zthr mtx 6 1000   6 thread
(6) time zthr mtx 8 1000   8 thread

-----------------------------------------------------------------------------------
                     (1)MFLOPS  (2)CPU/Elap/% (2)IndPerf  (3)CPU/Elap/% (3)IndPerf
-----------------------------------------------------------------------------------
(a)xeon-lx-2.4GHz     1167        6.5/6.5/100%               17.4/8.8/198%
                                           (4) 80.5/20.3/397%
                                           (5) 114.5/29.6/387%
                                           (6) 160/40.3/387%
                                           
(b)xeon-lx-2.8GHz      920        3.4/3.4/100%               12/6.1/199%
                                           (4) 55.8/14/400%
(c)amd-lx              690        6.98/6.98/100%             14.1/8.15/173%
                                           (4) 27.7/14.23/194%
                                           (5) 41.4/21.07/196%
                                           (6) 55.4/27.9/198.7%  
(cc)amd2-lx            675        4.1/4.1/100%               9.55/4.8/198%
                                           (4) 20/10.27/195%
                                           (5) 32.8/11.16/294%
                                           (6) 42.75/13.8/309%


(d)osf-asc             420        13.5s/13.7s/98%            32/16.5/194%   
                                           (4) 67.5/34.4/196%
(e)osf-xp1000          648        13/14.1/92%                27.1/27.4/99%
                                           (4) 54/54.7/99.6%
                                           (5) 80.6/81/99.6%
                                           (6) 107.8/108.3/99.5%
(f)superosf            842        6.1/7.24/84%               12.35/6.29/196%
                                           (4) 24.3/6.31/385%
                                           (5) 36.5/10.9/335%
                                           (6) 50.1/18.15/276%

(f)G4-osx-1.25GHz      333        
(h)G5-osx-1GHz        1151        23/23.7/97%                46.5/27.5/170%
                                           (4) 93.4/49.4/189%
  -tune=G5                        5.7/5.8/98%                13.3/6.8/197%
                                           (4) 26.8/13.56/197%
                                           (6) 53.8/27.25/197%
(i)core-osx-1.83GHz    855        12.6/12.7/100%             25.8/13.4/194% 
                                           (4) 51.6/26/199%
            -O2                   4.25/4.5/94%               10.6/5.36/198%
                                           (4) 20.87/10.68/198%
      -O2 2 jobs //           2 x 5/5.4/92%
(j)xeon-osx           2600        2.8/2.8/99%                9.3/4.66/199%
                                           (4) 31.4/8.6/364%
                                           (5) 47.1/12.96/364%
                                           (6) 62.8/17.38/362%

(p)ibm-aix-regatta  1750/730      9.5/9.7/98%                18.3/16.0/114%
                                           (4) 38.3/24.7/155%
(p)ibm-aix-meso     3600/1250     2.3/2.3/99%                5.1/2.64/194%   (compil avec -O3)
                                           (4) 11.4/4.16/272%
                                           (5) 20.2/5.85/344%
                                           (6) 29.9/6.74/442%

(s)sgi-magique         460        49/49/99%                  101/56/181%

 -----------------------------------------------------------------------------------

C/ Calcul fft (FFTW , FFTPack )
-------------------------------

(1) time cpupower 2 
(2) time tfft 2000000 W d 0 0 (avec FFTW)
(3) time tfft 2000000 P d 0 0 (avec FFTPack_Sophya)

IndPerf=1000/TCPU  


-----------------------------------------------------------------------------------
                     (1)MFLOPS     (2)CPU/Elap/%   (2)IndPerf   (3)CPU/Elap/%  (3)IndPerf
-----------------------------------------------------------------------------------
(a)xeon-lx-2.4GHz     1167        5.5/5.6/97%        180         7.4/7.4/100%     135    
       -O3 -g                                                    6.8/7.1/96%      147    
(b)xeon-lx-2.8GHz      920     
(c)amd-lx              690        3.2/3.75/86%                   4.2/4.2/100%     238
                                                            ~2x  4.7/4.7/99%
(cc)amd2-lx            675        2.8/2.8/99%                    3.56/3.58/99%

(d)osf-asc             420        13.3/13.9/95.7%     75        12.2/17.6/70%     82
(e)osf-xp1000          648        9.9/10.2/97%       101        9.3/9.46/98.5%    107     
(e')cool                                                        9.1/9.3/98%
(f)superosf            842        6./6.22/96.7%      166        5.1/5.18/97.4%    196     

(f)G4-osx-1.25GHz      333        15.2/15.9/96%       66        23.8/34.1/70%     42
(h)G5-osx-1GHz        1151        8.5/8.7/97.5%      120        11.5/11.6/99%     87
    -tune=G5                      5/5.1/98%          200        5.5/5.6/97%       180
(i)core-osx-1.83GHz    855        4.6/4.75/97%       217        7/7.08/99%        142
         -O2                      3.2/3.2/99%        312        3.6/3.6/99%       277
   -O2 2 jobs //              2 x 3.9/4.3/90%                 2x  4.7/5/91%       200     
(j)xeon-osx           2600                                      2.6/2.6/98%       384
       2 jobs //                                           ~2x  3.6/4.1/87%       250
       4 jobs //                                           

(p)ibm-aix-regatta 1750/730       6.25/18.9/33%      160        5.25/15.7/33%     190
(q)ibm-aix-meso    3600/1250      3.95/4.3/91%       250        3.82/4./94%       260
       2 jobs //                                           ~2x  3.88/4.2/92%      250


(s)sgi-magique         460        22/22/98%           42         24.5/25/99%       40           
 -----------------------------------------------------------------------------------


D/ Calcul inversion par lapack
-------------------------------

lpk inverse 1000,1000 0 
---> temps de calcul inversion par lapack 
-------------------------------------------------------------------------------------------
             CPU/Elap/%       (1)         
-------------------------------------------------------------------------------------------
(a)xeon-lx-2.4GHz          5.6/~100%    
(b)xeon-lx-2.8GHz 
(c)amd-lx                  5.5/5.5/99%      

(d)osf-asc 
(e')cool                   2.8/2.9/95% 
(f)superosf           

(f)G4-osx-1.25GHz 
(h)G5-osx-1GHz             0.8/~100%
    -tune=G5            
(i)core-osx-1.83GHz 
              -O2   
(p)ibm-aix-regatta        
(q)ibm-aix-meso            0.55/~100%

--------------------------------------------------------------------------------------------


K/ Efficacite de gestion de lock (mutex) avec les threads et tableaux 
----------------------------------------------------------------------
(32 threads - operant sur 2000 vecteurs ~ 64000 lock/unlock/wait/broadcast)

(1) time zthr syncp 32 2000 4 
(2) time zthr sync 32 2000 4 
(1) time zthr syncp 4 15000 130 
(2) time zthr sync  4 15000 130 
-------------------------------------------------------------------------------------------
             CPU/Elap/%       (1)             (2)              (3)              (4) 
-------------------------------------------------------------------------------------------
(a)xeon-lx-2.4GHz         23.5/14/168%    4.3/1.2/365%     7.9/5.5/142%      4/2.15/190%
      Avant ThSafeOp      17/178%
      avec -O3 -g 
(b)xeon-lx-2.8GHz (2)     
(c)amd-lx                 0.6/1/63%        0.6/1/60%       3.5/3.5/102%      2.6/2.7/98% 

(d)osf-asc               4.5/3.4/132%     3.35/2/170%     15.8/10.5/150%     13/8/163%
     5.4/100%(NoThSafe)
(e')cool                 1.3/1.37/95%     1.35/1.5/89%     5.3/5.3/99%       5.2/5.2/99%    
(e)superosf (1)          
(
(f)G4-osx-1.25GHz (1)    40.5/42.6/95%    42.2/43.5/97%   

(h)G5-osx-1GHz (2)     2.6/130% (NoThSafe)                   
    -tune=G5            
(i)core-osx-1.83GHz 
              -O2   
(j)xeon-osx               
      Avant ThSafeOp      2.55/143%

(p)ibm-aix-regatta        4.7/111%
(q)ibm-aix-meso           7.5/2.8/300%     17/3.8/450%    8.2/3.05/270%      4.85/2.43/200%    

--------------------------------------------------------------------------------------------



L/ I/O et PPF 
-----------------
Ecriture/lecture de n=10^7 lignes de int+6double, Total ~ 500 MO 
(1) time tstdtable w xx.ppf swap 10000000 1024 0
(2) time tstdtable r xx.ppf swap 10000000 1024 0
(3) time tstdtable w xx.ppf swap 50000000 1024 0
(4) time tstdtable r xx.ppf swap 50000000 1024 0

-------------------------------------------------------------------------------------------
             CPU/Elap/%       (1)              (2)              (3)                (4)
-------------------------------------------------------------------------------------------
(a)xeon-lx-2.4GHz          17/26/63%       5.5/5.6/94%        
(b)xeon-lx-2.8GHz 
(c)amd-lx                  5.9/6./97%      3.4/3.4/100%      30/32/93%      24/165/13% ?

(d)osf-asc 
(e')cool                   15/30/50%       13/13/99%
(f)superosf           

(f)G4-osx-1.25GHz 
(h)G5-osx-1GHz 
    -tune=G5            
(i)core-osx-1.83GHz 
              -O2   
(p)ibm-aix-regatta        
(q)ibm-aix-meso           5.5/16.8/38%    5.7/13.2/43%      32.7/85/39%    29/60/49%

--------------------------------------------------------------------------------------------
