-------------------------------------------------------------------------------------
 Comparaison performance de differentes machines en compilation / execution (calcul)
                                   -----------------------
    Mesures effectuees en Janvier 2007 ,       R. Ansari / C. Magneville
-------------------------------------------------------------------------------------

(a) eros3 : Bipro-bicoeur Xeon@2.4 GHz Linux (xeon-lx-2.4GHz)  , gcc 3.2
(b) ccali : Bipro-bicoeur Xeon@2.8 GHz Linux (xeon-lx-2.8GHz)  , icc 8.0 ou 9.0
(c) sgsda: AMD (amd-lx-

(d) asc: bipro alpha (@ ~1 GHz) server DS20 OSF (osf1)  , cxx 6.5 (osf-asc)
(nouveau asc 420 MFLOPS, moins puissante que l'ancien asc 800 MFLOPS)
(e) xp1000-dapnia: alpha xp1000 @ ~ 600 MHz ? OSF1 , cxx ? (osf-xp1000)
(f) superosf-dapnia: multi-proc alphaServer ES80 6 procs EV7 @ 1 GHz (super-osf)

(g) ccsvx01: XServe G5 bipro @~1GHz (Darwin/OSX) (G5-osx-1GHz) , gcc 3.3
(h) PowerBook-Reza : Apple G4 @ 1.25 GHz (G4-osx-1.25GHz) , gcc 3.3
(i) MacBook-Reza: Apple/ Core double-coeur Intel @ 1.83 GHz (core-osx-1.83GHz) gcc 4
(j) MacPro-Grosdidier : Apple / Xeon 2 double-coeur @ 3 GHz gcc 4.0.1 , compil SOPHYA -O2 -g

(p) IBM-AIX regatta , xlC , 


NOTES : 
- Sur les machines Xeon, il y a une interaction entre process / threads par rapport a 
l'occupation des CPU's. On perd un facteur 3 en performance multi-threads/multi-taches.
La machine MacPro avec OSX se debrouille quand meme mieux.
- Effet du systeme ou carte mere ??? 

A/ Compilation tout SOPHYA 
----------------------------
csh> time make all   (1)
ou 
csh> time make -j 2 all  (2)
  Temps CPU 
  Indice de performance 100*(1000/TCPU) 
  Temps elapsed (vrai)
  Temps vrai / TCPU


----------------------------------------------------------------------
                         CPU(s)  IndPerf   TElapsed , TCPU/Elapsed %
----------------------------------------------------------------------
(a)xeon-lx-2.4GHz (2)    615 s      162       410 s        150%  
      avec -O3 -g (2)   1300 s       77       760 s        172%  
(b)xeon-lx-2.8GHz (2)    755 s      132       540 s        140%
(c)amd-lx

(d)osf-xp1000 (1)        533 s      187       660 s        80%
(e)superosf (1)          895 s      112       910 s        98%
(f)osf-asc (1)          1920 s       52      2340 s        83%   (??)

(f)G4-osx-1.25GHz (1)    660 s      151       710 s        93%
(h)G5-osx-1GHz (2)       453 s      221       250 s        182%
    -tune=G5            1100 s       90
(i)core-osx-1.83GHz (2)  209 s      478       116 s        180%
              -O2   (1)  367 s      272       381          96%    
(j)xeon-osx

(p)ibm-aix
----------------------------------------------------------------------

Taille shared libs : 
(a) 
(f) = (e) = 57 MO 
(h) 80 MO
(i) 83 MO

B/ Calcul brut avec / sans threads
-----------------------------------

B.1/ arr = c1*a1+c2*a2
(1) time cpupower 2     # compile avec -O3
(2) time zthr arr 1 1000   1 thread
(3) time zthr arr 2 1000   2 thread
(4) time zthr arr 4 1000   4 thread
(5) time zthr arr 6 1000   6 thread
(6) time zthr arr 8 1000   8 thread

-----------------------------------------------------------------------------------
                     (1)MFLOPS  (2)CPU/Elap/% (2)IndPerf  (3)CPU/Elap/% (3)IndPerf
-----------------------------------------------------------------------------------
(a)xeon-lx-2.4GHz     1167        5.15/5.2/99%              11.4/5.8/196%
                                           (4)36.6/9.28/394%
        -O3 -g                    4.9/5./99%
(b)xeon-lx-2.8GHz      920        2.3/2.3/100%              6.2/3.1/198%
                                           (4)26/6.6/396%
(c)amd-lx              690        3.6/3.6/99%               6.8/4/171%
                                           (4)13.5/7/193%
                                           (5)20.3/10.23/198%

(d)osf-xp1000          648        5.1/5.3/96.6%            11.4/11.4/99%       
                                           (4)25.2/25.5/99%
(e)superosf            842        2.87/2.88/99.6%          6.25/4.1/153%
                                           (4)11.6/3.06/379%          
(f)osf-asc             420        6.3s/6.5s/99%            16.9/8.8/192%   
                                           (4)29.9/15.7/191%

(f)G4-osx-1.25GHz      333        44s/48s/91%              86.7/99.8/92%
(h)G5-osx-1GHz        1151        20s/20s/99%              40s/23s/170%
                                           (4) 80.8/45/180%
   -tune=G5                       3.35/3.8/88%             7.1/3.6/196%
                                           (4) 14/7.5/187%
(i)core-osx-1.83GHz    855        11.5/11.5/100%           23/11.6/192%
                                           (4) 46/23/199%
              -O2                 3.85/3.89/99%            7.7/3.9/198%
                                           (4) 15.4/7.77/198%

(j)xeon-osx           2600        2.5/2.5/100%             5.1/2.6/199%
                                           (4) 11.5/3.2/362%
                                           (5) 17.4/4.77/365%

(p)ibm-aix-regatta     700        6.8/6.9/98%              13.1/6.75/195%
                                           (4) 26.3/11.7/225%

 -----------------------------------------------------------------------------------


B.2/ Multiplication de matrices mtx = mtx1 * mtx2 

(1) time cpupower 2 
(2) time zthr mtx 1 1000   1 thread
(3) time zthr mtx 2 1000   2 thread
(4) time zthr mtx 4 1000   4 thread
(5) time zthr mtx 6 1000   6 thread
(6) time zthr mtx 8 1000   8 thread

-----------------------------------------------------------------------------------
                     (1)MFLOPS  (2)CPU/Elap/% (2)IndPerf  (3)CPU/Elap/% (3)IndPerf
-----------------------------------------------------------------------------------
(a)xeon-lx-2.4GHz     1167        6.5/6.5/100%               17.4/8.8/198%
                                           (4) 80.5/20.3/397%
                                           (5) 114.5/29.6/387%
                                           (6) 160/40.3/387%
                                           
(b)xeon-lx-2.8GHz      920        3.4/3.4/100%               12/6.1/199%
                                           (4) 55.8/14/400%
(c)amd-lx              690        6.98/6.98/100%             14.1/8.15/173%
                                           (4) 27.7/14.23/194%
                                           (5) 41.4/21.07/196%
                                           (6) 55.4/27.9/198.7%  

(d)osf-xp1000          648        13/14.1/92%                27.1/27.4/99%
                                           (4) 54/54.7/99.6%
                                           (5) 80.6/81/99.6%
                                           (6) 107.8/108.3/99.5%
(e)superosf            842        6.1/7.24/84%               12.35/6.29/196%
                                           (4) 24.3/6.31/385%
                                           (5) 36.5/10.9/335%
                                           (6) 50.1/18.15/276%
(f)osf-asc             420        13.5s/13.7s/98%            32/16.5/194%   
                                           (4) 67.5/34.4/196%

(f)G4-osx-1.25GHz      333        
(h)G5-osx-1GHz        1151        23/23.7/97%                46.5/27.5/170%
                                           (4) 93.4/49.4/189%
  -tune=G5                        5.7/5.8/98%                13.3/6.8/197%
                                           (4) 26.8/13.56/197%
                                           (6) 53.8/27.25/197%
(i)core-osx-1.83GHz    855        12.6/12.7/100%             25.8/13.4/194% 
                                           (4) 51.6/26/199%
            -O2                   4.25/4.5/94%               10.6/5.36/198%
                                           (4) 20.87/10.68/198%
      -O2 2 jobs //           2 x 5/5.4/92%
(j)xeon-osx           2600        2.8/2.8/99%                9.3/4.66/199%
                                           (4) 31.4/8.6/364%
                                           (5) 47.1/12.96/364%
                                           (6) 62.8/17.38/362%

(p)ibm-aix-regatta     700        9.5/9.7/98%                18.3/16.0/114%
                                           (4) 38.3/24.7/155%
 -----------------------------------------------------------------------------------

B.3/ Tests fft (FFTW , FFTPack )

(1) time cpupower 2 
(2) time tfft 2000000 W d 0 0 (avec FFTW)
(3) time tfft 2000000 P d 0 0 (avec FFTPack_Sophya)

IndPerf=1000/TCPU  


-----------------------------------------------------------------------------------
                     (1)MFLOPS     (2)CPU/Elap/%   (2)IndPerf   (3)CPU/Elap/%  (3)IndPerf
-----------------------------------------------------------------------------------
(a)xeon-lx-2.4GHz     1167        5.5/5.6/97%        180         7.4/7.4/100%     135    
       -O3 -g                                                    6.8/7.1/96%      147    
(b)xeon-lx-2.8GHz      920     
(c)amd-lx              690                                       4.2/4.2/100%     238
                                                            ~2x  4.7/4.7/99%

(d)osf-xp1000          648        9.9/10.2/97%       101        9.3/9.46/98.5%    107     
(e)superosf            842        6./6.22/96.7%      166        5.1/5.18/97.4%    196     
(f)osf-asc             420        13.3/13.9/95.7%     75        12.2/17.6/70%     82

(f)G4-osx-1.25GHz      333        15.2/15.9/96%       66        23.8/34.1/70%     42
(h)G5-osx-1GHz        1151        8.5/8.7/97.5%      120        11.5/11.6/99%     87
    -tune=G5                      5/5.1/98%          200        5.5/5.6/97%       180
(i)core-osx-1.83GHz    855        4.6/4.75/97%       217        7/7.08/99%        142
         -O2                      3.2/3.2/99%        312        3.6/3.6/99%       277
   -O2 2 jobs //              2 x 3.9/4.3/90%                 2x  4.7/5/91%       200     
(j)xeon-osx           2600                                      2.6/2.6/98%       384
       2 jobs //                                           ~2x  3.6/4.1/87%       250
       4 jobs //                                           

(p)ibm-aix-regatta     700        6.25/18.9/33%      160        5.25/15.7/33%     190

 -----------------------------------------------------------------------------------
