Theme:        MAQAO_theme   darkgrey   cyan  
 
Help is available by moving the cursor above any   symbol or by checking MAQAO website .
  r0: OMP1 
  r1: OMP2 
  r2: OMP4 
  r3: OMP8 
  r4: OMP16 
  r5: OMP32 
 
Metric r0 r1 r2 r3 r4 r5 Total Time (s) 1.16 E3 660.97 403.38 277.98 216.25 190.32  
Max (Thread Active Time) (s) 1.09 E3 621.67 374.46 253.64 194.44 170.90  
Average Active Time (s) 1.09 E3 575.45 313.90 185.90 122.87 96.44  
Activity Ratio (%) 93.9 87.1 77.9 67.0 57.0 50.8  
Average number of active threads 5.635 10.447 18.676 32.099 54.548 97.291  
Affinity Stability (%) 65.9 71.3 87.0 91.1 92.9 91.9  
GFLOPS 577.119 1.01 E3 1.66 E3 2.41 E3 3.10 E3 3.52 E3  
Time in analyzed loops (%) 90.0 85.8 80.5 71.5 60.1 51.5  
Time in analyzed innermost loops (%) 88.9 84.7 79.4 70.5 59.2 50.8  
Time in user code (%) 7.76 7.36 6.69 5.65 4.29 2.77  
Compilation Options Score (%) 100 100 100 100 100 100  
Array Access Efficiency (%) 51.0 50.8 50.7 50.7 50.7 50.7  
 
Potential Speedups  
Perfect Flow Complexity 1.00 1.00 1.00 1.00 1.00 1.00  
Perfect OpenMP + MPI + Pthread 1.04 1.10 1.16 1.23 1.32 1.35  
Perfect OpenMP + MPI + Pthread + Perfect Load Distribution 1.05 1.19 1.40 1.81 2.51 3.32  
Scalability - Gap 1.00 1.14 1.39 1.92 2.98 5.25  
No Scalar Integer Potential Speedup 1.00 1.00 1.00 1.00 1.00 1.00 Nb Loops to get 80% 2 2 2 2 2 2 FP Vectorised Potential Speedup 1.00 1.00 1.00 1.00 1.00 1.00 Nb Loops to get 80% 1 1 1 1 1 1 Fully Vectorised Potential Speedup 1.03 1.02 1.02 1.02 1.01 1.01 Nb Loops to get 80% 4 3 3 3 3 3 Only FP Arithmetic Potential Speedup 1.04 1.03 1.03 1.03 1.02 1.02 Nb Loops to get 80% 5 5 5 5 5 5 
 
 Source Object Issue 
▼ xhpl–  
▼ HPL_lmul.c–  
○  
▼ HPL_rand.c–  
○  
▼ HPL_dlaswp02N.c–  
○  
▼ HPL_bcast.c–  
○  
▼ HPL_dlaswp04N.c–  
○  
▼ HPL_1ring.c–  
○  
▼ HPL_setran.c–  
○  
▼ HPL_ladd.c–  
○  
▼ HPL_dlaswp03N.c–  
○  
▼ HPL_pdgesv0.c–  
○  
▼ HPL_pdlange.c–  
○  
▼ HPL_dlaswp01N.c–  
○  
 
 
 
 Source Object Issue 
▼ xhpl–  
▼ HPL_lmul.c–  
○  
▼ HPL_rand.c–  
○  
▼ HPL_dlaswp02N.c–  
○  
▼ HPL_bcast.c–  
○  
▼ HPL_dlaswp04N.c–  
○  
▼ HPL_1ring.c–  
○  
▼ HPL_setran.c–  
○  
▼ HPL_ladd.c–  
○  
▼ HPL_dlaswp03N.c–  
○  
▼ HPL_pdgesv0.c–  
○  
▼ HPL_pdlange.c–  
○  
▼ HPL_dlaswp01N.c–  
○  
 
 
 
 Source Object Issue 
▼ xhpl–  
▼ HPL_lmul.c–  
○  
▼ HPL_rand.c–  
○  
▼ HPL_dlaswp02N.c–  
○  
▼ HPL_bcast.c–  
○  
▼ HPL_dlaswp04N.c–  
○  
▼ HPL_1ring.c–  
○  
▼ HPL_setran.c–  
○  
▼ HPL_ladd.c–  
○  
▼ HPL_dlaswp03N.c–  
○  
▼ HPL_pdgesv0.c–  
○  
▼ HPL_pdlange.c–  
○  
▼ HPL_dlaswp01N.c–  
○  
 
 
 
 Source Object Issue 
▼ xhpl–  
▼ HPL_lmul.c–  
○  
▼ HPL_rand.c–  
○  
▼ HPL_dlaswp02N.c–  
○  
▼ HPL_bcast.c–  
○  
▼ HPL_dlaswp04N.c–  
○  
▼ HPL_1ring.c–  
○  
▼ HPL_setran.c–  
○  
▼ HPL_ladd.c–  
○  
▼ HPL_dlaswp03N.c–  
○  
▼ HPL_pdgesv0.c–  
○  
▼ HPL_pdlange.c–  
○  
▼ HPL_dlaswp01N.c–  
○  
 
 
 
 Source Object Issue 
▼ xhpl–  
▼ HPL_lmul.c–  
○  
▼ HPL_rand.c–  
○  
▼ HPL_dlaswp02N.c–  
○  
▼ HPL_bcast.c–  
○  
▼ HPL_dlaswp04N.c–  
○  
▼ HPL_1ring.c–  
○  
▼ HPL_setran.c–  
○  
▼ HPL_ladd.c–  
○  
▼ HPL_dlaswp03N.c–  
○  
▼ HPL_pdgesv0.c–  
○  
▼ HPL_pdlange.c–  
○  
▼ HPL_dlaswp01N.c–  
○  
 
 
 
 Source Object Issue 
▼ xhpl–  
▼ HPL_lmul.c–  
○  
▼ HPL_rand.c–  
○  
▼ HPL_dlaswp02N.c–  
○  
▼ HPL_bcast.c–  
○  
▼ HPL_dlaswp04N.c–  
○  
▼ HPL_1ring.c–  
○  
▼ HPL_setran.c–  
○  
▼ HPL_ladd.c–  
○  
▼ HPL_dlaswp03N.c–  
○  
▼ HPL_pdgesv0.c–  
○  
▼ HPL_pdlange.c–  
○  
▼ HPL_dlaswp01N.c–  
○  
 
 
 
  
 
 
r0 r1 r2 r3 r4 r5  
Experiment Name  
Application ./hpl-2.3/bin/Linux_Intel64/xhpl same as r0  same as r0  same as r0  same as r0  same as r0   
Timestamp 2025-06-23 09:35:51 same as r0  same as r0  same as r0  same as r0  same as r0   
Experiment Type MPI;  MPI; OpenMP;  same as r1  same as r1  same as r1  same as r1   
Machine isix06.benchmarkcenter.megware.com same as r0  same as r0  same as r0  same as r0  same as r0   
Architecture x86_64 same as r0  same as r0  same as r0  same as r0  same as r0   
Micro Architecture GRANITE_RAPIDS same as r0  same as r0  same as r0  same as r0  same as r0   
Model Name Intel(R) Xeon(R) 6972P same as r0  same as r0  same as r0  same as r0  same as r0   
Cache Size 491520 KB same as r0  same as r0  same as r0  same as r0  same as r0   
Number of Cores 96 same as r0  same as r0  same as r0  same as r0  same as r0   
Maximal Frequency 3.9 GHz same as r0  same as r0  same as r0  same as r0  same as r0   
OS Version Linux 5.14.0-503.19.1.el9_5.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Jan 7 17:08:27 EST 2025 same as r0  same as r0  same as r0  same as r0  same as r0   
Architecture used during static analysis x86_64 same as r0  same as r0  same as r0  same as r0  same as r0   
Micro Architecture used during static analysis GRANITE_RAPIDS same as r0  same as r0  same as r0  same as r0  same as r0   
Compilation Options  
xhpl : clang based Intel(R) oneAPI DPC++/C++ Compiler 2025.0.0 (2025.0.0.20241008)  --intel -I /beegfs/hackathon/users/eoseret/linpack/hpl-2.3/include -I /beegfs/hackathon/users/eoseret/linpack/hpl-2.3/include/Linux_Intel64 -I /cluster/intel/oneapi/2025.0.0/mkl/2025.0/mkl/include -I /cluster/intel/oneapi/2025.0.0/mpi/2021.14/include -o HPL_lmul.o -c -D Add__ -D F77_INTEGER=int -D StringSunStyle -D HPL_DETAILED_TIMING -D HPL_PROGRESS_REPORT -O3 -g -x Host -mprefer-vector-width=512 -Wall -fstrict-aliasing ../HPL_lmul.c -fveclib=SVML -fheinous-gnu-extensions same as r0  same as r0  same as r0  same as r0  same as r0   
Number of processes observed 6 same as r0  same as r0  same as r0  same as r0  same as r0   
Number of threads observed 6 12 24 48 96 192  
Frequency Driver intel_pstate same as r0  same as r0  same as r0  same as r0  same as r0   
Frequency Governor powersave same as r0  same as r0  same as r0  same as r0  same as r0   
Huge Pages always same as r0  same as r0  same as r0  same as r0  same as r0   
Hyperthreading on same as r0  same as r0  same as r0  same as r0  same as r0   
Number of sockets 2 same as r0  same as r0  same as r0  same as r0  same as r0   
Number of cores per socket 96 same as r0  same as r0  same as r0  same as r0  same as r0   
MAQAO version 2025.1.0 same as r0  same as r0  same as r0  same as r0  same as r0   
MAQAO build b107544c0173fc3785aa7d997ff783dc12b975d2::20250527-133805 same as r0  same as r0  same as r0  same as r0  same as r0   
Comments HPL benchmark compiled with Intel OneAPI 2025.0, using Intel MPI and MKL. Matrix order: 100K, block size 384. Run on Intel GNR with 6 NUMA nodes and 32 cores per NUMA node same as r0  same as r0  same as r0  same as r0  same as r0