Help is available by moving the cursor above any 
 symbol or by checking MAQAO website.
  - r0: OMP1 - option thread_filter-threshold (1%) discards 7 threads, cumulating 0.12 seconds CPU time.
 
  - r1: OMP2 - option thread_filter-threshold (1%) discards 8 threads, cumulating 0.11 seconds CPU time.
 
  - r2: OMP4 - option thread_filter-threshold (1%) discards 8 threads, cumulating 0.13 seconds CPU time.
 
  - r3: OMP8 - option thread_filter-threshold (1%) discards 8 threads, cumulating 0.12 seconds CPU time.
 
  - r4: OMP16 - option thread_filter-threshold (1%) discards 8 threads, cumulating 0.13 seconds CPU time.
 
  - r5: OMP24 - option thread_filter-threshold (1%) discards 8 threads, cumulating 0.12 seconds CPU time.
 
| Metric | r0 | r1 | r2 | r3 | r4 | r5 | 
|---|
| Total Time (s) | 748.15 | 423.00 | 249.76 | 164.57 | 113.53 | 91.74 | 
| Max (Thread Active Time) (s) | 727.42 | 410.15 | 241.97 | 158.17 | 108.24 | 86.43 | 
| Average Active Time (s) | 726.56 | 397.74 | 224.29 | 140.94 | 87.72 | 67.75 | 
| Activity Ratio (%) | 97.1 | 94.2 | 90.1 | 86.1 | 78.0 | 74.7 | 
| Average number of active threads | 7.769 | 15.044 | 28.737 | 54.812 | 98.900 | 141.787 | 
| Affinity Stability (%) | 15.1 | 14.0 | 22.5 | 33.1 | 47.2 | 42.5 | 
| GFLOPS | 82.850 | 146.385 | 247.933 | 375.890 | 545.237 | 674.333 | 
| Time in analyzed loops (%) | 2.05 | 1.36 | 0.91 | 0.58 | 0.43 | 0.35 | 
| Time in analyzed innermost loops (%) | 2.04 | 1.35 | 0.90 | 0.58 | 0.43 | 0.35 | 
| Time in user code (%) | 99.4 | 95.9 | 90.8 | 86.1 | 73.2 | 68.4 | 
| Compilation Options Score (%) | 100 | 100 | 100 | 100 | 100 | 100 | 
| Array Access Efficiency (%) | 55.3 | 52.9 | 52.6 | 50.5 | 50.4 | 50.1 | 
 | 
| Potential Speedups |   | 
| Perfect Flow Complexity | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 
| Perfect OpenMP + MPI + Pthread | 1.00 | 1.00 | 1.00 | 1.00 | 1.01 | 1.01 | 
| Perfect OpenMP + MPI + Pthread + Perfect Load Distribution | 1.00 | 1.07 | 1.19 | 1.30 | 1.68 | 1.86 | 
| Scalability - Gap | 1.00 | 1.13 | 1.34 | 1.76 | 2.43 | 2.94 | 
| No Scalar Integer | Potential Speedup | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 
| Nb Loops to get 80% | 1 | 1 | 1 | 1 | 1 | 1 | 
| FP Vectorised | Potential Speedup | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 
| Nb Loops to get 80% | 1 | 1 | 1 | 1 | 1 | 1 | 
| Fully Vectorised | Potential Speedup | 1.01 | 1.01 | 1.01 | 1.00 | 1.00 | 1.00 | 
| Nb Loops to get 80% | 3 | 3 | 3 | 3 | 3 | 3 | 
| Only FP Arithmetic | Potential Speedup | 1.02 | 1.01 | 1.01 | 1.00 | 1.00 | 1.00 | 
| Nb Loops to get 80% | 3 | 3 | 3 | 3 | 3 | 3 | 
 
| Source Object | Issue | 
| ▼xhpl– |  | 
| ▼HPL_lmul.c– |  | 
| ○ |  | 
| ▼HPL_rand.c– |  | 
| ○ |  | 
| ▼HPL_dlaswp03N.c– |  | 
| ○ |  | 
| ▼HPL_bcast.c– |  | 
| ○ |  | 
| ▼HPL_dlacpy.c– |  | 
| ○ |  | 
| ▼HPL_dlaswp04N.c– |  | 
| ○ |  | 
| ▼HPL_1ring.c– |  | 
| ○ |  | 
| ▼HPL_setran.c– |  | 
| ○ |  | 
| ▼HPL_ladd.c– |  | 
| ○ |  | 
| ▼HPL_pdlange.c– |  | 
| ○ |  | 
| ▼HPL_pdgesv0.c– |  | 
| ○ |  | 
| ▼HPL_dlaswp02N.c– |  | 
| ○ |  | 
| ▼HPL_dlaswp01N.c– |  | 
| ○ |  | 
 
 
 
| Source Object | Issue | 
| ▼xhpl– |  | 
| ▼HPL_lmul.c– |  | 
| ○ |  | 
| ▼HPL_rand.c– |  | 
| ○ |  | 
| ▼HPL_dlaswp03N.c– |  | 
| ○ |  | 
| ▼HPL_bcast.c– |  | 
| ○ |  | 
| ▼HPL_dlacpy.c– |  | 
| ○ |  | 
| ▼HPL_dlaswp04N.c– |  | 
| ○ |  | 
| ▼HPL_1ring.c– |  | 
| ○ |  | 
| ▼HPL_setran.c– |  | 
| ○ |  | 
| ▼HPL_ladd.c– |  | 
| ○ |  | 
| ▼HPL_pdlange.c– |  | 
| ○ |  | 
| ▼HPL_pdgesv0.c– |  | 
| ○ |  | 
| ▼HPL_dlaswp02N.c– |  | 
| ○ |  | 
| ▼HPL_dlaswp01N.c– |  | 
| ○ |  | 
 
 
 
| Source Object | Issue | 
| ▼xhpl– |  | 
| ▼HPL_lmul.c– |  | 
| ○ |  | 
| ▼HPL_rand.c– |  | 
| ○ |  | 
| ▼HPL_dlaswp03N.c– |  | 
| ○ |  | 
| ▼HPL_bcast.c– |  | 
| ○ |  | 
| ▼HPL_dlacpy.c– |  | 
| ○ |  | 
| ▼HPL_dlaswp04N.c– |  | 
| ○ |  | 
| ▼HPL_1ring.c– |  | 
| ○ |  | 
| ▼HPL_setran.c– |  | 
| ○ |  | 
| ▼HPL_ladd.c– |  | 
| ○ |  | 
| ▼HPL_pdlange.c– |  | 
| ○ |  | 
| ▼HPL_pdgesv0.c– |  | 
| ○ |  | 
| ▼HPL_dlaswp02N.c– |  | 
| ○ |  | 
| ▼HPL_dlaswp01N.c– |  | 
| ○ |  | 
 
 
 
| Source Object | Issue | 
| ▼xhpl– |  | 
| ▼HPL_lmul.c– |  | 
| ○ |  | 
| ▼HPL_rand.c– |  | 
| ○ |  | 
| ▼HPL_dlaswp03N.c– |  | 
| ○ |  | 
| ▼HPL_bcast.c– |  | 
| ○ |  | 
| ▼HPL_dlacpy.c– |  | 
| ○ |  | 
| ▼HPL_dlaswp04N.c– |  | 
| ○ |  | 
| ▼HPL_1ring.c– |  | 
| ○ |  | 
| ▼HPL_setran.c– |  | 
| ○ |  | 
| ▼HPL_ladd.c– |  | 
| ○ |  | 
| ▼HPL_pdlange.c– |  | 
| ○ |  | 
| ▼HPL_pdgesv0.c– |  | 
| ○ |  | 
| ▼HPL_dlaswp02N.c– |  | 
| ○ |  | 
| ▼HPL_dlaswp01N.c– |  | 
| ○ |  | 
 
 
 
| Source Object | Issue | 
| ▼xhpl– |  | 
| ▼HPL_lmul.c– |  | 
| ○ |  | 
| ▼HPL_rand.c– |  | 
| ○ |  | 
| ▼HPL_dlaswp03N.c– |  | 
| ○ |  | 
| ▼HPL_bcast.c– |  | 
| ○ |  | 
| ▼HPL_dlacpy.c– |  | 
| ○ |  | 
| ▼HPL_dlaswp04N.c– |  | 
| ○ |  | 
| ▼HPL_1ring.c– |  | 
| ○ |  | 
| ▼HPL_setran.c– |  | 
| ○ |  | 
| ▼HPL_ladd.c– |  | 
| ○ |  | 
| ▼HPL_pdlange.c– |  | 
| ○ |  | 
| ▼HPL_pdgesv0.c– |  | 
| ○ |  | 
| ▼HPL_dlaswp02N.c– |  | 
| ○ |  | 
| ▼HPL_dlaswp01N.c– |  | 
| ○ |  | 
 
 
 
| Source Object | Issue | 
| ▼xhpl– |  | 
| ▼HPL_lmul.c– |  | 
| ○ |  | 
| ▼HPL_rand.c– |  | 
| ○ |  | 
| ▼HPL_dlaswp03N.c– |  | 
| ○ |  | 
| ▼HPL_bcast.c– |  | 
| ○ |  | 
| ▼HPL_dlacpy.c– |  | 
| ○ |  | 
| ▼HPL_dlaswp04N.c– |  | 
| ○ |  | 
| ▼HPL_1ring.c– |  | 
| ○ |  | 
| ▼HPL_setran.c– |  | 
| ○ |  | 
| ▼HPL_ladd.c– |  | 
| ○ |  | 
| ▼HPL_pdlange.c– |  | 
| ○ |  | 
| ▼HPL_pdgesv0.c– |  | 
| ○ |  | 
| ▼HPL_dlaswp02N.c– |  | 
| ○ |  | 
| ▼HPL_dlaswp01N.c– |  | 
| ○ |  | 
 
 
 
  
 
 
 | r0 | r1 | r2 | r3 | r4 | r5 | 
| Experiment Name |  |  |  |  |  |  | 
| Application | ./hpl-2.3/bin/Linux_Intel64_Zen5_AOCL/xhpl | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | 
| Timestamp | 2025-06-23 10:53:13 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | 
| Experiment Type | MPI;  | MPI; OpenMP;  | same as r1 | same as r1 | same as r1 | same as r1 | 
| Machine | gmz12.benchmarkcenter.megware.com | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | 
| Architecture | x86_64 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | 
| Micro Architecture | ZEN_V5 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | 
| Model Name | AMD EPYC 9655 96-Core Processor | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | 
| Cache Size | 1024 KB | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | 
| Number of Cores | 96 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | 
| Maximal Frequency | 4.509375 GHz | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | 
| OS Version | Linux 5.14.0-503.31.1.el9_5.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Mar 13 06:50:51 EDT 2025 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | 
| Architecture used during static analysis | x86_64 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | 
| Micro Architecture used during static analysis | ZEN_V5 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | 
| Compilation Options | 
xhpl: AMD clang version 17.0.6 (CLANG: AOCC_5.0.0-Build#1377 2024_09_24) /cluster/comp/aocc/5.0.0/bin/clang-17 -o HPL_rand.o -c -D Add__ -D F77_INTEGER=int -D StringSunStyle -D HPL_DETAILED_TIMING -D HPL_PROGRESS_REPORT -I /beegfs/hackathon/users/eoseret/linpack/hpl-2.3/include -I /beegfs/hackathon/users/eoseret/linpack/hpl-2.3/include/Linux_Intel64_Zen5_AOCL -I /cluster/libs/aocl/5.0.0/aocc/include -fopenmp -O3 -ffast-math -g -grecord-command-line -march=znver5 -mprefer-vector-width=512 -Wall ../HPL_rand.c -I /cluster/hpcx/2.22/ompi-aocc/include -I /cluster/hpcx/2.22/ompi-aocc/include/openmpi -I /cluster/hpcx/2.22/ompi-aocc/include/openmpi/opal/mca/event/libevent2022/libevent -I /cluster/hpcx/2.22/ompi-aocc/include/openmpi/opal/mca/event/libevent2022/libevent/include   | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | 
| Number of processes observed | 8 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | 
| Number of threads observed | 8 | 16 | 32 | 64 | 128 | 192 | 
| Frequency Driver | acpi-cpufreq | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | 
| Frequency Governor | ondemand | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | 
| Huge Pages | always | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | 
| Hyperthreading | on | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | 
| Number of sockets | 2 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | 
| Number of cores per socket | 96 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | 
| MAQAO version | 2025.1.0 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | 
| MAQAO build | b107544c0173fc3785aa7d997ff783dc12b975d2::20250527-133805 | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 | 
| Comments | HPL benchmark compiled with AMD AOCC/AOCL 5.0. Matrix order: 100K, block size 384. Run on AMD Zen 5 with 8 NUMA nodes and 24 cores per NUMA node | same as r0 | same as r0 | same as r0 | same as r0 | same as r0 |