Theme:        MAQAO_theme   darkgrey   cyan  
 
Help is available by moving the cursor above any   symbol or by checking MAQAO website .
  r0: OMP1 
  r1: OMP2 
  r2: OMP4 
  r3: OMP8 
  r4: OMP16 
  r5: OMP24 
 
Metric r0 r1 r2 r3 r4 r5 Total Time (s) 4.73 E3 2.68 E3 1.64 E3 979.98 628.93 508.50  
Max (Thread Active Time) (s) 4.71 E3 2.66 E3 1.62 E3 964.37 613.34 492.24  
Average Active Time (s) 4.71 E3 2.61 E3 1.54 E3 880.25 525.13 402.11  
Activity Ratio (%) 99.6 97.4 94.5 89.8 83.5 79.1  
Average number of active threads 3.984 7.791 15.115 28.743 53.437 75.914  
Affinity Stability (%) 100.0 100.0 99.9 99.9 99.8 99.8  
GFLOPS 9.776 19.104 36.943 69.268 125.769 139.855  
Time in analyzed loops (%) 0.42 0.36 0.30 0.27 0.23 0.20  
Time in analyzed innermost loops (%) 0.41 0.36 0.30 0.27 0.22 0.20  
Time in user code (%) 2.35 2.10 1.76 1.55 1.30 1.14  
Compilation Options Score (%) 100 100 100 100 100 100  
Array Access Efficiency (%) 55.2 55.2 55.1 55.3 54.9 55.1  
 
Potential Speedups  
Perfect Flow Complexity 1.00 1.00 1.00 1.00 1.00 1.00  
Perfect OpenMP + MPI + Pthread 1.01 1.01 1.01 1.02 1.03 1.04  
Perfect OpenMP + MPI + Pthread + Perfect Load Distribution 1.01 1.03 1.07 1.14 1.25 1.34  
Scalability - Gap 1.00 1.13 1.38 1.66 2.13 2.58  
No Scalar Integer Potential Speedup 1.00 1.00 1.00 1.00 1.00 1.00 Nb Loops to get 80% 1 1 1 1 1 1 FP Vectorised Potential Speedup 1.00 1.00 1.00 1.00 1.00 1.00 Nb Loops to get 80% 1 1 1 1 1 1 Fully Vectorised Potential Speedup 1.00 1.00 1.00 1.00 1.00 1.00 Nb Loops to get 80% 3 3 3 3 3 4 Only FP Arithmetic Potential Speedup 1.00 1.00 1.00 1.00 1.00 1.00 Nb Loops to get 80% 3 3 3 3 3 3 
 
 Source Object Issue 
▼ xhpl–  
▼ HPL_lmul.c–  
○  
▼ HPL_rand.c–  
○  
▼ HPL_dlaswp02N.c–  
○  
▼ HPL_bcast.c–  
○  
▼ HPL_dlacpy.c–  
○  
▼ HPL_dlaswp04N.c–  
○  
▼ HPL_1ring.c–  
○  
▼ HPL_pdmatgen.c–  
○  
▼ HPL_setran.c–  
○  
▼ HPL_pdlange.c–  
○  
▼ HPL_ladd.c–  
○  
▼ HPL_dlaswp03N.c–  
○  
▼ HPL_dlaswp01N.c–  
○  
 
 
 
 Source Object Issue 
▼ xhpl–  
▼ HPL_lmul.c–  
○  
▼ HPL_rand.c–  
○  
▼ HPL_dlaswp02N.c–  
○  
▼ HPL_bcast.c–  
○  
▼ HPL_dlacpy.c–  
○  
▼ HPL_dlaswp04N.c–  
○  
▼ HPL_1ring.c–  
○  
▼ HPL_pdmatgen.c–  
○  
▼ HPL_setran.c–  
○  
▼ HPL_pdlange.c–  
○  
▼ HPL_ladd.c–  
○  
▼ HPL_dlaswp03N.c–  
○  
▼ HPL_dlaswp01N.c–  
○  
 
 
 
 Source Object Issue 
▼ xhpl–  
▼ HPL_lmul.c–  
○  
▼ HPL_rand.c–  
○  
▼ HPL_dlaswp02N.c–  
○  
▼ HPL_bcast.c–  
○  
▼ HPL_dlacpy.c–  
○  
▼ HPL_dlaswp04N.c–  
○  
▼ HPL_1ring.c–  
○  
▼ HPL_pdmatgen.c–  
○  
▼ HPL_setran.c–  
○  
▼ HPL_pdlange.c–  
○  
▼ HPL_ladd.c–  
○  
▼ HPL_dlaswp03N.c–  
○  
▼ HPL_dlaswp01N.c–  
○  
 
 
 
 Source Object Issue 
▼ xhpl–  
▼ HPL_lmul.c–  
○  
▼ HPL_rand.c–  
○  
▼ HPL_dlaswp02N.c–  
○  
▼ HPL_bcast.c–  
○  
▼ HPL_dlacpy.c–  
○  
▼ HPL_dlaswp04N.c–  
○  
▼ HPL_1ring.c–  
○  
▼ HPL_pdmatgen.c–  
○  
▼ HPL_setran.c–  
○  
▼ HPL_pdlange.c–  
○  
▼ HPL_ladd.c–  
○  
▼ HPL_dlaswp03N.c–  
○  
▼ HPL_dlaswp01N.c–  
○  
 
 
 
 Source Object Issue 
▼ xhpl–  
▼ HPL_lmul.c–  
○  
▼ HPL_rand.c–  
○  
▼ HPL_dlaswp02N.c–  
○  
▼ HPL_bcast.c–  
○  
▼ HPL_dlacpy.c–  
○  
▼ HPL_dlaswp04N.c–  
○  
▼ HPL_1ring.c–  
○  
▼ HPL_pdmatgen.c–  
○  
▼ HPL_setran.c–  
○  
▼ HPL_pdlange.c–  
○  
▼ HPL_ladd.c–  
○  
▼ HPL_dlaswp03N.c–  
○  
▼ HPL_dlaswp01N.c–  
○  
 
 
 
 Source Object Issue 
▼ xhpl–  
▼ HPL_lmul.c–  
○  
▼ HPL_rand.c–  
○  
▼ HPL_dlaswp02N.c–  
○  
▼ HPL_bcast.c–  
○  
▼ HPL_dlacpy.c–  
○  
▼ HPL_dlaswp04N.c–  
○  
▼ HPL_1ring.c–  
○  
▼ HPL_pdmatgen.c–  
○  
▼ HPL_setran.c–  
○  
▼ HPL_pdlange.c–  
○  
▼ HPL_ladd.c–  
○  
▼ HPL_dlaswp03N.c–  
○  
▼ HPL_dlaswp01N.c–  
○  
 
 
 
  
 
 
r0 r1 r2 r3 r4 r5  
Experiment Name  
Application ./bin/Linux_AArch64/xhpl same as r0  same as r0  same as r0  same as r0  same as r0   
Timestamp 2025-06-23 12:25:04 same as r0  same as r0  same as r0  same as r0  same as r0   
Experiment Type MPI;  MPI; OpenMP;  same as r1  same as r1  same as r1  same as r1   
Machine ip-172-31-47-249.ec2.internal same as r0  same as r0  same as r0  same as r0  same as r0   
Architecture aarch64 same as r0  same as r0  same as r0  same as r0  same as r0   
Micro Architecture ARM_NEOVERSE_V2 same as r0  same as r0  same as r0  same as r0  same as r0   
Model Name  
Cache Size  
Number of Cores  
Maximal Frequency 0 GHz same as r0  same as r0  same as r0  same as r0  same as r0   
OS Version Linux 6.1.109-118.189.amzn2023.aarch64 #1 SMP Tue Sep 10 08:58:40 UTC 2024 same as r0  same as r0  same as r0  same as r0  same as r0   
Architecture used during static analysis aarch64 same as r0  same as r0  same as r0  same as r0  same as r0   
Micro Architecture used during static analysis ARM_NEOVERSE_V2 same as r0  same as r0  same as r0  same as r0  same as r0   
Compilation Options  
xhpl : Arm C/C++/Fortran Compiler version 24.10.1 (build number 4) (based on LLVM 19.1.0) /opt/arm/arm-linux-compiler-24.10.1_AmazonLinux-2023/llvm-bin/clang-19 -o HPL_ladd.o -c -D Add__ -D F77_INTEGER=int -D StringSunStyle -D HPL_DETAILED_TIMING -D HPL_PROGRESS_REPORT -I /home/eoseret/hpl-2.3/include -I /home/eoseret/hpl-2.3/include/Linux_AArch64 -I /opt/arm/armpl-24.10.1_AmazonLinux-2023_arm-linux-compiler/include -fopenmp -O3 -ffast-math -g -grecord-command-line -mcpu=native -Wall ../HPL_ladd.c -I /home/eoseret/openmpi_acfl2410/include  same as r0  same as r0  same as r0  same as r0  same as r0   
Number of processes observed 4 same as r0  same as r0  same as r0  same as r0  same as r0   
Number of threads observed 4 8 16 32 64 96  
Frequency Driver NA same as r0  same as r0  same as r0  same as r0  same as r0   
Frequency Governor NA same as r0  same as r0  same as r0  same as r0  same as r0   
Huge Pages madvise same as r0  same as r0  same as r0  same as r0  same as r0   
Hyperthreading off same as r0  same as r0  same as r0  same as r0  same as r0   
Number of sockets 1 same as r0  same as r0  same as r0  same as r0  same as r0   
Number of cores per socket 96 same as r0  same as r0  same as r0  same as r0  same as r0   
MAQAO version 2025.1.0 same as r0  same as r0  same as r0  same as r0  same as r0   
MAQAO build b107544c0173fc3785aa7d997ff783dc12b975d2::20250527-133805 same as r0  same as r0  same as r0  same as r0  same as r0   
Comments HPL benchmark compiled with ARM ACfL/Armpl 24.10. Matrix order: 100K, block size 384. Run on AWS Graviton 4 with 1 NUMA node and 96 cores. Using 4 MPI ranks to limit multithreading overhead same as r0  same as r0  same as r0  same as r0  same as r0