| Run orig_default | Run gcc_default | Run aocc_default | Run icx_1 | Run gcc_6 | Run aocc_3 |
| Loop Source Regions | - /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-703-3988/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 346-346
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-703-3988/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 355-355
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-703-3988/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 389-404
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-703-3988/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/x86/quants.c: 298-347
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-703-3988/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/x86/quants.c: 353-355
| Loop Source Regions | - /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avx2intrin.h: 79-79
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avx2intrin.h: 86-86
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avx2intrin.h: 1046-1046
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/xmmintrin.h: 184-184
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/xmmintrin.h: 240-240
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/xmmintrin.h: 793-793
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/pmmintrin.h: 71-71
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-703-3988/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 354-354
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-703-3988/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 389-404
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h: 186-186
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h: 296-296
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h: 320-320
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h: 475-475
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h: 526-526
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h: 905-905
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h: 935-935
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h: 1066-1066
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h: 1329-1329
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h: 1472-1472
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-703-3988/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/x86/quants.c: 298-298
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-703-3988/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/x86/quants.c: 304-304
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-703-3988/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/x86/quants.c: 319-321
| Loop Source Regions | - /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-703-3988/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 346-346
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-703-3988/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 355-355
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-703-3988/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 389-404
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-703-3988/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/x86/quants.c: 298-347
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-703-3988/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/x86/quants.c: 353-355
| Loop Source Regions | - /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-703-3988/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 346-346
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-703-3988/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 355-355
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-703-3988/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 389-404
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-703-3988/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/x86/quants.c: 298-347
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-703-3988/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/x86/quants.c: 353-355
| Loop Source Regions | - /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avx2intrin.h: 79-79
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avx2intrin.h: 86-86
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avx2intrin.h: 1046-1046
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/xmmintrin.h: 184-184
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/xmmintrin.h: 240-240
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/xmmintrin.h: 793-793
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/pmmintrin.h: 71-71
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-703-3988/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 354-354
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-703-3988/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 389-404
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h: 186-186
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h: 296-296
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h: 320-320
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h: 475-475
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h: 526-526
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h: 905-905
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h: 935-935
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h: 1066-1066
- /cluster/comp/gcc/14.2.0/lib/gcc/x86_64-pc-linux-gnu/14.2.0/include/avxintrin.h: 1329-1329
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-703-3988/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/x86/quants.c: 298-298
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-703-3988/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/x86/quants.c: 304-304
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-703-3988/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/x86/quants.c: 319-321
| Loop Source Regions | - /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-703-3988/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 346-346
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-703-3988/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 355-355
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-703-3988/llama.cpp/build/llama.cpp/ggml/src/./ggml-impl.h: 389-404
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-703-3988/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/x86/quants.c: 298-347
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix06.benchmarkcenter.megware.com/177-703-3988/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/arch/x86/quants.c: 353-355
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| 3022 | 1.15 | 0.02 | 0.31 | 59.66 | 29.26 | 341.18 | 2212 | 1.15 | 0.02 | 0.33 | 60 | 28.75 | 332.59 | 3307 | 1.40 | 0.02 | 0.40 | 58.33 | 28.75 | 269.64 | 3269 | 1.10 | 0.02 | 0.29 | 60.7 | 29.66 | 343.47 | 2181 | 1.11 | 0.02 | 0.33 | 59.65 | 29.28 | 333.9 | 3328 | 1.13 | 0.02 | 0.31 | 60.7 | 29.66 | 350.12 |
| | | | | |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 2212) | Sum on 1 analyzed binary loop (libggml-cpu.so - 3307) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 2181) | Sum on 1 analyzed binary loop (libggml-cpu.so - 3328) |
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count |
| | Loop Computation Issues | | Loop Computation Issues | | | | Loop Computation Issues | | Loop Computation Issues | |
| | Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | | | Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 |
| | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 0 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | | | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 0 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 0 |
| | Control Flow Issues | | Control Flow Issues | | | | Control Flow Issues | | Control Flow Issues | |
| | Presence of 2 to 4 paths | 0 | Presence of 2 to 4 paths | 0 | | | Presence of 2 to 4 paths | 0 | Presence of 2 to 4 paths | 1 |
| | Presence of more than 4 paths | 1 | Presence of more than 4 paths | 1 | | | Presence of more than 4 paths | 1 | Presence of more than 4 paths | 0 |
| | Data Access Issues | | Data Access Issues | | | | Data Access Issues | | Data Access Issues | |
| | More than 10% of the vector loads instructions are unaligned | 1 | More than 10% of the vector loads instructions are unaligned | 1 | | | More than 10% of the vector loads instructions are unaligned | 1 | More than 10% of the vector loads instructions are unaligned | 1 |
| | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | | | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 |
| | Vectorization Roadblocks | | Vectorization Roadblocks | | | | Vectorization Roadblocks | | Vectorization Roadblocks | |
| | Presence of 2 to 4 paths | 0 | Presence of 2 to 4 paths | 0 | | | Presence of 2 to 4 paths | 0 | Presence of 2 to 4 paths | 1 |
| | Presence of more than 4 paths | 1 | Presence of more than 4 paths | 1 | | | Presence of more than 4 paths | 1 | Presence of more than 4 paths | 0 |
| | Inefficient Vectorization | | Inefficient Vectorization | | | | Inefficient Vectorization | | Inefficient Vectorization | |
| | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | | | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 |