| Run orig_default | Run gcc_default | Run aocc_default | Run icx_1 | Run gcc_2 | Run aocc_3 |
| Loop Source Regions | - /beegfs/hackathon/users/eoseret/qaas_runs_test/isix07.benchmarkcenter.megware.com/177-703-4206/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 303-326
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix07.benchmarkcenter.megware.com/177-703-4206/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 557-558
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix07.benchmarkcenter.megware.com/177-703-4206/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 617-617
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix07.benchmarkcenter.megware.com/177-703-4206/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 624-625
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix07.benchmarkcenter.megware.com/177-703-4206/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 636-636
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix07.benchmarkcenter.megware.com/177-703-4206/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 644-644
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix07.benchmarkcenter.megware.com/177-703-4206/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 792-792
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix07.benchmarkcenter.megware.com/177-703-4206/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 1390-1392
| Loop Source Regions | | Loop Source Regions | - /beegfs/hackathon/users/eoseret/qaas_runs_test/isix07.benchmarkcenter.megware.com/177-703-4206/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 303-330
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix07.benchmarkcenter.megware.com/177-703-4206/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 557-558
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix07.benchmarkcenter.megware.com/177-703-4206/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 617-617
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix07.benchmarkcenter.megware.com/177-703-4206/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 624-625
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix07.benchmarkcenter.megware.com/177-703-4206/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 632-636
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix07.benchmarkcenter.megware.com/177-703-4206/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 644-644
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix07.benchmarkcenter.megware.com/177-703-4206/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 792-792
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix07.benchmarkcenter.megware.com/177-703-4206/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 1390-1392
| Loop Source Regions | - /beegfs/hackathon/users/eoseret/qaas_runs_test/isix07.benchmarkcenter.megware.com/177-703-4206/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 303-326
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix07.benchmarkcenter.megware.com/177-703-4206/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 557-558
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix07.benchmarkcenter.megware.com/177-703-4206/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 617-617
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix07.benchmarkcenter.megware.com/177-703-4206/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 624-625
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix07.benchmarkcenter.megware.com/177-703-4206/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 636-636
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix07.benchmarkcenter.megware.com/177-703-4206/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 644-644
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix07.benchmarkcenter.megware.com/177-703-4206/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 792-792
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix07.benchmarkcenter.megware.com/177-703-4206/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 1390-1392
| Loop Source Regions | | Loop Source Regions | - /beegfs/hackathon/users/eoseret/qaas_runs_test/isix07.benchmarkcenter.megware.com/177-703-4206/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 303-330
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix07.benchmarkcenter.megware.com/177-703-4206/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 557-558
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix07.benchmarkcenter.megware.com/177-703-4206/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 617-617
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix07.benchmarkcenter.megware.com/177-703-4206/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 624-625
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix07.benchmarkcenter.megware.com/177-703-4206/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 632-636
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix07.benchmarkcenter.megware.com/177-703-4206/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 644-644
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix07.benchmarkcenter.megware.com/177-703-4206/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 792-792
- /beegfs/hackathon/users/eoseret/qaas_runs_test/isix07.benchmarkcenter.megware.com/177-703-4206/llama.cpp/build/llama.cpp/ggml/src/ggml-cpu/amx/mmq.cpp: 1390-1392
|
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | GFLOP/s |
| 661 | 0.05 | 0.02 | 0.27 | 91.67 | 35.76 | 0 | | 379 | 0.05 | 0.03 | 0.36 | 94.96 | 35.16 | 0 | 672 | 0.05 | 0.03 | 0.30 | 91.89 | 36.15 | 0 | | 380 | 0.05 | 0.03 | 0.37 | 94.96 | 35.16 | 0 |
| | | | | |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 661) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 379) | Sum on 1 analyzed binary loop (libggml-cpu.so - 672) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 380) |
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count |
| Data Access Issues | | | | Data Access Issues | | Data Access Issues | | | | Data Access Issues | |
| Presence of constant non-unit stride data access | 1 | | | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | | | Presence of constant non-unit stride data access | 1 |
| Presence of indirect access | 1 | | | Presence of indirect access | 1 | Presence of indirect access | 1 | | | Presence of indirect access | 1 |
| Presence of special instructions executing on a single port | 1 | | | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | | | Presence of special instructions executing on a single port | 1 |
| More than 20% of the loads are accessing the stack | 1 | | | More than 20% of the loads are accessing the stack | 1 | More than 20% of the loads are accessing the stack | 1 | | | More than 20% of the loads are accessing the stack | 1 |
| Vectorization Roadblocks | | | | Vectorization Roadblocks | | Vectorization Roadblocks | | | | Vectorization Roadblocks | |
| Presence of constant non-unit stride data access | 1 | | | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | | | Presence of constant non-unit stride data access | 1 |
| Presence of indirect access | 1 | | | Presence of indirect access | 1 | Presence of indirect access | 1 | | | Presence of indirect access | 1 |
| Inefficient Vectorization | | | | Inefficient Vectorization | | Inefficient Vectorization | | | | Inefficient Vectorization | |
| Presence of special instructions executing on a single port | 1 | | | Presence of special instructions executing on a single port | 1 | Presence of special instructions executing on a single port | 1 | | | Presence of special instructions executing on a single port | 1 |