Loops
kai_matmul_clamp_f32_qsi8d32p4x8_qsi4c32p4x8_16x4_neon_i8mm.c: 131 - 321.50 %
| Run orig_default | Run gcc_default | Run armclang_3 | Run gcc_3 | ||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2354 | 1.73 | 2.13 | 76.04 | 28.14 | 55.37 | 2189 | 1.71 | 2.46 | 84.23 | 28.14 | 55.37 | 2528 | 1.70 | 2.07 | 76.85 | 28.14 | 55.37 | 2240 | 1.70 | 2.42 | 84.37 | 28.14 | 55.37 |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 2354) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2189) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2528) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2240) | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Data Access Issues | Data Access Issues | Data Access Issues | Data Access Issues | ||||||||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | ||||||||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||||||
kai_lhs_quant_pack_qsi8d32p4x8sb_f32_neon.c: 96 - 3.02 %
| Run orig_default | Run gcc_default | Run armclang_3 | Run gcc_3 | ||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2317 | 0.04 | 0.02 | 0.56 | 77.23 | 96.44 | 2158 | 0.05 | 0.02 | 0.78 | 76.21 | 96.75 | 2496 | 0.05 | 0.02 | 0.83 | 77.23 | 96.44 | 2210 | 0.05 | 0.02 | 0.85 | 76.21 | 96.75 |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 2317) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2158) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2496) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2210) | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | ||||||||||||||||||||
| Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | Presence of expensive FP instructions | 1 | ||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||
| Data Access Issues | Data Access Issues | Data Access Issues | Data Access Issues | ||||||||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | ||||||||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||||||
ops.cpp: 4325 - 1.80 %
| Run orig_default | Run gcc_default | Run armclang_3 | Run gcc_3 | ||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1249 | 0.03 | 0.01 | 0.41 | 94.12 | 97.06 | 1148 | 0.03 | 0.02 | 0.53 | 0 | 26.56 | 1353 | 0.03 | 0.01 | 0.42 | 16.67 | 54.17 | 1193 | 0.03 | 0.01 | 0.43 | 17.78 | 55.56 |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 1249) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1148) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1353) | Sum on 1 analyzed binary loop (libggml-cpu.so - 1193) | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | ||||||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||
| Data Access Issues | Data Access Issues | Data Access Issues | Data Access Issues | ||||||||||||||||||||
| Presence of constant non-unit stride data access | Presence of constant non-unit stride data access | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | ||||||||||||||||||||
| Presence of constant non-unit stride data access | Presence of constant non-unit stride data access | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||||||||
binary-ops.cpp: 10 - 1.07 %
| Run orig_default | Run gcc_default | Run armclang_3 | Run gcc_3 | ||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 413 | 0.02 | 0.01 | 0.24 | 0 | 25 | 419 | 0.02 | 0.01 | 0.26 | 25 | 100 | 437 | 0.02 | 0.01 | 0.31 | 0 | 25 | 431 | 0.02 | 0.01 | 0.25 | 25 | 100 |
| Sum on 1 analyzed binary loop (libggml-cpu.so - 413) | Sum on 1 analyzed binary loop (libggml-cpu.so - 419) | Sum on 1 analyzed binary loop (libggml-cpu.so - 437) | Sum on 1 analyzed binary loop (libggml-cpu.so - 431) | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | Loop Computation Issues | ||||||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||
| Data Access Issues | Data Access Issues | Data Access Issues | Data Access Issues | ||||||||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | Vectorization Roadblocks | ||||||||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||||||
binary-ops.cpp: 18 - 0.93 %
| Run orig_default | Run gcc_default | Run armclang_3 | Run gcc_3 | ||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 520 | 0.02 | 0.01 | 0.21 | 0 | 23.68 | 493 | 0.02 | 0.01 | 0.21 | 25 | 100 | 541 | 0.02 | 0.01 | 0.26 | 0 | 23.68 | 515 | 0.03 | 0.01 | 0.25 | 25 | 100 |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 541) | Sum on 1 analyzed binary loop (libggml-cpu.so - 515) | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Loop Computation Issues | Loop Computation Issues | ||||||||||||||||||||||
| Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA | 1 | ||||||||||||||||||||
| Data Access Issues | Data Access Issues | ||||||||||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | ||||||||||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||||||||||
quants.c: 2506 - 0.79 %
| Run orig_default | Run gcc_default | Run armclang_3 | Run gcc_3 | ||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2236 | 0.01 | 0.00 | 0.17 | 50.56 | 65.8 | 2056 | 0.01 | 0.01 | 0.21 | 49.72 | 67.11 | 2425 | 0.01 | 0.01 | 0.21 | 50.56 | 65.8 | 2113 | 0.01 | 0.01 | 0.19 | 49.44 | 66.92 |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | Sum on 1 analyzed binary loop (libggml-cpu.so - 2056) | Sum on 1 analyzed binary loop (libggml-cpu.so - 2425) | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
| Data Access Issues | Data Access Issues | ||||||||||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||||||||||
| Vectorization Roadblocks | Vectorization Roadblocks | ||||||||||||||||||||||
| Presence of constant non-unit stride data access | 1 | Presence of constant non-unit stride data access | 1 | ||||||||||||||||||||
ggml-cpu.c: 3228 - 0.65 %
| Run orig_default | Run gcc_default | Run armclang_3 | Run gcc_3 | ||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.02 | 0.00 | 0.16 | 92.5 | 98.75 | 5 | 0.02 | 0.00 | 0.16 | 88.37 | 93.9 | 0 | 0.01 | 0.00 | 0.14 | 91.67 | 98.61 | 1 | 0.02 | 0.01 | 0.18 | 86.21 | 94.83 |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
kai_rhs_pack_nxk_qsi4c32pscalef16_qsu4c32s16s0.c: 127 - 0.60 %
| Run orig_default | Run gcc_default | Run armclang_3 | Run gcc_3 | ||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2337 | 0.30 | 0.00 | 0.15 | 0 | 50 | 2172 | 0.31 | 0.00 | 0.17 | 0 | 50 | 2511 | 0.26 | 0.00 | 0.13 | 0 | 50 | 2223 | 0.29 | 0.00 | 0.16 | 0 | 50 |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||
ops.cpp: 6446 - 0.29 %
| Run orig_default | Run gcc_default | Run armclang_3 | Run gcc_3 | ||||||||||||||||||||
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| Loop Source Regions |
| ||||||||||||||||
| ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) | ASM Loop ID | Max Time Over Threads (s) | Time w.r.t. Wall Time (s) | Cov (%) | Vect. Ratio (%) | Vector Length Use (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1403 | 0.01 | 0.00 | 0.08 | 35.29 | 39.71 | 785 | 0.01 | 0.00 | 0.05 | 42.86 | 53.57 | 1591 | 0.01 | 0.00 | 0.08 | 55.56 | 43.75 | 830 | 0.01 | 0.00 | 0.08 | 42.86 | 53.57 |
| No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | No Optimizer analysis found for any assembly loop. More loops can be analyzed using option --optimizer-loop-count. | ||||||||||||||||||||
| Analysis | Count | Analysis | Count | Analysis | Count | Analysis | Count | ||||||||||||||||

