options

exec - 2026-03-17 11:44:14 - MAQAO 2026.0.0

Help is available by moving the cursor above any symbol or by checking MAQAO website.

  • run_0
  • run_1
  • run_2
  • run_3
  • run_4
  • run_5

Optimizer

Loop IDAnalysisPenalty Score
Loop 16 - execExecution Time: 28 % - Vectorization Ratio: 100.00 % - Vector Length Use: 100.00 %
Loop 13 - exec+Execution Time: 28 % - Vectorization Ratio: 100.00 % - Vector Length Use: 100.00 %
Loop Computation Issues+4
[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.4
Loop 10 - exec+Execution Time: 20 % - Vectorization Ratio: 100.00 % - Vector Length Use: 100.00 %
Loop Computation Issues+4
[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.4
Loop 21 - exec+Execution Time: 0 % - Vectorization Ratio: 100.00 % - Vector Length Use: 100.00 %
Loop Computation Issues+4
[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.4
Inefficient Vectorization+2
[SA] Inefficient vectorization: use of masked instructions - Simplify control structure. The issue costs 2 points.2
Loop 8 - exec+Execution Time: 0 % - Vectorization Ratio: 100.00 % - Vector Length Use: 100.00 %
Loop Computation Issues+4
[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.4
Loop 4 - execExecution Time: 0 % - Vectorization Ratio: 100.00 % - Vector Length Use: 100.00 %

Strategizer  

[ 4 / 4 ] Enough time of the experiment time spent in analyzed loops (74.90%)

If the time spent in analyzed loops is less than 30%, standard loop optimizations will have a limited impact on application performances.

[ 4 / 4 ] Threads activity is good

On average, more than 763.08% of observed threads are actually active

[ 4 / 4 ] CPU activity is good

CPU cores are active 96.13% of time

[ 4 / 4 ] Loop profile is not flat

At least one loop coverage is greater than 4% (27.29%), representing an hotspot for the application

[ 4 / 4 ] Enough time of the experiment time spent in analyzed innermost loops (74.90%)

If the time spent in analyzed innermost loops is less than 15%, standard innermost loop optimizations such as vectorisation will have a limited impact on application performances.

[ 4 / 4 ] Affinity is good (96.10%)

Threads are not migrating to CPU cores: probably successfully pinned

[ 3 / 3 ] Less than 10% (0.00%) is spend in BLAS1 operations

It could be more efficient to inline by hand BLAS1 operations

[ 0 / 3 ] Too many functions do not use all threads

Functions running on a reduced number of threads (typically sequential code) cover at least 10% of application walltime (20.75%). Check both "Max Inclusive Time Over Threads" and "Nb Threads" in Functions or Loops tabs and consider parallelizing sequential regions or improving parallelization of regions running on a reduced number of threads

[ 3 / 3 ] Cumulative Outermost/In between loops coverage (0.00%) lower than cumulative innermost loop coverage (74.90%)

Having cumulative Outermost/In between loops coverage greater than cumulative innermost loop coverage will make loop optimization more complex

[ 2 / 2 ] Less than 10% (0.00%) is spend in BLAS2 operations

BLAS2 calls usually could make a poor cache usage and could benefit from inlining.

[ 2 / 2 ] Less than 10% (0.00%) is spend in Libm/SVML (special functions)

Optimizer

Loop IDAnalysisPenalty Score
Loop 13 - exec+Execution Time: 27 % - Vectorization Ratio: 100.00 % - Vector Length Use: 100.00 %
Loop Computation Issues+4
[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.4
Loop 16 - execExecution Time: 27 % - Vectorization Ratio: 100.00 % - Vector Length Use: 100.00 %
Loop 10 - exec+Execution Time: 20 % - Vectorization Ratio: 100.00 % - Vector Length Use: 100.00 %
Loop Computation Issues+4
[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.4
Loop 8 - exec+Execution Time: 0 % - Vectorization Ratio: 100.00 % - Vector Length Use: 100.00 %
Loop Computation Issues+4
[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.4
Loop 21 - exec+Execution Time: 0 % - Vectorization Ratio: 100.00 % - Vector Length Use: 100.00 %
Loop Computation Issues+4
[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.4
Inefficient Vectorization+2
[SA] Inefficient vectorization: use of masked instructions - Simplify control structure. The issue costs 2 points.2
Loop 4 - execExecution Time: 0 % - Vectorization Ratio: 100.00 % - Vector Length Use: 100.00 %

Strategizer  

[ 4 / 4 ] Enough time of the experiment time spent in analyzed loops (79.92%)

If the time spent in analyzed loops is less than 30%, standard loop optimizations will have a limited impact on application performances.

[ 4 / 4 ] Threads activity is good

On average, more than 1141.43% of observed threads are actually active

[ 4 / 4 ] CPU activity is good

CPU cores are active 96.00% of time

[ 4 / 4 ] Loop profile is not flat

At least one loop coverage is greater than 4% (29.18%), representing an hotspot for the application

[ 4 / 4 ] Enough time of the experiment time spent in analyzed innermost loops (79.92%)

If the time spent in analyzed innermost loops is less than 15%, standard innermost loop optimizations such as vectorisation will have a limited impact on application performances.

[ 4 / 4 ] Affinity is good (96.08%)

Threads are not migrating to CPU cores: probably successfully pinned

[ 3 / 3 ] Less than 10% (0.00%) is spend in BLAS1 operations

It could be more efficient to inline by hand BLAS1 operations

[ 0 / 3 ] Too many functions do not use all threads

Functions running on a reduced number of threads (typically sequential code) cover at least 10% of application walltime (11.54%). Check both "Max Inclusive Time Over Threads" and "Nb Threads" in Functions or Loops tabs and consider parallelizing sequential regions or improving parallelization of regions running on a reduced number of threads

[ 3 / 3 ] Cumulative Outermost/In between loops coverage (0.00%) lower than cumulative innermost loop coverage (79.92%)

Having cumulative Outermost/In between loops coverage greater than cumulative innermost loop coverage will make loop optimization more complex

[ 2 / 2 ] Less than 10% (0.00%) is spend in BLAS2 operations

BLAS2 calls usually could make a poor cache usage and could benefit from inlining.

[ 2 / 2 ] Less than 10% (0.00%) is spend in Libm/SVML (special functions)

Optimizer

Loop IDAnalysisPenalty Score
Loop 13 - exec+Execution Time: 29 % - Vectorization Ratio: 100.00 % - Vector Length Use: 100.00 %
Loop Computation Issues+4
[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.4
Loop 16 - execExecution Time: 28 % - Vectorization Ratio: 100.00 % - Vector Length Use: 100.00 %
Loop 10 - exec+Execution Time: 21 % - Vectorization Ratio: 100.00 % - Vector Length Use: 100.00 %
Loop Computation Issues+4
[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.4
Loop 8 - exec+Execution Time: 0 % - Vectorization Ratio: 100.00 % - Vector Length Use: 100.00 %
Loop Computation Issues+4
[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.4
Loop 21 - exec+Execution Time: 0 % - Vectorization Ratio: 100.00 % - Vector Length Use: 100.00 %
Loop Computation Issues+4
[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.4
Inefficient Vectorization+2
[SA] Inefficient vectorization: use of masked instructions - Simplify control structure. The issue costs 2 points.2
Loop 4 - execExecution Time: 0 % - Vectorization Ratio: 100.00 % - Vector Length Use: 100.00 %

Strategizer  

[ 4 / 4 ] Enough time of the experiment time spent in analyzed loops (75.34%)

If the time spent in analyzed loops is less than 30%, standard loop optimizations will have a limited impact on application performances.

[ 4 / 4 ] Threads activity is good

On average, more than 1519.82% of observed threads are actually active

[ 4 / 4 ] CPU activity is good

CPU cores are active 95.92% of time

[ 4 / 4 ] Loop profile is not flat

At least one loop coverage is greater than 4% (27.55%), representing an hotspot for the application

[ 4 / 4 ] Enough time of the experiment time spent in analyzed innermost loops (75.34%)

If the time spent in analyzed innermost loops is less than 15%, standard innermost loop optimizations such as vectorisation will have a limited impact on application performances.

[ 4 / 4 ] Affinity is good (95.89%)

Threads are not migrating to CPU cores: probably successfully pinned

[ 3 / 3 ] Less than 10% (0.00%) is spend in BLAS1 operations

It could be more efficient to inline by hand BLAS1 operations

[ 0 / 3 ] Too many functions do not use all threads

Functions running on a reduced number of threads (typically sequential code) cover at least 10% of application walltime (16.30%). Check both "Max Inclusive Time Over Threads" and "Nb Threads" in Functions or Loops tabs and consider parallelizing sequential regions or improving parallelization of regions running on a reduced number of threads

[ 3 / 3 ] Cumulative Outermost/In between loops coverage (0.00%) lower than cumulative innermost loop coverage (75.34%)

Having cumulative Outermost/In between loops coverage greater than cumulative innermost loop coverage will make loop optimization more complex

[ 2 / 2 ] Less than 10% (0.00%) is spend in BLAS2 operations

BLAS2 calls usually could make a poor cache usage and could benefit from inlining.

[ 2 / 2 ] Less than 10% (0.00%) is spend in Libm/SVML (special functions)

Optimizer

Loop IDAnalysisPenalty Score
Loop 13 - exec+Execution Time: 27 % - Vectorization Ratio: 100.00 % - Vector Length Use: 100.00 %
Loop Computation Issues+4
[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.4
Loop 16 - execExecution Time: 27 % - Vectorization Ratio: 100.00 % - Vector Length Use: 100.00 %
Loop 10 - exec+Execution Time: 20 % - Vectorization Ratio: 100.00 % - Vector Length Use: 100.00 %
Loop Computation Issues+4
[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.4
Loop 8 - exec+Execution Time: 0 % - Vectorization Ratio: 100.00 % - Vector Length Use: 100.00 %
Loop Computation Issues+4
[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.4
Loop 4 - execExecution Time: 0 % - Vectorization Ratio: 100.00 % - Vector Length Use: 100.00 %
Loop 21 - exec+Execution Time: 0 % - Vectorization Ratio: 100.00 % - Vector Length Use: 100.00 %
Loop Computation Issues+4
[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.4
Inefficient Vectorization+2
[SA] Inefficient vectorization: use of masked instructions - Simplify control structure. The issue costs 2 points.2

Strategizer  

[ 4 / 4 ] Enough time of the experiment time spent in analyzed loops (76.16%)

If the time spent in analyzed loops is less than 30%, standard loop optimizations will have a limited impact on application performances.

[ 4 / 4 ] Threads activity is good

On average, more than 1898.54% of observed threads are actually active

[ 4 / 4 ] CPU activity is good

CPU cores are active 95.98% of time

[ 4 / 4 ] Loop profile is not flat

At least one loop coverage is greater than 4% (27.72%), representing an hotspot for the application

[ 4 / 4 ] Enough time of the experiment time spent in analyzed innermost loops (76.16%)

If the time spent in analyzed innermost loops is less than 15%, standard innermost loop optimizations such as vectorisation will have a limited impact on application performances.

[ 4 / 4 ] Affinity is good (95.95%)

Threads are not migrating to CPU cores: probably successfully pinned

[ 3 / 3 ] Less than 10% (0.00%) is spend in BLAS1 operations

It could be more efficient to inline by hand BLAS1 operations

[ 0 / 3 ] Too many functions do not use all threads

Functions running on a reduced number of threads (typically sequential code) cover at least 10% of application walltime (23.62%). Check both "Max Inclusive Time Over Threads" and "Nb Threads" in Functions or Loops tabs and consider parallelizing sequential regions or improving parallelization of regions running on a reduced number of threads

[ 3 / 3 ] Cumulative Outermost/In between loops coverage (0.00%) lower than cumulative innermost loop coverage (76.16%)

Having cumulative Outermost/In between loops coverage greater than cumulative innermost loop coverage will make loop optimization more complex

[ 2 / 2 ] Less than 10% (0.00%) is spend in BLAS2 operations

BLAS2 calls usually could make a poor cache usage and could benefit from inlining.

[ 2 / 2 ] Less than 10% (0.00%) is spend in Libm/SVML (special functions)

Optimizer

Loop IDAnalysisPenalty Score
Loop 13 - exec+Execution Time: 27 % - Vectorization Ratio: 100.00 % - Vector Length Use: 100.00 %
Loop Computation Issues+4
[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.4
Loop 16 - execExecution Time: 27 % - Vectorization Ratio: 100.00 % - Vector Length Use: 100.00 %
Loop 10 - exec+Execution Time: 20 % - Vectorization Ratio: 100.00 % - Vector Length Use: 100.00 %
Loop Computation Issues+4
[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.4
Loop 8 - exec+Execution Time: 0 % - Vectorization Ratio: 100.00 % - Vector Length Use: 100.00 %
Loop Computation Issues+4
[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.4
Loop 4 - execExecution Time: 0 % - Vectorization Ratio: 100.00 % - Vector Length Use: 100.00 %
Loop 21 - exec+Execution Time: 0 % - Vectorization Ratio: 100.00 % - Vector Length Use: 100.00 %
Loop Computation Issues+4
[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.4
Inefficient Vectorization+2
[SA] Inefficient vectorization: use of masked instructions - Simplify control structure. The issue costs 2 points.2

Strategizer  

[ 4 / 4 ] Enough time of the experiment time spent in analyzed loops (79.70%)

If the time spent in analyzed loops is less than 30%, standard loop optimizations will have a limited impact on application performances.

[ 4 / 4 ] Threads activity is good

On average, more than 2269.44% of observed threads are actually active

[ 4 / 4 ] CPU activity is good

CPU cores are active 95.78% of time

[ 4 / 4 ] Loop profile is not flat

At least one loop coverage is greater than 4% (29.05%), representing an hotspot for the application

[ 4 / 4 ] Enough time of the experiment time spent in analyzed innermost loops (79.70%)

If the time spent in analyzed innermost loops is less than 15%, standard innermost loop optimizations such as vectorisation will have a limited impact on application performances.

[ 4 / 4 ] Affinity is good (95.74%)

Threads are not migrating to CPU cores: probably successfully pinned

[ 3 / 3 ] Less than 10% (0.00%) is spend in BLAS1 operations

It could be more efficient to inline by hand BLAS1 operations

[ 0 / 3 ] Too many functions do not use all threads

Functions running on a reduced number of threads (typically sequential code) cover at least 10% of application walltime (11.52%). Check both "Max Inclusive Time Over Threads" and "Nb Threads" in Functions or Loops tabs and consider parallelizing sequential regions or improving parallelization of regions running on a reduced number of threads

[ 3 / 3 ] Cumulative Outermost/In between loops coverage (0.00%) lower than cumulative innermost loop coverage (79.70%)

Having cumulative Outermost/In between loops coverage greater than cumulative innermost loop coverage will make loop optimization more complex

[ 2 / 2 ] Less than 10% (0.00%) is spend in BLAS2 operations

BLAS2 calls usually could make a poor cache usage and could benefit from inlining.

[ 2 / 2 ] Less than 10% (0.00%) is spend in Libm/SVML (special functions)

Optimizer

Loop IDAnalysisPenalty Score
Loop 13 - exec+Execution Time: 29 % - Vectorization Ratio: 100.00 % - Vector Length Use: 100.00 %
Loop Computation Issues+4
[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.4
Loop 16 - execExecution Time: 28 % - Vectorization Ratio: 100.00 % - Vector Length Use: 100.00 %
Loop 10 - exec+Execution Time: 21 % - Vectorization Ratio: 100.00 % - Vector Length Use: 100.00 %
Loop Computation Issues+4
[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.4
Loop 4 - execExecution Time: 0 % - Vectorization Ratio: 100.00 % - Vector Length Use: 100.00 %
Loop 8 - exec+Execution Time: 0 % - Vectorization Ratio: 100.00 % - Vector Length Use: 100.00 %
Loop Computation Issues+4
[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.4
Loop 21 - exec+Execution Time: 0 % - Vectorization Ratio: 100.00 % - Vector Length Use: 100.00 %
Loop Computation Issues+4
[SA] Less than 10% of the FP ADD/SUB/MUL arithmetic operations are performed using FMA - Reorganize arithmetic expressions to exhibit potential for FMA. This issue costs 4 points.4
Inefficient Vectorization+2
[SA] Inefficient vectorization: use of masked instructions - Simplify control structure. The issue costs 2 points.2
×