Application Performance Snapshot¶
Introduction¶
Use Application Performance Snapshot for a quick scan of performance aspects that relate to compute-intensive applications:
- MPI usage
- OpenMP usage
- Intel® oneAPI Collective Communications Library (oneCCL) usage
- CPU/GPU usage
- Efficiency of memory access
- Vectorization
- I/O and memory footprint
Application Performance Snapshot displays key optimization areas and suggests specialized tools for tuning particular performance aspects, such as Intel VTune Profiler and Intel® Advisor. You can run Application Performance Snapshot on large MPI workloads to analyze scalability issues.
Instruction¶
Loading the module¶
$ module load oneapi
$ aps --version
Intel(R) VTune(TM) Profiler 2025.0.1 (build 629235) Command Line Tool
Copyright (C) 2009 Intel Corporation. All rights reserved.
Usage of aps
and aps-report
¶
$ aps -h
Intel(R) VTune(TM) Profiler 2025.0.1 (build 629235) Command Line Tool
Copyright (C) 2009 Intel Corporation. All rights reserved.
Usage: 1. aps [--result-dir=<dir> | -r=<dir>] [--start-paused] [--collection-mode=<mode> | -c=<mode>] <app>
2. <mpi_launcher> <mpi_parameters> aps [--result-dir=<dir> | -r=<dir>] [--collection-mode=<mode> | -c=<mode>] [--stat-level=<0-5> | -L <0-5>] [--mpi-imbalance=<0-2> | -I <0-2>] [--storage-format=<format> | -F <format>] <app>
3. aps --report | -O <option> <dir>
4. aps <option>
1. Run analysis for an application or script <app> and store results in <dir>
2. Run analysis for an MPI application <app> and store results in <dir>
--collection-mode=<mode> Specify a comma separated list of data to collect. Possible values: hwc - hardware counters, omp - OpenMP statistics, mpi - MPI statistics, all - all possible data (default).
--start-paused "Start data collection in the paused mode. Data collection resumes when the application calls __itt_resume or MPI_Pcontrol(1).
--stat-level Set MPI statistic collection level.
--mpi-imbalance Set MPI imbalance collection mode.
--tmp-dir=<path> Specify the directory path where temporary data (generated during the collection run) is saved.
--storage-format=<format> Specify the trace format: simple (a small number of MPI ranks per node) or compact (numerous MPI ranks per node).
3. Show analysis report based on data from <dir>
Tip: use
aps-report --help
to review report options. Additional details include statistics view by ranks, message sizes, collective operations, and communicators, as well as the ability to explore rank-to-rank and node-to-node communication statistics.
4. Show additional product info. Where <option> can be:
--help, -h show this help and exit
--version show version information and exit
$ aps-report -h
Usage: aps-report [keys] [options] <file-name(s)>
General KEYS:
-h [ --help ] Show this help and exit.
-a [ --all ] Show all available diagrams.
Specific KEYS:
-s [ --summary ] Show Summary information. Default
option.
-g [ --html-summary ] Generate Summary HTML report file.
Default option. (See option '-H' to set
the file name.)
-o [ --counters ] Show Counters & Memory usage
statistics.
-f [ --functions ] Show MPI Function Summary for all ranks
diagram. (Use Time filter. See '-T'
option.)
-t [ --mpi-time-per-rank ] Show MPI Function Time per Rank
diagram. (Use Time filter. See '-T'
option.)
-m [ --message-sizes ] Show Message Size diagram. (Use Time
filter. See '-T' option.)
-x [ --transfers-per-communication ] Show Data Transfers per Communication
diagram. (Use Time filter. See '-T'
option.) Use -v option to generate Data
Transfers by volume, not by time (which
is the default value)
-e [ --transfers-per-rank ] Show Data Transfers per Rank diagram.
(Use Volume filter. See '-V' option.)
-n [ --node-to-node ] Show Node-to-Node Transfers(MB)
-u [ --transfers-per-function ] Show Data Transfers per Function
diagram. (Use Volume filter. See '-V'
option.)
-l [ --communicators-list ] Show list of communicators used.
--node-topology Show the association between ranks,
nodes, and PCI devices in the collected
data.
--metrics arg Show the selected performance metrics
per node/rank. The metric names should
be separated using the comma. Use
--metrics=? to see the list of metrics
available for the selected result. The
special value "all" enables the display
of all available metrics. Example:
--metrics="Elapsed Time, Vectorization,
PCI"
-j [ --itac-config ] arg Generate specified Intel(R) Trace
Analyzer and Collector config.
Available config types are "time",
"imbalance", "volume". Use -N to
specify number of MPI functions in
config file.
OPTIONS:
-R [ --rank ] arg Specify ranks to show in the report
(the list of ranks should be defined
using comma and dash separators,
example: -R 0,1,4,4-10).
--node arg Specify nodes to show in the report
(the list of nodes should be defined
using commas as separators, example:
--node=localhost,test1).
-X [ --comm-matrix-size ] arg Show Communication matrix as a square
with specific dimensions. Applied to
'-x'.
-M [ --comm-id ] arg Show detailed info about collective
operations for specified communicator
ID. Applied to '-f'.
-D [ --details ] Show details for the ranks with
minimum, average, and maximum time
values from the selected diagram. For
Message Size diagram shows detailed
info about functions for each size.
Applied to the options '-t', '-c', '-f'
and '-m'.
-C [ --communicators ] Shows additional information about
communicators for collective
operations. Applied to the options '-f'
and '-m -D'.
-E [ --internal-communicators ] Shows information about internal IDs of
communicators provided by Intel(R) MPI.
Applied to '-l', '-f -C', '-m -D -C'.
-P [ --func-group ] Group data by function name. Applied to
the options '-m -D'.
-V [ --volume-threshold ] arg Threshold for data volume in % from
total transferred volume. Lines with
volume less than the threshold will be
skipped. Set to 0 to disable the
filter. By default - 1%.
-T [ --time-threshold ] arg Threshold for time in % from total
process lifetime. Lines with time less
than the threshold will be skipped. Set
to 0 to disable the filter. By default
- 1%.
--functions-filter arg Gather statistics only for specified
MPI functions
-v [ --by-volume ] Show Data Transfers per Communication
diagram by volume
--format arg Generate file of specified format for
the diagram. Supported values: txt,
csv, html
-H [ --html-summary-file ] arg Name of Summary HTML report file. By
default - 'aps_report_<date>_<time>.htm
l' in current folder.
The following environment variables are used:
APS_STAT_DIR_PREFIX a string to define the base name for the results directory.
By default, the results directory is "stat"
APS_STAT_DIR_POSTFIX a string to define a postfix for the results directory name and _mpi postfix.
The default value for the bundled version is _%D-%T, where:
%D - current date in the format YYYYMMDD
%T - current time in the format hhmmss
These placeholders are supported by the Intel MPI Library only.
APS_STAT_FILE_PREFIX a string to define the base name for statistics files.
The default value is "stat-". Rank number and ".bin" are added automatically.
FILE-NAMES:
Path to the results directory or any statistics file inside the results directory.
Examples: `aps-report -t -D stat_20160923-141516` - APS parses this directory.
`aps-report -t -D stat-0.bin` - APS tries to recognize all stat* files in the current directory.
Running with an MPI application¶
Collection on compute nodes¶
HTML report generation¶
Simple examples¶
aps
collection for amr-wind
application with 48 MPI ranks on four Aurora nodes¶
Collection¶
$ mpirun -n 48 --ppn 12 gpu_tile_compact.sh aps -r aps_report_amr-wind_4N ../amr_wind abl_godunov_4N_ST100.inp
==============================================================================
AMR-Wind (https://github.com/exawind/amr-wind)
AMR-Wind version :: v3.4.2
AMR-Wind Git SHA :: ed475a0533dfacf1fdff0b707518ccf99040d9f9
AMReX version :: 25.04-9-g30a9768150c4
...<skipping application standard output>...
AMReX (25.04-9-g30a9768150c4) finalized
Intel(R) VTune(TM) Profiler 2025.0.1 collection completed successfully. Use the "aps --report /lus/flare/projects/Aurora_deployment/jkwack/JK_AT_Tools/Apps/amr-wind_v3.4.2/build_20250513/jk_test/aps_report_amr-wind_4N" command to generate textual and HTML reports for the profiling session.
During the collection, the following warning can be ignored:
vtune: Warning: EMON Collector Message: Warning: PMU counter(s) [FP0 FP1 FP2 ] is (are) already in use by other software and the data collected on these counter(s) may be incorrect
Report¶
$ aps --report /lus/flare/projects/Aurora_deployment/jkwack/JK_AT_Tools/Apps/amr-wind_v3.4.2/build_20250513/jk_test/aps_report_amr-wind_4N
| Summary information
|--------------------------------------------------------------------
Application : amr_wind
Report creation date : 2025-05-19 00:39:10
Number of ranks : 48
Ranks per node : 12
HW Platform : Intel(R) Xeon(R) Processor code named Sapphirerapids
Frequency : 2.00 GHz
Logical core count per node : 208
Collector type : Event-based sampling driver,Event-based counting driver,User-mode sampling and tracing
Used statistics : /lus/flare/projects/Aurora_deployment/jkwack/JK_AT_Tools/Apps/amr-wind_v3.4.2/build_20250513/jk_test/aps_report_amr-wind_4N
|
| Your application might underutilize the available logical CPU cores
| because of insufficient parallel work, blocking on synchronization, or too much I/O. Perform function or source line-level profiling with tools like Intel(R) VTune(TM) Profiler to discover why the CPU is underutilized.
|
Elapsed Time: 867.22 s
SP GFLOPS: 0.00
DP GFLOPS: 0.01
Average CPU Frequency: 4.80 GHz
IPC Rate: 1.65
GPU Accumulated Time: 4732.36 s
MPI Time: 153.27 s 17.68% of Elapsed Time
| Your application is MPI bound. This may be caused by high busy wait time
| inside the library (imbalance), non-optimal communication schema or MPI
| library settings. Explore the MPI Imbalance metric if it is available or use
| MPI profiling tools like Intel(R) Trace Analyzer and Collector to explore
| possible performance bottlenecks.
| Some of the individual values contributing to this average metric are
| statistical outliers that can significantly distort the average metric value.
| They can also be a cause of performance degradation.
| Please use --counters or --metrics="MPI Time" reports for details.
MPI Imbalance: N/A*
| * No information about MPI Imbalance time is available. Set APS_IMBALANCE_TYPE
| to 1 or 2 to collect it.
Top 5 MPI functions (avg time):
MPI_Waitall: 103.02 s 11.89% of Elapsed Time
MPI_Allreduce: 22.22 s 2.56% of Elapsed Time
MPI_Isend: 19.92 s 2.30% of Elapsed Time
MPI_Barrier: 3.14 s 0.36% of Elapsed Time
MPI_Testall: 1.46 s 0.17% of Elapsed Time
Physical Core Utilization: 106.67%
Average Physical Core Utilization: 110.96 out of 208 Physical Cores
GPU Stack Utilization: 93.75%
XVE State:
Active: 66.68%
Stalled: 18.25%
| Some of the individual values contributing to this average metric are
| statistical outliers that can significantly distort the average metric value.
| They can also be a cause of performance degradation.
| Please use --counters or --metrics="XVE State: Stalled" reports for details.
Idle: 115.02%
| A significant portion of GPU time is spent idle. This is usually caused by
| imbalance or thread scheduling problems.
| Some of the individual values contributing to this average metric are
| statistical outliers that can significantly distort the average metric value.
| They can also be a cause of performance degradation.
| Please use --counters or --metrics="XVE State: Idle" reports for details.
GPU Occupancy: 72.47% of Peak Value
| Some of the individual values contributing to this average metric are
| statistical outliers that can significantly distort the average metric value.
| They can also be a cause of performance degradation.
| Please use --counters or --metrics="GPU Occupancy" reports for details.
Memory Stalls: 62.90% of Pipeline Slots
| The metric value can indicate that a significant fraction of execution
| pipeline slots could be stalled due to demand memory load and stores. See the
| second level metrics to define if the application is cache- or DRAM-bound and
| the NUMA efficiency. Use Intel(R) VTune(TM) Profiler Memory Access analysis to
| review a detailed metric breakdown by memory hierarchy, memory bandwidth
| information, and correlation by memory objects.
Cache Stalls: 66.60% of Cycles
| A significant proportion of cycles are spent on data fetches from cache. Use
| Intel(R) VTune(TM) Profiler Memory Access analysis to see if accesses to L2 or
| L3 cache are problematic and consider applying the same performance tuning as
| you would for a cache-missing workload. This may include reducing the data
| working set size, improving data access locality, blocking or partitioning the
| working set to fit in the lower cache levels, or exploiting hardware
| prefetchers.
DRAM Stalls: 3.95% of Cycles
DRAM Bandwidth
Peak: 426.75 GB/s
Average: 16.41 GB/s
Bound: 14.05%
Vectorization: 30.22%
Instruction Mix:
SP FLOPs: 0.00% of uOps
DP FLOPs: 0.00% of uOps
Non-FP: 200.00% of uOps
FP Arith/Mem Rd Instr. Ratio: 0.00
FP Arith/Mem Wr Instr. Ratio: 0.00
Average PCI Bandwidth:
Average PCIe Bandwidth Usage by GPU:
Inbound PCIe Read: 607.84 MB/s
Inbound PCIe Write: 6826.98 MB/s
Average PCIe Bandwidth Usage by Network Controller Devices:
Inbound PCIe Read: 6120.50 MB/s
Inbound PCIe Write: 4131.81 MB/s
Disk I/O Bound: 0.94 s 0.11% of Elapsed Time
Disk read: 153.1 KB
Disk write: 50.1 GB
Memory Footprint:
Resident:
Per node:
Peak resident set size : 8944.00 MB (node x4013c2s1b0n0)
Average resident set size : 8919.50 MB
Per rank:
Peak resident set size : 906.00 MB (rank 24)
Average resident set size : 743.29 MB
Virtual:
Per node:
Peak memory consumption : 64187493.00 MB (node x4013c2s0b0n0)
Average memory consumption : 64186991.25 MB
Per rank:
Peak memory consumption : 5349075.00 MB (rank 36)
Average memory consumption : 5348915.94 MB
Graphical representation of this data is available in the HTML report: /lus/flare/projects/Aurora_deployment/jkwack/JK_AT_Tools/Apps/amr-wind_v3.4.2/build_20250513/jk_test/aps_report_20250519_004709.html
Download the HTML file and open it on a web browser.
aps-report
CLI interface for the details¶
$ aps-report --metrics=? aps_report_amr-wind_4N/
| Available Metrics:
|--------------------
Elapsed Time
MPI Time
MPI Time
MPI Imbalance
MPI Hotspot 1 - MPI_Waitall
MPI Hotspot 1 - MPI_Waitall
MPI Hotspot 2 - MPI_Allreduce
MPI Hotspot 2 - MPI_Allreduce
MPI Hotspot 3 - MPI_Isend
MPI Hotspot 3 - MPI_Isend
MPI Hotspot 4 - MPI_Barrier
MPI Hotspot 4 - MPI_Barrier
MPI Hotspot 5 - MPI_Testall
MPI Hotspot 5 - MPI_Testall
Disk I/O Bound
Disk I/O Bound
Disk read
Disk write
Resident Memory Usage per Rank
Resident Memory Usage per Node
Virtual Memory Usage per Rank
Virtual Memory Usage per Node
Instructions Per Cycle Rate
Average CPU Frequency
Physical Core Utilization
Average Physical Core Utilization
Memory Stalls
Cache Stalls
DRAM Stalls
Average DRAM Bandwidth
DRAM Bandwidth Peak
DRAM Bandwidth Average
DRAM Bandwidth Bound
SP GFLOPS
DP GFLOPS
Vectorization
SP FLOPs
DP FLOPs
Non-FP
FP Arith/Mem Rd Instr. Ratio
FP Arith/Mem Wr Instr. Ratio
GPU Accumulated Time
GPU Stack Utilization
XVE State: Active
XVE State: Stalled
XVE State: Idle
GPU Occupancy
GPU Inbound PCIe Read
GPU Inbound PCIe Write
GPU Outbound PCIe Read
GPU Outbound PCIe Write
Network Controller Inbound PCIe Read
Network Controller Inbound PCIe Write
Network Controller Outbound PCIe Read
Network Controller Outbound PCIe Write
Inbound PCIe Read Per Device
Inbound PCIe Write Per Device
Outbound PCIe Read Per Device
Outbound PCIe Write Per Device
GPU Accumulated Time Per Device
GPU Stack Utilization Per Device
XVE State: Active Per Device
XVE State: Stalled Per Device
XVE State: Idle Per Device
GPU Occupancy Per Device
$ aps-report --metrics="GPU Stack Utilization Per Device, OpenMP Offload Time, GPU Accumulated Time Per Device, MPI Time" aps_report_amr-wind_4N
| Metric Table%
|--------------------------------------------------------------------------------------------------------------------------------
Metric Name Node Name Rank Device Type Device Name Metric Value Outlier Type
MPI Time, s x4013c2s1b0n0 14 N/A N/A 166.471 None
MPI Time, s x4013c2s1b0n0 15 N/A N/A 165.058 None
MPI Time, s x4013c2s3b0n0 36 N/A N/A 164.408 None
MPI Time, s x4013c2s2b0n0 29 N/A N/A 163.307 None
MPI Time, s x4013c2s3b0n0 39 N/A N/A 162.385 None
MPI Time, s x4013c2s3b0n0 38 N/A N/A 162.119 None
MPI Time, s x4013c2s3b0n0 41 N/A N/A 162.084 None
MPI Time, s x4013c2s2b0n0 28 N/A N/A 161.303 None
MPI Time, s x4013c2s1b0n0 16 N/A N/A 159.364 None
MPI Time, s x4013c2s2b0n0 25 N/A N/A 159.1 None
MPI Time, s x4013c2s3b0n0 37 N/A N/A 158.898 None
MPI Time, s x4013c2s0b0n0 4 N/A N/A 157.763 None
MPI Time, s x4013c2s1b0n0 13 N/A N/A 157.692 None
MPI Time, s x4013c2s1b0n0 20 N/A N/A 157.515 None
MPI Time, s x4013c2s0b0n0 5 N/A N/A 156.837 None
MPI Time, s x4013c2s2b0n0 30 N/A N/A 156.188 None
MPI Time, s x4013c2s2b0n0 34 N/A N/A 155.781 None
MPI Time, s x4013c2s1b0n0 21 N/A N/A 155.483 None
MPI Time, s x4013c2s1b0n0 12 N/A N/A 155.452 None
MPI Time, s x4013c2s3b0n0 44 N/A N/A 155.092 None
MPI Time, s x4013c2s1b0n0 22 N/A N/A 154.587 None
MPI Time, s x4013c2s2b0n0 26 N/A N/A 154.353 None
MPI Time, s x4013c2s2b0n0 27 N/A N/A 154.302 None
MPI Time, s x4013c2s1b0n0 18 N/A N/A 154.197 None
MPI Time, s x4013c2s2b0n0 32 N/A N/A 154.155 None
MPI Time, s x4013c2s0b0n0 8 N/A N/A 153.864 None
MPI Time, s x4013c2s3b0n0 42 N/A N/A 153.821 None
MPI Time, s x4013c2s1b0n0 19 N/A N/A 153.713 None
MPI Time, s x4013c2s3b0n0 43 N/A N/A 153.701 None
MPI Time, s x4013c2s2b0n0 33 N/A N/A 153.685 None
MPI Time, s x4013c2s3b0n0 47 N/A N/A 153.273 None
MPI Time, s x4013c2s3b0n0 45 N/A N/A 153.21 None
MPI Time, s x4013c2s2b0n0 31 N/A N/A 152.941 None
MPI Time, s x4013c2s3b0n0 46 N/A N/A 152.49 None
MPI Time, s x4013c2s0b0n0 9 N/A N/A 151.611 None
MPI Time, s x4013c2s2b0n0 35 N/A N/A 151.37 None
MPI Time, s x4013c2s0b0n0 6 N/A N/A 150.94 None
MPI Time, s x4013c2s0b0n0 1 N/A N/A 150.864 None
MPI Time, s x4013c2s0b0n0 3 N/A N/A 150.579 None
MPI Time, s x4013c2s0b0n0 7 N/A N/A 150.239 None
MPI Time, s x4013c2s1b0n0 23 N/A N/A 149.236 None
MPI Time, s x4013c2s0b0n0 10 N/A N/A 148.357 None
MPI Time, s x4013c2s0b0n0 11 N/A N/A 144.026 None
MPI Time, s x4013c2s0b0n0 0 N/A N/A 140.557 None
MPI Time, s x4013c2s3b0n0 40 N/A N/A 136.304 None
MPI Time, s x4013c2s1b0n0 17 N/A N/A 134.898 None
MPI Time, s x4013c2s2b0n0 24 N/A N/A 131.685 None
MPI Time, s x4013c2s0b0n0 2 N/A N/A 121.563 Statistic
MPI Time, % of Elapsed Time x4013c2s1b0n0 14 N/A N/A 19.2067 None
MPI Time, % of Elapsed Time x4013c2s1b0n0 15 N/A N/A 19.0436 None
MPI Time, % of Elapsed Time x4013c2s3b0n0 36 N/A N/A 18.9743 None
MPI Time, % of Elapsed Time x4013c2s2b0n0 29 N/A N/A 18.8438 None
MPI Time, % of Elapsed Time x4013c2s3b0n0 39 N/A N/A 18.7408 None
MPI Time, % of Elapsed Time x4013c2s3b0n0 38 N/A N/A 18.7101 None
MPI Time, % of Elapsed Time x4013c2s3b0n0 41 N/A N/A 18.7061 None
MPI Time, % of Elapsed Time x4013c2s2b0n0 28 N/A N/A 18.6125 None
MPI Time, % of Elapsed Time x4013c2s1b0n0 16 N/A N/A 18.3867 None
MPI Time, % of Elapsed Time x4013c2s2b0n0 25 N/A N/A 18.3583 None
MPI Time, % of Elapsed Time x4013c2s3b0n0 37 N/A N/A 18.3384 None
MPI Time, % of Elapsed Time x4013c2s1b0n0 13 N/A N/A 18.1938 None
MPI Time, % of Elapsed Time x4013c2s0b0n0 4 N/A N/A 18.1918 None
MPI Time, % of Elapsed Time x4013c2s1b0n0 20 N/A N/A 18.1733 None
MPI Time, % of Elapsed Time x4013c2s0b0n0 5 N/A N/A 18.0851 None
MPI Time, % of Elapsed Time x4013c2s2b0n0 30 N/A N/A 18.0223 None
MPI Time, % of Elapsed Time x4013c2s2b0n0 34 N/A N/A 17.9754 None
MPI Time, % of Elapsed Time x4013c2s1b0n0 21 N/A N/A 17.9389 None
MPI Time, % of Elapsed Time x4013c2s1b0n0 12 N/A N/A 17.9353 None
MPI Time, % of Elapsed Time x4013c2s3b0n0 44 N/A N/A 17.8991 None
MPI Time, % of Elapsed Time x4013c2s1b0n0 22 N/A N/A 17.8356 None
MPI Time, % of Elapsed Time x4013c2s2b0n0 26 N/A N/A 17.8105 None
MPI Time, % of Elapsed Time x4013c2s2b0n0 27 N/A N/A 17.8047 None
MPI Time, % of Elapsed Time x4013c2s1b0n0 18 N/A N/A 17.7905 None
MPI Time, % of Elapsed Time x4013c2s2b0n0 32 N/A N/A 17.7878 None
MPI Time, % of Elapsed Time x4013c2s3b0n0 42 N/A N/A 17.7524 None
MPI Time, % of Elapsed Time x4013c2s0b0n0 8 N/A N/A 17.7422 None
MPI Time, % of Elapsed Time x4013c2s3b0n0 43 N/A N/A 17.7386 None
MPI Time, % of Elapsed Time x4013c2s1b0n0 19 N/A N/A 17.7347 None
MPI Time, % of Elapsed Time x4013c2s2b0n0 33 N/A N/A 17.7335 None
MPI Time, % of Elapsed Time x4013c2s3b0n0 47 N/A N/A 17.6892 None
MPI Time, % of Elapsed Time x4013c2s3b0n0 45 N/A N/A 17.6819 None
MPI Time, % of Elapsed Time x4013c2s2b0n0 31 N/A N/A 17.6477 None
MPI Time, % of Elapsed Time x4013c2s3b0n0 46 N/A N/A 17.5988 None
MPI Time, % of Elapsed Time x4013c2s0b0n0 9 N/A N/A 17.4825 None
MPI Time, % of Elapsed Time x4013c2s2b0n0 35 N/A N/A 17.4664 None
MPI Time, % of Elapsed Time x4013c2s0b0n0 6 N/A N/A 17.4051 None
MPI Time, % of Elapsed Time x4013c2s0b0n0 1 N/A N/A 17.3963 None
MPI Time, % of Elapsed Time x4013c2s0b0n0 3 N/A N/A 17.3635 None
MPI Time, % of Elapsed Time x4013c2s0b0n0 7 N/A N/A 17.3243 None
MPI Time, % of Elapsed Time x4013c2s1b0n0 23 N/A N/A 17.2182 None
MPI Time, % of Elapsed Time x4013c2s0b0n0 10 N/A N/A 17.1073 None
MPI Time, % of Elapsed Time x4013c2s0b0n0 11 N/A N/A 16.6078 None
MPI Time, % of Elapsed Time x4013c2s0b0n0 0 N/A N/A 16.2078 None
MPI Time, % of Elapsed Time x4013c2s3b0n0 40 N/A N/A 15.7308 None
MPI Time, % of Elapsed Time x4013c2s1b0n0 17 N/A N/A 15.5639 None
MPI Time, % of Elapsed Time x4013c2s2b0n0 24 N/A N/A 15.1949 None
MPI Time, % of Elapsed Time x4013c2s0b0n0 2 N/A N/A 14.0176 Statistic
GPU Accumulated Time Per Device, s x4013c2s0b0n0 N/A GPU GPU 0 Stack 0 574.952 None
GPU Accumulated Time Per Device, s x4013c2s0b0n0 N/A GPU GPU 0 Stack 1 561.112 None
GPU Accumulated Time Per Device, s x4013c2s1b0n0 N/A GPU GPU 0 Stack 0 555.666 None
GPU Accumulated Time Per Device, s x4013c2s2b0n0 N/A GPU GPU 0 Stack 0 552.534 None
GPU Accumulated Time Per Device, s x4013c2s3b0n0 N/A GPU GPU 0 Stack 0 546.085 None
GPU Accumulated Time Per Device, s x4013c2s3b0n0 N/A GPU GPU 0 Stack 1 543.687 None
GPU Accumulated Time Per Device, s x4013c2s0b0n0 N/A GPU GPU 1 Stack 1 543.447 None
GPU Accumulated Time Per Device, s x4013c2s2b0n0 N/A GPU GPU 0 Stack 1 543.261 None
GPU Accumulated Time Per Device, s x4013c2s1b0n0 N/A GPU GPU 0 Stack 1 543.069 None
GPU Accumulated Time Per Device, s x4013c2s0b0n0 N/A GPU GPU 1 Stack 0 541.997 None
GPU Accumulated Time Per Device, s x4013c2s0b0n0 N/A GPU GPU 2 Stack 0 533.238 None
GPU Accumulated Time Per Device, s x4013c2s0b0n0 N/A GPU GPU 2 Stack 1 529.018 None
GPU Accumulated Time Per Device, s x4013c2s2b0n0 N/A GPU GPU 1 Stack 1 526.305 None
GPU Accumulated Time Per Device, s x4013c2s2b0n0 N/A GPU GPU 1 Stack 0 525.409 None
GPU Accumulated Time Per Device, s x4013c2s1b0n0 N/A GPU GPU 2 Stack 0 523.241 None
GPU Accumulated Time Per Device, s x4013c2s3b0n0 N/A GPU GPU 1 Stack 0 521.108 None
GPU Accumulated Time Per Device, s x4013c2s3b0n0 N/A GPU GPU 1 Stack 1 520.34 None
GPU Accumulated Time Per Device, s x4013c2s2b0n0 N/A GPU GPU 2 Stack 1 518.369 None
GPU Accumulated Time Per Device, s x4013c2s2b0n0 N/A GPU GPU 2 Stack 0 516.94 None
GPU Accumulated Time Per Device, s x4013c2s1b0n0 N/A GPU GPU 2 Stack 1 516.645 None
GPU Accumulated Time Per Device, s x4013c2s1b0n0 N/A GPU GPU 1 Stack 0 516.139 None
GPU Accumulated Time Per Device, s x4013c2s3b0n0 N/A GPU GPU 2 Stack 0 514.991 None
GPU Accumulated Time Per Device, s x4013c2s3b0n0 N/A GPU GPU 2 Stack 1 513.923 None
GPU Accumulated Time Per Device, s x4013c2s1b0n0 N/A GPU GPU 1 Stack 1 513.66 None
GPU Accumulated Time Per Device, s x4013c2s0b0n0 N/A GPU GPU 5 Stack 0 268.919 None
GPU Accumulated Time Per Device, s x4013c2s0b0n0 N/A GPU GPU 5 Stack 1 268.893 None
GPU Accumulated Time Per Device, s x4013c2s0b0n0 N/A GPU GPU 4 Stack 1 268.416 None
GPU Accumulated Time Per Device, s x4013c2s0b0n0 N/A GPU GPU 4 Stack 0 266.623 None
GPU Accumulated Time Per Device, s x4013c2s0b0n0 N/A GPU GPU 3 Stack 0 266.087 None
GPU Accumulated Time Per Device, s x4013c2s0b0n0 N/A GPU GPU 3 Stack 1 262.871 None
GPU Accumulated Time Per Device, s x4013c2s2b0n0 N/A GPU GPU 4 Stack 1 258.085 None
GPU Accumulated Time Per Device, s x4013c2s2b0n0 N/A GPU GPU 4 Stack 0 256.988 None
GPU Accumulated Time Per Device, s x4013c2s2b0n0 N/A GPU GPU 5 Stack 1 255.12 None
GPU Accumulated Time Per Device, s x4013c2s2b0n0 N/A GPU GPU 5 Stack 0 254.068 None
GPU Accumulated Time Per Device, s x4013c2s2b0n0 N/A GPU GPU 3 Stack 1 253.944 None
GPU Accumulated Time Per Device, s x4013c2s3b0n0 N/A GPU GPU 4 Stack 1 252.502 None
GPU Accumulated Time Per Device, s x4013c2s1b0n0 N/A GPU GPU 3 Stack 0 252.348 None
GPU Accumulated Time Per Device, s x4013c2s1b0n0 N/A GPU GPU 5 Stack 1 252.127 None
GPU Accumulated Time Per Device, s x4013c2s1b0n0 N/A GPU GPU 5 Stack 0 251.616 None
GPU Accumulated Time Per Device, s x4013c2s1b0n0 N/A GPU GPU 4 Stack 1 250.515 None
GPU Accumulated Time Per Device, s x4013c2s3b0n0 N/A GPU GPU 3 Stack 0 250.468 None
GPU Accumulated Time Per Device, s x4013c2s1b0n0 N/A GPU GPU 3 Stack 1 250.404 None
GPU Accumulated Time Per Device, s x4013c2s1b0n0 N/A GPU GPU 4 Stack 0 250.19 None
GPU Accumulated Time Per Device, s x4013c2s3b0n0 N/A GPU GPU 5 Stack 1 250.123 None
GPU Accumulated Time Per Device, s x4013c2s2b0n0 N/A GPU GPU 3 Stack 0 249.972 None
GPU Accumulated Time Per Device, s x4013c2s3b0n0 N/A GPU GPU 3 Stack 1 248.308 None
GPU Accumulated Time Per Device, s x4013c2s3b0n0 N/A GPU GPU 4 Stack 0 248.101 None
GPU Accumulated Time Per Device, s x4013c2s3b0n0 N/A GPU GPU 5 Stack 0 247.606 None
GPU Stack Utilization Per Device, % x4013c2s0b0n0 N/A GPU GPU 0 Stack 0 132.9 None
GPU Stack Utilization Per Device, % x4013c2s0b0n0 N/A GPU GPU 0 Stack 1 129.9 None
GPU Stack Utilization Per Device, % x4013c2s1b0n0 N/A GPU GPU 0 Stack 0 128.3 None
GPU Stack Utilization Per Device, % x4013c2s2b0n0 N/A GPU GPU 0 Stack 0 127.4 None
GPU Stack Utilization Per Device, % x4013c2s0b0n0 N/A GPU GPU 1 Stack 1 126.2 None
GPU Stack Utilization Per Device, % x4013c2s3b0n0 N/A GPU GPU 0 Stack 0 126.1 None
GPU Stack Utilization Per Device, % x4013c2s0b0n0 N/A GPU GPU 1 Stack 0 125.8 None
GPU Stack Utilization Per Device, % x4013c2s3b0n0 N/A GPU GPU 0 Stack 1 125.5 None
GPU Stack Utilization Per Device, % x4013c2s2b0n0 N/A GPU GPU 0 Stack 1 125.4 None
GPU Stack Utilization Per Device, % x4013c2s1b0n0 N/A GPU GPU 0 Stack 1 125.4 None
GPU Stack Utilization Per Device, % x4013c2s0b0n0 N/A GPU GPU 2 Stack 0 123.7 None
GPU Stack Utilization Per Device, % x4013c2s0b0n0 N/A GPU GPU 2 Stack 1 122.7 None
GPU Stack Utilization Per Device, % x4013c2s2b0n0 N/A GPU GPU 1 Stack 1 121.8 None
GPU Stack Utilization Per Device, % x4013c2s2b0n0 N/A GPU GPU 1 Stack 0 121.7 None
GPU Stack Utilization Per Device, % x4013c2s1b0n0 N/A GPU GPU 2 Stack 0 121.2 None
GPU Stack Utilization Per Device, % x4013c2s3b0n0 N/A GPU GPU 1 Stack 0 120.8 None
GPU Stack Utilization Per Device, % x4013c2s3b0n0 N/A GPU GPU 1 Stack 1 120.4 None
GPU Stack Utilization Per Device, % x4013c2s2b0n0 N/A GPU GPU 2 Stack 1 120 None
GPU Stack Utilization Per Device, % x4013c2s1b0n0 N/A GPU GPU 2 Stack 1 119.7 None
GPU Stack Utilization Per Device, % x4013c2s2b0n0 N/A GPU GPU 2 Stack 0 119.7 None
GPU Stack Utilization Per Device, % x4013c2s1b0n0 N/A GPU GPU 1 Stack 0 119.6 None
GPU Stack Utilization Per Device, % x4013c2s3b0n0 N/A GPU GPU 2 Stack 0 119.3 None
GPU Stack Utilization Per Device, % x4013c2s3b0n0 N/A GPU GPU 2 Stack 1 119 None
GPU Stack Utilization Per Device, % x4013c2s1b0n0 N/A GPU GPU 1 Stack 1 118.9 None
GPU Stack Utilization Per Device, % x4013c2s0b0n0 N/A GPU GPU 5 Stack 1 67.5 Threshold
GPU Stack Utilization Per Device, % x4013c2s0b0n0 N/A GPU GPU 5 Stack 0 67.5 Threshold
GPU Stack Utilization Per Device, % x4013c2s0b0n0 N/A GPU GPU 4 Stack 1 67.4 Threshold
GPU Stack Utilization Per Device, % x4013c2s0b0n0 N/A GPU GPU 4 Stack 0 66.9 Threshold
GPU Stack Utilization Per Device, % x4013c2s0b0n0 N/A GPU GPU 3 Stack 0 66.8 Threshold
GPU Stack Utilization Per Device, % x4013c2s0b0n0 N/A GPU GPU 3 Stack 1 66 Threshold
GPU Stack Utilization Per Device, % x4013c2s2b0n0 N/A GPU GPU 4 Stack 1 64.9 Threshold
GPU Stack Utilization Per Device, % x4013c2s2b0n0 N/A GPU GPU 4 Stack 0 64.6 Threshold
GPU Stack Utilization Per Device, % x4013c2s2b0n0 N/A GPU GPU 5 Stack 1 64.1 Threshold
GPU Stack Utilization Per Device, % x4013c2s2b0n0 N/A GPU GPU 5 Stack 0 63.9 Threshold
GPU Stack Utilization Per Device, % x4013c2s2b0n0 N/A GPU GPU 3 Stack 1 63.8 Threshold
GPU Stack Utilization Per Device, % x4013c2s3b0n0 N/A GPU GPU 4 Stack 1 63.5 Threshold
GPU Stack Utilization Per Device, % x4013c2s1b0n0 N/A GPU GPU 5 Stack 1 63.4 Threshold
GPU Stack Utilization Per Device, % x4013c2s1b0n0 N/A GPU GPU 3 Stack 0 63.4 Threshold
GPU Stack Utilization Per Device, % x4013c2s1b0n0 N/A GPU GPU 5 Stack 0 63.2 Threshold
GPU Stack Utilization Per Device, % x4013c2s1b0n0 N/A GPU GPU 4 Stack 1 63 Threshold
GPU Stack Utilization Per Device, % x4013c2s3b0n0 N/A GPU GPU 3 Stack 0 63 Threshold
GPU Stack Utilization Per Device, % x4013c2s3b0n0 N/A GPU GPU 5 Stack 1 62.9 Threshold
GPU Stack Utilization Per Device, % x4013c2s1b0n0 N/A GPU GPU 4 Stack 0 62.9 Threshold
GPU Stack Utilization Per Device, % x4013c2s1b0n0 N/A GPU GPU 3 Stack 1 62.9 Threshold
GPU Stack Utilization Per Device, % x4013c2s2b0n0 N/A GPU GPU 3 Stack 0 62.8 Threshold
GPU Stack Utilization Per Device, % x4013c2s3b0n0 N/A GPU GPU 4 Stack 0 62.4 Threshold
GPU Stack Utilization Per Device, % x4013c2s3b0n0 N/A GPU GPU 3 Stack 1 62.4 Threshold
GPU Stack Utilization Per Device, % x4013c2s3b0n0 N/A GPU GPU 5 Stack 0 62.2 Threshold