Miscellaneous
Status
GC-Monitor
The command gc-monitor
is Graphcore's device usage monitor. Run it as follows for ordinary monitoring. See gc-monitor --help
for other options.
export IPUOF_VIPU_API_HOST=10.1.3.101
gc-monitor --no-card-info --all-partitions
# or watch gc-monitor --no-card-info --all-partitions
Note: if there are no partitions active, gc-monitor will core dump:
Segmentation fault (core dumped)
The output will look something like:
+--------------------------------------------------------------+-----------------------+
| IPUs in slurm_2616 attached from other namespaces | Board |
+----+------------------------------+--------------+-----------+-----------+-----------+
| ID | Application host | Clock | Temp | Temp | Power |
+----+------------------------------+--------------+-----------+-----------+-----------+
| 0 | gc-poplar-02 | 1850MHz | 24.2 C | 21.1 C | 92.3 W |
+----+------------------------------+--------------+-----------+-----------+-----------+
GC-Info
The command gc-info
is used to display device information. See gc-info --help
for more options.
To list devices,
The command gc-info
lists the partition and different IPU Id's along with the multi-IPU configuration IDs.
-+- Id: [0], target: [Fabric], IPU-M host: [10.1.5.1], IPU#: [3]
-+- Id: [1], target: [Fabric], IPU-M host: [10.1.5.1], IPU#: [2]
-+- Id: [2], target: [Fabric], IPU-M host: [10.1.5.1], IPU#: [1]
One may also display detailed information for a specific device. The devices are numbered 0-63. For example,
See gc-info --help
for more information.
How busy is the system?
Use one of