Getting Started with Cerebras CSL¶
Cerebras CSL (Cerebras System Language) is a low-level kernel programming language designed for the Cerebras system. It enables users to write code that runs on individual Processing Elements (PEs) and to define the placement of programs and the routing of data on the Wafer-Scale Engine (WSE).
To develop programs for the Cerebras system, users create two main components:
- Device Code: Written in CSL, this code executes directly on the Cerebras system.
- Host Code: Written in Python, this code leverages Cerebras APIs to facilitate data movement and execute functions on the Cerebras system. CSL includes libraries for a variety of commonly used primitive operations, such as broadcasting, gathering, and scattering data across rows or columns of PEs.
The Cerebras SDK can be utilized in two primary modes: 1. Simulator Mode: For testing and debugging programs without access to physical hardware. 2. Appliance Mode: For executing programs on the actual Cerebras hardware.
For a comprehensive overview of the Cerebras SDK, refer to the Cerebras SDK Documentation.
SDK with Simulator¶
The Cerebras SDK relies on a Singularity container and associated scripts to execute CSL code on a simulator.
On the login node, the Cerebras SDK is available at /software/cerebras/cs_sdk
for your convenience. You can copy it to your $HOME
directory, add it to your $PATH
, and you’re ready to get started.
To verify that the SDK is installed correctly, execute the command: cslc --help
Examples¶
We will use examples from the csl-examples
repository provided by Cerebras. To get these examples, clone the repository into your desired directory:
Sample Output
$ bash commands.sh
INFO: Using SIF: /software/cerebras/cs_sdk-1.2.0/sdk-cbcore-202406260214-3-f03c8e31.sif
INFO: User's specified CSL_IMPORT_PATH=
NOTE: CSL_IMPORT_PATH accepts colon separated list of paths generated by 'realpath <path>'
compile successful
INFO: Using SIF: /software/cerebras/cs_sdk-1.2.0/sdk-cbcore-202406260214-3-f03c8e31.sif
SUCCESS
SDK GUI¶
You can use the SDK Debug GUI to analyze and gain insights into your code execution. For detailed instructions, refer to the SDK GUI documentation.
To launch the SDK Debug GUI, run the following commands:
Sample Output
> sdk_debug_shell visualize
INFO: Using SIF: /software/cerebras/cs_sdk-1.2.0/sdk-cbcore-202406260214-3-f03c8e31.sif
INFO: User's specified CSL_IMPORT_PATH=
NOTE: CSL_IMPORT_PATH accepts colon separated list of paths generated by 'realpath <path>'
Click this link to open URL: http://cer-login-02.ai.alcf.anl.gov:8000/sdk-gui
Click this link to open URL: http://140.221.80.28:8000/sdk-gui
Press Ctrl-C to exit
To access the GUI from your local computer, forward port 8000 from the login node to your local machine and open the following URL in your web browser: http://localhost:8000/sdk-gui/
SDK with Appliance Mode¶
Examples currently not working
With the release of the Cerebras SDK version 2.4.0, the examples in the below tutorial are known to be broken on the CS-2. A fix and updates are forthcoming.
Appliance Mode enables running code directly on the Cerebras Wafer-Scale Cluster. In addition to the containerized Singularity build of the Cerebras SDK, the SDK also supports operations on Cerebras Wafer-Scale Clusters running in appliance mode.
Setup¶
Create Virtual Environment: Follow these steps to set up the virtual environment for the Cerebras SDK:
Install SDK Packages: Install the cerebras_appliance
and cerebras_sdk
Python packages in the virtual environment, specifying the appropriate Cerebras Software release:
Examples¶
We will use examples from the csl-examples
repository provided by Cerebras. To access these examples, clone the repository into your desired directory:
Compile Code¶
Use the following compile.py
script to compile the code in the respective example directory:
Sample Output
$ python compile.py
2023-10-11 00:55:33,107 DEBUG ClusterClient: server=10.140.65.35:443, authority=cluster-server.cerebras1.lab.alcf.anl.gov, cert=/opt/cerebras/certs/tls.crt, client-lease-strategy=0, heartbeat_options=HeartBeatOptions(cycle_seconds=10, cycle_threshold=12, lease_duration_seconds_override=0), options=[('grpc.service_config', '{"methodConfig": [{"name": [{"service": "cluster.cluster_mgmt_pb.ClusterManagement"}], "retryPolicy": {"maxAttempts": 3, "initialBackoff": "3s", "maxBackoff": "10s", "backoffMultiplier": 2, "retryableStatusCodes": ["UNAVAILABLE"]}}]}'), ('grpc.enable_retries', 1), ('grpc.default_authority', 'cluster-server.cerebras1.lab.alcf.anl.gov')]
2023-10-11 00:55:33,128 INFO Initiating a new SDK compile job against the cluster server
2023-10-11 00:55:33,142 DEBUG Run meta is available at /srv/projects/datascience/sraskar/cs2/cs_sdk/gemv/run_meta.json.
2023-10-11 00:55:33,142 INFO sdk_compile job id: wsjob-cgvksf7bv8mnfszfmftozv, log path: /n1/wsjob/workdir/wsjob-cgvksf7bv8mnfszfmftozv
2023-10-11 00:55:33,142 DEBUG Starting heartbeat thread for wsjob-cgvksf7bv8mnfszfmftozv. Heartbeat requests will be sent every 10 seconds.
2023-10-11 00:55:43,153 INFO Poll ingress status: Waiting for coordinator to be ready.
2023-10-11 00:56:13,186 INFO Ingress is ready.
2023-10-11 00:56:13,231 WARNING There is no existing compile job record.
2023-10-11 00:56:13,231 WARNING There is no existing compile job record.
2023-10-11 00:56:13,231 DEBUG Cluster mgmt job handle: {'job_id': 'wsjob-cgvksf7bv8mnfszfmftozv', 'service_authority': 'wsjob-cgvksf7bv8mnfszfmftozv-coordinator-0.cluster-server.cerebras1.lab.alcf.anl.gov', 'service_url': '10.140.65.35:443', 'credentials_path': '/opt/cerebras/certs/tls.crt'}
2023-10-11 00:56:13,255 INFO Application was found in the compile cache.
2023-10-11 00:56:13,266 DEBUG Signaling heartbeat thread to stop for wsjob-cgvksf7bv8mnfszfmftozv
The only difference between CS-2 and simuator run is the fabric_dims
. It should be set to minimum required for simulatored runs. Above script generates artifact.json
which is used by the run.py
script.
Run Code¶
Use the following run.py
script to run the code in the respective example directory:
Sample Output
$ python run.py
2023-10-11 00:56:21,281 DEBUG ClusterClient: server=10.140.65.35:443, authority=cluster-server.cerebras1.lab.alcf.anl.gov, cert=/opt/cerebras/certs/tls.crt, client-lease-strategy=0, heartbeat_options=HeartBeatOptions(cycle_seconds=10, cycle_threshold=12, lease_duration_seconds_override=0), options=[('grpc.service_config', '{"methodConfig": [{"name": [{"service": "cluster.cluster_mgmt_pb.ClusterManagement"}], "retryPolicy": {"maxAttempts": 3, "initialBackoff": "3s", "maxBackoff": "10s", "backoffMultiplier": 2, "retryableStatusCodes": ["UNAVAILABLE"]}}]}'), ('grpc.enable_retries', 1), ('grpc.default_authority', 'cluster-server.cerebras1.lab.alcf.anl.gov')]
2023-10-11 00:56:21,304 INFO Initiating a new SDK execute job against the cluster server
2023-10-11 00:56:21,316 DEBUG Run meta is available at /srv/projects/datascience/sraskar/cs2/cs_sdk/gemv/run_meta.json.
2023-10-11 00:56:21,316 INFO sdk_execute job id: wsjob-nm4gmz7jtq3khck2ltadbv, log path: /n1/wsjob/workdir/wsjob-nm4gmz7jtq3khck2ltadbv
2023-10-11 00:56:21,316 DEBUG Starting heartbeat thread for wsjob-nm4gmz7jtq3khck2ltadbv. Heartbeat requests will be sent every 10 seconds.
2023-10-11 00:56:31,327 INFO Poll ingress status: Waiting for coordinator to be ready.
2023-10-11 00:57:01,360 INFO Ingress is ready.
2023-10-11 00:57:01,410 WARNING There is no existing execute job record.
2023-10-11 00:57:01,410 WARNING There is no existing execute job record.
2023-10-11 00:57:01,410 DEBUG Cluster mgmt job handle: {'job_id': 'wsjob-nm4gmz7jtq3khck2ltadbv', 'service_authority': 'wsjob-nm4gmz7jtq3khck2ltadbv-coordinator-0.cluster-server.cerebras1.lab.alcf.anl.gov', 'service_url': '10.140.65.35:443', 'credentials_path': '/opt/cerebras/certs/tls.crt'}
SUCCESS!
2023-10-11 00:57:44,184 DEBUG Signaling heartbeat thread to stop for wsjob-nm4gmz7jtq3khck2ltadbv