Skip to content

Getting Started with Cerebras CSL

Cerebras CSL (Cerebras System Language) is a low-level kernel programming language designed for the Cerebras system. It enables users to write code that runs on individual Processing Elements (PEs) and to define the placement of programs and the routing of data on the Wafer-Scale Engine (WSE).

To develop programs for the Cerebras system, users create two main components:

  1. Device Code: Written in CSL, this code executes directly on the Cerebras system.
  2. Host Code: Written in Python, this code leverages Cerebras APIs to facilitate data movement and execute functions on the Cerebras system. CSL includes libraries for a variety of commonly used primitive operations, such as broadcasting, gathering, and scattering data across rows or columns of PEs.

The Cerebras SDK can be utilized in two primary modes: 1. Simulator Mode: For testing and debugging programs without access to physical hardware. 2. Appliance Mode: For executing programs on the actual Cerebras hardware.

For a comprehensive overview of the Cerebras SDK, refer to the Cerebras SDK Documentation.

SDK with Simulator

The Cerebras SDK relies on a Singularity container and associated scripts to execute CSL code on a simulator.

On the login node, the Cerebras SDK is available at /software/cerebras/cs_sdk for your convenience. You can copy it to your $HOME directory, add it to your $PATH, and you’re ready to get started.

cp -r /software/cerebras/cs_sdk-1.2.0 ~
export PATH=~/cs_sdk-1.2.0:$PATH

To verify that the SDK is installed correctly, execute the command: cslc --help

Examples

We will use examples from the csl-examples repository provided by Cerebras. To get these examples, clone the repository into your desired directory:

1
2
3
4
5
git clone https://github.com/Cerebras/csl-examples.git
cd csl-examples
git checkout rel-sdk-1.2.0
cd ~/csl-examples/benchmarks/gemm-collectives_2d
bash commands.sh
Sample Output
$ bash commands.sh
INFO: Using SIF: /software/cerebras/cs_sdk-1.2.0/sdk-cbcore-202406260214-3-f03c8e31.sif
INFO: User's specified CSL_IMPORT_PATH=
NOTE: CSL_IMPORT_PATH accepts colon separated list of paths generated by 'realpath <path>'
compile successful
INFO: Using SIF: /software/cerebras/cs_sdk-1.2.0/sdk-cbcore-202406260214-3-f03c8e31.sif
SUCCESS

SDK GUI

You can use the SDK Debug GUI to analyze and gain insights into your code execution. For detailed instructions, refer to the SDK GUI documentation.

To launch the SDK Debug GUI, run the following commands:

cd ~/csl-examples/benchmarks/gemm-collectives_2d
sdk_debug_shell visualize

Sample Output
> sdk_debug_shell visualize
INFO: Using SIF: /software/cerebras/cs_sdk-1.2.0/sdk-cbcore-202406260214-3-f03c8e31.sif
INFO: User's specified CSL_IMPORT_PATH=
NOTE: CSL_IMPORT_PATH accepts colon separated list of paths generated by 'realpath <path>'
Click this link to open URL:  http://cer-login-02.ai.alcf.anl.gov:8000/sdk-gui
Click this link to open URL:  http://140.221.80.28:8000/sdk-gui
Press Ctrl-C to exit

To access the GUI from your local computer, forward port 8000 from the login node to your local machine and open the following URL in your web browser: http://localhost:8000/sdk-gui/

CS-2 connection diagram

SDK with Appliance Mode

Examples currently not working

With the release of the Cerebras SDK version 2.4.0, the examples in the below tutorial are known to be broken on the CS-2. A fix and updates are forthcoming.

Appliance Mode enables running code directly on the Cerebras Wafer-Scale Cluster. In addition to the containerized Singularity build of the Cerebras SDK, the SDK also supports operations on Cerebras Wafer-Scale Clusters running in appliance mode.

Setup

Create Virtual Environment: Follow these steps to set up the virtual environment for the Cerebras SDK:

1
2
3
4
5
6
7
mkdir ~/cs_appliance_sdk
cd ~/cs_appliance_sdk
deactivate
rm -r cs_appliance_sdk
/software/cerebras/python3.8/bin/python3.8 -m venv cs_appliance_sdk
source cs_appliance_sdk/bin/activate
pip install --upgrade pip

Install SDK Packages: Install the cerebras_appliance and cerebras_sdk Python packages in the virtual environment, specifying the appropriate Cerebras Software release:

1
2
3
pip install --upgrade pip
pip install cerebras_appliance==2.4.0
pip install cerebras_sdk==2.4.0

Examples

We will use examples from the csl-examples repository provided by Cerebras. To access these examples, clone the repository into your desired directory:

1
2
3
4
git clone https://github.com/Cerebras/csl-examples.git
cd csl-examples
git checkout rel-sdk-1.2.0
cd ~/csl-examples/benchmarks/gemm-collectives_2d

Compile Code

Use the following compile.py script to compile the code in the respective example directory:

compile.py
import json
from cerebras_appliance.sdk import SdkCompiler

# Instantiate copmiler
compiler = SdkCompiler()

# Launch compile job
artifact_id = compiler.compile(
    ".",
    "layout.csl",
    # For running on CS-2
    "--fabric-dims=757,996 --fabric-offsets=4,1 --memcpy --channels=1 -o out",
    # For Running Fabric Simulator
    #"--fabric-dims=8,3 --fabric-offsets=4,1 --memcpy --channels=1 -o out",
)

# Write the artifact_id to a JSON file
with open("artifact_id.json", "w", encoding="utf8") as f:
    json.dump({"artifact_id": artifact_id,}, f)
Sample Output
$ python compile.py
2023-10-11 00:55:33,107 DEBUG    ClusterClient: server=10.140.65.35:443, authority=cluster-server.cerebras1.lab.alcf.anl.gov, cert=/opt/cerebras/certs/tls.crt, client-lease-strategy=0, heartbeat_options=HeartBeatOptions(cycle_seconds=10, cycle_threshold=12, lease_duration_seconds_override=0), options=[('grpc.service_config', '{"methodConfig": [{"name": [{"service": "cluster.cluster_mgmt_pb.ClusterManagement"}], "retryPolicy": {"maxAttempts": 3, "initialBackoff": "3s", "maxBackoff": "10s", "backoffMultiplier": 2, "retryableStatusCodes": ["UNAVAILABLE"]}}]}'), ('grpc.enable_retries', 1), ('grpc.default_authority', 'cluster-server.cerebras1.lab.alcf.anl.gov')]
2023-10-11 00:55:33,128 INFO     Initiating a new SDK compile job against the cluster server
2023-10-11 00:55:33,142 DEBUG    Run meta is available at /srv/projects/datascience/sraskar/cs2/cs_sdk/gemv/run_meta.json.
2023-10-11 00:55:33,142 INFO     sdk_compile job id: wsjob-cgvksf7bv8mnfszfmftozv, log path: /n1/wsjob/workdir/wsjob-cgvksf7bv8mnfszfmftozv
2023-10-11 00:55:33,142 DEBUG    Starting heartbeat thread for wsjob-cgvksf7bv8mnfszfmftozv. Heartbeat requests will be sent every 10 seconds.
2023-10-11 00:55:43,153 INFO     Poll ingress status: Waiting for coordinator to be ready.
2023-10-11 00:56:13,186 INFO     Ingress is ready.
2023-10-11 00:56:13,231 WARNING  There is no existing compile job record.
2023-10-11 00:56:13,231 WARNING  There is no existing compile job record.
2023-10-11 00:56:13,231 DEBUG    Cluster mgmt job handle: {'job_id': 'wsjob-cgvksf7bv8mnfszfmftozv', 'service_authority': 'wsjob-cgvksf7bv8mnfszfmftozv-coordinator-0.cluster-server.cerebras1.lab.alcf.anl.gov', 'service_url': '10.140.65.35:443', 'credentials_path': '/opt/cerebras/certs/tls.crt'}
2023-10-11 00:56:13,255 INFO     Application was found in the compile cache.
2023-10-11 00:56:13,266 DEBUG    Signaling heartbeat thread to stop for wsjob-cgvksf7bv8mnfszfmftozv

The only difference between CS-2 and simuator run is the fabric_dims. It should be set to minimum required for simulatored runs. Above script generates artifact.json which is used by the run.py script.

Run Code

Use the following run.py script to run the code in the respective example directory:

run.py
import json
import os

import numpy as np

from cerebras_appliance.pb.sdk.sdk_common_pb2 import MemcpyDataType, MemcpyOrder
from cerebras_appliance.sdk import SdkRuntime

# Matrix dimensions
M = 4
N = 6

# Construct A, x, b
A = np.arange(M*N, dtype=np.float32).reshape(M, N)
x = np.full(shape=N, fill_value=1.0, dtype=np.float32)
b = np.full(shape=M, fill_value=2.0, dtype=np.float32)

# Calculate expected y
y_expected = A@x + b

# Read the artifact_id from the JSON file
with open("artifact_id.json", "r", encoding="utf8") as f:
    data = json.load(f)
    artifact_id = data["artifact_id"]

# Instantiate a runner object using a context manager
with SdkRuntime(artifact_id, simulator=False) as runner:
    # Launch the init_and_compute function on device
    runner.launch('init_and_compute', nonblock=False)

    # Copy y back from device
    y_symbol = runner.get_id('y')
    y_result = np.zeros([1*1*M], dtype=np.float32)
    runner.memcpy_d2h(y_result, y_symbol, 0, 0, 1, 1, M, streaming=False,
    order=MemcpyOrder.ROW_MAJOR, data_type=MemcpyDataType.MEMCPY_32BIT, nonblock=False)

# Ensure that the result matches our expectation
np.testing.assert_allclose(y_result, y_expected, atol=0.01, rtol=0)
print("SUCCESS!")
Sample Output
$ python run.py
2023-10-11 00:56:21,281 DEBUG    ClusterClient: server=10.140.65.35:443, authority=cluster-server.cerebras1.lab.alcf.anl.gov, cert=/opt/cerebras/certs/tls.crt, client-lease-strategy=0, heartbeat_options=HeartBeatOptions(cycle_seconds=10, cycle_threshold=12, lease_duration_seconds_override=0), options=[('grpc.service_config', '{"methodConfig": [{"name": [{"service": "cluster.cluster_mgmt_pb.ClusterManagement"}], "retryPolicy": {"maxAttempts": 3, "initialBackoff": "3s", "maxBackoff": "10s", "backoffMultiplier": 2, "retryableStatusCodes": ["UNAVAILABLE"]}}]}'), ('grpc.enable_retries', 1), ('grpc.default_authority', 'cluster-server.cerebras1.lab.alcf.anl.gov')]
2023-10-11 00:56:21,304 INFO     Initiating a new SDK execute job against the cluster server
2023-10-11 00:56:21,316 DEBUG    Run meta is available at /srv/projects/datascience/sraskar/cs2/cs_sdk/gemv/run_meta.json.
2023-10-11 00:56:21,316 INFO     sdk_execute job id: wsjob-nm4gmz7jtq3khck2ltadbv, log path: /n1/wsjob/workdir/wsjob-nm4gmz7jtq3khck2ltadbv
2023-10-11 00:56:21,316 DEBUG    Starting heartbeat thread for wsjob-nm4gmz7jtq3khck2ltadbv. Heartbeat requests will be sent every 10 seconds.
2023-10-11 00:56:31,327 INFO     Poll ingress status: Waiting for coordinator to be ready.
2023-10-11 00:57:01,360 INFO     Ingress is ready.
2023-10-11 00:57:01,410 WARNING  There is no existing execute job record.
2023-10-11 00:57:01,410 WARNING  There is no existing execute job record.
2023-10-11 00:57:01,410 DEBUG    Cluster mgmt job handle: {'job_id': 'wsjob-nm4gmz7jtq3khck2ltadbv', 'service_authority': 'wsjob-nm4gmz7jtq3khck2ltadbv-coordinator-0.cluster-server.cerebras1.lab.alcf.anl.gov', 'service_url': '10.140.65.35:443', 'credentials_path': '/opt/cerebras/certs/tls.crt'}
SUCCESS!
2023-10-11 00:57:44,184 DEBUG    Signaling heartbeat thread to stop for wsjob-nm4gmz7jtq3khck2ltadbv