Instructions for gpt-neox
:
We include below a set of instructions to get EleutherAI/gpt-neox
running on Polaris.
A batch submission script for the following example is available here.
Warning
The instructions below should be ran directly from a compute node.
Explicitly, to request an interactive job (from polaris-login
):
Refer to job scheduling and execution for additional information.
-
Load and activate the base
conda
environment: -
We've installed the requirements for running
gpt-neox
into a virtual environment. To activate this environment, -
Clone the
EleutherAI/gpt-neox
repository if it doesn't already exist: -
Navigate into the
gpt-neox
directory:
Note
The remaining instructions assume you're inside the
gpt-neox
directory -
Create a DeepSpeed compliant
hostfile
(each line is formatted ashostname, slots=N
): -
Create a
.deepspeed_env
file to ensure a consistent environment across all workers -
Prepare data:
-
Train: