Skip to main content
Anyscale is a platform that helps scale AI workloads. Built on Ray, it adds observability, data governance, developer tools and optimization. You can run Anyscale in Nebius AI Cloud, deploying it on a Managed Service for Kubernetes cluster. To store application data and artifacts, you can set up a Network File System (NFS) server on a Compute virtual machine (VM), and an Object Storage bucket.

Costs

This tutorial includes the following chargeable resources:

Prerequisites

  1. Create an Anyscale account.
  2. Install and configure the following tools:

Steps

Prepare the environment

  1. Clone the GitHub repository and go to the anyscale directory:
    git clone https://github.com/nebius/nebius-solutions-library.git
    cd nebius-solutions-library/anyscale
    
  2. Edit the environment.sh file to add the IDs of your tenant, project and the region of the project to the environment variables at the top of the file.
  3. Execute environment.sh to export environment variables from it, so they persist for the current shell session:
    source ./environment.sh
    
  4. Make a copy of the configuration file:
    cp default.yaml.tpl default.yaml
    
  5. Create an SSH key for the NFS server and Anyscale nodes:
    ssh-keygen -t ed25519
    

Create storage for Anyscale

The Terraform configuration in the prepare directory creates an Object Storage bucket to store workload artifacts and an NFS server to store Anyscale workspace data (user code, configuration files, etc.).
  1. Edit default.yaml:
    • In .ssh_public_key, paste the contents of the public SSH key (~/.ssh/id_ed25519.pub).
    • In .nfs_server.nfs_size, set the size of the disk in GiB. It is a Network SSD IO M3 disk, its size must be a multiple of 93 GiB (e.g. 1023 GiB).
  2. Apply the Terraform configuration from the prepare directory:
    terraform -chdir=prepare init
    terraform -chdir=prepare apply
    

Deploy Anyscale

The Terraform configuration in the deploy directory creates a Managed Kubernetes cluster and node groups, and deploys the Anyscale application on the cluster.
  1. Register the Anyscale cloud:
    ./register.sh <cloud_name>
    
    Replace <cloud_name> with the name for your Anyscale cloud deployment that will be shown in Anyscale console. The output contains a cloud deployment ID that starts with cldrsrc_. Save it for the next step.
  2. Create an Anyscale API key in Anyscale console.
  3. Edit default.yaml:
    • In .anyscale.cloud_deployment_id, paste the cloud deployment ID that starts with cldrsrc_ which you obtained from the previous step.
    • In .anyscale.anyscale_cli_token, paste the Anyscale API key.
    • In k8s_cluster, configure the cluster. It should have at least one GPU node group with one node, and at least one non-GPU node. The default configuration creates an NVIDIA H100 node group with one single-GPU node and a non-GPU node group with one node. For details about the parameters, see the following articles and resources:
  4. Apply the Terraform configuration from the deploy directory:
    terraform -chdir=deploy init
    terraform -chdir=deploy apply
    

(Optional) Configure Anyscale

You can configure how Anyscale head node and worker nodes are selected. To do this, log in to Anyscale console, go to your workspace and then follow instructions in the next sections.

Force a non-GPU head node

Anyscale head node is a Kubernetes Pod that does not use GPUs. However, by default, it can be scheduled on a GPU node or a non-GPU node. You can force the head node to run on non-GPU nodes to save you the costs of provisioning GPU nodes. To do that, perform the following steps in your workspace in Anyscale console:
  1. On the Compute resources panel, under Head node, click https://mintcdn.com/nebius-ai-cloud/1Ha0sWR6e1mnIaHS/_assets/button-edit.svg?fit=max&auto=format&n=1Ha0sWR6e1mnIaHS&q=85&s=f0faacbf9f9ce2425edc683c5dc0f7f8.
  2. In the window that opens, expand Advanced config.
  3. Under Instance config, paste the node selector specification:
    {
      "spec": {
        "nodeSelector": {
          "node.kubernetes.io/instance-type": "cpu-d3"
        }
      }
    }
    
    The value of the node.kubernetes.io/instance-type annotation must match the platform specified in the .k8s_cluster.cpu_nodes_platform field of the anyscale/default.yaml file.
  4. Click Save.

Configure the selection of worker nodes

Anyscale allows automatic and manual modes of selecting worker nodes for your workspaces. To choose between these modes in Anyscale console, on the Compute resources panel, under Worker nodes, select or clear the Auto-select worker nodes checkbox:
  • When the checkbox is selected, Anyscale tries to provision worker nodes automatically. This works well when you run Anyscale and other workloads at the same time in the Managed Kubernetes cluster, because Anyscale workloads only reserve as many GPUs as they require. However, since the number of worker nodes is scaled on demand, provisioning new worker nodes can take some time.
  • When the checkbox is not selected, you select worker nodes manually. This is recommended for workloads that need to scale up fast, or if you need granular control over GPU usage.

Test the deployment

For details on testing the Anyscale deployment, see Anyscale resources:

How to delete the created resources

Some of the created resources are chargeable. If you do not need them, delete these resources so Nebius AI Cloud does not charge for them:
  1. In Anyscale console, delete all the workloads that use the deployment.
  2. Delete all objects from the Anyscale bucket. Its name starts with anyscale-.
  3. Delete the Managed Kubernetes cluster, NFS server and bucket by running the following commands in the anyscale directory of the cloned repository:
    terraform -chdir=deploy destroy
    terraform -chdir=prepare destroy