1

I tried training a BC algorithm using offline data and enabled the RL module in the algorithm configuration. I ran the code on Google Colab, which only provides 2 CPUs, and encountered the following error:

The following resource request cannot be scheduled right now: {'CPU': 1.0}. This is likely due to all cluster resources being claimed by actors.

If I disable the RL module, the BC algorithm runs without problems. However, when the RL module is enabled, the code gets stuck due to the CPU scheduling error above.

My question is:

  • How can I check which resources (CPUs, actors, etc.) are currently being scheduled or used?
  • How can I identify what might be causing this scheduling issue in my code?

Would really appreciate any suggestions or debugging tips!

My Environment

  • Platform: Google Colab (2 CPUs)
  • Ray version: 3.0.0.dev0
  • Python version: 3.10

My Code

import gymnasium as gym

from ray.rllib.algorithms.bc import BCConfig
from ray.rllib.core.rl_module.rl_module import RLModule, RLModuleSpec
from ray.rllib.core.testing.torch.bc_module import DiscreteBCTorchModule

config = (
    BCConfig()
    .api_stack(
        enable_rl_module_and_learner=True,
        enable_env_runner_and_connector_v2=True,
    )
    .environment("CartPole-v1")
    .learners(num_learners=0)
    .offline_data(
        input_ = "/content/cartpole/large.json",
        input_read_method = "read_json",
        dataset_num_iters_per_learner=1,
    )
    .training(lr=0.00001, gamma=0.99, beta=0.0)
    .rl_module(rl_module_spec=RLModuleSpec(module_class=DiscreteBCTorchModule))
    .evaluation(
        evaluation_interval=1,
        evaluation_num_env_runners=1,
        evaluation_duration=1,
    )
)


algo = config.build()
result = algo.train()

Output Message

2025-10-17 07:46:34,505 INFO worker.py:1783 -- Started a local Ray instance.
<trimmed unrelated content>

(autoscaler +52s) Warning: The following resource request cannot be scheduled right now: {'CPU': 1.0}. This is likely due to all cluster resources being claimed by actors. Consider creating fewer actors or adding more nodes to this Ray cluster.
(autoscaler +1m27s) Warning: The following resource request cannot be scheduled right now: {'CPU': 1.0}. This is likely due to all cluster resources being claimed by actors. Consider creating fewer actors or adding more nodes to this Ray cluster.
....

1 Answer 1

0

An rllib Algorithm has three parts than can run on extra CPUs (local or remote). These are training & evaluation EnvRunners and the learner(s).

With the classmethod Algorithm.default_resource_request(your_config) you get a list of dicts, which represent the ressource requests each algorithm instance needs - these are called bundles. Sum up the CPU entries and you how much CPUs are needed. If you do not use tune/train you might can ignore the very first bundle which represent the main process.

evaluation_num_env_runners=1 means you want one (remote) EnvRunner, which needs one free CPU. It appears this requirement cannot be met - hence you get the ressource warning and ray waits until one CPU becomes free.

If you do not have many cpus set num_env_runners=0, num_learners=0, evaluation_num_env_runners=0 and in case you use offline training num_offline_eval_runners=0.


Note that there are also the num_cpus_per... with which you can control the amount of cpus per EnvRunner/Learner. If you have access to a GPU then num_cpus_per_learner='auto' with take 0 CPUs otherwise 1.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for your suggestion. I will try it.
If it works for you. Mark the question as solved for everyone to see.
That helped, but I later found the issue wasn’t with the resource request — the message just misled me. Thanks anyway!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.