I tried training a BC algorithm using offline data and enabled the RL module in the algorithm configuration. I ran the code on Google Colab, which only provides 2 CPUs, and encountered the following error:
The following resource request cannot be scheduled right now: {'CPU': 1.0}. This is likely due to all cluster resources being claimed by actors.
If I disable the RL module, the BC algorithm runs without problems. However, when the RL module is enabled, the code gets stuck due to the CPU scheduling error above.
My question is:
- How can I check which resources (CPUs, actors, etc.) are currently being scheduled or used?
- How can I identify what might be causing this scheduling issue in my code?
Would really appreciate any suggestions or debugging tips!
My Environment
- Platform: Google Colab (2 CPUs)
- Ray version: 3.0.0.dev0
- Python version: 3.10
My Code
import gymnasium as gym
from ray.rllib.algorithms.bc import BCConfig
from ray.rllib.core.rl_module.rl_module import RLModule, RLModuleSpec
from ray.rllib.core.testing.torch.bc_module import DiscreteBCTorchModule
config = (
BCConfig()
.api_stack(
enable_rl_module_and_learner=True,
enable_env_runner_and_connector_v2=True,
)
.environment("CartPole-v1")
.learners(num_learners=0)
.offline_data(
input_ = "/content/cartpole/large.json",
input_read_method = "read_json",
dataset_num_iters_per_learner=1,
)
.training(lr=0.00001, gamma=0.99, beta=0.0)
.rl_module(rl_module_spec=RLModuleSpec(module_class=DiscreteBCTorchModule))
.evaluation(
evaluation_interval=1,
evaluation_num_env_runners=1,
evaluation_duration=1,
)
)
algo = config.build()
result = algo.train()
Output Message
2025-10-17 07:46:34,505 INFO worker.py:1783 -- Started a local Ray instance.
<trimmed unrelated content>
(autoscaler +52s) Warning: The following resource request cannot be scheduled right now: {'CPU': 1.0}. This is likely due to all cluster resources being claimed by actors. Consider creating fewer actors or adding more nodes to this Ray cluster.
(autoscaler +1m27s) Warning: The following resource request cannot be scheduled right now: {'CPU': 1.0}. This is likely due to all cluster resources being claimed by actors. Consider creating fewer actors or adding more nodes to this Ray cluster.
....