Trending 'reinforcement-learning' questions

-2 votes

0 answers

85 views

Why does my model keep getting stuck in one spot? [closed]

Problem Description The RL code implements a Gymnasium-compatible environment (DoomEnv) for training reinforcement learning agents in DOOM Retro. It captures game state (observations) via shared ...

Steven Harrison III

1

asked 13 hours ago

1 vote

1 answer

56 views

Custom PyEnvironment time_step and time_step_spec do not match

I'm creating a custom PyEnvironment in TensorFlow Agents to simulate the track and field decathlon. I've managed to create a functioning environment in the sense that I can use _step and _reset, but ...

Perry

31

asked Apr 2 at 16:59

Best practices

0 votes

2 replies

47 views

Reward Function Design for Ideal Value of two Vectors

I have a reinforcement-learning agent which tries to move pressure (nodes) and velocity (edges) vectors via doing physical-simulations. At each step, agent runs a physical network with some settings ...

oakca

1,588

asked Mar 17 at 16:07

1 vote

0 answers

65 views

Unity: ML-Agents training freeze with RenderTexture on Unity 6.3 LTS and ML-Agents 4.0.2 (eventually resumes)

I am training a PPO agent using ML-Agents 4.0.2 on Unity 6.3 LTS and I am encountering a reproducible long-running stall that appears to be related to visual observations via RenderTexture. The setup ...

Ling

505

asked Mar 2 at 11:04

4 votes

0 answers

102 views

Implemented PPO algorithm fails to train

I wrote a PPO-based reinforcement learning code for the Gymnasium CarRacing-v3 environment. (The code was generated with the help of Gemini) However, even after 200,000 frames, the training does not ...

Rai Madu

49

asked Feb 22 at 3:31

Advice

0 votes

0 replies

60 views

Please explain this vectorized Bellman's Equation

Can some please explain this vectorized Bellman equation to me in simple terms? The Bellman equation usually has "summation symbol", "summation symbol", "summation symbol"...

Khosro Pourkavoos

1

asked Feb 22 at 19:10

1 vote

0 answers

181 views

Custom GRPO Trainer not Learning

I am new to reinforcement learning. So, as an educational exercise, I am implementing the GRPO from scratch with PyTorch. My goal is mimic how TRL works, but boil it down to just the loss function and ...

csnate

1,671

asked Dec 15, 2025 at 1:07

Advice

0 votes

1 replies

111 views

How to read a large Python project (for example, a project of Deep Learning or Reinforcement Learning)

I've downloaded many Python projects about Reinforcement Learning from Github, but each takes me too much time to read. It's easy to comprehend a simple Python project with only a few *.py files, but ...

Xingrui Zhuang

27

asked Nov 28, 2025 at 9:09

0 votes

1 answer

56 views

When using PyTorch torchrl TD0Estimator, how to handle the "done" and "terminated" flag

Based on the TD0Estimator documentation, it is using 2 Tensordict keys to flag whether episode has ended or not. But i can't seems to find any indication when and how to use it. As an example, let's ...

Bejo

13

asked Dec 11, 2025 at 5:16

0 votes

1 answer

207 views

Getting different results across different machines while training RL

While training my RL algorithm using SBX, I am getting different results across my HPC cluster and PC. However, I did find that results consistently are same within the same machine. They just diverge ...

desert_ranger

1,891

asked Aug 28, 2025 at 20:42

Advice

0 votes

0 replies

45 views

What should I learn to deploy low-level RL locomotion policies on quadruped robots?

I am working on a project to build a robust locomotion policy for quadruped robots on adverse terrains using reinforcement learning. My end goal is to deploy a trained RL policy on real hardware, and ...

KANISHK KHANDELWAL

11

asked Dec 22, 2025 at 11:12

Advice

0 votes

0 replies

41 views

When using TensorDictPrioritizedReplayBuffer, should I apply the priority weight manually or not?

With Prioritized Experience Replay (PER), we use Beta parameter, so we can find weight that will be used to offset the bias introduced by PER. Now, with PyTorch's TensorDictPrioritizedReplayBuffer, I ...

Bejo

13

asked Nov 25, 2025 at 6:43

10 votes

4 answers

11k views

Module 'numpy' has no attribute 'bool8' In cartpole problem openai gym

I'm beginner & trying to run this simple code but it is giving me this exception "module 'numpy' has no attribute 'bool8'" as you can see in screenshot below. Gym version is 0.26.2 & ...

Jitender

345

asked Oct 6, 2024 at 4:19

1 vote

0 answers

88 views

KeyError: 'advantages' in PPO MARL using Ray RLLib

I use ray 2.50.1 to implement a MARL model using PPO. However, I meet the following problem: 'advantages' KeyError: 'advantages' During handling of the above exception, another exception occurred: ...

geniusadven

11

asked Oct 22, 2025 at 3:49

Advice

1 vote

0 replies

42 views

How can I design “story-driven NPCs” in a reinforcement-learned environment? Looking for development directions and architectural advice

I’m working on a thesis about "story-driven NPCs in a reinforcement-learning world", and I’m building a small multi-agent RL environment as a prototype. However, I’m unsure how to push the ...

DucTruong

1

asked Nov 21, 2025 at 16:38

Collectives™ on Stack Overflow

Why does my model keep getting stuck in one spot? [closed]

Custom PyEnvironment time_step and time_step_spec do not match

Reward Function Design for Ideal Value of two Vectors

Unity: ML-Agents training freeze with RenderTexture on Unity 6.3 LTS and ML-Agents 4.0.2 (eventually resumes)

Implemented PPO algorithm fails to train

Please explain this vectorized Bellman's Equation

Custom GRPO Trainer not Learning

How to read a large Python project (for example, a project of Deep Learning or Reinforcement Learning)

When using PyTorch torchrl TD0Estimator, how to handle the "done" and "terminated" flag

Getting different results across different machines while training RL

What should I learn to deploy low-level RL locomotion policies on quadruped robots?

When using TensorDictPrioritizedReplayBuffer, should I apply the priority weight manually or not?

Module 'numpy' has no attribute 'bool8' In cartpole problem openai gym

KeyError: 'advantages' in PPO MARL using Ray RLLib

How can I design “story-driven NPCs” in a reinforcement-learned environment? Looking for development directions and architectural advice

Hot Network Questions