2,587 questions
-2
votes
0
answers
85
views
Why does my model keep getting stuck in one spot? [closed]
Problem Description
The RL code implements a Gymnasium-compatible environment (DoomEnv) for training reinforcement learning agents in DOOM Retro. It captures game state (observations) via shared ...
1
vote
1
answer
56
views
Custom PyEnvironment time_step and time_step_spec do not match
I'm creating a custom PyEnvironment in TensorFlow Agents to simulate the track and field decathlon. I've managed to create a functioning environment in the sense that I can use _step and _reset, but ...
Best practices
0
votes
2
replies
47
views
Reward Function Design for Ideal Value of two Vectors
I have a reinforcement-learning agent which tries to move pressure (nodes) and velocity (edges) vectors via doing physical-simulations. At each step, agent runs a physical network with some settings ...
1
vote
0
answers
65
views
Unity: ML-Agents training freeze with RenderTexture on Unity 6.3 LTS and ML-Agents 4.0.2 (eventually resumes)
I am training a PPO agent using ML-Agents 4.0.2 on Unity 6.3 LTS and I am encountering a reproducible long-running stall that appears to be related to visual observations via RenderTexture.
The setup ...
4
votes
0
answers
102
views
Implemented PPO algorithm fails to train
I wrote a PPO-based reinforcement learning code for the Gymnasium CarRacing-v3 environment.
(The code was generated with the help of Gemini)
However, even after 200,000 frames, the training does not ...
Advice
0
votes
0
replies
60
views
Please explain this vectorized Bellman's Equation
Can some please explain this vectorized Bellman equation to me in simple terms?
The Bellman equation usually has "summation symbol", "summation symbol", "summation symbol"...
1
vote
0
answers
181
views
Custom GRPO Trainer not Learning
I am new to reinforcement learning. So, as an educational exercise, I am implementing the GRPO from scratch with PyTorch. My goal is mimic how TRL works, but boil it down to just the loss function and ...
Advice
0
votes
1
replies
111
views
How to read a large Python project (for example, a project of Deep Learning or Reinforcement Learning)
I've downloaded many Python projects about Reinforcement Learning from Github, but each takes me too much time to read.
It's easy to comprehend a simple Python project with only a few *.py files, but ...
0
votes
1
answer
56
views
When using PyTorch torchrl TD0Estimator, how to handle the "done" and "terminated" flag
Based on the TD0Estimator documentation, it is using 2 Tensordict keys to flag whether episode has ended or not. But i can't seems to find any indication when and how to use it.
As an example, let's ...
0
votes
1
answer
207
views
Getting different results across different machines while training RL
While training my RL algorithm using SBX, I am getting different results across my HPC cluster and PC. However, I did find that results consistently are same within the same machine. They just diverge ...
Advice
0
votes
0
replies
45
views
What should I learn to deploy low-level RL locomotion policies on quadruped robots?
I am working on a project to build a robust locomotion policy for quadruped robots on adverse terrains using reinforcement learning.
My end goal is to deploy a trained RL policy on real hardware, and ...
Advice
0
votes
0
replies
41
views
When using TensorDictPrioritizedReplayBuffer, should I apply the priority weight manually or not?
With Prioritized Experience Replay (PER), we use Beta parameter, so we can find weight that will be used to offset the bias introduced by PER. Now, with PyTorch's TensorDictPrioritizedReplayBuffer, I ...
10
votes
4
answers
11k
views
Module 'numpy' has no attribute 'bool8' In cartpole problem openai gym
I'm beginner & trying to run this simple code but it is giving me this exception "module 'numpy' has no attribute 'bool8'" as you can see in screenshot below. Gym version is 0.26.2 & ...
1
vote
0
answers
88
views
KeyError: 'advantages' in PPO MARL using Ray RLLib
I use ray 2.50.1 to implement a MARL model using PPO.
However, I meet the following problem:
'advantages'
KeyError: 'advantages'
During handling of the above exception, another exception occurred:
...
Advice
1
vote
0
replies
42
views
How can I design “story-driven NPCs” in a reinforcement-learned environment? Looking for development directions and architectural advice
I’m working on a thesis about "story-driven NPCs in a reinforcement-learning world", and I’m building a small multi-agent RL environment as a prototype. However, I’m unsure how to push the ...