2,587 questions
-2
votes
0
answers
74
views
Why does my model keep getting stuck in one spot? [closed]
Problem Description
The RL code implements a Gymnasium-compatible environment (DoomEnv) for training reinforcement learning agents in DOOM Retro. It captures game state (observations) via shared ...
1
vote
1
answer
56
views
Custom PyEnvironment time_step and time_step_spec do not match
I'm creating a custom PyEnvironment in TensorFlow Agents to simulate the track and field decathlon. I've managed to create a functioning environment in the sense that I can use _step and _reset, but ...
Best practices
0
votes
2
replies
47
views
Reward Function Design for Ideal Value of two Vectors
I have a reinforcement-learning agent which tries to move pressure (nodes) and velocity (edges) vectors via doing physical-simulations. At each step, agent runs a physical network with some settings ...
1
vote
0
answers
65
views
Unity: ML-Agents training freeze with RenderTexture on Unity 6.3 LTS and ML-Agents 4.0.2 (eventually resumes)
I am training a PPO agent using ML-Agents 4.0.2 on Unity 6.3 LTS and I am encountering a reproducible long-running stall that appears to be related to visual observations via RenderTexture.
The setup ...
Advice
0
votes
0
replies
60
views
Please explain this vectorized Bellman's Equation
Can some please explain this vectorized Bellman equation to me in simple terms?
The Bellman equation usually has "summation symbol", "summation symbol", "summation symbol"...
4
votes
0
answers
102
views
Implemented PPO algorithm fails to train
I wrote a PPO-based reinforcement learning code for the Gymnasium CarRacing-v3 environment.
(The code was generated with the help of Gemini)
However, even after 200,000 frames, the training does not ...
0
votes
0
answers
59
views
SAC Implementation
I am implementing Soft Actor-Critic (SAC) and I am confused about the policy update step.
What I want is:
When I update the policy (actor), I do not want the parameters of the Q-networks (critics) to ...
Advice
0
votes
0
replies
15
views
PPO performance drops drastically when reducing observation from 180-dim lidar to 12-dim handcrafted state (Flappy Bird)
Body
I am training a PPO agent on flappy_bird_gymnasium.
Setup
Algorithm: PPO (stable-baselines3)
Environment: FlappyBird-v0
Two observation designs:
Case A: High-dimensional lidar
180 ~ 2000 ray ...
Advice
0
votes
0
replies
45
views
What should I learn to deploy low-level RL locomotion policies on quadruped robots?
I am working on a project to build a robust locomotion policy for quadruped robots on adverse terrains using reinforcement learning.
My end goal is to deploy a trained RL policy on real hardware, and ...
1
vote
0
answers
181
views
Custom GRPO Trainer not Learning
I am new to reinforcement learning. So, as an educational exercise, I am implementing the GRPO from scratch with PyTorch. My goal is mimic how TRL works, but boil it down to just the loss function and ...
0
votes
1
answer
56
views
When using PyTorch torchrl TD0Estimator, how to handle the "done" and "terminated" flag
Based on the TD0Estimator documentation, it is using 2 Tensordict keys to flag whether episode has ended or not. But i can't seems to find any indication when and how to use it.
As an example, let's ...
Advice
0
votes
1
replies
111
views
How to read a large Python project (for example, a project of Deep Learning or Reinforcement Learning)
I've downloaded many Python projects about Reinforcement Learning from Github, but each takes me too much time to read.
It's easy to comprehend a simple Python project with only a few *.py files, but ...
Advice
0
votes
0
replies
41
views
When using TensorDictPrioritizedReplayBuffer, should I apply the priority weight manually or not?
With Prioritized Experience Replay (PER), we use Beta parameter, so we can find weight that will be used to offset the bias introduced by PER. Now, with PyTorch's TensorDictPrioritizedReplayBuffer, I ...
Advice
1
vote
0
replies
42
views
How can I design “story-driven NPCs” in a reinforcement-learned environment? Looking for development directions and architectural advice
I’m working on a thesis about "story-driven NPCs in a reinforcement-learning world", and I’m building a small multi-agent RL environment as a prototype. However, I’m unsure how to push the ...
1
vote
0
answers
88
views
KeyError: 'advantages' in PPO MARL using Ray RLLib
I use ray 2.50.1 to implement a MARL model using PPO.
However, I meet the following problem:
'advantages'
KeyError: 'advantages'
During handling of the above exception, another exception occurred:
...