2,587 questions
-2
votes
0
answers
80
views
Why does my model keep getting stuck in one spot? [closed]
Problem Description
The RL code implements a Gymnasium-compatible environment (DoomEnv) for training reinforcement learning agents in DOOM Retro. It captures game state (observations) via shared ...
0
votes
1
answer
307
views
Modifying Existing Mujoco Environment
I wish to add a block in an existing mujoco environment, for eg in half cheetah. Can anyone guide as to how the xml file is to be modified for achieving this?
1
vote
1
answer
56
views
Custom PyEnvironment time_step and time_step_spec do not match
I'm creating a custom PyEnvironment in TensorFlow Agents to simulate the track and field decathlon. I've managed to create a functioning environment in the sense that I can use _step and _reset, but ...
Best practices
0
votes
2
replies
47
views
Reward Function Design for Ideal Value of two Vectors
I have a reinforcement-learning agent which tries to move pressure (nodes) and velocity (edges) vectors via doing physical-simulations. At each step, agent runs a physical network with some settings ...
3
votes
0
answers
1k
views
Understanding multi agent learning in OpenAI gym and stable-baselines
I was trying out developing multiagent reinforcement learning model using OpenAI stable baselines and gym as explained in this article.
I am confused about how do we specify opponent agents.
It seems ...
1
vote
0
answers
65
views
Unity: ML-Agents training freeze with RenderTexture on Unity 6.3 LTS and ML-Agents 4.0.2 (eventually resumes)
I am training a PPO agent using ML-Agents 4.0.2 on Unity 6.3 LTS and I am encountering a reproducible long-running stall that appears to be related to visual observations via RenderTexture.
The setup ...
4
votes
0
answers
102
views
Implemented PPO algorithm fails to train
I wrote a PPO-based reinforcement learning code for the Gymnasium CarRacing-v3 environment.
(The code was generated with the help of Gemini)
However, even after 200,000 frames, the training does not ...
Advice
0
votes
0
replies
60
views
Please explain this vectorized Bellman's Equation
Can some please explain this vectorized Bellman equation to me in simple terms?
The Bellman equation usually has "summation symbol", "summation symbol", "summation symbol"...
0
votes
0
answers
59
views
SAC Implementation
I am implementing Soft Actor-Critic (SAC) and I am confused about the policy update step.
What I want is:
When I update the policy (actor), I do not want the parameters of the Q-networks (critics) to ...
Advice
0
votes
0
replies
15
views
PPO performance drops drastically when reducing observation from 180-dim lidar to 12-dim handcrafted state (Flappy Bird)
Body
I am training a PPO agent on flappy_bird_gymnasium.
Setup
Algorithm: PPO (stable-baselines3)
Environment: FlappyBird-v0
Two observation designs:
Case A: High-dimensional lidar
180 ~ 2000 ray ...
0
votes
1
answer
10k
views
Run gym-gazebo on Google Colaboratory
I am trying to run gym-gazebo on Google Colaboratory.
There is a problem to run gazebo server (gazebo without gui) on Colab.
There was warning on display: Unable to create X window. Rendering will be ...
1
vote
0
answers
181
views
Custom GRPO Trainer not Learning
I am new to reinforcement learning. So, as an educational exercise, I am implementing the GRPO from scratch with PyTorch. My goal is mimic how TRL works, but boil it down to just the loss function and ...
3
votes
2
answers
2k
views
How to clamp output of neuron in pytorch
I am using simple nn linear model(20,64,64,2) for deep reinforcement learning. This model I am using to approximate the policy gradients by the PPO algorithm. Hence the output layer gives 2 values, ...
Advice
0
votes
0
replies
45
views
What should I learn to deploy low-level RL locomotion policies on quadruped robots?
I am working on a project to build a robust locomotion policy for quadruped robots on adverse terrains using reinforcement learning.
My end goal is to deploy a trained RL policy on real hardware, and ...
0
votes
1
answer
56
views
When using PyTorch torchrl TD0Estimator, how to handle the "done" and "terminated" flag
Based on the TD0Estimator documentation, it is using 2 Tensordict keys to flag whether episode has ended or not. But i can't seems to find any indication when and how to use it.
As an example, let's ...