Skip to main content
-2 votes
0 answers
80 views

Problem Description The RL code implements a Gymnasium-compatible environment (DoomEnv) for training reinforcement learning agents in DOOM Retro. It captures game state (observations) via shared ...
0 votes
1 answer
307 views

I wish to add a block in an existing mujoco environment, for eg in half cheetah. Can anyone guide as to how the xml file is to be modified for achieving this?
1 vote
1 answer
56 views

I'm creating a custom PyEnvironment in TensorFlow Agents to simulate the track and field decathlon. I've managed to create a functioning environment in the sense that I can use _step and _reset, but ...
Best practices
0 votes
2 replies
47 views

I have a reinforcement-learning agent which tries to move pressure (nodes) and velocity (edges) vectors via doing physical-simulations. At each step, agent runs a physical network with some settings ...
3 votes
0 answers
1k views

I was trying out developing multiagent reinforcement learning model using OpenAI stable baselines and gym as explained in this article. I am confused about how do we specify opponent agents. It seems ...
1 vote
0 answers
65 views

I am training a PPO agent using ML-Agents 4.0.2 on Unity 6.3 LTS and I am encountering a reproducible long-running stall that appears to be related to visual observations via RenderTexture. The setup ...
4 votes
0 answers
102 views

I wrote a PPO-based reinforcement learning code for the Gymnasium CarRacing-v3 environment. (The code was generated with the help of Gemini) However, even after 200,000 frames, the training does not ...
Advice
0 votes
0 replies
60 views

Can some please explain this vectorized Bellman equation to me in simple terms? The Bellman equation usually has "summation symbol", "summation symbol", "summation symbol"...
0 votes
0 answers
59 views

I am implementing Soft Actor-Critic (SAC) and I am confused about the policy update step. What I want is: When I update the policy (actor), I do not want the parameters of the Q-networks (critics) to ...
Advice
0 votes
0 replies
15 views

Body I am training a PPO agent on flappy_bird_gymnasium. Setup Algorithm: PPO (stable-baselines3) Environment: FlappyBird-v0 Two observation designs: Case A: High-dimensional lidar 180 ~ 2000 ray ...
0 votes
1 answer
10k views

I am trying to run gym-gazebo on Google Colaboratory. There is a problem to run gazebo server (gazebo without gui) on Colab. There was warning on display: Unable to create X window. Rendering will be ...
1 vote
0 answers
181 views

I am new to reinforcement learning. So, as an educational exercise, I am implementing the GRPO from scratch with PyTorch. My goal is mimic how TRL works, but boil it down to just the loss function and ...
3 votes
2 answers
2k views

I am using simple nn linear model(20,64,64,2) for deep reinforcement learning. This model I am using to approximate the policy gradients by the PPO algorithm. Hence the output layer gives 2 values, ...
Advice
0 votes
0 replies
45 views

I am working on a project to build a robust locomotion policy for quadruped robots on adverse terrains using reinforcement learning. My end goal is to deploy a trained RL policy on real hardware, and ...
0 votes
1 answer
56 views

Based on the TD0Estimator documentation, it is using 2 Tensordict keys to flag whether episode has ended or not. But i can't seems to find any indication when and how to use it. As an example, let's ...

15 30 50 per page
1
2 3 4 5
173