Recently Active 'reinforcement-learning' Questions

-2 votes

0 answers

80 views

Why does my model keep getting stuck in one spot? [closed]

Problem Description The RL code implements a Gymnasium-compatible environment (DoomEnv) for training reinforcement learning agents in DOOM Retro. It captures game state (observations) via shared ...

Steven Harrison III

1

modified 4 hours ago

0 votes

1 answer

307 views

Modifying Existing Mujoco Environment

I wish to add a block in an existing mujoco environment, for eg in half cheetah. Can anyone guide as to how the xml file is to be modified for achieving this?

Divyansh Gupta

519

answered Apr 20 at 8:28

1 vote

1 answer

56 views

Custom PyEnvironment time_step and time_step_spec do not match

I'm creating a custom PyEnvironment in TensorFlow Agents to simulate the track and field decathlon. I've managed to create a functioning environment in the sense that I can use _step and _reset, but ...

Perry

31

answered Apr 4 at 1:24

Best practices

0 votes

2 replies

47 views

Reward Function Design for Ideal Value of two Vectors

I have a reinforcement-learning agent which tries to move pressure (nodes) and velocity (edges) vectors via doing physical-simulations. At each step, agent runs a physical network with some settings ...

oakca

1,588

modified Mar 18 at 20:16

3 votes

0 answers

1k views

Understanding multi agent learning in OpenAI gym and stable-baselines

I was trying out developing multiagent reinforcement learning model using OpenAI stable baselines and gym as explained in this article. I am confused about how do we specify opponent agents. It seems ...

Jovan Marinovic

1

modified Mar 15 at 20:34

1 vote

0 answers

65 views

Unity: ML-Agents training freeze with RenderTexture on Unity 6.3 LTS and ML-Agents 4.0.2 (eventually resumes)

I am training a PPO agent using ML-Agents 4.0.2 on Unity 6.3 LTS and I am encountering a reproducible long-running stall that appears to be related to visual observations via RenderTexture. The setup ...

Ling

505

asked Mar 2 at 11:04

4 votes

0 answers

102 views

Implemented PPO algorithm fails to train

I wrote a PPO-based reinforcement learning code for the Gymnasium CarRacing-v3 environment. (The code was generated with the help of Gemini) However, even after 200,000 frames, the training does not ...

desertnaut

60.9k

modified Feb 24 at 14:42

Advice

0 votes

0 replies

60 views

Please explain this vectorized Bellman's Equation

Can some please explain this vectorized Bellman equation to me in simple terms? The Bellman equation usually has "summation symbol", "summation symbol", "summation symbol"...

desertnaut

60.9k

modified Feb 24 at 14:40

0 votes

0 answers

59 views

SAC Implementation

I am implementing Soft Actor-Critic (SAC) and I am confused about the policy update step. What I want is: When I update the policy (actor), I do not want the parameters of the Q-networks (critics) to ...

user32396289

1

asked Feb 21 at 11:57

Advice

0 votes

0 replies

15 views

PPO performance drops drastically when reducing observation from 180-dim lidar to 12-dim handcrafted state (Flappy Bird)

Body I am training a PPO agent on flappy_bird_gymnasium. Setup Algorithm: PPO (stable-baselines3) Environment: FlappyBird-v0 Two observation designs: Case A: High-dimensional lidar 180 ~ 2000 ray ...

游立揚

1

asked Jan 25 at 3:33

0 votes

1 answer

10k views

Run gym-gazebo on Google Colaboratory

I am trying to run gym-gazebo on Google Colaboratory. There is a problem to run gazebo server (gazebo without gui) on Colab. There was warning on display: Unable to create X window. Rendering will be ...

CommunityBot

1

modified Jan 18 at 4:05

1 vote

0 answers

181 views

Custom GRPO Trainer not Learning

I am new to reinforcement learning. So, as an educational exercise, I am implementing the GRPO from scratch with PyTorch. My goal is mimic how TRL works, but boil it down to just the loss function and ...

President James K. Polk

42.3k

modified Jan 14 at 16:23

3 votes

2 answers

2k views

How to clamp output of neuron in pytorch

I am using simple nn linear model(20,64,64,2) for deep reinforcement learning. This model I am using to approximate the policy gradients by the PPO algorithm. Hence the output layer gives 2 values, ...

Spatz

20.6k

modified Dec 26, 2025 at 18:03

Advice

0 votes

0 replies

45 views

What should I learn to deploy low-level RL locomotion policies on quadruped robots?

I am working on a project to build a robust locomotion policy for quadruped robots on adverse terrains using reinforcement learning. My end goal is to deploy a trained RL policy on real hardware, and ...

KANISHK KHANDELWAL

11

asked Dec 22, 2025 at 11:12

0 votes

1 answer

56 views

When using PyTorch torchrl TD0Estimator, how to handle the "done" and "terminated" flag

Based on the TD0Estimator documentation, it is using 2 Tensordict keys to flag whether episode has ended or not. But i can't seems to find any indication when and how to use it. As an example, let's ...

Bejo

13

modified Dec 11, 2025 at 5:37

Collectives™ on Stack Overflow

Why does my model keep getting stuck in one spot? [closed]

Modifying Existing Mujoco Environment

Custom PyEnvironment time_step and time_step_spec do not match

Reward Function Design for Ideal Value of two Vectors

Understanding multi agent learning in OpenAI gym and stable-baselines

Unity: ML-Agents training freeze with RenderTexture on Unity 6.3 LTS and ML-Agents 4.0.2 (eventually resumes)

Implemented PPO algorithm fails to train

Please explain this vectorized Bellman's Equation

SAC Implementation

PPO performance drops drastically when reducing observation from 180-dim lidar to 12-dim handcrafted state (Flappy Bird)

Run gym-gazebo on Google Colaboratory

Custom GRPO Trainer not Learning

How to clamp output of neuron in pytorch

What should I learn to deploy low-level RL locomotion policies on quadruped robots?

When using PyTorch torchrl TD0Estimator, how to handle the "done" and "terminated" flag

Hot Network Questions