Skip to main content
-2 votes
0 answers
85 views

Problem Description The RL code implements a Gymnasium-compatible environment (DoomEnv) for training reinforcement learning agents in DOOM Retro. It captures game state (observations) via shared ...
Steven Harrison III's user avatar
1 vote
1 answer
56 views

I'm creating a custom PyEnvironment in TensorFlow Agents to simulate the track and field decathlon. I've managed to create a functioning environment in the sense that I can use _step and _reset, but ...
Perry's user avatar
  • 31
Best practices
0 votes
2 replies
47 views

I have a reinforcement-learning agent which tries to move pressure (nodes) and velocity (edges) vectors via doing physical-simulations. At each step, agent runs a physical network with some settings ...
oakca's user avatar
  • 1,588
1 vote
0 answers
65 views

I am training a PPO agent using ML-Agents 4.0.2 on Unity 6.3 LTS and I am encountering a reproducible long-running stall that appears to be related to visual observations via RenderTexture. The setup ...
Ling's user avatar
  • 505
4 votes
0 answers
102 views

I wrote a PPO-based reinforcement learning code for the Gymnasium CarRacing-v3 environment. (The code was generated with the help of Gemini) However, even after 200,000 frames, the training does not ...
Rai Madu's user avatar
Advice
0 votes
0 replies
60 views

Can some please explain this vectorized Bellman equation to me in simple terms? The Bellman equation usually has "summation symbol", "summation symbol", "summation symbol"...
Khosro Pourkavoos's user avatar
1 vote
0 answers
181 views

I am new to reinforcement learning. So, as an educational exercise, I am implementing the GRPO from scratch with PyTorch. My goal is mimic how TRL works, but boil it down to just the loss function and ...
csnate's user avatar
  • 1,671
Advice
0 votes
1 replies
111 views

I've downloaded many Python projects about Reinforcement Learning from Github, but each takes me too much time to read. It's easy to comprehend a simple Python project with only a few *.py files, but ...
Xingrui Zhuang's user avatar
0 votes
1 answer
56 views

Based on the TD0Estimator documentation, it is using 2 Tensordict keys to flag whether episode has ended or not. But i can't seems to find any indication when and how to use it. As an example, let's ...
Bejo's user avatar
  • 13
0 votes
1 answer
207 views

While training my RL algorithm using SBX, I am getting different results across my HPC cluster and PC. However, I did find that results consistently are same within the same machine. They just diverge ...
desert_ranger's user avatar
Advice
0 votes
0 replies
45 views

I am working on a project to build a robust locomotion policy for quadruped robots on adverse terrains using reinforcement learning. My end goal is to deploy a trained RL policy on real hardware, and ...
KANISHK KHANDELWAL's user avatar
Advice
0 votes
0 replies
41 views

With Prioritized Experience Replay (PER), we use Beta parameter, so we can find weight that will be used to offset the bias introduced by PER. Now, with PyTorch's TensorDictPrioritizedReplayBuffer, I ...
Bejo's user avatar
  • 13
10 votes
4 answers
11k views

I'm beginner & trying to run this simple code but it is giving me this exception "module 'numpy' has no attribute 'bool8'" as you can see in screenshot below. Gym version is 0.26.2 & ...
Jitender's user avatar
  • 345
1 vote
0 answers
88 views

I use ray 2.50.1 to implement a MARL model using PPO. However, I meet the following problem: 'advantages' KeyError: 'advantages' During handling of the above exception, another exception occurred: ...
geniusadven's user avatar
Advice
1 vote
0 replies
42 views

I’m working on a thesis about "story-driven NPCs in a reinforcement-learning world", and I’m building a small multi-agent RL environment as a prototype. However, I’m unsure how to push the ...
DucTruong's user avatar

15 30 50 per page
1
2 3 4 5
173