Skip to main content
2 votes
0 answers
204 views

Could you explain me what is wrong in this code ? I am trying to implement SARSA(lamda) with eligibility traces. using ReinforcementLearningBase, GridWorlds using PyPlot world = GridWorlds....
przel123's user avatar
0 votes
1 answer
294 views

I have a deep sarsa algorithm which work great on Pytorch on lunar-lander-v2 and I would use with Keras/Tensorflow. It use mini-batch of size 64 which are used 128 time to train at each episode. There ...
rdpdo's user avatar
  • 33
0 votes
1 answer
274 views

I am trying to implement a custom lunar lander environment by taking help from already existing LunarLanderv2. https://github.com/openai/gym/blob/master/gym/envs/box2d/lunar_lander.py I'm having a ...
Shan's user avatar
  • 1
1 vote
1 answer
1k views

I am solving the frozen lake game using Q-Learning and SARSA algorithms. I have the code implementation of the Q-Learning algorithm and that works. This code was taken from Chapter 5 of "Deep ...
ronanwa's user avatar
  • 11
0 votes
1 answer
98 views

I am implementing a SARSA reinforcement learning function which chooses an action following the same current policy updates its Q-values. This throws me the following error: TypeError: only size-1 ...
matheo-es's user avatar
0 votes
1 answer
658 views

I try to learn the concept of reinforcement learning at the moment. Hereby, I tried to implement the SARSA algorithm for the cart pole example using tensorflow. I compared my algorithm to algorithms ...
Ralf's user avatar
  • 73
1 vote
0 answers
71 views

I'm pretty new to Unity and Accord.Net but I'm currently making a small game in Unity and decided to see what I could do with some reinforcement learning to make it more interesting. Everything has ...
earlyLo's user avatar
  • 11
2 votes
0 answers
508 views

I‘m studying Reinforcement Learning and I’m facing a problem understanding the difference between SARSA, Q-Learning, expected SARSA, Double Q Learning and temporal difference. Can you please explain ...
Cooper's user avatar
  • 25
0 votes
1 answer
337 views

My problem is the following. I have a simple grid world: https://i.sstatic.net/xrhJw.png The agent starts at the initial state labeled with START, and the goal is to reach the terminal state labeled ...
Genesist's user avatar
1 vote
1 answer
415 views

I am reading Silver et al (2012) "Temporal-Difference Search in Computer Go", and trying to understand the update order for the eligibility trace algorithm. In the Algorithm 1 and 2 of the paper, ...
Kota Mori's user avatar
  • 6,762
2 votes
1 answer
3k views

I have a question about my own project for testing reinforcement learning technique. First let me explain you the purpose. I have an agent which can take 4 actions during 8 steps. At the end of this ...
T.L's user avatar
  • 21
3 votes
1 answer
510 views

I have a question on this SARSA FA. In input cell 142 I see this modified update w += alpha * (reward - discount * q_hat_next) * q_hat_grad where q_hat_next is Q(S', a') and q_hat_grad is the ...
Chuk Lee's user avatar
  • 3,608
0 votes
1 answer
336 views

So I've used following code to implement Q-learning in Unity: using System; using System.Collections; using System.Collections.Generic; using System.Linq; using UnityEngine; namespace QLearner { ...
user3631213's user avatar
6 votes
1 answer
4k views

I think I am messing something up. I always thought that: - 1-step TD on-policy = Sarsa - 1-step TD off-policy = Q-learning Thus I conclude: - n-step TD on-policy = n-step Sarsa - n-step TD off-...
siva's user avatar
  • 1,583
0 votes
1 answer
82 views

What does zeta represent in the critic method? I believe it keeps track of the state-action pairs and represents eligibility traces, which are a temporary record of the state-actions, but what exactly ...
anon's user avatar
  • 560

15 30 50 per page