33 questions
2
votes
0
answers
204
views
Implementing Sarsa(lambda) - Gridworld - in Julia language
Could you explain me what is wrong in this code ? I am trying to implement SARSA(lamda) with eligibility traces.
using ReinforcementLearningBase, GridWorlds
using PyPlot
world = GridWorlds....
0
votes
1
answer
294
views
Problem with Deep Sarsa algorithm which work with pytorch (Adam optimizer) but not with keras/Tensorflow (Adam optimizer)
I have a deep sarsa algorithm which work great on Pytorch on lunar-lander-v2 and I would use with Keras/Tensorflow. It use mini-batch of size 64 which are used 128 time to train at each episode.
There ...
0
votes
1
answer
274
views
Helipad Co-ordinates of LunarLander v2 openai gym
I am trying to implement a custom lunar lander environment by taking help from already existing LunarLanderv2. https://github.com/openai/gym/blob/master/gym/envs/box2d/lunar_lander.py
I'm having a ...
1
vote
1
answer
1k
views
Implementing SARSA from Q-Learning algorithm in the frozen lake game
I am solving the frozen lake game using Q-Learning and SARSA algorithms. I have the code implementation of the Q-Learning algorithm and that works. This code was taken from Chapter 5 of "Deep ...
0
votes
1
answer
98
views
Converting to Python scalars
I am implementing a SARSA reinforcement learning function which chooses an action following the same current policy updates its Q-values.
This throws me the following error:
TypeError: only size-1 ...
0
votes
1
answer
658
views
SARSA implementation with tensorflow
I try to learn the concept of reinforcement learning at the moment. Hereby, I tried to implement the SARSA algorithm for the cart pole example using tensorflow. I compared my algorithm to algorithms ...
1
vote
0
answers
71
views
Can not save Sarsa in Accord.NET
I'm pretty new to Unity and Accord.Net but I'm currently making a small game in Unity and decided to see what I could do with some reinforcement learning to make it more interesting. Everything has ...
2
votes
0
answers
508
views
is this true ? what about Expected SARSA and double Q-Learning?
I‘m studying Reinforcement Learning and I’m facing a problem understanding the difference between SARSA, Q-Learning, expected SARSA, Double Q Learning and temporal difference. Can you please explain ...
0
votes
1
answer
337
views
Teach robot to collect items in grid world before reach terminal state by using reinforcement learning
My problem is the following. I have a simple grid world:
https://i.sstatic.net/xrhJw.png
The agent starts at the initial state labeled with START, and the goal is to reach the terminal state labeled ...
1
vote
1
answer
415
views
Eligibility trace algorithm, the update order
I am reading Silver et al (2012) "Temporal-Difference Search in Computer Go", and trying to understand the update order for the eligibility trace algorithm.
In the Algorithm 1 and 2 of the paper, ...
2
votes
1
answer
3k
views
Sarsa and Q Learning (reinforcement learning) don't converge optimal policy
I have a question about my own project for testing reinforcement learning technique. First let me explain you the purpose. I have an agent which can take 4 actions during 8 steps. At the end of this ...
3
votes
1
answer
510
views
SARSA value approximation for Cart Pole
I have a question on this SARSA FA.
In input cell 142 I see this modified update
w += alpha * (reward - discount * q_hat_next) * q_hat_grad
where q_hat_next is Q(S', a') and q_hat_grad is the ...
0
votes
1
answer
336
views
Implementing SARSA in Unity
So I've used following code to implement Q-learning in Unity:
using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using UnityEngine;
namespace QLearner
{
...
6
votes
1
answer
4k
views
Why is there no n-step Q-learning algorithm in Sutton's RL book?
I think I am messing something up.
I always thought that:
- 1-step TD on-policy = Sarsa
- 1-step TD off-policy = Q-learning
Thus I conclude:
- n-step TD on-policy = n-step Sarsa
- n-step TD off-...
0
votes
1
answer
82
views
Zeta Variable of SARSA(lamda)
What does zeta represent in the critic method? I believe it keeps track of the state-action pairs and represents eligibility traces, which are a temporary record of the state-actions, but what exactly ...