Skip to main content
158 votes
8 answers
93k views

Although I know that SARSA is on-policy while Q-learning is off-policy, when looking at their formulas it's hard (to me) to see any difference between these two algorithms. According to the book ...
Ælex's user avatar
  • 15k
149 votes
6 answers
42k views

I'm currently trying to get an ANN to play a video game and and I was hoping to get some help from the wonderful community here. I've settled on Diablo 2. Game play is thus in real-time and from an ...
zergylord's user avatar
  • 4,456
147 votes
5 answers
118k views

In reinforcement learning, what is the difference between policy iteration and value iteration? As much as I understand, in value iteration, you use the Bellman equation to solve for the optimal ...
Arslán's user avatar
  • 1,783
68 votes
2 answers
32k views

I know the basics of feedforward neural networks, and how to train them using the backpropagation algorithm, but I'm looking for an algorithm than I can use for training an ANN online with ...
Kendall Frey's user avatar
  • 44.7k
59 votes
4 answers
50k views

I know the basics of Reinforcement Learning, but what terms it's necessary to understand to be able read arxiv PPO paper ? What is the roadmap to learn and use PPO ?
Alexander Cyberman's user avatar
55 votes
6 answers
54k views

I'm trying to get an agent to learn the mouse movements necessary to best perform some task in a reinforcement learning setting (i.e. the reward signal is the only feedback for learning). I'm hoping ...
zergylord's user avatar
  • 4,456
47 votes
3 answers
47k views

I've seen such words as: A policy defines the learning agent's way of behaving at a given time. Roughly speaking, a policy is a mapping from perceived states of the environment to actions to be ...
Alexander Cyberman's user avatar
43 votes
3 answers
37k views

How is Q-learning different from value iteration in reinforcement learning? I know Q-learning is model-free and training samples are transitions (s, a, s', r). But since we know the transitions and ...
huskywolf's user avatar
  • 525
38 votes
1 answer
39k views

I want to setup an RL agent on the OpenAI CarRacing-v0 environment, but before that I want to understand the action space. In the code on github line 119 says: self.action_space = spaces.Box( np....
Faur's user avatar
  • 6,018
37 votes
2 answers
20k views

What is the difference between deep reinforcement learning and reinforcement learning? I basically know what reinforcement learning is about, but what does the concrete term deep stand for in this ...
Christopher Klaus's user avatar
34 votes
5 answers
15k views

I know SVMs are supposedly 'ANN killers' in that they automatically select representation complexity and find a global optimum (see here for some SVM praising quotes). But here is where I'm unclear --...
zergylord's user avatar
  • 4,456
34 votes
4 answers
22k views

Is it possible to use openai's gym environments for multi-agent games? Specifically, I would like to model a card game with four players (agents). The player scoring a turn starts the next turn. How ...
Martin Studer's user avatar
32 votes
1 answer
38k views

When trying to create a neural network and optimize it using Pytorch, I am getting ValueError: optimizer got an empty parameter list Here is the code. import torch.nn as nn import torch.nn....
Gulzar's user avatar
  • 29k
30 votes
2 answers
36k views

I have recently been working on a project that uses a neural network for virtual robot control. I used tensorflow to code it up and it runs smoothly. So far, I used sequential simulations to evaluate ...
MrRed's user avatar
  • 719
27 votes
6 answers
38k views

[Note that I am using xvfb-run -s "-screen 0 1400x900x24" jupyter notebook] I try to run a basic set of commands in OpenAI Gym import gym env = gym.make("CartPole-v0") obs = env.reset() env.render() ...
midawn98's user avatar
  • 401

15 30 50 per page
1
2 3 4 5
173