Skip to main content

Unanswered Questions

3,728 questions with no upvoted or accepted answers
15 votes
1 answer
496 views

Can you extend FaceNet’s triplet loss to object recognition?

FaceNet uses a novel loss metric (triplet loss) to train a model to output embeddings (128-D from the paper), such that any two faces of the same identity will have a small Euclidean distance, and ...
11 votes
2 answers
1k views

Is there a difference in the architecture of deep reinforcement learning when multiple actions are performed instead of a single action?

I've built a deep deterministic policy gradient reinforcement learning agent to be able to handle any games/tasks that have only one action. However, the agent seems to fail horribly when there are ...
9 votes
0 answers
5k views

What theoretical reasons explain the "catastrophic drop" in Deep Q-Learning?

I am implementing some "classical" papers in Model Free RL like DQN, Double DQN, and Double DQN with Prioritized Replay. Through the various models im running on ...
9 votes
3 answers
381 views

How to classify human actions?

I'm quite new to machine learning (I followed the Coursera course of Andrew Ng and now starting deeplearning.ai courses). I want to classify human actions real-time like: Left-arm bended Arm above ...
9 votes
1 answer
5k views

Does it make sense to use batch normalization in deep (stacked) or sparse auto-encoders?

Does it make sense to use batch normalization in deep (stacked) or sparse auto-encoders? I cannot find any resources for that. Is it safe to assume that, since it works for other DNNs, it will also ...
9 votes
2 answers
1k views

How should we interpret this figure that relates the perceptron criterion and the hinge loss?

I am currently studying the textbook Neural Networks and Deep Learning by Charu C. Aggarwal. Chapter 1.2.1.2 Relationship with Support Vector Machines says the following: The perceptron criterion is ...
8 votes
1 answer
987 views

Why is it recommended to use a "separate test environment" when evaluating a model?

I am training an agent (stable baselines3 algorithm) on a custom environment. During training, I want to have a callback so that for every $N$ steps of the learning process, I get the current model ...
8 votes
1 answer
295 views

How to graphically represent a RNN architecture implemented in Keras?

I'm trying to create a simple blogpost on RNNs, that should give a better insight into how they work in Keras. Let's say: ...
8 votes
0 answers
353 views

Is the Bellman equation that uses sampling weighted by the Q values (instead of max) a contraction?

It is proved that the Bellman update is a contraction (1). Here is the Bellman update that is used for Q-Learning: $$Q_{t+1}(s, a) = Q_{t}(s, a) + \alpha*(r(s, a, s') + \gamma \max_{a^*} (Q_{t}(s', ...
8 votes
1 answer
711 views

How does SGD escape local minima?

SGD is able to jump out of local minima that would otherwise trap BGD I don't really understand the above statement. Could someone please provide a mathematical explanation for why SGD (Stochastic ...
8 votes
0 answers
181 views

Normalizing Normal Distributions in Thompson Sampling for online Reinforcement Learning

In my implementation of Thompson Sampling (TS) for online Reinforcement Learning, my distribution for selecting $a$ is $\mathcal{N}(Q(s, a), \frac{1}{C(s,a)+1})$, where $C(s,a)$ is the number of times ...
8 votes
1 answer
259 views

What is the impact of using multiple BMUs for self-organizing maps?

Here's a sort of a conceptual question. I was implementing a SOM algorithm to better understand its variations and parameters. I got curious about one bit: the BMU (best matching unit == the neuron ...
8 votes
2 answers
1k views

What is the current state-of-the-art in Reinforcement Learning regarding data efficiency?

In other words, which existing reinforcement method learns with fewest episodes? R-Max comes to mind, but it's very old and I'd like to know if there is something better now.
7 votes
1 answer
612 views

Why does the policy gradient theorem have two different forms?

I have been studying policy gradients recently but found different expositions from different sources, which greatly confused me. From the book "Reinforcement Learning: an Introduction (Sutton &...
7 votes
1 answer
198 views

Is there a proof that shows a dominating policy always exists in an MDP?

I think that it is common knowledge that for any infinite horizon discounted MDP $(S, A, P, r, \gamma)$, there always exists a dominating policy $\pi$, i.e. a policy $\pi$ such that for all policies $\...

15 30 50 per page
1
2 3 4 5
249