Unanswered Questions
3,728 questions with no upvoted or accepted answers
15
votes
1
answer
496
views
Can you extend FaceNet’s triplet loss to object recognition?
FaceNet uses a novel loss metric (triplet loss) to train a model to output embeddings (128-D from the paper), such that any two faces of the same identity will have a small Euclidean distance, and ...
11
votes
2
answers
1k
views
Is there a difference in the architecture of deep reinforcement learning when multiple actions are performed instead of a single action?
I've built a deep deterministic policy gradient reinforcement learning agent to be able to handle any games/tasks that have only one action. However, the agent seems to fail horribly when there are ...
9
votes
0
answers
5k
views
What theoretical reasons explain the "catastrophic drop" in Deep Q-Learning?
I am implementing some "classical" papers in Model Free RL like DQN, Double DQN, and Double DQN with Prioritized Replay.
Through the various models im running on ...
9
votes
3
answers
381
views
How to classify human actions?
I'm quite new to machine learning (I followed the Coursera course of Andrew Ng and now starting deeplearning.ai courses).
I want to classify human actions real-time like:
Left-arm bended
Arm above ...
9
votes
1
answer
5k
views
Does it make sense to use batch normalization in deep (stacked) or sparse auto-encoders?
Does it make sense to use batch normalization in deep (stacked) or sparse auto-encoders?
I cannot find any resources for that. Is it safe to assume that, since it works for other DNNs, it will also ...
9
votes
2
answers
1k
views
How should we interpret this figure that relates the perceptron criterion and the hinge loss?
I am currently studying the textbook Neural Networks and Deep Learning by Charu C. Aggarwal. Chapter 1.2.1.2 Relationship with Support Vector Machines says the following:
The perceptron criterion is ...
8
votes
1
answer
987
views
Why is it recommended to use a "separate test environment" when evaluating a model?
I am training an agent (stable baselines3 algorithm) on a custom environment. During training, I want to have a callback so that for every $N$ steps of the learning process, I get the current model ...
8
votes
1
answer
295
views
How to graphically represent a RNN architecture implemented in Keras?
I'm trying to create a simple blogpost on RNNs, that should give a better insight into how they work in Keras. Let's say:
...
8
votes
0
answers
353
views
Is the Bellman equation that uses sampling weighted by the Q values (instead of max) a contraction?
It is proved that the Bellman update is a contraction (1).
Here is the Bellman update that is used for Q-Learning:
$$Q_{t+1}(s, a) = Q_{t}(s, a) + \alpha*(r(s, a, s') + \gamma \max_{a^*} (Q_{t}(s',
...
8
votes
1
answer
711
views
How does SGD escape local minima?
SGD is able to jump out of local minima that would otherwise trap BGD
I don't really understand the above statement. Could someone please provide a mathematical explanation for why SGD (Stochastic ...
8
votes
0
answers
181
views
Normalizing Normal Distributions in Thompson Sampling for online Reinforcement Learning
In my implementation of Thompson Sampling (TS) for online Reinforcement Learning, my distribution for selecting $a$ is $\mathcal{N}(Q(s, a), \frac{1}{C(s,a)+1})$, where $C(s,a)$ is the number of times ...
8
votes
1
answer
259
views
What is the impact of using multiple BMUs for self-organizing maps?
Here's a sort of a conceptual question. I was implementing a SOM algorithm to better understand its variations and parameters. I got curious about one bit: the BMU (best matching unit == the neuron ...
8
votes
2
answers
1k
views
What is the current state-of-the-art in Reinforcement Learning regarding data efficiency?
In other words, which existing reinforcement method learns with fewest episodes? R-Max comes to mind, but it's very old and I'd like to know if there is something better now.
7
votes
1
answer
612
views
Why does the policy gradient theorem have two different forms?
I have been studying policy gradients recently but found different expositions from different sources, which greatly confused me. From the book "Reinforcement Learning: an Introduction (Sutton &...
7
votes
1
answer
198
views
Is there a proof that shows a dominating policy always exists in an MDP?
I think that it is common knowledge that for any infinite horizon discounted MDP $(S, A, P, r, \gamma)$, there always exists a dominating policy $\pi$, i.e. a policy $\pi$ such that for all policies $\...