Highly scored unanswered questions - Artificial Intelligence Stack Exchange

15 votes

1 answer

496 views

Can you extend FaceNet’s triplet loss to object recognition?

FaceNet uses a novel loss metric (triplet loss) to train a model to output embeddings (128-D from the paper), such that any two faces of the same identity will have a small Euclidean distance, and ...

CommunityBot

1

modified Mar 21 at 6:05

11 votes

2 answers

1k views

Is there a difference in the architecture of deep reinforcement learning when multiple actions are performed instead of a single action?

I've built a deep deterministic policy gradient reinforcement learning agent to be able to handle any games/tasks that have only one action. However, the agent seems to fail horribly when there are ...

Sakshi Gaba

1

answered Mar 24 at 8:39

9 votes

0 answers

5k views

What theoretical reasons explain the "catastrophic drop" in Deep Q-Learning?

I am implementing some "classical" papers in Model Free RL like DQN, Double DQN, and Double DQN with Prioritized Replay. Through the various models im running on ...

Mr. AI Cool

1,791

modified Jan 6 at 3:35

9 votes

3 answers

381 views

How to classify human actions?

I'm quite new to machine learning (I followed the Coursera course of Andrew Ng and now starting deeplearning.ai courses). I want to classify human actions real-time like: Left-arm bended Arm above ...

CommunityBot

1

modified Dec 31, 2025 at 11:03

9 votes

1 answer

5k views

Does it make sense to use batch normalization in deep (stacked) or sparse auto-encoders?

Does it make sense to use batch normalization in deep (stacked) or sparse auto-encoders? I cannot find any resources for that. Is it safe to assume that, since it works for other DNNs, it will also ...

CommunityBot

1

modified Dec 3, 2025 at 22:07

9 votes

2 answers

1k views

How should we interpret this figure that relates the perceptron criterion and the hinge loss?

I am currently studying the textbook Neural Networks and Deep Learning by Charu C. Aggarwal. Chapter 1.2.1.2 Relationship with Support Vector Machines says the following: The perceptron criterion is ...

CommunityBot

1

modified Mar 20 at 0:08

8 votes

1 answer

987 views

Why is it recommended to use a "separate test environment" when evaluating a model?

I am training an agent (stable baselines3 algorithm) on a custom environment. During training, I want to have a callback so that for every $N$ steps of the learning process, I get the current model ...

CommunityBot

1

modified Feb 18 at 5:04

8 votes

1 answer

295 views

How to graphically represent a RNN architecture implemented in Keras?

I'm trying to create a simple blogpost on RNNs, that should give a better insight into how they work in Keras. Let's say: ...

CommunityBot

1

modified Dec 13, 2025 at 0:08

8 votes

0 answers

353 views

Is the Bellman equation that uses sampling weighted by the Q values (instead of max) a contraction?

It is proved that the Bellman update is a contraction (1). Here is the Bellman update that is used for Q-Learning: $$Q_{t+1}(s, a) = Q_{t}(s, a) + \alpha*(r(s, a, s') + \gamma \max_{a^*} (Q_{t}(s', ...

CommunityBot

1

modified Jul 25, 2020 at 8:28

8 votes

1 answer

711 views

How does SGD escape local minima?

SGD is able to jump out of local minima that would otherwise trap BGD I don't really understand the above statement. Could someone please provide a mathematical explanation for why SGD (Stochastic ...

CommunityBot

1

modified Feb 9 at 1:09

8 votes

0 answers

181 views

Normalizing Normal Distributions in Thompson Sampling for online Reinforcement Learning

In my implementation of Thompson Sampling (TS) for online Reinforcement Learning, my distribution for selecting $a$ is $\mathcal{N}(Q(s, a), \frac{1}{C(s,a)+1})$, where $C(s,a)$ is the number of times ...

nbro

43.6k

modified Dec 20, 2021 at 14:51

8 votes

1 answer

259 views

What is the impact of using multiple BMUs for self-organizing maps?

Here's a sort of a conceptual question. I was implementing a SOM algorithm to better understand its variations and parameters. I got curious about one bit: the BMU (best matching unit == the neuron ...

CommunityBot

1

modified Jan 18 at 16:04

8 votes

2 answers

1k views

What is the current state-of-the-art in Reinforcement Learning regarding data efficiency?

In other words, which existing reinforcement method learns with fewest episodes? R-Max comes to mind, but it's very old and I'd like to know if there is something better now.

CommunityBot

1

modified Dec 6, 2025 at 14:06

7 votes

1 answer

612 views

Why does the policy gradient theorem have two different forms?

I have been studying policy gradients recently but found different expositions from different sources, which greatly confused me. From the book "Reinforcement Learning: an Introduction (Sutton &...

CommunityBot

1

modified Feb 22 at 18:00

7 votes

1 answer

198 views

Is there a proof that shows a dominating policy always exists in an MDP?

I think that it is common knowledge that for any infinite horizon discounted MDP $(S, A, P, r, \gamma)$, there always exists a dominating policy $\pi$, i.e. a policy $\pi$ such that for all policies $\...

CommunityBot

1

modified Feb 20 at 20:04

Stack Exchange Network

Unanswered Questions

Can you extend FaceNet’s triplet loss to object recognition?

Is there a difference in the architecture of deep reinforcement learning when multiple actions are performed instead of a single action?

What theoretical reasons explain the "catastrophic drop" in Deep Q-Learning?

How to classify human actions?

Does it make sense to use batch normalization in deep (stacked) or sparse auto-encoders?

How should we interpret this figure that relates the perceptron criterion and the hinge loss?

Why is it recommended to use a "separate test environment" when evaluating a model?

How to graphically represent a RNN architecture implemented in Keras?

Is the Bellman equation that uses sampling weighted by the Q values (instead of max) a contraction?

How does SGD escape local minima?

Normalizing Normal Distributions in Thompson Sampling for online Reinforcement Learning

What is the impact of using multiple BMUs for self-organizing maps?

What is the current state-of-the-art in Reinforcement Learning regarding data efficiency?

Why does the policy gradient theorem have two different forms?

Is there a proof that shows a dominating policy always exists in an MDP?

Unanswered Questions

Unanswered Tags