Newest 'deep-learning' Questions

0 votes

0 answers

40 views

Independence and Correlation Structure of Weights Generated by a Hypernetwork

Suppose a hypernetwork $\mathcal{H}$ takes a latent variable $z \sim p_z(z)$, where $p_z$ is Gaussian, and outputs the parameters of another neural network $f$. In particular, each weight $w_i$ of $f$ ...

rando

360

asked Nov 21 at 19:13

0 votes

0 answers

19 views

Why does batch normalization make lower layers 'useless' in purely linear networks?

I'm reading the Deep Learning book by Goodfellow, Bengio, and Courville (Chapter 8 section 8.7.1 on Batch Normalization, page 315). The authors use a simple example of a deep linear network without ...

spierenb

11

asked Nov 18 at 15:20

1 vote

0 answers

24 views

Is there a work on trying to pretrain RNNs model? [closed]

I really want to play around with RNNs. Trying to build an AI assistant with RNNs to run on my machine as I'm always obsessed with RNNs model... To make the performance good, I think I need to do some ...

jupyter

111

asked Nov 5 at 5:25

0 votes

0 answers

9 views

Are there any other powerful optimization tools available besides the ABC and PSO algorithms? [duplicate]

What are other optimization tools that are powerful enough to improve the accuracy performance of the neural network model? Please give me recent tools that are powerful

bbadyalina

823

asked Nov 5 at 1:10

0 votes

0 answers

17 views

How to compare WT vs mutant predictions with MC Dropout ensemble (M=5, T=100) in a binary classifier?

I’m using an ensemble of M = 5 deep neural networks, each evaluated with T = 100 Monte Carlo dropout samples at test time to estimate predictive uncertainty. The model performs binary classification (...

Solomon123

11

asked Nov 3 at 8:30

0 votes

0 answers

25 views

Is it a bad idea to use Transformer models on long-tailed datasets?

I’m working on a video classification task with a long-tailed dataset where a few classes have many samples while most classes have very few. More specifically, my dataset has around 9k samples and 3....

Olivia

191

asked Nov 1 at 1:36

1 vote

0 answers

26 views

Does compositional structure (actually) mitigate the curse of dimensionality?

The paper "Deep Quantile Regression: Mitigating the Curse of Dimensionality Through Composition" makes the following claim (top of page 4): It is clear that smoothness is not the right ...

Chris

322

asked Oct 11 at 21:13

2 votes

0 answers

21 views

What causes the degradation problem - the higher training error in much deeper networks?

In the paper "Deep Residual Learning for Image Recognition", it's been mentioned that "When deeper networks are able to start converging, a degradation problem has been exposed: with ...

Vignesh N

21

asked Oct 11 at 12:28

0 votes

0 answers

24 views

What is the expected ideal values for the losses of discrimintor when using generative adversarial imputaiton network to impute missing values?

I am new to GAIN (generative adversarial imputation network). I am trying to use GAIN to impute missing values. I have a quesiton about the values of the losses for the discriminator. Are the values ...

JonathonSoong

1

asked Oct 10 at 7:02

0 votes

0 answers

40 views

Multiplying probabilities of weights in Bayesian neural networks to formulate a prior

A key element in Bayesian neural networks is finding the probability of a set of weights, so that it can be applied to Bayes rule. I cannot think of many ways of doing this, for P(w) (also sometimes ...

user494234

21

asked Oct 5 at 15:02

1 vote

1 answer

58 views

Bayes-by-backprop - meaning of partial derivative

The Google Deepmind paper "Weight Uncertainty in Neural Networks" features the following algorithm: Note that the $\frac{∂f(w,θ)}{∂w}$ term of the gradients for the mean and standard ...

user494234

21

asked Oct 3 at 9:45

1 vote

1 answer

113 views

Function omitted during formula derivation (KL-divergence)

From the above, I am trying to derive the below: However, I do not see why the $q_\theta(w)$ has been omitted from $\log p(D)$, in equation 17 and 18. Here is my attempt to derive the above: $$\begin{...

user494234

21

asked Sep 18 at 13:56

3 votes

0 answers

60 views

Normalizing observations in a nonlinear state space model

I am modelling the the sequence $\{(a_t,y_t)\}_t$ as follows: $$ \begin{cases} Y_{t+1} &= g_\nu(X_{t+1}) + \alpha V_{t+1}\\ X_{t+1} &= X_t + \mu_\xi(a_t) + \sigma_\psi(a_t)Z_{t+1}\\ X_0 &= ...

Uomond

51

asked Sep 13 at 21:09

0 votes

0 answers

65 views

Why is one-hot encoding used in RL instead of binary encoding?

Basically, the question above: in RL, people typically encode the state as a tensor consisting of a plane with "channels", i.e. original Alpha Zero paper. These channels are typically one-...

FriendlyLagrangian

101

asked Sep 4 at 9:06

0 votes

0 answers

38 views

Why do flow neural networks that are trained to only simulate vector fields for specific timesteps perform poorly compared to regular models?

I am currently learning about flow matching models and wanted to test whether or not we could train a flow matching model on just two time steps 0 and 0.5 and sampling at only those two time steps to ...

Bill Wang

1

asked Sep 2 at 3:23

Stack Exchange Network

All Questions

Independence and Correlation Structure of Weights Generated by a Hypernetwork

Why does batch normalization make lower layers 'useless' in purely linear networks?

Is there a work on trying to pretrain RNNs model? [closed]

Are there any other powerful optimization tools available besides the ABC and PSO algorithms? [duplicate]

How to compare WT vs mutant predictions with MC Dropout ensemble (M=5, T=100) in a binary classifier?

Is it a bad idea to use Transformer models on long-tailed datasets?

Does compositional structure (actually) mitigate the curse of dimensionality?

What causes the degradation problem - the higher training error in much deeper networks?

What is the expected ideal values for the losses of discrimintor when using generative adversarial imputaiton network to impute missing values?

Multiplying probabilities of weights in Bayesian neural networks to formulate a prior

Bayes-by-backprop - meaning of partial derivative

Function omitted during formula derivation (KL-divergence)

Normalizing observations in a nonlinear state space model

Why is one-hot encoding used in RL instead of binary encoding?

Why do flow neural networks that are trained to only simulate vector fields for specific timesteps perform poorly compared to regular models?

Hot Network Questions