All Questions
Tagged with deep-learning or neural-networks
9,985 questions
0
votes
0
answers
40
views
Independence and Correlation Structure of Weights Generated by a Hypernetwork
Suppose a hypernetwork $\mathcal{H}$ takes a latent variable $z \sim p_z(z)$,
where $p_z$ is Gaussian, and outputs the parameters of another neural network $f$.
In particular, each weight $w_i$ of $f$ ...
0
votes
0
answers
19
views
Why does batch normalization make lower layers 'useless' in purely linear networks?
I'm reading the Deep Learning book by Goodfellow, Bengio, and Courville (Chapter 8 section 8.7.1 on Batch Normalization, page 315). The authors use a simple example of a deep linear network without ...
1
vote
0
answers
24
views
Is there a work on trying to pretrain RNNs model? [closed]
I really want to play around with RNNs. Trying to build an AI assistant with RNNs to run on my machine as I'm always obsessed with RNNs model...
To make the performance good, I think I need to do some ...
0
votes
0
answers
9
views
Are there any other powerful optimization tools available besides the ABC and PSO algorithms? [duplicate]
What are other optimization tools that are powerful enough to improve the accuracy performance of the neural network model? Please give me recent tools that are powerful
0
votes
0
answers
17
views
How to compare WT vs mutant predictions with MC Dropout ensemble (M=5, T=100) in a binary classifier?
I’m using an ensemble of M = 5 deep neural networks, each evaluated with T = 100 Monte Carlo dropout samples at test time to estimate predictive uncertainty.
The model performs binary classification (...
0
votes
0
answers
25
views
Is it a bad idea to use Transformer models on long-tailed datasets?
I’m working on a video classification task with a long-tailed dataset where a few classes have many samples while most classes have very few.
More specifically, my dataset has around 9k samples and 3....
1
vote
0
answers
26
views
Does compositional structure (actually) mitigate the curse of dimensionality?
The paper "Deep Quantile Regression: Mitigating the Curse of Dimensionality Through Composition" makes the following claim (top of page 4):
It is clear that smoothness is not the right ...
2
votes
0
answers
21
views
What causes the degradation problem - the higher training error in much deeper networks?
In the paper "Deep Residual Learning for Image Recognition", it's been mentioned that
"When deeper networks are able to start converging, a degradation problem has been exposed: with ...
0
votes
0
answers
24
views
What is the expected ideal values for the losses of discrimintor when using generative adversarial imputaiton network to impute missing values?
I am new to GAIN (generative adversarial imputation network). I am trying to use GAIN to impute missing values. I have a quesiton about the values of the losses for the discriminator. Are the values ...
0
votes
0
answers
40
views
Multiplying probabilities of weights in Bayesian neural networks to formulate a prior
A key element in Bayesian neural networks is finding the probability of a set of weights, so that it can be applied to Bayes rule.
I cannot think of many ways of doing this, for P(w) (also sometimes ...
1
vote
1
answer
58
views
Bayes-by-backprop - meaning of partial derivative
The Google Deepmind paper "Weight Uncertainty in Neural Networks" features the following algorithm:
Note that the $\frac{∂f(w,θ)}{∂w}$
term of the gradients for the mean and
standard ...
1
vote
1
answer
113
views
Function omitted during formula derivation (KL-divergence)
From the above, I am trying to derive the below:
However, I do not see why the $q_\theta(w)$ has been omitted from $\log p(D)$, in equation 17 and 18.
Here is my attempt to derive the above:
$$\begin{...
3
votes
0
answers
60
views
Normalizing observations in a nonlinear state space model
I am modelling the the sequence $\{(a_t,y_t)\}_t$ as follows:
$$
\begin{cases}
Y_{t+1} &= g_\nu(X_{t+1}) + \alpha V_{t+1}\\
X_{t+1} &= X_t + \mu_\xi(a_t) + \sigma_\psi(a_t)Z_{t+1}\\
X_0 &= ...
0
votes
0
answers
65
views
Why is one-hot encoding used in RL instead of binary encoding?
Basically, the question above: in RL, people typically encode the state as a tensor consisting of a plane with "channels", i.e. original Alpha Zero paper. These channels are typically one-...
0
votes
0
answers
38
views
Why do flow neural networks that are trained to only simulate vector fields for specific timesteps perform poorly compared to regular models?
I am currently learning about flow matching models and wanted to test whether or not we could train a flow matching model on just two time steps 0 and 0.5 and sampling at only those two time steps to ...