Questions tagged [regularization]
For questions about application of regularization techniques.
57 questions
0
votes
1
answer
102
views
What are the consequences when we multiply, instead of add, a penalty term?
The typical objective function in regression problems like Lasso or Ridge includes a Residual Sum of Squares (RSS) term added to a penalty term based on a norm of the coefficients.
What are the ...
2
votes
0
answers
48
views
Does this learning scenario have a name? If so, can someone point me to relevant literature?
I am faced with a problem which I bet was already solved before, but that I had never seen. Perhaps by discussing it abstractly, someone can point me to relevant literature.
It goes like this: I have ...
0
votes
1
answer
115
views
How to handle BatchNorm in the last layers of Neural Networks?
I am creating a neural network using batchnorm as a regularization method to enable deep models and prevent overfitting.
I understand that batchnorming supresses the internal covariance shift ...
4
votes
1
answer
2k
views
What is the best way to combine or weight multiple losses with gradient descent?
I am optimizing a neural network with Adam using 3 different losses. Their scale is very different, and the current method is to either sum the losses and clip the gradient or to manually weight them ...
1
vote
1
answer
157
views
Do different models using early stopping have the same validation set to check model training performance?
I, i have a doubt about making validation using early stopping given two NN models.
Suppose I have two models M1 and M2 and a Training set TS and Test set TS.
Take the TS and consider TS_80% and TS_20%...
0
votes
1
answer
133
views
Should weight decay regularization be divided by the number of samples?
I was watching a video by Andrew Ng about regularization in logistic regression and neural network models.
He uses the following term for regularization to (the sum is over the weights in the network)....
1
vote
1
answer
113
views
In the Dropout paper, why would increasing the dropout increase the error rate if the capacity is constant?
In the original paper on dropout, in section 7.3.2, we see that while keeping $pn$ constant, we get a (test) error increase by decreasing retainment below 0.6. Why would that happen? If $pn$ is ...
3
votes
1
answer
4k
views
How does dropout work during backpropagation?
I've searched for an answer to this, and read several scientific articles on the subject, but I can't find a practical explanation of how Dropout actually drops nodes in an algorithm.
I've read that ...
1
vote
0
answers
378
views
Higher validation loss after using Dropout
I’m working on a classification problem (500 classes). My NN has 3 fully connected layers, followed by an LSTM layer. I use nn.CrossEntropyLoss() as my loss ...
0
votes
1
answer
371
views
Dummy variable trap in neural networks and class visualization
Let's say I have data records looking like that: (x1, x2, x3, x4, ..., x100), where each x can be either ...
1
vote
1
answer
100
views
Is it mandatory to multiply every activation of a layer by droupout factor during testing?
Dropout is a regularization technique used in neural networks. It is useful in preventing overfitting by making a neural network as good as an ensemble system.
In dropout, we switch off $p$ percent of ...
0
votes
1
answer
524
views
Can we Consider Regularization as a "Constraint"?
I have the following question on "Regularization vs. Constrained Optimization" :
In the context of statistical modelling, we are often taught about "Regularization" as a method of ...
4
votes
2
answers
480
views
How does Regularization Reduce Overfitting?
As I understand, this is the general summary of the Regularization-Overfitting Problem:
The classical "Bias-Variance Tradeoff" suggests that complicated models (i.e. models with more ...
2
votes
0
answers
78
views
What determines when Dropout, BatchNorm & other Regularization will be effective?
I just had a very strange experience where I was training an 8 layer deep & pretty wide (max: 512 neurons) neural network for a regression task. I had assumed since it was big enough that it would ...
1
vote
1
answer
160
views
What does it mean when accuracy of regularized model is higher for training set than for validation set?
Accuracy of my regularized model is higher for training set than for validation set.
The situation improves when regularization coeefficient is reduced:
What does this really imply?
From my ...