Newest 'regularization' Questions - Artificial Intelligence Stack Exchange

0 votes

1 answer

102 views

What are the consequences when we multiply, instead of add, a penalty term?

The typical objective function in regression problems like Lasso or Ridge includes a Residual Sum of Squares (RSS) term added to a penalty term based on a norm of the coefficients. What are the ...

BigMistake

113

asked Nov 19, 2023 at 4:11

2 votes

0 answers

48 views

Does this learning scenario have a name? If so, can someone point me to relevant literature?

I am faced with a problem which I bet was already solved before, but that I had never seen. Perhaps by discussing it abstractly, someone can point me to relevant literature. It goes like this: I have ...

Alek Fröhlich

121

asked Aug 16, 2023 at 11:59

0 votes

1 answer

115 views

How to handle BatchNorm in the last layers of Neural Networks?

I am creating a neural network using batchnorm as a regularization method to enable deep models and prevent overfitting. I understand that batchnorming supresses the internal covariance shift ...

Quantum

121

asked Jun 12, 2023 at 11:22

4 votes

1 answer

2k views

What is the best way to combine or weight multiple losses with gradient descent?

I am optimizing a neural network with Adam using 3 different losses. Their scale is very different, and the current method is to either sum the losses and clip the gradient or to manually weight them ...

Simon

273

asked May 24, 2023 at 17:29

1 vote

1 answer

157 views

Do different models using early stopping have the same validation set to check model training performance?

I, i have a doubt about making validation using early stopping given two NN models. Suppose I have two models M1 and M2 and a Training set TS and Test set TS. Take the TS and consider TS_80% and TS_20%...

PwNzDust

113

asked May 4, 2023 at 14:04

0 votes

1 answer

133 views

Should weight decay regularization be divided by the number of samples?

I was watching a video by Andrew Ng about regularization in logistic regression and neural network models. He uses the following term for regularization to (the sum is over the weights in the network)....

martinkunev

255

asked Feb 12, 2023 at 1:00

1 vote

1 answer

113 views

In the Dropout paper, why would increasing the dropout increase the error rate if the capacity is constant?

In the original paper on dropout, in section 7.3.2, we see that while keeping $pn$ constant, we get a (test) error increase by decreasing retainment below 0.6. Why would that happen? If $pn$ is ...

Apples14

11

asked Feb 9, 2023 at 5:55

3 votes

1 answer

4k views

How does dropout work during backpropagation?

I've searched for an answer to this, and read several scientific articles on the subject, but I can't find a practical explanation of how Dropout actually drops nodes in an algorithm. I've read that ...

Connor

133

asked Dec 13, 2022 at 11:13

1 vote

0 answers

378 views

Higher validation loss after using Dropout

I’m working on a classification problem (500 classes). My NN has 3 fully connected layers, followed by an LSTM layer. I use nn.CrossEntropyLoss() as my loss ...

helloworld

65

asked Oct 25, 2022 at 2:39

0 votes

1 answer

371 views

Dummy variable trap in neural networks and class visualization

Let's say I have data records looking like that: (x1, x2, x3, x4, ..., x100), where each x can be either ...

leleogere

101

asked Oct 17, 2022 at 7:49

1 vote

1 answer

100 views

Is it mandatory to multiply every activation of a layer by droupout factor during testing?

Dropout is a regularization technique used in neural networks. It is useful in preventing overfitting by making a neural network as good as an ensemble system. In dropout, we switch off $p$ percent of ...

hanugm

4,302

asked May 29, 2022 at 23:00

0 votes

1 answer

524 views

Can we Consider Regularization as a "Constraint"?

I have the following question on "Regularization vs. Constrained Optimization" : In the context of statistical modelling, we are often taught about "Regularization" as a method of ...

stats_noob

329

asked Feb 14, 2022 at 17:14

4 votes

2 answers

480 views

How does Regularization Reduce Overfitting?

As I understand, this is the general summary of the Regularization-Overfitting Problem: The classical "Bias-Variance Tradeoff" suggests that complicated models (i.e. models with more ...

stats_noob

329

asked Jan 24, 2022 at 22:25

2 votes

0 answers

78 views

What determines when Dropout, BatchNorm & other Regularization will be effective?

I just had a very strange experience where I was training an 8 layer deep & pretty wide (max: 512 neurons) neural network for a regression task. I had assumed since it was big enough that it would ...

profPlum

586

asked Jan 7, 2022 at 2:47

1 vote

1 answer

160 views

What does it mean when accuracy of regularized model is higher for training set than for validation set?

Accuracy of my regularized model is higher for training set than for validation set. The situation improves when regularization coeefficient is reduced: What does this really imply? From my ...

Aadith Ramia

111

asked Sep 13, 2021 at 0:10

Stack Exchange Network

Questions tagged [regularization]

What are the consequences when we multiply, instead of add, a penalty term?

Does this learning scenario have a name? If so, can someone point me to relevant literature?

How to handle BatchNorm in the last layers of Neural Networks?

What is the best way to combine or weight multiple losses with gradient descent?

Do different models using early stopping have the same validation set to check model training performance?

Should weight decay regularization be divided by the number of samples?

In the Dropout paper, why would increasing the dropout increase the error rate if the capacity is constant?

How does dropout work during backpropagation?

Higher validation loss after using Dropout

Dummy variable trap in neural networks and class visualization

Is it mandatory to multiply every activation of a layer by droupout factor during testing?

Can we Consider Regularization as a "Constraint"?

How does Regularization Reduce Overfitting?

What determines when Dropout, BatchNorm & other Regularization will be effective?

What does it mean when accuracy of regularized model is higher for training set than for validation set?

Hot Network Questions