No Hidden Layer Neural Network Doesn't Equal Logistic Regression

Question

In theory, a no-hidden layer neural network should be the same as a logistic regression, however, we collect wildly varied results. What makes this even more bewildering is that the test case is incredibly basic, yet the neural network fails to learn.

sklearn logistic regression

tensorflow no-hidden-layer neural network

We have attempted to choose the parameters of both models to be as similar as possible (same number of epochs, no L2 penalty, same loss function, no addition optimizations such as momentum, etc...). The sklearn logistic regression correctly finds the decision boundary consistently, with minimal variation. The tensorflow neural network is highly variable, where it looks like the bias is 'struggling' to train.

The code is included below to recreate this issue. An ideal solution would have the tensorflow decision boundary very similar to the logistic regression decision boundary.

import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Conv1D, Dense, Flatten, Input, Concatenate, Dropout
from tensorflow.keras import Sequential, Model
from tensorflow.keras.optimizers import SGD

import matplotlib.pyplot as plt

%matplotlib inline

from sklearn.linear_model import LogisticRegression


X = np.array([[1, 1],
              [2, 2]])
y = np.array([0, 1])

model = LogisticRegression(penalty = 'none',
                           solver='sag',
                           max_iter = 300,
                           tol = 1e-100)
model.fit(X, y)

model.score(X, y)

model.coef_.flatten()[1]

model.intercept_

w_1 = model.coef_.flatten()[0]
w_2 = model.coef_.flatten()[1]
b = model.intercept_
n = np.linspace(0, 3, 10000, endpoint=False)
x_n = -w_1 / w_2 * n - b / w_2

plt.scatter(X[:, 0], X[:, 1], c = y)
plt.plot(n, x_n)
plt.gca().set_aspect('equal')
plt.show()

X = np.array([[1, 1],
              [2, 2]])
y = np.array([0, 1])

optimizer = SGD(learning_rate=0.01,
                momentum = 0.0,
                nesterov = False,
                name = 'SGD')

inputs = Input(shape = (2,), name='inputs')
outputs = Dense(1, activation = 'sigmoid', name = 'outputs')(inputs)

model = Model(inputs = inputs, outputs = outputs, name = 'model')
model.compile(loss = 'bce', optimizer = optimizer, metrics = ['AUC', 'accuracy'])
model.fit(X, y, epochs = 100, verbose=False)

print(model.evaluate(X, y))

weights, bias = model.layers[1].get_weights()
weights = weights.flatten()

w_1 = weights[0]
w_2 = weights[1]
b = bias
n = np.linspace(0, 3, 10000, endpoint=False)
x_n = -w_1 / w_2 * n - b / w_2

plt.scatter(X[:, 0], X[:, 1], c = y)
plt.plot(n, x_n)
plt.grid()
plt.gca().set_aspect('equal')

plt.show()

Your labels aren't binary, so compiling with bce isn't going to do what you want. This is why you see the decision boundary strictly below 0. You'll find that LogisticRegression optimizes for arbitrary categorical xent, so you should find the same thing by optimizing for the same loss. — erip, Commented Jul 14, 2022 at 20:05
@erip Thanks for pointing that out! We corrected the labels to y = np.array([0, 1]), as edited into the original post, but we still don't collect theoretically correct results. — DataScientistAmateur, Commented Jul 14, 2022 at 20:13

erip · Accepted Answer · 2022-07-14 21:40:20Z

0

A simple way to determine if this is actually a bug is to let the number of epochs in your perceptron go to some arbitrary large number (say, 5000). You'll note that the decision boundary approaches that of your logistic regression model.

The natural question is why LR needs fewer iterations to achieve a near-optimal decision boundary. For strongly convex functions (like in your example), SAG enjoys much faster convergence than SGD. Thus, it takes SGD longer to converge to a "globally good" solution (though not many to converge to a locally good solution).

answered Jul 14, 2022 at 21:40

erip

17.1k11 gold badges72 silver badges129 bronze badges

Thanks @erip! You helped us solve this problem. The issue was not a bug problem, but a convergence problem. The problem is solved by increasing the number of epochs (as you suggested), increasing the learning rate, or increasing the number of data points from both classes distributions.
– DataScientistAmateur
Commented Jul 14, 2022 at 22:32

Add a comment |

Collectives™ on Stack Overflow

No Hidden Layer Neural Network Doesn't Equal Logistic Regression

1 Answer 1

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Related