In order to fit a linear regression model to some given training data X and labels y, i want to augment my training data X by nonlinear transformations of the given features. Let's say we have the feature x1, x2 and x3. And we want to use the additional transformed features:
x4 = x12, x5 = x22 and x6 = x32
x7 = exp(x1), x8 = exp(x2) and x9 = exp(x3)
x10 = cos(x1), x11 = cos(x2) and x12 = cos(x3)
I tried the following approach, which however lead to a model that performed very poorly in terms of Root Mean Squared Error as evaluation criterion:
import pandas as pd
import numpy as np
from sklearn import linear_model
#import the training data and extract the features and labels from it
DATAPATH = 'train.csv'
data = pd.read_csv(DATAPATH)
features = data.drop(['Id', 'y'], axis=1)
labels = data[['y']]
features['x6'] = features['x1']**2
features['x7'] = features['x2']**2
features['x8'] = features['x3']**2
features['x9'] = np.exp(features['x1'])
features['x10'] = np.exp(features['x2'])
features['x11'] = np.exp(features['x3'])
features['x12'] = np.cos(features['x1'])
features['x13'] = np.cos(features['x2'])
features['x14'] = np.cos(features['x3'])
regr = linear_model.LinearRegression()
regr.fit(features, labels)
I'm quite new to ML and there is for sure a better option to do these nonlinear feature transformations, I'm very happy for your help.
Cheers Lukas
np.exp
terms are much much larger than everything else in your dataset, so your regression fits only them. You can avoid that by normalising your data before training the classifier. Check out this post