Implementing sklearn.linear_model.SGDClassifier using python

Question

I have an excel file that contains details related to determining the quality of a wine and I want to implement the linear model concept using the function sklearn.linear_model.SGDClassifier(SVM => Hinge loss) and (Logarithmic regression =>log loss) using python. I learned the basics about these function through the scikit learn website and I am not able to implement the model using excel file. I am very new to python and machine learning and I finding it hard to implement the model. I opened the excel file in python and tried to take two columns [randomly] from the file and use that as an input to call the fit function available in the model. But, I got an error stating Unknown label type: array. I tried a couple of other methods too, but, nothing worked. Can someone guide me with the implementation process?

from xlrd import open_workbook
from sklearn import linear_model
i = 0
fa = []
ph = []

book = open_workbook('F:/BIG DATA/winequality.xlsx')
sheet = book.sheet_by_name('Sheet1')
num_rows = sheet.nrows - 1
num_cols = sheet.ncols - 1
curr_row = 0
while curr_row <num_rows:
    curr_row += 1
    cell_val = sheet.cell_value(curr_row,0)
    cell_val1 = sheet.cell_value(curr_row,10)

    fa.append([float(cell_val),float(cell_val1)])
    cell_val2 = sheet.cell_value(curr_row,8)
    ph.append(float(cell_val2))

model = linear_model.SGDClassifier()
print(model.fit(fa,ph))

Screenshot

The error message screenshot:

ERROR

Zephyr · Accepted Answer · 2020-08-03 08:06:38Z

1

I think that this is the same issue as in this question.

The shape of $X$ must be (n_samples, n_features) as explained in the SVC.fit docstring. A 1-d array is interpreted as a single sample (for convenience when doing predictions on single samples). Reshape your $X$ to (n_samples, 1).

That means you should use numpy.reshape to reshape the $X$ column. If the data frame has n rows, you should use

X_new = X.reshape(n, 1)

Then use the fit method with $X_{new}$. Note: you probably don't need to do this if you use two or more $X$ columns for your model fitting.

edited Aug 3, 2020 at 8:06

Zephyr

9834 gold badges11 silver badges20 bronze badges

answered Jun 15, 2015 at 15:43

Will Stanton

4362 silver badges3 bronze badges

$\begingroup$ In the scikit learn tutorial, to implement this method, they have used the two np.arrays with one having a 2d list and one with 1d list. I tried to replicate the pattern without using numpy and got the error mentioned above. Don't know how else i should approach the implementation $\endgroup$

SRS
– SRS

2015-06-15 17:51:48 +00:00
Commented Jun 15, 2015 at 17:51
1

$\begingroup$ When it says, "Unknown label type" it looks like your "y" is actually a numeric array, not an array of labels. If you are trying to predict "ph" from "fixed acid", you should use a Regressor, not a Classifier. $\endgroup$

Will Stanton
– Will Stanton

2015-06-15 18:13:08 +00:00
Commented Jun 15, 2015 at 18:13
$\begingroup$ That's what I thought. I am not 100% clear about what the sgdclassifier example at scikit learn website means. They have 2 lists and using the model and fit to get some output. Could you please explain me what can i do with this data set to work with the sgdclassifier ? $\endgroup$

SRS
– SRS

2015-06-15 18:27:19 +00:00
Commented Jun 15, 2015 at 18:27
1

$\begingroup$ Well, if you really want to use sgdclassifier, you could try using "quality" as the "y" variable. But that might not be too appropriate, because I assume that quality is really an ordered variable. Why don't you try SGDRegressor? $\endgroup$

Will Stanton
– Will Stanton

2015-06-15 18:30:16 +00:00
Commented Jun 15, 2015 at 18:30
$\begingroup$ I have to use both SGDClassifier and SGDRegressor, with this data set, for my project. Not sure what I can do with this dataset to implement SGDClassifier model. $\endgroup$

SRS
– SRS

2015-06-15 18:32:25 +00:00
Commented Jun 15, 2015 at 18:32

| Show 3 more comments

Stack Exchange Network

Implementing sklearn.linear_model.SGDClassifier using python

1 Answer 1

Hot Network Questions

Implementing sklearn.linear_model.SGDClassifier using python

1 Answer 1

Related

Hot Network Questions