3
$\begingroup$

I have an excel file that contains details related to determining the quality of a wine and I want to implement the linear model concept using the function sklearn.linear_model.SGDClassifier(SVM => Hinge loss) and (Logarithmic regression =>log loss) using python. I learned the basics about these function through the scikit learn website and I am not able to implement the model using excel file. I am very new to python and machine learning and I finding it hard to implement the model. I opened the excel file in python and tried to take two columns [randomly] from the file and use that as an input to call the fit function available in the model. But, I got an error stating Unknown label type: array. I tried a couple of other methods too, but, nothing worked. Can someone guide me with the implementation process?

from xlrd import open_workbook
from sklearn import linear_model
i = 0
fa = []
ph = []

book = open_workbook('F:/BIG DATA/winequality.xlsx')
sheet = book.sheet_by_name('Sheet1')
num_rows = sheet.nrows - 1
num_cols = sheet.ncols - 1
curr_row = 0
while curr_row <num_rows:
    curr_row += 1
    cell_val = sheet.cell_value(curr_row,0)
    cell_val1 = sheet.cell_value(curr_row,10)

    fa.append([float(cell_val),float(cell_val1)])
    cell_val2 = sheet.cell_value(curr_row,8)
    ph.append(float(cell_val2))

model = linear_model.SGDClassifier()
print(model.fit(fa,ph))

Screenshot

The error message screenshot:

ERROR

$\endgroup$

1 Answer 1

1
$\begingroup$

I think that this is the same issue as in this question.

The shape of $X$ must be (n_samples, n_features) as explained in the SVC.fit docstring. A 1-d array is interpreted as a single sample (for convenience when doing predictions on single samples). Reshape your $X$ to (n_samples, 1).

That means you should use numpy.reshape to reshape the $X$ column. If the data frame has n rows, you should use

X_new = X.reshape(n, 1)

Then use the fit method with $X_{new}$. Note: you probably don't need to do this if you use two or more $X$ columns for your model fitting.

$\endgroup$
8
  • $\begingroup$ In the scikit learn tutorial, to implement this method, they have used the two np.arrays with one having a 2d list and one with 1d list. I tried to replicate the pattern without using numpy and got the error mentioned above. Don't know how else i should approach the implementation $\endgroup$ Commented Jun 15, 2015 at 17:51
  • 1
    $\begingroup$ When it says, "Unknown label type" it looks like your "y" is actually a numeric array, not an array of labels. If you are trying to predict "ph" from "fixed acid", you should use a Regressor, not a Classifier. $\endgroup$ Commented Jun 15, 2015 at 18:13
  • $\begingroup$ That's what I thought. I am not 100% clear about what the sgdclassifier example at scikit learn website means. They have 2 lists and using the model and fit to get some output. Could you please explain me what can i do with this data set to work with the sgdclassifier ? $\endgroup$ Commented Jun 15, 2015 at 18:27
  • 1
    $\begingroup$ Well, if you really want to use sgdclassifier, you could try using "quality" as the "y" variable. But that might not be too appropriate, because I assume that quality is really an ordered variable. Why don't you try SGDRegressor? $\endgroup$ Commented Jun 15, 2015 at 18:30
  • $\begingroup$ I have to use both SGDClassifier and SGDRegressor, with this data set, for my project. Not sure what I can do with this dataset to implement SGDClassifier model. $\endgroup$ Commented Jun 15, 2015 at 18:32

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.