0

I am stuck on where I am going wrong with this loop to perform Logistic Regression on a dataframe with 25 features.

When I reshape it giving the error : "ValueError: Expected 2D array, got 1D array instead: array=[-12.36677125 -12.91946925 -12.89317629 -13.16951215 -12.20588875 -12.44694704 -12.71370778 -12.69351738 -12.89451587 -12.0776727 -12.63723271 -13.39461116 -12.52027792]. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample."

peptides = ['AYSLFSYNTQGR','IVLGQEQDSYGGK','EQLTPLIK','SPELQAEAK','SPELQAEAK','ALVQQMEQLR','SGVQQLIQYYQDQK','VVVHPDYR','GFVVAGPSR','CLCACPFK','VVEESELAR','FCDMPVFENSR','GYSIFSYATK',
'EPGCGCCSVCAR',
'LIQGAPTIR',
'YYLQGAK',
'ALGHLDLSGNR',
'DLLLPQPDLR',
'GPLQLER',
'IISIMDEK',
'LQDAEIAR',
'QINDYVEK',
'SVLGQLGITK',
'ADLSGITGAR',
'EQLSLLDR']

That is the list of peptides I would like to interate over. They should be the columns titles to X_train.

LR_scores = []
logit_roc_auc =[]
y_pred = []
acc_score = []

for peptide in peptides:
    model=LogisticRegression()
    model.fit(X_train[peptide], y_train)
    score = model.score(X_test[peptide], y_test)
    y_pred=model.predict(X_test[peptide])
    acc_score = accuracy_score(y_test, y_pred)
    LR_scores.append(peptide,acc_score)
    
    #Classification Report
    print (classification_report(y_test,y_pred))
    
    #Confusion Matrix
    cnf_matrix = confusion_matrix(y_test,y_pred)
    print(cnf_matrix)
    
    #ROC_AUC Curves
    y_predict_proba = model.predict_proba(X_test[peptide])
    probabilities = np.array(y_predict_proba)[:, 1]
    fpr, tpr, thresholds = roc_curve(y_test, probabilities, pos_label=1)
    roc_auc = auc(fpr, tpr)
    logit_roc_auc = roc_auc_score(y_test, model.predict(X_test[peptide]))

Any help is appreciated.

Screenshot of Jupyter Notebook

This loop works with different input lists

0

1 Answer 1

0

X is expected to be a 2D array while fitting the model, and y as a 1D array.

X_train[peptide] returns a series which is a 1D array. You can either -

X_train[peptide].shape
#Output  = (nrows,)

You can do this -

X_train[[peptide]].shape
#Output = (nrows,1)

OR

X_train[peptide].to_numpy().reshape(-1,1)
#Output = (nrows,1)

This should work -

In case there is another error, then there is more than one issue with the code. Please post that error in the comments as well.

for peptide in peptides:
    model=LogisticRegression()
    model.fit(X_train[[peptide]], y_train)
    score = model.score(X_test[[peptide]], y_test)
    y_pred=model.predict(X_test[[peptide]])
    acc_score = accuracy_score(y_test, y_pred)
    LR_scores.append(peptide,acc_score)
    
    #Classification Report
    print (classification_report(y_test,y_pred))
    
    #Confusion Matrix
    cnf_matrix = confusion_matrix(y_test,y_pred)
    print(cnf_matrix)
    
    #ROC_AUC Curves
    y_predict_proba = model.predict_proba(X_test[[peptide]])
    probabilities = np.array(y_predict_proba)[:, 1]
    fpr, tpr, thresholds = roc_curve(y_test, probabilities, pos_label=1)
    roc_auc = auc(fpr, tpr)
    logit_roc_auc = roc_auc_score(y_test, model.predict(X_test[[peptide]]))
10
  • I have tried that, but I end up with this error:: Found input variables with inconsistent numbers of samples: [26, 13]
    – thejahcoop
    Commented Aug 29, 2020 at 22:01
  • does this work? you are using multiple functions that use a 2D train data, and 1D y data. please make sure you are making these changes in ALL of them Commented Aug 29, 2020 at 22:04
  • I have tried all of these without success. Most recently, was the middle one which produced this error: Expected 2D array, got 1D array instead: array=[13. 1.]. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
    – thejahcoop
    Commented Aug 29, 2020 at 22:15
  • I have only applied these changes to the X_train or X_test values
    – thejahcoop
    Commented Aug 29, 2020 at 22:15
  • I am using this loop but the input list is a combination of peptides made with "combos = list(combinations(X_train.columns,4))". I am then using " for combo in combos" with the code working fine. I not sure what is happening.
    – thejahcoop
    Commented Aug 29, 2020 at 22:20

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.