ValueError: setting an array element with a sequence in scikit-learn (sklearn) using GaussianNB

Question

I am trying to make a sklearn image classifier but I am unable to fit the data into a classifier.

x_train = np.array(im_matrix)
y_train = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
clf = GaussianNB()
clf.fit(x_train, y_train)

at clf.fit(x_train, y_train) I get following error:

ValueError: setting an array element with a sequence.

im_matrix is an array holding image matrices:

for file in files:
        path = os.path.join(root, file)
        im_matrix.append(mpimg.imread(path))

shape of x_train is (10, 1) shape of y_train is (10,)

I am guessing the problem is with the x_train as its weirdly shaped:

array([array([[[227, 255, 233],
        [227, 255, 233],
        [227, 255, 233],
        ...,
        [175, 140, 160],
        [175, 140, 160],
        [175, 140, 160]],

       [[227, 255, 233],
        [227, 255, 233],
        [227, 255, 233],
        ...,
        [174, 139, 159],
        [174, 139, 159],
        [174, 139, 159]],

       [[227, 255, 233],
        [227, 255, 233],
        [227, 255, 233],
        ...,
        [173, 138, 158],
        [173, 138, 158],
        [173, 138, 158]],

       ...,

       [[199, 222, 253],
        [121, 142, 169],
        [ 13,  34,  55],
        ...,
        [ 31,  40,  49],
        [ 31,  40,  49],
        [ 32,  41,  50]],

       [[187, 206, 246],
        [ 80,  98, 134],
        [  0,  13,  41],
        ...,
        [ 36,  44,  63],
        [ 35,  43,  62],
        [ 35,  43,  62]],

       [[187, 206, 246],
        [ 80,  98, 134],
        [  0,  13,  41],
        ...,
        [ 36,  44,  63],
        [ 35,  43,  62],
        [ 35,  43,  62]]], dtype=uint8),

This has been asked here several times, but I could not find a solution. Any help would be appreciated

You have an array of 3-d images (RGB colors). So your data is currently 4-d. That wont work with scikit. All scikit estimators only work with 2-d data. So you need to reshape the image data as a single vector, and then append it to im_matrix. — Vivek Kumar
– Vivek Kumar, Commented Jul 25, 2018 at 10:31

seralouk · Accepted Answer · 2018-07-25 11:38:52Z

2

Most (if not all) scikit-learn functions expect as input X, a 2D array with shape (n_samples, n_features).

See the doc: http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html#sklearn.naive_bayes.GaussianNB.fit

Fit Gaussian Naive Bayes according to X, y

Parameters: X : array-like, shape (n_samples, n_features)

Training vectors, where n_samples is the number of samples and n_features is the number of features.

To solve your problem, use a vector representation of the images and then put each vector as a row in your x_train matrix.

Finally, use this X for the fitting of the GaussianNB.

How to vectorize an image ?

Use something like this:

import numpy as np
from PIL import Image

img = Image.open('orig.png').convert('RGBA')
arr = np.array(img)


# make a 1-dimensional view of arr
flat_arr = arr.ravel()

edited Jul 25, 2018 at 11:38

answered Jul 25, 2018 at 11:23

seralouk

33.6k10 gold badges127 silver badges141 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Ach113 Over a year ago

Vectorization seems to work, and to add each element as a row I have tried using np.vstack, but the length of each row seems to be different from each other, so vstack gives an error. Could this be due to the size difference of pictures?

seralouk Over a year ago

Yes. Again, scikit-learn need samples (images in your case) that have the same number of features (e.g. same number of elements for each vectorized image). Do you have different sizes /types of images ?

Ach113 Over a year ago

I see, that is the issue then. Thank you very much for help

Collectives™ on Stack Overflow

ValueError: setting an array element with a sequence in scikit-learn (sklearn) using GaussianNB

1 Answer 1

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Linked

Related