2

I am trying to make a sklearn image classifier but I am unable to fit the data into a classifier.

x_train = np.array(im_matrix)
y_train = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
clf = GaussianNB()
clf.fit(x_train, y_train)

at clf.fit(x_train, y_train) I get following error:

ValueError: setting an array element with a sequence.

im_matrix is an array holding image matrices:

for file in files:
        path = os.path.join(root, file)
        im_matrix.append(mpimg.imread(path))

shape of x_train is (10, 1) shape of y_train is (10,)

I am guessing the problem is with the x_train as its weirdly shaped:

array([array([[[227, 255, 233],
        [227, 255, 233],
        [227, 255, 233],
        ...,
        [175, 140, 160],
        [175, 140, 160],
        [175, 140, 160]],

       [[227, 255, 233],
        [227, 255, 233],
        [227, 255, 233],
        ...,
        [174, 139, 159],
        [174, 139, 159],
        [174, 139, 159]],

       [[227, 255, 233],
        [227, 255, 233],
        [227, 255, 233],
        ...,
        [173, 138, 158],
        [173, 138, 158],
        [173, 138, 158]],

       ...,

       [[199, 222, 253],
        [121, 142, 169],
        [ 13,  34,  55],
        ...,
        [ 31,  40,  49],
        [ 31,  40,  49],
        [ 32,  41,  50]],

       [[187, 206, 246],
        [ 80,  98, 134],
        [  0,  13,  41],
        ...,
        [ 36,  44,  63],
        [ 35,  43,  62],
        [ 35,  43,  62]],

       [[187, 206, 246],
        [ 80,  98, 134],
        [  0,  13,  41],
        ...,
        [ 36,  44,  63],
        [ 35,  43,  62],
        [ 35,  43,  62]]], dtype=uint8),

This has been asked here several times, but I could not find a solution. Any help would be appreciated

1
  • 1
    You have an array of 3-d images (RGB colors). So your data is currently 4-d. That wont work with scikit. All scikit estimators only work with 2-d data. So you need to reshape the image data as a single vector, and then append it to im_matrix. Commented Jul 25, 2018 at 10:31

1 Answer 1

2

Most (if not all) scikit-learn functions expect as input X, a 2D array with shape (n_samples, n_features).

See the doc: http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html#sklearn.naive_bayes.GaussianNB.fit

Fit Gaussian Naive Bayes according to X, y

Parameters: X : array-like, shape (n_samples, n_features)

Training vectors, where n_samples is the number of samples and n_features is the number of features.

To solve your problem, use a vector representation of the images and then put each vector as a row in your x_train matrix.

Finally, use this X for the fitting of the GaussianNB.


How to vectorize an image ?

Use something like this:

import numpy as np
from PIL import Image

img = Image.open('orig.png').convert('RGBA')
arr = np.array(img)


# make a 1-dimensional view of arr
flat_arr = arr.ravel()
Sign up to request clarification or add additional context in comments.

3 Comments

Vectorization seems to work, and to add each element as a row I have tried using np.vstack, but the length of each row seems to be different from each other, so vstack gives an error. Could this be due to the size difference of pictures?
Yes. Again, scikit-learn need samples (images in your case) that have the same number of features (e.g. same number of elements for each vectorized image). Do you have different sizes /types of images ?
I see, that is the issue then. Thank you very much for help

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.