PYTHON3 - How to use List Comprehension with a numpy array, to unpack a tuple of Lists, to avoid running for loop multiple times

Question

In Short ->

How to write this expression correctly?? ->

[(self._mean,self._var,self._priors)] = [ ([X[y==c].mean(axis=0)] , [X[y==c].var(axis=0)],[X[y==c].shape[0] / n_samples ]) for c in self.classes]

A Minimal Reproducable Example of this problem is , to generate the same error is :

from sklearn.model_selection import train_test_split
from sklearn import datasets
import numpy as np
import matplotlib.pyplot as plt
import time


X,y = datasets.make_classification(n_samples=1000,n_classes = 2, n_features=10,random_state= 1234)

Classes = [0,1,2,3,4,5,6,7,8,9]

[[_mean, _var]] = [[ (np.mean(X[i%10==c]),np.var(X[i%10==c])) for c in Classes ] for i in range(len(X)) ]

print(_mean)
print(_var)

with the error stack as :

 /bin/python3 "/home/vivek/Documents/GitHub/ML-Coding-Playground/LecturesSeries1/Lecture 5 - Naive Bayes/CodeSample.py"                                             ─╯
/home/vivek/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3474: RuntimeWarning: Mean of empty slice.
  return _methods._mean(a, axis=axis, dtype=dtype,
/home/vivek/.local/lib/python3.8/site-packages/numpy/core/_methods.py:189: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
/home/vivek/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3757: RuntimeWarning: Degrees of freedom <= 0 for slice
  return _methods._var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
/home/vivek/.local/lib/python3.8/site-packages/numpy/core/_methods.py:222: RuntimeWarning: invalid value encountered in true_divide
  arrmean = um.true_divide(arrmean, div, out=arrmean, casting='unsafe',
/home/vivek/.local/lib/python3.8/site-packages/numpy/core/_methods.py:256: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
Traceback (most recent call last):
  File "/home/vivek/Documents/GitHub/ML-Coding-Playground/LecturesSeries1/Lecture 5 - Naive Bayes/CodeSample.py", line 12, in <module>
    [[_mean, _var]] = [[ (np.mean(X[i%10==c]),np.var(X[i%10==c])) for c in Classes ] for i in range(len(X)) ]
ValueError: too many values to unpack (expected 1)

Context for the Line of code :

I am running a naive bayesian classifier from scratch, and have written the following script to run my code :

#script.py
#
#
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import datasets
import matplotlib.pyplot as plt
import time

from NaiveBayes import *

def accuracy (y_true, y_pred):
    accuracy=np.sum(y_true==y_pred)/len(y_true)
    return accuracy

X,y = datasets.make_classification(n_samples=1000,n_classes = 2, n_features=10,random_state= 1234)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=69420)

nb=NaiveBayes()
nb.fit(X_train,y_train)
y_pred=nb.predict(X_test)

print("Accuracy:",accuracy(y_test,y_pred))
print(  "Confusion Matrix:")
print(  np.array([[np.sum(y_test==0),np.sum(y_test==1)],[np.sum(y_pred==0),np.sum(y_pred==1)]]))

I have made a few attempts at the code for my naive bayes classifier,

With use of for loop (working)

#NaiveBayes.py
#
#
import numpy as np
class NaiveBayes:
    def fit(self,X,y):
        n_samples, n_features = X.shape

        self.classes = np.unique(y)
        n_classes = len(self.classes)

        #init mean , var, priors
        self._mean = np.zeros((n_classes,n_features), dtype=np.float64)
        self._var = np.zeros((n_classes,n_features), dtype=np.float64)
        self._priors = np.zeros(n_classes, dtype=np.float64)

        for c in self.classes:
            X_c = X[y==c]
            self._mean[c] = X_c.mean(axis=0)
            self._var[c] = X_c.var(axis=0)
            self._priors[c] = X_c.shape[0] / n_samples

        #trying to use list comprehenstion to remove the loop
        # self._mean = [X[y==c].mean(axis=0) for c in self.classes]
        # self._var= [X[y==c].var(axis=0) for c in self.classes]
        # self._priors= [X[y==c].shape[0] / n_samples for c in self.classes]

        # Trying to have only one command for all three
        print([c for c in self.classes]) #debugging
        # [(self._mean,self._var,self._priors)] = [ ([X[y==c].mean(axis=0)] , [X[y==c].var(axis=0)],[X[y==c].shape[0] / n_samples ]) for c in self.classes]
        # (self._mean,self._var,self._priors) = ([X[y==c].mean(axis=0)] , [X[y==c].var(axis=0)],[X[y==c].shape[0] / n_samples ] for c in self.classes ) 

        #debugging
        print(self._mean)
        print(self._var)
        print(self._priors)

    def predict(self,X):
        y_pred = [self._predict(x) for x in X]
        return np.array(y_pred)

    def _predict(self,x):
        posteriors = [self._posterior(x,c,idx) for (idx,c) in enumerate(self.classes)]
        return self.classes[np.argmax(posteriors)]

    def _posterior(self,x,c,idx):
        prior = np.log(self._priors[idx])
        likelihood = np.prod((self._likelihood)(idx,x))
        return prior + np.log(likelihood)

    def _likelihood(self,class_idx,x): # x is a single sample , c is the class, class_idx is the id of said class , and this returns the likelihood of the sample belonging to the class, given the mean and variance of the class, and the priors of the class , IE the probability of the sample belonging to the class as the __PDF__ of the class. It is the _pdf function from the video
        mean = self._mean[class_idx]
        var = self._var[class_idx]
        n_features = len(x)
        coeff = 1.0 / np.sqrt(2 * np.pi * var)
        exp = np.exp(-(x - mean)**2 / (2 * var))
        return coeff * exp

With use of three list comprehensions (working)

#NaiveBayes.py
#
#
import numpy as np
class NaiveBayes:
    def fit(self,X,y):
        n_samples, n_features = X.shape

        self.classes = np.unique(y)
        n_classes = len(self.classes)

        #init mean , var, priors
        self._mean = np.zeros((n_classes,n_features), dtype=np.float64)
        self._var = np.zeros((n_classes,n_features), dtype=np.float64)
        self._priors = np.zeros(n_classes, dtype=np.float64)

        # for c in self.classes:
        #     X_c = X[y==c]
        #     self._mean[c] = X_c.mean(axis=0)
        #     self._var[c] = X_c.var(axis=0)
        #     self._priors[c] = X_c.shape[0] / n_samples

        #trying to use list comprehenstion to remove the loop
        self._mean = [X[y==c].mean(axis=0) for c in self.classes]
        self._var= [X[y==c].var(axis=0) for c in self.classes]
        self._priors= [X[y==c].shape[0] / n_samples for c in self.classes]

        # Trying to have only one command for all three
        print([c for c in self.classes]) #debugging
        # [(self._mean,self._var,self._priors)] = [ ([X[y==c].mean(axis=0)] , [X[y==c].var(axis=0)],[X[y==c].shape[0] / n_samples ]) for c in self.classes]
        # (self._mean,self._var,self._priors) = ([X[y==c].mean(axis=0)] , [X[y==c].var(axis=0)],[X[y==c].shape[0] / n_samples ] for c in self.classes ) 

        #debugging
        print(self._mean)
        print(self._var)
        print(self._priors)

    def predict(self,X):
        y_pred = [self._predict(x) for x in X]
        return np.array(y_pred)

    def _predict(self,x):
        posteriors = [self._posterior(x,c,idx) for (idx,c) in enumerate(self.classes)]
        return self.classes[np.argmax(posteriors)]

    def _posterior(self,x,c,idx):
        prior = np.log(self._priors[idx])
        likelihood = np.prod((self._likelihood)(idx,x))
        return prior + np.log(likelihood)

    def _likelihood(self,class_idx,x): # x is a single sample , c is the class, class_idx is the id of said class , and this returns the likelihood of the sample belonging to the class, given the mean and variance of the class, and the priors of the class , IE the probability of the sample belonging to the class as the __PDF__ of the class. It is the _pdf function from the video
        mean = self._mean[class_idx]
        var = self._var[class_idx]
        n_features = len(x)
        coeff = 1.0 / np.sqrt(2 * np.pi * var)
        exp = np.exp(-(x - mean)**2 / (2 * var))
        return coeff * exp

With use of one list comprehension, and numpy array manipulation ( not working), (error not understood)

#NaiveBayes.py
#
#
import numpy as np
class NaiveBayes:
    def fit(self,X,y):
        n_samples, n_features = X.shape

        self.classes = np.unique(y)
        n_classes = len(self.classes)

        #init mean , var, priors
        self._mean = np.zeros((n_classes,n_features), dtype=np.float64)
        self._var = np.zeros((n_classes,n_features), dtype=np.float64)
        self._priors = np.zeros(n_classes, dtype=np.float64)

        # for c in self.classes:
        #     X_c = X[y==c]
        #     self._mean[c] = X_c.mean(axis=0)
        #     self._var[c] = X_c.var(axis=0)
        #     self._priors[c] = X_c.shape[0] / n_samples

        #trying to use list comprehenstion to remove the loop
        # self._mean = [X[y==c].mean(axis=0) for c in self.classes]
        # self._var= [X[y==c].var(axis=0) for c in self.classes]
        # self._priors= [X[y==c].shape[0] / n_samples for c in self.classes]

        # Trying to have only one command for all three
        print([c for c in self.classes]) #debugging
        print(np.array([ [ np.array([X[y==c].mean(axis=0)]).flatten() , np.array([X[y==c].var(axis=0)]).flatten(), np.array ([X[y==c].shape[0] / n_samples ]).flatten() ] for c in self.classes],dtype=object).flatten() )#debugging
        TempArray = np.array([ [ np.array([X[y==c].mean(axis=0)]).flatten() , np.array([X[y==c].var(axis=0)]).flatten(), np.array ([X[y==c].shape[0] / n_samples ]).flatten() ] for c in self.classes]).flatten()

        self._mean=TempArray[0]
        self._var=TempArray[1]
        self._priors = TempArray[2] 
        # [(self._mean,self._var,self._priors)] = [ ([X[y==c].mean(axis=0)] , [X[y==c].var(axis=0)],[X[y==c].shape[0] / n_samples ]) for c in self.classes]
        # (self._mean,self._var,self._priors) = ([X[y==c].mean(axis=0)] , [X[y==c].var(axis=0)],[X[y==c].shape[0] / n_samples ] for c in self.classes ) 

        #debugging
        print(self._mean)
        print(self._var)
        print(self._priors)

    def predict(self,X):
        y_pred = [self._predict(x) for x in X]
        return np.array(y_pred)

    def _predict(self,x):
        posteriors = [self._posterior(x,c,idx) for (idx,c) in enumerate(self.classes)]
        return self.classes[np.argmax(posteriors)]

    def _posterior(self,x,c,idx):
        prior = np.log(self._priors[idx])
        likelihood = np.prod((self._likelihood)(idx,x))
        return prior + np.log(likelihood)

    def _likelihood(self,class_idx,x): # x is a single sample , c is the class, class_idx is the id of said class , and this returns the likelihood of the sample belonging to the class, given the mean and variance of the class, and the priors of the class , IE the probability of the sample belonging to the class as the __PDF__ of the class. It is the _pdf function from the video
        mean = self._mean[class_idx]
        var = self._var[class_idx]
        n_features = len(x)
        coeff = 1.0 / np.sqrt(2 * np.pi * var)
        exp = np.exp(-(x - mean)**2 / (2 * var))
        return coeff * exp

and the one that I have an error in :

With use of one list comprehension and Iterable unpacking ( Error : ValueError: too many values to unpack (expected 1)

line 33

#NaiveBayes.py
#
#
import numpy as np
class NaiveBayes:
    def fit(self,X,y):
        n_samples, n_features = X.shape

        self.classes = np.unique(y)
        n_classes = len(self.classes)

        #init mean , var, priors
        self._mean = np.zeros((n_classes,n_features), dtype=np.float64)
        self._var = np.zeros((n_classes,n_features), dtype=np.float64)
        self._priors = np.zeros(n_classes, dtype=np.float64)

        # for c in self.classes:
        #     X_c = X[y==c]
        #     self._mean[c] = X_c.mean(axis=0)
        #     self._var[c] = X_c.var(axis=0)
        #     self._priors[c] = X_c.shape[0] / n_samples

        #trying to use list comprehenstion to remove the loop
        # self._mean = [X[y==c].mean(axis=0) for c in self.classes]
        # self._var= [X[y==c].var(axis=0) for c in self.classes]
        # self._priors= [X[y==c].shape[0] / n_samples for c in self.classes]

        # Trying to have only one command for all three
        print([c for c in self.classes]) #debugging
        # print(np.array([ [ np.array([X[y==c].mean(axis=0)]).flatten() , np.array([X[y==c].var(axis=0)]).flatten(), np.array ([X[y==c].shape[0] / n_samples ]).flatten() ] for c in self.classes],dtype=object).flatten() )#debugging
        # TempArray = np.array([ [ np.array([X[y==c].mean(axis=0)]).flatten() , np.array([X[y==c].var(axis=0)]).flatten(), np.array ([X[y==c].shape[0] / n_samples ]).flatten() ] for c in self.classes]).flatten()

        # self._mean=TempArray[0]
        # self._var=TempArray[1]
        # self._priors = TempArray[2] 
        [(self._mean,self._var,self._priors)] = [ ([X[y==c].mean(axis=0)] , [X[y==c].var(axis=0)],[X[y==c].shape[0] / n_samples ]) for c in self.classes]
        # (self._mean,self._var,self._priors) = ([X[y==c].mean(axis=0)] , [X[y==c].var(axis=0)],[X[y==c].shape[0] / n_samples ] for c in self.classes ) 

        #debugging
        print(self._mean)
        print(self._var)
        print(self._priors)

    def predict(self,X):
        y_pred = [self._predict(x) for x in X]
        return np.array(y_pred)

    def _predict(self,x):
        posteriors = [self._posterior(x,c,idx) for (idx,c) in enumerate(self.classes)]
        return self.classes[np.argmax(posteriors)]

    def _posterior(self,x,c,idx):
        prior = np.log(self._priors[idx])
        likelihood = np.prod((self._likelihood)(idx,x))
        return prior + np.log(likelihood)

    def _likelihood(self,class_idx,x): # x is a single sample , c is the class, class_idx is the id of said class , and this returns the likelihood of the sample belonging to the class, given the mean and variance of the class, and the priors of the class , IE the probability of the sample belonging to the class as the __PDF__ of the class. It is the _pdf function from the video
        mean = self._mean[class_idx]
        var = self._var[class_idx]
        n_features = len(x)
        coeff = 1.0 / np.sqrt(2 * np.pi * var)
        exp = np.exp(-(x - mean)**2 / (2 * var))
        return coeff * exp

Other Failed attempts

(self._mean,self._var,self._priors) = ([X[y==c].mean(axis=0)] , [X[y==c].var(axis=0)],[X[y==c].shape[0] / n_samples ] for c in self.classes )

was a failed attempt

Can You explain the correct way to do this, and why these other approaches of mine are largely failing?

Thank you for your time.

Hi, and welcome. This is a very long question. If you can make a small, runnable example that illustrates your specific problem, it will be much easier to help you. — Matt Hall, Commented May 27, 2022 at 13:03
I have added a Minimal reproduction of the error in the question. — Vivek Mathur, Commented May 27, 2022 at 13:34
Okay, thanks Vivek; you can probably just delete the rest of the question. — Matt Hall, Commented May 27, 2022 at 13:56
list comprehension is not a big improvement over the regular loop. You get the unpacking error because you haven't actually the examined the result. You are just guessing as to what a comprehension like [(i,2*i) for i in range(5)] produces. — hpaulj, Commented May 27, 2022 at 14:40

hpaulj · Accepted Answer · 2022-05-27 15:36:16Z

Look at what a list comprehension that does 2 things in the body produces:

In [122]: alist = [(i,i*2) for i in range(3)]

In [123]: alist
Out[123]: [(0, 0), (1, 2), (2, 4)]

That's one list with 3 items. I cannot unpack that into two lists.

List comprehension is streamlined way of writing a loop with an append:

In [125]: alist = []
     ...: for i in range(3):
     ...:     alist.append((i,2*i))
     ...: alist
Out[125]: [(0, 0), (1, 2), (2, 4)]

The loop you are trying to rewrite does several things in the body:

        for c in self.classes:
            X_c = X[y==c]
            self._mean[c] = X_c.mean(axis=0)
            self._var[c] = X_c.var(axis=0)
            self._priors[c] = X_c.shape[0] / n_samples

You manange to rewrite it as 3 list comprehensions, but that doesn't save time - that's 3 iterations instead of one. And as the above example shows, you can't unpack a single comprehension into 3.

Well there is a way - apply a list version of transpose to the list:

In [126]: list(zip(*alist))
Out[126]: [(0, 1, 2), (0, 2, 4)]

Using whole-array computations as suggested the other answer is better, but I thought you needed a basic look at list comprehensions as well.

Unpacking can make nice compact code, but it is quite unforgiving when it comes to matching values

The 2 element list in 126 can be unpacked to 2 variables:

In [127]: a,b = Out[126]
In [128]: [a,b] = Out[126]  # or (a,b)=  all the same thing

In [129]: a
Out[129]: (0, 1, 2)

but you add a layer of []:

In [130]: [[a,b]] = Out[126]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [130], in <cell line: 1>()
----> 1 [[a,b]] = Out[126]

ValueError: too many values to unpack (expected 1)

This unpacking only works for:

In [133]: [[a,b]] = Out[123][1:2]

In [134]: Out[123][1:2]
Out[134]: [(1, 2)]

Note the same layers of nesting on both sides of the assignment. That's important when unpacking.

Thank you for the answer! It cleared a lot of basic doubts that I had with list comprehension in python. I will keep this in mind in the future. — Vivek Mathur, Commented May 30, 2022 at 3:09

Matt Hall · Accepted Answer · 2022-05-27 13:53:46Z

-1

It looks like you're trying to compute the mean and variance of each column in X. You can accomplish this without loops like so:

mean = np.mean(X, axis=0)
var = np.var(X, axis=0)

In general, you rarely (almost never) need loops with NumPy. The axis argument tells NumPy to compute along rows or columns (as opposed to computing the statistics for the whole array).

(By the way, the columns of X are usually referred to as 'features'. Most people use the word 'classes' for the unique values of y, which are 0 and 1 in this example.)

answered May 27, 2022 at 13:53

Matt Hall

8,1621 gold badge27 silver badges37 bronze badges

Thank you for the answer! This tells me that I need to delve into the documentation of NumPy. I will do so now.
– Vivek Mathur
Commented May 30, 2022 at 3:10

Add a comment |

Collectives™ on Stack Overflow

PYTHON3 - How to use List Comprehension with a numpy array, to unpack a tuple of Lists, to avoid running for loop multiple times

In Short ->

A Minimal Reproducable Example of this problem is , to generate the same error is :

Context for the Line of code :

2 Answers 2

Hot Network Questions

Collectives™ on Stack Overflow

In Short ->

A Minimal Reproducable Example of this problem is , to generate the same error is :

Context for the Line of code :

2 Answers 2

Related