How do I remove NaN values from a NumPy array?

Question

[1, 2, NaN, 4, NaN, 8]   ⟶   [1, 2, 4, 8]

Mateen Ulhaq · Accepted Answer · 2022-07-30 05:52:24Z

620

To remove NaN values from a NumPy array x:

x = x[~numpy.isnan(x)]

Explanation

The inner function numpy.isnan returns a boolean/logical array which has the value True everywhere that x is not-a-number. Since we want the opposite, we use the logical-not operator ~ to get an array with Trues everywhere that x is a valid number.

Lastly, we use this logical array to index into the original array x, in order to retrieve just the non-NaN values.

edited Jul 30, 2022 at 5:52

Mateen Ulhaq

27.4k21 gold badges119 silver badges152 bronze badges

answered Jul 23, 2012 at 21:42

jmetz

12.8k3 gold badges32 silver badges41 bronze badges

53

Or x = x[numpy.isfinite(x)]
– Miki Tebeka
Commented Jul 23, 2012 at 22:29
23

Or x = x[~numpy.isnan(x)], which is equivalent to mutzmatron's original answer, but shorter. In case you want to keep your infinities around, know that numpy.isfinite(numpy.inf) == False, of course, but ~numpy.isnan(numpy.inf) == True.
– chbrown
Commented Nov 19, 2013 at 19:02
17

For people looking to solve this with an ndarray and maintain the dimensions, use numpy where: np.where(np.isfinite(x), x, 0)
– BoltzmannBrain
Commented Sep 7, 2017 at 2:51
1

TypeError: only integer scalar arrays can be converted to a scalar index
– towry
Commented Jun 30, 2018 at 14:29
1

@towry: this is happening because your input, x is not a numpy array. If you want to use logical indexing, it must be an array - e.g. x = np.array(x)
– jmetz
Commented Jul 2, 2018 at 11:32

| Show 5 more comments

udibr · Accepted Answer · 2015-04-16 15:46:36Z

77

filter(lambda v: v==v, x)

works both for lists and numpy array since v!=v only for NaN

answered Apr 16, 2015 at 15:46

udibr

1,0897 silver badges5 bronze badges

12

A hack but an especially useful one in the case where you are filtering nans from an array of objects with mixed types, such as a strings and nans.
– Austin Richardson
Commented Jun 29, 2015 at 14:15
5

This might seem clever, but if obscures the logic and theoretically other objects (such as custom classes) can also have this property
– Chris_Rands
Commented Jul 31, 2018 at 15:02
1

Also useful because it only needs x to be specified once as opposed to solutions of the type x[~numpy.isnan(x)]. This is convenient when x is defined by a long expression and you don't want to clutter the code by creating a temporary variable to store the result of this long expression.
– Christian O'Reilly
Commented Jun 15, 2020 at 1:09
1

It might be slow compere to x[~numpy.isnan(x)]
– smm
Commented Aug 21, 2020 at 21:23
1

what is v and what is x?
– M_Idk392845
Commented Nov 21, 2022 at 21:12

| Show 3 more comments

Daniel Kislyuk · Accepted Answer · 2017-04-18 14:37:51Z

46

For me the answer by @jmetz didn't work, however using pandas isnull() did.

x = x[~pd.isnull(x)]

answered Apr 18, 2017 at 14:37

Daniel Kislyuk

99610 silver badges12 bronze badges

2

or: x = x[x.notnull()]
– kbridge4096
Commented Jun 5, 2022 at 17:18
I am not found of including pandas on the pipe but the accepted solution got me TypeError: ufunc 'isnan' not supported for the input types. It does not work with strings or object types. This solution did.
– Llohann
Commented Jun 23, 2023 at 7:32
Added benefit for this one is it removes NaTs out of the box
– Rafs
Commented Feb 16, 2024 at 15:53

Add a comment |

liori · Accepted Answer · 2012-07-23 21:39:59Z

36

Try this:

import math
print [value for value in x if not math.isnan(value)]

For more, read on List Comprehensions.

answered Jul 23, 2012 at 21:39

liori

42.4k13 gold badges80 silver badges111 bronze badges

6

If you're using numpy both my answer and that by @lazy1 are almost an order of magnitude faster than the list comprehension - lazy1's solution is slightly faster (though technically will also not return any infinity values).
– jmetz
Commented Jul 24, 2012 at 13:54
Don't forget the brackets :) print ([value for value in x if not math.isnan(value)])
– hypers
Commented Nov 22, 2017 at 16:09
If you're using numpy like the top answer then you can use this list comprehension answer with the np package: So returns your list without the nans: [value for value in x if not np.isnan(value)]
– yeliabsalohcin
Commented Nov 23, 2018 at 14:09

Add a comment |

M4urice · Accepted Answer · 2020-05-04 09:43:05Z

28

@jmetz's answer is probably the one most people need; however it yields a one-dimensional array, e.g. making it unusable to remove entire rows or columns in matrices.

To do so, one should reduce the logical array to one dimension, then index the target array. For instance, the following will remove rows which have at least one NaN value:

x = x[~numpy.isnan(x).any(axis=1)]

See more detail here.

answered May 4, 2020 at 9:43

M4urice

6611 gold badge9 silver badges16 bronze badges

Add a comment |

Shashank Srivastava · Accepted Answer · 2020-01-09 13:55:35Z

10

As shown by others

x[~numpy.isnan(x)]

works. But it will throw an error if the numpy dtype is not a native data type, for example if it is object. In that case you can use pandas.

x[~pandas.isna(x)] or x[~pandas.isnull(x)]

edited Jan 9, 2020 at 13:55

Shashank Srivastava

1952 silver badges13 bronze badges

answered Nov 25, 2017 at 12:55

koliyat9811

9151 gold badge10 silver badges12 bronze badges

Add a comment |

aloha · Accepted Answer · 2018-02-16 09:19:02Z

8

If you're using numpy

# first get the indices where the values are finite
ii = np.isfinite(x)

# second get the values
x = x[ii]

answered Feb 16, 2018 at 9:19

aloha

4,8146 gold badges35 silver badges43 bronze badges

Add a comment |

Markus Dutschke · Accepted Answer · 2019-03-16 06:37:23Z

The accepted answer changes shape for 2d arrays. I present a solution here, using the Pandas dropna() functionality. It works for 1D and 2D arrays. In the 2D case you can choose weather to drop the row or column containing np.nan.

import pandas as pd
import numpy as np

def dropna(arr, *args, **kwarg):
    assert isinstance(arr, np.ndarray)
    dropped=pd.DataFrame(arr).dropna(*args, **kwarg).values
    if arr.ndim==1:
        dropped=dropped.flatten()
    return dropped

x = np.array([1400, 1500, 1600, np.nan, np.nan, np.nan ,1700])
y = np.array([[1400, 1500, 1600], [np.nan, 0, np.nan] ,[1700,1800,np.nan]] )


print('='*20+' 1D Case: ' +'='*20+'\nInput:\n',x,sep='')
print('\ndropna:\n',dropna(x),sep='')

print('\n\n'+'='*20+' 2D Case: ' +'='*20+'\nInput:\n',y,sep='')
print('\ndropna (rows):\n',dropna(y),sep='')
print('\ndropna (columns):\n',dropna(y,axis=1),sep='')

print('\n\n'+'='*20+' x[np.logical_not(np.isnan(x))] for 2D: ' +'='*20+'\nInput:\n',y,sep='')
print('\ndropna:\n',x[np.logical_not(np.isnan(x))],sep='')

Result:

==================== 1D Case: ====================
Input:
[1400. 1500. 1600.   nan   nan   nan 1700.]

dropna:
[1400. 1500. 1600. 1700.]


==================== 2D Case: ====================
Input:
[[1400. 1500. 1600.]
 [  nan    0.   nan]
 [1700. 1800.   nan]]

dropna (rows):
[[1400. 1500. 1600.]]

dropna (columns):
[[1500.]
 [   0.]
 [1800.]]


==================== x[np.logical_not(np.isnan(x))] for 2D: ====================
Input:
[[1400. 1500. 1600.]
 [  nan    0.   nan]
 [1700. 1800.   nan]]

dropna:
[1400. 1500. 1600. 1700.]

Robin Teuwens · Accepted Answer · 2021-03-15 18:36:11Z

In case it helps, for simple 1d arrays:

x = np.array([np.nan, 1, 2, 3, 4])

x[~np.isnan(x)]
>>> array([1., 2., 3., 4.])

but if you wish to expand to matrices and preserve the shape:

x = np.array([
    [np.nan, np.nan],
    [np.nan, 0],
    [1, 2],
    [3, 4]
])

x[~np.isnan(x).any(axis=1)]
>>> array([[1., 2.],
           [3., 4.]])

I encountered this issue when dealing with pandas .shift() functionality, and I wanted to avoid using .apply(..., axis=1) at all cost due to its inefficiency.

melissaOu · Accepted Answer · 2016-06-23 20:35:51Z

6

Doing the above :

x = x[~numpy.isnan(x)]

or

x = x[numpy.logical_not(numpy.isnan(x))]

I found that resetting to the same variable (x) did not remove the actual nan values and had to use a different variable. Setting it to a different variable removed the nans. e.g.

y = x[~numpy.isnan(x)]

answered Jun 23, 2016 at 20:35

melissaOu

611 silver badge2 bronze badges

This is strange; according to the docs, boolean array indexing (which this is), is under advanced indexing which apparently "always returns a copy of the data", so you should be over-writing x with the new value (i.e. without the NaNs...). Can you provide any more info as to why this could be happening?
– jmetz
Commented Mar 24, 2017 at 10:35

Add a comment |

bitbang · Accepted Answer · 2020-12-18 10:59:17Z

2

Simply fill with

 x = numpy.array([
 [0.99929941, 0.84724713, -0.1500044],
 [-0.79709026, numpy.NaN, -0.4406645],
 [-0.3599013, -0.63565744, -0.70251352]])

x[numpy.isnan(x)] = .555

print(x)

# [[ 0.99929941  0.84724713 -0.1500044 ]
#  [-0.79709026  0.555      -0.4406645 ]
#  [-0.3599013  -0.63565744 -0.70251352]]

edited Dec 18, 2020 at 10:59

answered Dec 18, 2020 at 10:08

bitbang

2,23217 silver badges20 bronze badges

Add a comment |

Darren Weber · Accepted Answer · 2022-05-19 16:50:42Z

pandas introduces an option to convert all data types to missing values.

https://pandas.pydata.org/docs/user_guide/missing_data.html

The np.isnan() function is not compatible with all data types, e.g.

>>> import numpy as np
>>> values = [np.nan, "x", "y"]
>>> np.isnan(values)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

The pd.isna() and pd.notna() functions are compatible with many data types and pandas introduces a pd.NA value:

>>> import numpy as np
>>> import pandas as pd

>>> values = pd.Series([np.nan, "x", "y"])
>>> values
0    NaN
1      x
2      y
dtype: object
>>> values.loc[pd.isna(values)]
0    NaN
dtype: object
>>> values.loc[pd.isna(values)] = pd.NA
>>> values.loc[pd.isna(values)]
0    <NA>
dtype: object
>>> values
0    <NA>
1       x
2       y
dtype: object

#
# using map with lambda, or a list comprehension
#

>>> values = [np.nan, "x", "y"]
>>> list(map(lambda x: pd.NA if pd.isna(x) else x, values))
[<NA>, 'x', 'y']
>>> [pd.NA if pd.isna(x) else x for x in values]
[<NA>, 'x', 'y']

Bruno Rodrigues de Oliveira · Accepted Answer · 2017-06-21 18:03:06Z

-2

A simplest way is:

numpy.nan_to_num(x)

Documentation: https://docs.scipy.org/doc/numpy/reference/generated/numpy.nan_to_num.html

answered Jun 21, 2017 at 18:03

Bruno Rodrigues de Oliveira

452 bronze badges

6

Welcome to SO! The solution you propose does not answer the problem: your solution substitutes NaNs with a large number, while the OP asked to entirely remove the elements.
– Pier Paolo
Commented Jun 21, 2017 at 18:49

Add a comment |

Collectives™ on Stack Overflow

How do I remove NaN values from a NumPy array?

13 Answers 13

Explanation

Linked

Hot Network Questions

Collectives™ on Stack Overflow

13 Answers 13

Explanation

Linked

Related