Attempting to convert to numeric and check for nulls won't work. Almost all data files will have missing numeric values, which will appear as NA
. Data loading functions like read_csv will generate NAs for every empty field and common NaN markers
By default the following values are interpreted as NaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘None’, ‘n/a’, ‘nan’, ‘null’.
Besides, trying to convert all the values in a series and then check if any failed does the same job twice. Pandas has built-in methods to detect/convert types that will stop immediately if conversion fails.
One option is infer_object, which tries to detect the types of any object
Series. Another option is convert_dtypes which will try to find the best type for the values.
Using this dataframe, where everything is object
:
df = pd.DataFrame(
{
"a": pd.Series([1, 2, 3], dtype=np.dtype("O")),
"b": pd.Series(["x", "y", "z"], dtype=np.dtype("O")),
"c": pd.Series([True, False, np.nan], dtype=np.dtype("O")),
"d": pd.Series(["h", "i", np.nan], dtype=np.dtype("O")),
"e": pd.Series([10, np.nan, 20], dtype=np.dtype("O")),
"f": pd.Series([np.nan, 100.5, 200], dtype=np.dtype("O")),
}
)
infer_objects()
produces these types:
df_i=df.infer_objects()
df_i.dtypes
-----------------------
a int64
b object
c object
d object
e float64
f float64
dtype: object
While convert_dtypes
goes deeper :
df_c=df.convert_dtypes()
df_c.dtypes
------------------------
a Int64
b string
c boolean
d string
e Int64
f Float64
object
orstring
columns tofloat
? Or are you trying to determine the contents of mixed columns? Why not useastype
orconvert_dtypes
? Please post an example of what you want