How to insert column of different type to numpy array?

Question

I would like to append two numpy arrays of type np.datetime64 and int to another.

This leads to an error. What do I have to do to correct this?

It works without error, if I append the vectors to itself (i. e.: np.append(c,c,axis=1) or np.append(a,a,axis=1))

numpy version: 1.14.3

import numpy as np
a = np.array([['2018-04-01T15:30:00'],
              ['2018-04-01T15:31:00'],
              ['2018-04-01T15:32:00'],
              ['2018-04-01T15:33:00'],
              ['2018-04-01T15:34:00']], dtype='datetime64[s]')
c = np.array([0,1,2,3,4]).reshape(-1,1)
c
Out[2]: 
array([[0],
       [1],
       [2],
       [3],
       [4]])
d = np.append(c,a,axis=1)
Traceback (most recent call last):
  File "/home/user/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2963, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-3-10548a83d1a2>", line 1, in <module>
    d = np.append(c,a,axis=1)
  File "/home/user/anaconda3/lib/python3.6/site-packages/numpy/lib/function_base.py", line 5166, in append
    return concatenate((arr, values), axis=axis)
TypeError: invalid type promotion

What dtype and shape were you expecting? Remember, a numpy array has only one dtype (though it may be compound). — hpaulj
– hpaulj, Commented Jan 20, 2019 at 17:02
While it's not the problem here, don't get in the habit of using np.append. It's a poorly named way of using np.concatenate. — hpaulj
– hpaulj, Commented Jan 20, 2019 at 17:10
Thanks for your advice regarding np.concatenate. Could you please explain to a numpy novice, why np.concatenate is better than np.append? Thanks in advance! — user7468395
– user7468395, Commented Jan 21, 2019 at 3:28
np.concatenate is the base function. Look at the code for np.append. It just tweaks the inputs (just 2) and calls concatenate. But more than that people tend to misuse it, thinking it's just like the list append. It's NOT. There are several stack functions that also use concatenate. np.stack is perhaps the most useful of these. But you can look at their code as well. — hpaulj
– hpaulj, Commented Jan 21, 2019 at 3:52

tel · Accepted Answer · 2019-01-20 19:39:26Z

Probably easiest - work with a Pandas `DataFrame` instead of an array

Truthfully, while Numpy arrays can be made to work with heterogenous columns, they may not be what most users actually need in this case. For many use cases, you may be better off using a Pandas DataFrame. Here's how to convert your two columns to a DataFrame called df:

import numpy as np
import pandas as pd

a = np.array([['2018-04-01T15:30:00'],
              ['2018-04-01T15:31:00'],
              ['2018-04-01T15:32:00'],
              ['2018-04-01T15:33:00'],
              ['2018-04-01T15:34:00']], dtype='datetime64[s]')
c = np.array([0,1,2,3,4]).reshape(-1,1)


df = pd.DataFrame(dict(date=a.ravel(), val=c.ravel()))
print(df)
# output:
#                      date  val
#     0 2018-04-01 15:30:00    0
#     1 2018-04-01 15:31:00    1
#     2 2018-04-01 15:32:00    2
#     3 2018-04-01 15:33:00    3
#     4 2018-04-01 15:34:00    4

You can then work with each of your columns like so:

print(df['date'])
# output:
#     0   2018-04-01 15:30:00
#     1   2018-04-01 15:31:00
#     2   2018-04-01 15:32:00
#     3   2018-04-01 15:33:00
#     4   2018-04-01 15:34:00
#     Name: date, dtype: datetime64[ns]

DataFrame objects provide a ton of methods that make it pretty easy to analyze this kind of data. See the Pandas docs (or other QAs on this site) for more info about DataFrame objects.

Numpy only solution - structured arrays

Generally, you should avoid arrays of dtype=object if you can. They cause performance issues with many of the basic Numpy operations (such as arithmetic, eg arr0 + arr1), and they may behave in ways you don't expect.

A better Numpy only solution is structured arrays. These arrays have a compound dtype, with one part per field (for the sake of this discussion, "field" is equivalent to "column", though you can do more interesting things with fields). Given your a and c arrays, here's how you can create a structured array:

# create the compound dtype
dtype = np.dtype(dict(names=['date', 'val'], formats=[arr.dtype for arr in (a, c)]))

# create an empty structured array
struct = np.empty(a.shape[0], dtype=dtype)

# populate the structured array with the data from your column arrays
struct['date'], struct['val'] = a.T, c.T

print(struct)
# output:
#     array([('2018-04-01T15:30:00', 0), ('2018-04-01T15:31:00', 1),
#            ('2018-04-01T15:32:00', 2), ('2018-04-01T15:33:00', 3),
#            ('2018-04-01T15:34:00', 4)],
#           dtype=[('date', '<M8[s]'), ('val', '<i8')])

You can then access the specific columns by indexing them with their name (just like you could with the DataFrame):

print(struct['date'])
# output:
#     ['2018-04-01T15:30:00' '2018-04-01T15:31:00' '2018-04-01T15:32:00'
#      '2018-04-01T15:33:00' '2018-04-01T15:34:00']

Structured array pitfalls

You can't, for example, add two structured arrays:

# doesn't work
struct0 + struct1

but you can add the fields of two structured arrays:

# works great
struct0['val'] + struct1['val']

In general, the fields behave just like standard Numpy arrays.

user7468395 · Accepted Answer · 2019-01-20 17:21:53Z

Taking into account the statements of the other users, leads to the insight, that converting the first array to dtype object is at least a workaround.

import numpy as np
a = np.array([['2018-04-01T15:30:00'],
       ['2018-04-01T15:31:00'],
       ['2018-04-01T15:32:00'],
       ['2018-04-01T15:33:00'],
       ['2018-04-01T15:34:00']], dtype='datetime64[s]')
a = a.astype("object")
c = np.array([0,1,2,3,4]).reshape(-1,1)
d = np.append(a,c,axis=1)
d

.

array([[datetime.datetime(2018, 4, 1, 15, 30), 0],
   [datetime.datetime(2018, 4, 1, 15, 31), 1],
   [datetime.datetime(2018, 4, 1, 15, 32), 2],
   [datetime.datetime(2018, 4, 1, 15, 33), 3],
   [datetime.datetime(2018, 4, 1, 15, 34), 4]], dtype=object)

Collectives™ on Stack Overflow

How to insert column of different type to numpy array?

2 Answers 2

Probably easiest - work with a Pandas `DataFrame` instead of an array

Numpy only solution - structured arrays

Structured array pitfalls

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Probably easiest - work with a Pandas DataFrame instead of an array

Numpy only solution - structured arrays

Structured array pitfalls

Comments

Comments

Linked

Related

Probably easiest - work with a Pandas `DataFrame` instead of an array