13

I have a numpy array of size 31x36 and i want to transform into pandas dataframe in order to process it. I am trying to convert it using the following code:

pd.DataFrame(data=matrix,
          index=np.array(range(1, 31)),
          columns=np.array(range(1, 36)))

However, I am receiving the following error:

ValueError: Shape of passed values is (36, 31), indices imply (35, 30)

How can I solve the issue and transform it properly?

1
  • pd.DataFrame(matrix) would work, no? To use custom index, this is another option: pd.DataFrame(arr).rename(index=lambda x: x + 1, columns=lambda x: x + 1)
    – cs95
    Commented Jun 7, 2019 at 15:34

3 Answers 3

8

As to why what you tried failed, the ranges are off by 1

pd.DataFrame(data=matrix,
          index=np.array(range(1, 32)),
          columns=np.array(range(1, 37)))

As the last value isn't included in the range

Actually looking at what you're doing you could've just done:

pd.DataFrame(data=matrix,
          index=np.arange(1, 32)),
          columns=np.arange(1, 37)))

Or in pure pandas:

pd.DataFrame(data=matrix,
          index=pd.RangeIndex(range(1, 32)),
          columns=pd.RangeIndex(range(1, 37)))

Also if you don't specify the index and column params, an auto-generated index and columns is made, which will start from 0. Unclear why you need them to start from 1

You could also have not passed the index and column params and just modified them after construction:

In[9]:
df = pd.DataFrame(adaption)
df.columns = df.columns+1
df.index = df.index + 1
df

Out[9]: 
          1         2         3         4         5         6
1 -2.219072 -1.637188  0.497752 -1.486244  1.702908  0.331697
2 -0.586996  0.040052  1.021568  0.783492 -1.263685 -0.192921
3 -0.605922  0.856685 -0.592779 -0.584826  1.196066  0.724332
4 -0.226160 -0.734373 -0.849138  0.776883 -0.160852  0.403073
5 -0.081573 -1.805827 -0.755215 -0.324553 -0.150827 -0.102148
5

You meet an error because the end argument in range(start, end) is non-inclusive. You have a couple of options to account for this:

Don't pass index and columns

Just use df = pd.DataFrame(matrix). The pd.DataFrame constructor adds integer indices implicitly.

Pass in the shape of the array

matrix.shape gives a tuple of row and column count, so you need not specify them manually. For example:

df = pd.DataFrame(matrix, index=range(matrix.shape[0]),
                          columns=range(matrix.shape[1]))

If you need to start at 1, remember to add 1:

df = pd.DataFrame(matrix, index=range(1, matrix.shape[0] + 1),
                          columns=range(1, matrix.shape[1] + 1))
1
  • Downvoter care to comment? Using matrix properties directly is the natural solution here.
    – jpp
    Commented Jun 3, 2019 at 21:21
1

In addition to the above answer,range(1, X) describes the set of numbers from 1 up to X-1 inclusive. You need to use range(1, 32) and range(1, 37) to do what you describe.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.