0

I have the following lines of code:

import numpy as np
data = np.genfromtxt(path...,delimiter=',',dtype=None)

This returns an array of lists. Is there an easy way to get a matrix from a csv so I can use operations like data[:,:3] to get the first 3 columns of the matrix?

I've tried (data[1])[:3] to get the first 3 entries of the 2nd row but I get the following error:

invalid index

I'm really confused because if I just copy the 2nd row from the file and then do (copiedata)[:3] things work.

So my question is two part:

  1. Can I import a csv (with strings and numbers) as an array of arrays?
  2. Why does (data[1])[:3] return invalid index
1
  • I believe that data.shape returns the dimensions of a nested array. So here it would return (number of rows, ) because the entries of this array are lists not arrays. Now I wrote (stuff[1])[:3] with the belief that something[index] is overloaded. Inside the parentheses [1] acts on an array and outside [:3] acts on a list. (I deliberately did this since I don't think [1,:3] acts on a meaningful object (due to the way shape behaves). Commented Oct 12, 2013 at 1:31

2 Answers 2

1

So this is an answer to 1

import numpy as np
data = np.genfromtxt(path...,delimiter=',',dtype=None)

data = []  
with open(path,'rb') as file:
    reader = csv.reader(file)
    for row in reader:
        data[len(data):] = [row]

data = np.array(data)

but I'm still vexed about part 2

Sign up to request clarification or add additional context in comments.

2 Comments

What is the shape of data?
with the code in the original post it is (4177,) and in my response is (4177,8)
0

What does your data look like? Maybe you should use np.loadtxt().
Csv file:

0,1,2,3,4,5,6,7,8,9
10,11,12,13,14,15,16,17,18,19
20,21,22,23,24,25,26,27,28,29
30,31,32,33,34,35,36,37,38,39
40,41,42,43,44,45,46,47,48,49
50,51,52,53,54,55,56,57,58,59
60,61,62,63,64,65,66,67,68,69
70,71,72,73,74,75,76,77,78,79
80,81,82,83,84,85,86,87,88,89
90,91,92,93,94,95,96,97,98,99

Load into an array and index into it:

>>> a = np.loadtxt('data.csv', delimiter = ',')
>>> a
array([[  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.],
       [ 10.,  11.,  12.,  13.,  14.,  15.,  16.,  17.,  18.,  19.],
       [ 20.,  21.,  22.,  23.,  24.,  25.,  26.,  27.,  28.,  29.],
       [ 30.,  31.,  32.,  33.,  34.,  35.,  36.,  37.,  38.,  39.],
       [ 40.,  41.,  42.,  43.,  44.,  45.,  46.,  47.,  48.,  49.],
       [ 50.,  51.,  52.,  53.,  54.,  55.,  56.,  57.,  58.,  59.],
       [ 60.,  61.,  62.,  63.,  64.,  65.,  66.,  67.,  68.,  69.],
       [ 70.,  71.,  72.,  73.,  74.,  75.,  76.,  77.,  78.,  79.],
       [ 80.,  81.,  82.,  83.,  84.,  85.,  86.,  87.,  88.,  89.],
       [ 90.,  91.,  92.,  93.,  94.,  95.,  96.,  97.,  98.,  99.]])
>>> a[1]
array([ 10.,  11.,  12.,  13.,  14.,  15.,  16.,  17.,  18.,  19.])
>>> a[1][:3]
array([ 10.,  11.,  12.])
>>> a[1,:3]
array([ 10.,  11.,  12.])
>>> 

1 Comment

There's text and strings in the matrix and loadtxt confuses the types. I guess I should have mentioned that