2

I have a .dat file which contains about 1000 lines each line contains

letter int int int int boolean and i am trying to read it in so each line is a new row on my array. currently I have np.genfromtxt('myfile.dat') which gives me

nan 23. 34. 23. 55. 1.

this is almost right but that nan should be the letter 't' any idea how I get to read in the correct letter? And also how do I get rid of the . after each number? cheers

1 Answer 1

2

One way is defining a new dtype. For example:

import numpy as np

desc = np.dtype([('letter', 'S1'), ('v1', float), ('v2', float),
                 ('v3', float), ('v4', float)])

and use in genfromtext:

data = np.genfromtxt(fobj, dtype=desc)

This file content:

x 23. 34. 23. 55. 1.
y 23. 34. 23. 55. 1.

would give you this data:

array([(b'x', 23.0, 34.0, 23.0, 55.0), (b'y', 23.0, 34.0, 23.0, 55.0)], 
      dtype=[('letter', 'S1'), ('v1', '<f8'), ('v2', '<f8'), ('v3', '<f8'), ('v4', '<f8')])

This is an record array. You can access one line:

>>> data[0]
(b'x', 23.0, 34.0, 23.0, 55.0)

or one column:

>>> data['letter']
    array([b'x', b'y'], 
          dtype='|S1')

or one entry:

>>> data[0][1]
23.0
>>> data['v1'][1]
23.0
Sign up to request clarification or add additional context in comments.

7 Comments

This is working, but the file contains 2000 lines and when you get the shape I get (2000l,) this doesn't seem right isn't there might to be another number in there? cheers
shape (2001,) or (20001,) ?
Try dtype=None. Either way the result will be a 1d structured array, one record per line.
Mike it gives (2000L), as in the letter, not sure what that means. dtype=None doesn't work.
2000L is a long integer. Northing to worried about. Looks you are on Windows using Python 2.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.