1

let's say I have a file that looks like this

text a

bla bla

1 2 3   
4 5 6

text b

bla

7 8 9
10 11 12

text c

bla bla bla

13 14 15
16 17 18

I am trying to extract only the number arrays and place them into a numpy array:

array([[ 1, 2, 3,
         4, 5, 6,],
       [ 7, 8, 9,
         10, 11, 12],
       [ 13, 14, 15,
         16, 17, 18]])

I tried using np.genfromtxt('test.txt',usecols=[0,1,2],invalid_raise=False)

array([[  1.,   2.,   3.],
       [  4.,   5.,   6.],
       [  7.,   8.,   9.],
       [ 10.,  11.,  12.],
       [ nan,  nan,  nan],
       [ 13.,  14.,  15.],
       [ 16.,  17.,  18.]])

but it doesn't create sub-arrays and converts the text into nans. Is there a better way of doing this?

6
  • Why is the 1 in the first line not included? Commented Apr 27, 2018 at 21:52
  • @chrisz: Because it's part of the text "text 1". I'm just interested in the number arrays after the "bla" Commented Apr 27, 2018 at 22:00
  • Read the lines as ordinary text, and pass the array lines to genfromtxt Commented Apr 27, 2018 at 22:17
  • or filter the bad rows out after parsing and reshape the rest. Commented Apr 27, 2018 at 22:54
  • @hpaulj: Thanks for your comment. Reading the text file with np.loadtxt raises an exception. Are you suggesting reading the file outside numpy? I don't understand the filtering out after parsing bit, could you maybe provide an answer to this post? Commented Apr 27, 2018 at 22:59

2 Answers 2

1

You could use itertools.groupby along the lines of

>>> import itertools
>>> import numpy as np
>>> 
>>> content = """text a
... 
... bla bla
... 
... 1 2 3   
... 4 5 6
... 
... text b
... 
... bla
... 
... 7 8 9
... 10 11 12
... 
... text c
... 
... bla bla bla
... 
... 13 14 15
... 16 17 18"""
>>> 
>>> import io
>>> filelike = io.StringIO(content)

# you may want to refine this test
>>> allowed_characters = set('0123456789 ')
>>> def isnumeric(line):
...     return set() < set(line.strip()) <= allowed_characters
... 
>>> [np.genfromtxt(gr) for k, gr in itertools.groupby(filelike, isnumeric) if k]
[array([[1., 2., 3.],
       [4., 5., 6.]]), array([[ 7.,  8.,  9.],
       [10., 11., 12.]]), array([[13., 14., 15.],
       [16., 17., 18.]])]
Sign up to request clarification or add additional context in comments.

Comments

0

You'll likely have to resort to a bit of "manual" parsing. Assuming a form like given here's one solution (there are surely others):

import numpy as np

def parser(fname):
    with open(fname) as fh:
        for i, line in enumerate(fh):
            p = i % 7
            if p not in (5, 6):
                continue
            yield line.rstrip()

a = ' '.join(parser(filename))
arr = np.fromstring(a, dtype=int, sep=' ')
arr = arr.reshape((-1, 6))
print(arr)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.