For Loop over a list in Python

Question

I have a train_file.txt which has 3 columns on each row.

For example;

I am reading this txt file with

train_data = open("train_file.txt", 'r').readlines()

Then I am trying to get each value with for loop

for eachline in train_data:
    uid, lid, x = eachline.strip().split()

Question: Train data is a huge file that's why I want to just get the first 1000 rows.

I was trying to execute the following code but I am getting an error ('list' object cannot be interpreted as an integer)

for eachline in range(train_data,1000)
        uid, lid, x = eachline.strip().split()

user2390182 · Accepted Answer · 2020-12-23 10:44:40Z

6

It is not necessary to read the entire file at all. You could use enumerate on the file directly and break early or use itertools.islice:

from itertools import islice

train_data = list(islice(open("train_file.txt", 'r'), 1000))

You can also keep using the same file handle to read more data later:

f = open("train_file.txt", 'r')
train_data = list(islice(f, 1000)) # reads first 1000
test_data = list(islice(f, 100))   # reads next 100

answered Dec 23, 2020 at 10:44

user2390182

73.7k6 gold badges71 silver badges95 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

U13-Forward · Accepted Answer · 2020-12-23 10:41:30Z

2

Maybe try changing this line:

train_data = open("train_file.txt", 'r').readlines()

To:

train_data = open("train_file.txt", 'r').readlines()[:1000]

answered Dec 23, 2020 at 10:41

U13-Forward

71.8k15 gold badges100 silver badges125 bronze badges

Comments

buran · Accepted Answer · 2020-12-23 10:58:13Z

2

train_data is a list, use slicing: for eachline in train_data[:1000]:

As the file is "huge" in your words a better approach is to read just first 1000 rows (readlines() will read the whole file in memory)

with open("train_file.txt", 'r'):
    train_data = []
    for idx, line in enumerate(f, start=1):
        train_data.append(line.strip.split())
        if idx == 1000:
            break

Note that data will be str, not int. You probably want to convert them to int.

edited Dec 23, 2020 at 10:58

answered Dec 23, 2020 at 10:42

buran

14.4k13 gold badges45 silver badges76 bronze badges

Comments

Mathieu · Accepted Answer · 2020-12-23 10:41:14Z

1

You could use enumerate and a break:

for k, line in enumerate(lines):
    if k > 1000: 
        break # exit the loop

    # do stuff on the line

answered Dec 23, 2020 at 10:41

Mathieu

5,8367 gold badges34 silver badges66 bronze badges

Comments

Hugh · Accepted Answer · 2020-12-29 10:36:28Z

I would recommend using the csv built in library since the data is csv-like (or the pandas one if you're using it), and using with. So something like this:

import csv
from itertools import islice

with open('./test.csv', 'r') as input_file:
  csv_reader = csv.reader(input_file, delimiter=' ')
  rows = list(islice(csv_reader, 1000))

# Use rows
print(rows)

You don't need it right now but it will make escaped characters or multiline entries way easier to parse. Also, if there are headers you can use csv.DictReader to include them.

Regarding your original code:

The call the readlines() will read all lines at that point so doing any filtering after won't make a difference.
If you did read it that way, to get the first 1000 lines your for loop should be:

for eachline in traindata[:1000]:
  ...

Collectives™ on Stack Overflow

For Loop over a list in Python

5 Answers 5

Comments

Comments

Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Comments

Comments

Comments

Comments

Related