Printing random lines from a file in Python

Question

Is this the most efficient solution or can the performance be improved? Just opens a file and prints random words (lines with newlines stripped out).

import random

filename = "20k.txt"

def main(filename):
    words_file = open(filename, 'r')

    words = words_file.read().split("\n")
    temp_words = []
    for word in words:
        temp_words.append(word.strip("\r"))
    words = temp_words

    while True:
        length = int(raw_input("Words:"))
        for i in range(0,length):
            print(words[random.randrange(0,len(words)+1)]),
        print("\n")


if __name__ == "__main__":
    main(filename)

Graipher · Accepted Answer · 2016-10-04 16:30:58Z

3

First, you should always close your files. Or, use the with keyword, which takes care of this for you.

Then, the 'r' mode is already the default mode for opening a file. Also, consider storing your words in a set, instead of a list (gets rid of duplicate words, which might be wanted, or not if you want a higher probability for some words).

You should consider using random.choice to get a random element from an iterable (actually an iterable with len defined).

I would also separate the file reading into a separate method. and make it more explicit what the prompt actually means (the number of words to generate).

I would use str.join and give it a generator expression rather than multiple calls to print.

import random

def get_words(file_name):
    with open(filename) as word_file:
        return list(set(word.strip() for word in word_file))


def random_words(n, words)
    return " ".join(random.choice(words) for _ in range(n))


def main(file_name):
    words = get_words(file_name)
    while True:
        n = int(raw_input("Number of words:"))
        print(random_words(n, words))

if __name__ == "__main__":
    file_name = "20k.txt"
    main(file_name)

edited Oct 4, 2016 at 16:30

answered Oct 4, 2016 at 16:12

Graipher

41.3k7 gold badges70 silver badges133 bronze badges

\$\begingroup\$ Right, I forgot about the close. Using a set is a better option, yes. Didn't know about random.choice. Thanks a lot. \$\endgroup\$
– Samuel Shifterovich
Commented Oct 4, 2016 at 16:19
\$\begingroup\$ Your code ran: TypeError: 'set' object does not support indexing \$\endgroup\$
– Samuel Shifterovich
Commented Oct 4, 2016 at 16:23
\$\begingroup\$ What does .strip() do without any arguments? \$\endgroup\$
– Samuel Shifterovich
Commented Oct 4, 2016 at 16:27
\$\begingroup\$ @SamuelShifterovich It strips away whitespace from the edges, including all kinds of line-breaks. So " asgsg \n\r\n".strip() == 'asgsg'. \$\endgroup\$
– Graipher
Commented Oct 4, 2016 at 16:28
\$\begingroup\$ Your answer contains words_file = open(filename) and with open(filename) as word_file:. I guess the first line is a mistake? \$\endgroup\$
– Samuel Shifterovich
Commented Oct 4, 2016 at 16:30

| Show 2 more comments

Artem Gromov · Accepted Answer · 2016-10-05 12:55:37Z

1

You would greatly improve performance if you stop reading the whole 20k lines file into memory. The difference for 100 executions (3 words per execution) is 0.018 vs 0.793 seconds.

I suggest you to use word_file.seek method instead. It manipulates stream position to read just the word you need, not the whole file.

To get random position, use random.randint(0, EOF).

Details are here: https://github.com/artgromov/CorrectHorse.

I have made pull request to your repo.

edited Oct 5, 2016 at 12:55

answered Oct 5, 2016 at 11:53

Artem Gromov

114 bronze badges

\$\begingroup\$ I should've specified it's Python 2.7 \$\endgroup\$
– Samuel Shifterovich
Commented Oct 5, 2016 at 13:20

Add a comment |

Stack Exchange Network

Printing random lines from a file in Python

2 Answers 2

Hot Network Questions

Printing random lines from a file in Python

2 Answers 2

Related

Hot Network Questions