2
\$\begingroup\$

Is this the most efficient solution or can the performance be improved? Just opens a file and prints random words (lines with newlines stripped out).

import random

filename = "20k.txt"

def main(filename):
    words_file = open(filename, 'r')

    words = words_file.read().split("\n")
    temp_words = []
    for word in words:
        temp_words.append(word.strip("\r"))
    words = temp_words

    while True:
        length = int(raw_input("Words:"))
        for i in range(0,length):
            print(words[random.randrange(0,len(words)+1)]),
        print("\n")


if __name__ == "__main__":
    main(filename)
\$\endgroup\$

2 Answers 2

3
\$\begingroup\$

First, you should always close your files. Or, use the with keyword, which takes care of this for you.

Then, the 'r' mode is already the default mode for opening a file. Also, consider storing your words in a set, instead of a list (gets rid of duplicate words, which might be wanted, or not if you want a higher probability for some words).

You should consider using random.choice to get a random element from an iterable (actually an iterable with len defined).

I would also separate the file reading into a separate method. and make it more explicit what the prompt actually means (the number of words to generate).

I would use str.join and give it a generator expression rather than multiple calls to print.

import random

def get_words(file_name):
    with open(filename) as word_file:
        return list(set(word.strip() for word in word_file))


def random_words(n, words)
    return " ".join(random.choice(words) for _ in range(n))


def main(file_name):
    words = get_words(file_name)
    while True:
        n = int(raw_input("Number of words:"))
        print(random_words(n, words))

if __name__ == "__main__":
    file_name = "20k.txt"
    main(file_name)
\$\endgroup\$
7
  • \$\begingroup\$ Right, I forgot about the close. Using a set is a better option, yes. Didn't know about random.choice. Thanks a lot. \$\endgroup\$ Commented Oct 4, 2016 at 16:19
  • \$\begingroup\$ Your code ran: TypeError: 'set' object does not support indexing \$\endgroup\$ Commented Oct 4, 2016 at 16:23
  • \$\begingroup\$ What does .strip() do without any arguments? \$\endgroup\$ Commented Oct 4, 2016 at 16:27
  • \$\begingroup\$ @SamuelShifterovich It strips away whitespace from the edges, including all kinds of line-breaks. So " asgsg \n\r\n".strip() == 'asgsg'. \$\endgroup\$
    – Graipher
    Commented Oct 4, 2016 at 16:28
  • \$\begingroup\$ Your answer contains words_file = open(filename) and with open(filename) as word_file:. I guess the first line is a mistake? \$\endgroup\$ Commented Oct 4, 2016 at 16:30
1
\$\begingroup\$

You would greatly improve performance if you stop reading the whole 20k lines file into memory. The difference for 100 executions (3 words per execution) is 0.018 vs 0.793 seconds.

I suggest you to use word_file.seek method instead. It manipulates stream position to read just the word you need, not the whole file.

To get random position, use random.randint(0, EOF).

Details are here: https://github.com/artgromov/CorrectHorse.

I have made pull request to your repo.

\$\endgroup\$
1
  • \$\begingroup\$ I should've specified it's Python 2.7 \$\endgroup\$ Commented Oct 5, 2016 at 13:20

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.