Skip to main content
edited body
Source Link
Graipher
  • 41.7k
  • 7
  • 70
  • 134
with open(sys.argv[1],'r') as f:
    for line ifin f:
        for token in tokenize(line):
            print(token)
with open(sys.argv[1],'r') as f:
    for line if f:
        for token in tokenize(line):
            print(token)
with open(sys.argv[1],'r') as f:
    for line in f:
        for token in tokenize(line):
            print(token)
added 89 characters in body
Source Link
Graipher
  • 41.7k
  • 7
  • 70
  • 134

Note that I renamed the file variable to f, to avoid shadowing the built-in file.

It has two important mehtodsmethods, first the __iter__ method, which just returns self. This just tells Python that this class is the actual iterator which it can iterate over. It is important for if you nest iter calls, namely iter(iter(tokenizer)) == iter(tokenizer).

Note that since we call iter on the output of the tokenize function, it is enough for tokenize to return an iterable (canthis can be a list, like I defined here here, but it can also be an iterator itself).

It has two important mehtods, first the __iter__ method, which just returns self. This just tells Python that this class is the actual iterator which it can iterate over. It is important for if you nest iter calls, namely iter(iter(tokenizer)) == iter(tokenizer).

Note that since we call iter on the output of the tokenize function, it is enough for tokenize to return an iterable (can be a list, like I defined here, but can also be an iterator itself)

Note that I renamed the file variable to f, to avoid shadowing the built-in file.

It has two important methods, first the __iter__ method, which just returns self. This just tells Python that this class is the actual iterator which it can iterate over. It is important for if you nest iter calls, namely iter(iter(tokenizer)) == iter(tokenizer).

Note that since we call iter on the output of the tokenize function, it is enough for tokenize to return an iterable (this can be a list, like here, but it can also be an iterator itself).

added 247 characters in body
Source Link
Graipher
  • 41.7k
  • 7
  • 70
  • 134
def tokenize(line):
    return line.split()

class Tokenizer:
    def __init__(self, f, tokenize):
        self.f_it = iter(f)
        self.tokenize = tokenize
        self.token_it = None
    ...

Or, using inheritance:

class Tokenizer:
    ...
    def tokenize(self, line):
        raise NotImplementedError


class SplitTokenizer(Tokenizer):
    def tokenize(self, line):
        return line.split()
def tokenize(line):
    return line.split()

class Tokenizer:
    def __init__(self, f, tokenize):
        self.f_it = iter(f)
        self.tokenize = tokenize
        self.token_it = None
def tokenize(line):
    return line.split()

class Tokenizer:
    def __init__(self, f, tokenize):
        self.f_it = iter(f)
        self.tokenize = tokenize
        self.token_it = None
    ...

Or, using inheritance:

class Tokenizer:
    ...
    def tokenize(self, line):
        raise NotImplementedError


class SplitTokenizer(Tokenizer):
    def tokenize(self, line):
        return line.split()
added 538 characters in body
Source Link
Graipher
  • 41.7k
  • 7
  • 70
  • 134
Loading
added 1794 characters in body
Source Link
Graipher
  • 41.7k
  • 7
  • 70
  • 134
Loading
Source Link
Graipher
  • 41.7k
  • 7
  • 70
  • 134
Loading