My code ported to that pattern isfollows. That version of the following:code also includes a proper handling of expression termination, which was missing in my initial code.
Good alternative: a class that implements the expected interface
OPERATORS = '+', '-', '*', '/'
class Tokenizer:
def __init__(self, expression):
self.token = None
self.char_consumed = True
self.state = Tokenizer._state_none
self.expression = iter(expression)
def __iter__(self):
return self
def _state_none(self, c):
if c.isdecimal():
self.token = c
self.state = Tokenizer._state_number
elif c in OPERATORS:
return 'operator', c
def _state_number(self, c):
if c.isdecimal():
self.token += c
else:
self.char_consumed = False
self.state = Tokenizer._state_none
return 'number', self.token
def _interpret_character(self, c):
self.char_consumed = True
return self.state(self, c)
def __next__(self):
for c in self.expression:
self.char_consumed = False
while not self.char_consumed:
token = self._interpret_character(c)
if token:
return token
token = self._interpret_character('') # termination event
if token:
return token
raise StopIteration
def main():
for x in Tokenizer('15+ 2 * 378 / 5'):
print(x)
# ('number', '15')
# ('operator', '+')
# ('number', '2')
# ('operator', '*')
# ('number', '378')
# ('operator', '/')
# ('number', '5')
if __name__ == '__main__':
main()
I have also made a few other minor improvements onThe main difference with 200_success' solution is that my previous solutions.
Other thoughts
My question was reallystates do not about finite state machine design, and I feel that I am somewhat punished for making the efforttake care of providing a realistic example insteadthe fetching of a dummy oneevents.
Since there has been some debate in The code driving the comments about good FSM design, I will give a few opinions:
- I like to keep my state functions clean. They should be a description of how the FSM reacts to events, not a handling of how the events come in. I.e. in this case, they should not have to take care of fetching the input characters. That also means that the code driving the FSM has to provide input that represents all possible events, including the end-of-expression event (i.e. a valid character terminator or a symbolic representation of the end). This is not a hack. This is a valid design decision.
- In fact, I don't even like the idea of having the state functions return anything, except possibly the new state, in some designs. Since the state machine in this case produces tokens, I made a compromise and let the state functions return tokens (or nothing) as proposed by 200_success, since it saves a flag and gives better looking code.
As far as namingFSM does that instead. This means that my __next__() method is concernedheavier, but my states are lighter. I also do not buy the argumentneed a specific terminating state. I made that the namedesign choice out of the class above should be totenizehabit (I like to keep my states as free from non-FSM logic as possible) just because instantiating, but it looks like a function call. That can be saidhas the additional benefit to provide cheaper scaling of all class instantiationsthe state machine.
The class above defines tokenizing objects that implementAlso, since the iterator interface. Tokenizersolution is a good nameregular class, I see no reason for it.
More about alternatives to 'nonlocal'
A nice and clear discussion can be found therenot using the traditional class naming conventions. To summarize:
- For a function that needs internal state and nested functions, with no return value (only side-effects), a class where
__init__()does the whole job can be used. Function naming conventions can be used for the class name. - If the function needs to return a value (not discussed at the link above), the answer will depend on the type of value. Especially, one should wonder whether that 'value' should provide an interface, like that of an iterator. In that case, a solution along the lines of the code in this answer could be appropriate.