Skip to main content
added 9 characters in body
Source Link
nilo
  • 805
  • 2
  • 7
  • 15

My code ported to that pattern isfollows. That version of the following:code also includes a proper handling of expression termination, which was missing in my initial code.

Good alternative: a class that implements the expected interface

OPERATORS = '+', '-', '*', '/'

class Tokenizer:
    def __init__(self, expression):
        self.token = None
        self.char_consumed = True
        self.state = Tokenizer._state_none
        self.expression = iter(expression)

    def __iter__(self):
        return self
        
    def _state_none(self, c):
        if c.isdecimal():
            self.token = c
            self.state = Tokenizer._state_number
        elif c in OPERATORS:
            return 'operator', c
    
    def _state_number(self, c):
        if c.isdecimal():
            self.token += c
        else:
            self.char_consumed = False
            self.state = Tokenizer._state_none
            return 'number', self.token

    def _interpret_character(self, c):
        self.char_consumed = True
        return self.state(self, c)
        
    def __next__(self):
        for c in self.expression:
            self.char_consumed = False
            while not self.char_consumed:
                token = self._interpret_character(c)
                if token:
                    return token
        token = self._interpret_character('') # termination event
        if token:
            return token
        raise StopIteration

def main():
    for x in Tokenizer('15+ 2 * 378 / 5'):
        print(x)

    # ('number', '15')
    # ('operator', '+')
    # ('number', '2')
    # ('operator', '*')
    # ('number', '378')
    # ('operator', '/')
    # ('number', '5')

if __name__ == '__main__':
    main()

I have also made a few other minor improvements onThe main difference with 200_success' solution is that my previous solutions.

Other thoughts

My question was reallystates do not about finite state machine design, and I feel that I am somewhat punished for making the efforttake care of providing a realistic example insteadthe fetching of a dummy oneevents.

Since there has been some debate in The code driving the comments about good FSM design, I will give a few opinions:

  • I like to keep my state functions clean. They should be a description of how the FSM reacts to events, not a handling of how the events come in. I.e. in this case, they should not have to take care of fetching the input characters. That also means that the code driving the FSM has to provide input that represents all possible events, including the end-of-expression event (i.e. a valid character terminator or a symbolic representation of the end). This is not a hack. This is a valid design decision.
  • In fact, I don't even like the idea of having the state functions return anything, except possibly the new state, in some designs. Since the state machine in this case produces tokens, I made a compromise and let the state functions return tokens (or nothing) as proposed by 200_success, since it saves a flag and gives better looking code.

As far as namingFSM does that instead. This means that my __next__() method is concernedheavier, but my states are lighter. I also do not buy the argumentneed a specific terminating state. I made that the namedesign choice out of the class above should be totenizehabit (I like to keep my states as free from non-FSM logic as possible) just because instantiating, but it looks like a function call. That can be saidhas the additional benefit to provide cheaper scaling of all class instantiationsthe state machine.

The class above defines tokenizing objects that implementAlso, since the iterator interface. Tokenizersolution is a good nameregular class, I see no reason for it.

More about alternatives to 'nonlocal'

A nice and clear discussion can be found therenot using the traditional class naming conventions. To summarize:

  • For a function that needs internal state and nested functions, with no return value (only side-effects), a class where __init__() does the whole job can be used. Function naming conventions can be used for the class name.
  • If the function needs to return a value (not discussed at the link above), the answer will depend on the type of value. Especially, one should wonder whether that 'value' should provide an interface, like that of an iterator. In that case, a solution along the lines of the code in this answer could be appropriate.

My code ported to that pattern is the following:

OPERATORS = '+', '-', '*', '/'

class Tokenizer:
    def __init__(self, expression):
        self.token = None
        self.char_consumed = True
        self.state = Tokenizer._state_none
        self.expression = iter(expression)

    def __iter__(self):
        return self
        
    def _state_none(self, c):
        if c.isdecimal():
            self.token = c
            self.state = Tokenizer._state_number
        elif c in OPERATORS:
            return 'operator', c
    
    def _state_number(self, c):
        if c.isdecimal():
            self.token += c
        else:
            self.char_consumed = False
            self.state = Tokenizer._state_none
            return 'number', self.token

    def _interpret_character(self, c):
        self.char_consumed = True
        return self.state(self, c)
        
    def __next__(self):
        for c in self.expression:
            self.char_consumed = False
            while not self.char_consumed:
                token = self._interpret_character(c)
                if token:
                    return token
        token = self._interpret_character('')
        if token:
            return token
        raise StopIteration

def main():
    for x in Tokenizer('15+ 2 * 378 / 5'):
        print(x)

    # ('number', '15')
    # ('operator', '+')
    # ('number', '2')
    # ('operator', '*')
    # ('number', '378')
    # ('operator', '/')
    # ('number', '5')

if __name__ == '__main__':
    main()

I have also made a few other minor improvements on my previous solutions.

Other thoughts

My question was really not about finite state machine design, and I feel that I am somewhat punished for making the effort of providing a realistic example instead of a dummy one.

Since there has been some debate in the comments about good FSM design, I will give a few opinions:

  • I like to keep my state functions clean. They should be a description of how the FSM reacts to events, not a handling of how the events come in. I.e. in this case, they should not have to take care of fetching the input characters. That also means that the code driving the FSM has to provide input that represents all possible events, including the end-of-expression event (i.e. a valid character terminator or a symbolic representation of the end). This is not a hack. This is a valid design decision.
  • In fact, I don't even like the idea of having the state functions return anything, except possibly the new state, in some designs. Since the state machine in this case produces tokens, I made a compromise and let the state functions return tokens (or nothing) as proposed by 200_success, since it saves a flag and gives better looking code.

As far as naming is concerned, I do not buy the argument that the name of the class above should be totenize() just because instantiating it looks like a function call. That can be said of all class instantiations.

The class above defines tokenizing objects that implement the iterator interface. Tokenizer is a good name for it.

More about alternatives to 'nonlocal'

A nice and clear discussion can be found there. To summarize:

  • For a function that needs internal state and nested functions, with no return value (only side-effects), a class where __init__() does the whole job can be used. Function naming conventions can be used for the class name.
  • If the function needs to return a value (not discussed at the link above), the answer will depend on the type of value. Especially, one should wonder whether that 'value' should provide an interface, like that of an iterator. In that case, a solution along the lines of the code in this answer could be appropriate.

My code ported to that pattern follows. That version of the code also includes a proper handling of expression termination, which was missing in my initial code.

Good alternative: a class that implements the expected interface

OPERATORS = '+', '-', '*', '/'

class Tokenizer:
    def __init__(self, expression):
        self.token = None
        self.char_consumed = True
        self.state = Tokenizer._state_none
        self.expression = iter(expression)

    def __iter__(self):
        return self
        
    def _state_none(self, c):
        if c.isdecimal():
            self.token = c
            self.state = Tokenizer._state_number
        elif c in OPERATORS:
            return 'operator', c
    
    def _state_number(self, c):
        if c.isdecimal():
            self.token += c
        else:
            self.char_consumed = False
            self.state = Tokenizer._state_none
            return 'number', self.token

    def _interpret_character(self, c):
        self.char_consumed = True
        return self.state(self, c)
        
    def __next__(self):
        for c in self.expression:
            self.char_consumed = False
            while not self.char_consumed:
                token = self._interpret_character(c)
                if token:
                    return token
        token = self._interpret_character('') # termination event
        if token:
            return token
        raise StopIteration

def main():
    for x in Tokenizer('15+ 2 * 378 / 5'):
        print(x)

    # ('number', '15')
    # ('operator', '+')
    # ('number', '2')
    # ('operator', '*')
    # ('number', '378')
    # ('operator', '/')
    # ('number', '5')

if __name__ == '__main__':
    main()

The main difference with 200_success' solution is that my states do not take care of the fetching of events. The code driving the FSM does that instead. This means that my __next__() method is heavier, but my states are lighter. I also do not need a specific terminating state. I made that design choice out of habit (I like to keep my states as free from non-FSM logic as possible), but it has the additional benefit to provide cheaper scaling of the state machine.

Also, since the solution is a regular class, I see no reason for not using the traditional class naming conventions.

added 9 characters in body
Source Link
nilo
  • 805
  • 2
  • 7
  • 15
OPERATORS = '+', '-', '*', '/'

class Tokenizer:
    def __init__(self, expression):
        self.token = None
        self.char_consumed = True
        self.state = Tokenizer._state_none
        self.expression = iter(expression)

    def __iter__(self):
        return self
        
    def _state_none(self, c):
        if c.isdecimal():
            self.token = c
            self.state = Tokenizer._state_number
        elif c in OPERATORS:
            return 'operator', c
    
    def _state_number(self, c):
        if c.isdecimal():
            self.token += c
        else:
            self.char_consumed = False
            self.state = Tokenizer._state_none
            return 'number', self.token

    def _interpret_character(self, c):
        self.char_consumed = True
        return self.state(self, c)
        
    def __next__(self):
        for c in self.expression:
            self.char_consumed = False
            while not self.char_consumed:
                token = self._interpret_character(c)
                if token:
                    return token
        token = self._interpret_character('')
        if token:
            return token
        raise StopIteration

def main():
    for x in Tokenizer('15+ 2 * 378 / 5'):
        print(x)

    # ('number', '15')
    # ('operator', '+')
    # ('number', '2')
    # ('operator', '*')
    # ('number', '378')
    # ('operator', '/')
    # ('number', '5')

if __name__ == "__main__"'__main__':
    main()
OPERATORS = '+', '-', '*', '/'

class Tokenizer:
    def __init__(self, expression):
        self.token = None
        self.char_consumed = True
        self.state = Tokenizer._state_none
        self.expression = iter(expression)

    def __iter__(self):
        return self
        
    def _state_none(self, c):
        if c.isdecimal():
            self.token = c
            self.state = Tokenizer._state_number
        elif c in OPERATORS:
            return 'operator', c
    
    def _state_number(self, c):
        if c.isdecimal():
            self.token += c
        else:
            self.char_consumed = False
            self.state = Tokenizer._state_none
            return 'number', self.token

    def _interpret_character(self, c):
        self.char_consumed = True
        return self.state(self, c)
        
    def __next__(self):
        for c in self.expression:
            self.char_consumed = False
            while not self.char_consumed:
                token = self._interpret_character(c)
                if token:
                    return token
        token = self._interpret_character('')
        if token:
            return token
        raise StopIteration

def main():
    for x in Tokenizer('15+ 2 * 378 / 5'):
        print(x)

    # ('number', '15')
    # ('operator', '+')
    # ('number', '2')
    # ('operator', '*')
    # ('number', '378')
    # ('operator', '/')
    # ('number', '5')

if __name__ == "__main__":
    main()
OPERATORS = '+', '-', '*', '/'

class Tokenizer:
    def __init__(self, expression):
        self.token = None
        self.char_consumed = True
        self.state = Tokenizer._state_none
        self.expression = iter(expression)

    def __iter__(self):
        return self
        
    def _state_none(self, c):
        if c.isdecimal():
            self.token = c
            self.state = Tokenizer._state_number
        elif c in OPERATORS:
            return 'operator', c
    
    def _state_number(self, c):
        if c.isdecimal():
            self.token += c
        else:
            self.char_consumed = False
            self.state = Tokenizer._state_none
            return 'number', self.token

    def _interpret_character(self, c):
        self.char_consumed = True
        return self.state(self, c)
        
    def __next__(self):
        for c in self.expression:
            self.char_consumed = False
            while not self.char_consumed:
                token = self._interpret_character(c)
                if token:
                    return token
        token = self._interpret_character('')
        if token:
            return token
        raise StopIteration

def main():
    for x in Tokenizer('15+ 2 * 378 / 5'):
        print(x)

    # ('number', '15')
    # ('operator', '+')
    # ('number', '2')
    # ('operator', '*')
    # ('number', '378')
    # ('operator', '/')
    # ('number', '5')

if __name__ == '__main__':
    main()
deleted 154 characters in body; deleted 302 characters in body
Source Link
nilo
  • 805
  • 2
  • 7
  • 15

200_success has a very good point in hisTo summarize 200_success' answer: if I want an iterator object with an internal state, I should create a class that implementsimplement the iterator interface in a class, instead of blindly trying to use yield (I guess my excuse is that I discovered yield in Python justreturn a few days ago)generator.

I have also made a few other minor improvements on my previous solutions.

Since a class ends up being a perfect solution for this case, it is not a good example to discuss alternatives to nonlocal for local variables in a function with nested functions, which was the whole purpose of this question :-(. I guess I will have to find a better example and post a new question.

200_success has a very good point in his answer: if I want an iterator object with an internal state, I should create a class that implements the iterator interface, instead of blindly trying to use yield (I guess my excuse is that I discovered yield in Python just a few days ago).

I have also made a few other minor improvements on my previous solutions.

Since a class ends up being a perfect solution for this case, it is not a good example to discuss alternatives to nonlocal for local variables in a function with nested functions, which was the whole purpose of this question :-(. I guess I will have to find a better example and post a new question.

To summarize 200_success' answer, I should implement the iterator interface in a class, instead of trying to return a generator.

I have also made a few other minor improvements on my previous solutions.

added 781 characters in body
Source Link
nilo
  • 805
  • 2
  • 7
  • 15
Loading
deleted 199 characters in body
Source Link
nilo
  • 805
  • 2
  • 7
  • 15
Loading
added 199 characters in body
Source Link
nilo
  • 805
  • 2
  • 7
  • 15
Loading
deleted 29 characters in body
Source Link
nilo
  • 805
  • 2
  • 7
  • 15
Loading
added 1171 characters in body
Source Link
nilo
  • 805
  • 2
  • 7
  • 15
Loading
added 1171 characters in body
Source Link
nilo
  • 805
  • 2
  • 7
  • 15
Loading
Source Link
nilo
  • 805
  • 2
  • 7
  • 15
Loading