5

I am trying to partition some code that scans input in a nontrivial way, from the code that consumes pieces of that input, but also want to support generation/iteration, and I wonder what is the most pythonic way to do this.

The easy way to get what I want is not to support iteration, and split responsibility into a Scanner and a ScanHandler:

import abc

class ScanHandler(abc.ABC):
    @abc.abstractmethod
    def onFoo(foo):
        """ handle a foo """
        pass
 
    @abc.abstractmethod
    def onBar(bar):
        """ handle a bar """
        pass

    @abc.abstractmethod
    def onBaz(baz):
        """ handle a baz """
        pass

class Scanner:
    ...
    def scan(someInput, handler : ScanHandler):
        for chunk in someInput:
            ... complicated logic ...
            if ... something ...:
                foo = ...
                handler.onFoo(foo)
            if ... something else ...:
                bar = ...
                handler.onBar(bar)
            if ... something else ...:
                baz = ...
                handler.onBaz(baz)

Then the handler object can do whatever it wants with the foo/bar/baz bits that are scanned.

But in some cases all I want to do is handle foo pieces, and I'd like to just do something like:

for foo in scanner.someMethod(someInput):
    ... handle each foo ...

and I can't figure out how to do this unless I change the Scanner architecture to

class Scanner:
    ...
    def scan(someInput):
        for chunk in someInput:
            ... complicated logic ...
            if ... something ...:
                foo = ...
                yield foo
            if ... something else ...:
                bar = ...
                yield bar
            if ... something else ...:
                baz = ...
                yield baz

and then if I want to just handle the foo objects, I have to do something like:

def filterOnlyFoo(things):
    for thing in things:
        if [thing is a foo]:      # annoying extra work
            yield thing

for foo in filterOnlyFoo(scanner.scan(someInput)):
    doSomethingWithFoo

Is there a more straightforward design that would make sense to use here?

3
  • Are you aware of the built-in filter() function? You don't even need to import it. Commented Jan 28 at 17:58
  • 1
    If the 'annoying extra work' is determining if something is a foo, either define and generate Foo objects and check with isinstance(object, Foo) or, if you prefer you can do something like create an Enum of foo, bar, baz and set a field on the emitted objects with the appropriate one. Commented Jan 28 at 18:04
  • By "annoying extra work", do you refer to the verbosity of the if statement (or even the need to define filterOnlyFoo); or do you refer to the work happening in the complicated logic of if ... something ...: bar = ... that you would like to optimise away when you know beforehand that you will only handle foos? Commented Jan 30 at 4:20

6 Answers 6

4

Your second approach fails when foo, bar and baz are not of three different types, so it is actually a bad idea. Heck, in a dynamically typed language like Python, it is not even guaranted that each foo object will have the same type.

What you can do without changing the initial scan method: you can implement a class FooCollector derived from ScanHandler, where onFoo just collects all passed foo objects in a list, and onBar as well as onBaz stay empty. Then call scan(someInput, fooCollector) and ask fooCollector afterwards for the list of foo objects it got. This is dead simple and does not require any type checks.

You could then implement scanner.getAllFoos this way:

class Scanner:
    ...
    def getAllFoos(someInput):
         fooHandler = FooHandler()
         scan(someInput,fooHandler)
         return fooHandler.listOfCollectedFoos

As a variant of this, you may implement a FooHandler in a similar manner, which does not just fill a list, but does the foo processing directly in-place.

Where I would also take a deeper look into: the section

... complicated logic ... 

should be refactored to a function which returns a result which makes he following conditionals if ... something ...: simple. This gives you the option to provide an extra function scanForFoos sharing the same logic as scan without duplicating it. Something along the lines of

def scan(someInput, handler : ScanHandler):
    for chunk in someInput:
        result=callToComplicatedLogic(...)
        if result.conditionForFoo():
            foo = result.getFoo()
            handler.onFoo(foo)
        if result.conditionForBar():
            bar = result.getBar()
            handler.onBar(bar)
        if result.conditionForBaz():
            baz = = result.getBaz()
            handler.onBaz(baz)

def scanForFoos(someInput):
    for chunk in someInput:
        result = callToComplicatedLogic(...)
        if result.conditionForFoo():
            foo = result.getFoo()
            yield foo

This keeps the complicated logic in one place, hence the code DRY, with a little bit more boilerplate, of course. This approach has the advantage that it allows the caller to iterate over a subset of foo object and stop the iteration, lets say, after the first 10 objects, without making the scanner going through the full input data.

3
  • "As a variant of this, you may implement a FooHandler in a similar manner, which does not just fill a list, but does the foo processing directly in-place." -- that's the part that I'm trying to figure out how to do --- should onFoo()/onBar()/onBaz() be generators and if so, how do I do that? I'm hitting a mental block, and because there may be cases where there are hundreds of items (such as tag elements in HTML, although in my case there are only 3 or 4 different types of items, and without that level of hierarchy) I'd prefer not to collect the entire list first and then process. Commented Jan 28 at 18:13
  • 1
    @JasonS You could add an optional filter parameter to scanner or just a list of types you want emitted. Commented Jan 28 at 18:23
  • @JasonS: my second suggestion gives you a generator solution. What exactly are you still missing ? Commented Jan 28 at 21:31
2

Based on your comment on Doc Brown's answer, I think you are looking for something like this:

class Scanner:
    ...
    def scan(someInput, handler : ScanHandler, emit = ('foo', 'bar', 'baz')):
        for chunk in someInput:
            ... complicated logic ...
            if ... something ... and 'foo' in emit:
                foo = ...
                handler.onFoo(foo)
            if ... something else ... and 'bar' in emit:
                bar = ...
                handler.onBar(bar)
            if ... something else ... and 'baz' in emit:
                baz = ...
                handler.onBaz(baz)

If you exclude an emit parameter, it will work as it does now. If you want to just get 'foo' output, you can call like this:

scan(input, scanner, ['foo'])

There are many levels of clever fanciness that you can add to this basic idea. Another simple one would be to do this instead:

def scan(someInput, handler : ScanHandler, emit = lambda x: True):
   for chunk in someInput:
       ... complicated logic ...
       if ... something ... and emit('foo'):
          foo = ...
          handler.onFoo(foo)

Then the foo-only call becomes:

scan(input, scanner, lambda x: x == 'foo')

The advantage here being that your function allows for arbitrary logic. You can add more parameters to add more capabilities.

You might also want to consider adding the emit list (or function) to the Scanner object. This might be useful if the decision of what to emit does not change during the scanners lifetime. There's also nothing preventing you from doing both. That is, setting it on the object and allowing it on the method call. You just need to decide what you want to do if both are set.

2
  • True, somehow I'm still missing the yield solution that smells right to me for a Python-style inversion of control / separation of the thing producing the items to be iterated over, vs. the thing processing the yielded items. Commented Jan 28 at 20:12
  • 2
    @JasonS You can replace the handler calls with yields in the above. It's totally normal valid to emit everything and then filter what you don't want. If the cost of that is low, I would do that. But if you want to avoid the cost of creating things in the scanner that you don't need, that logic needs to happen in the scanner. Passing in a filtering function (or object) is a kind of IoC that is pretty standard in Python. Commented Jan 28 at 20:26
1

Filter

Nothing is faster, than doing nothing.

Andrei Alexandrescu

In general, constructing an object just to discard it is pretty wasteful work. If the construction is fast, and free of side-effects, then you may just go ahead and do it... but do consider filtering.

Filtering is relatively easy, provided you have a fixed list of item kinds/tags you care about:

  1. Define an enum, where each element is but an integer.
  2. Pass a set of integers -- possibly a bit-set -- to the scan method.

Iteration

The idiomatic way to create an external iterator in Python is to use yield, as you mentioned.

And it can easily be augmented with filtering capabilities:

class Scanner:
    def scan(someInput, filter):
        for chunk in someInput:
            ... complicated logic ...

            if ... something ...:
                if not Tag.FOO in filter:
                    continue

                foo = ...
                yield foo

            if ... something else ...:
                if not Tag.BAR in filter:
                    continue

                bar = ...
                yield bar

            if ... something else ...:
                if not Tag.BAZ in filter:
                    continue

                baz = ...
                yield baz

However, do note that you are losing some information on the client side. When calling a handler method, the method name that is called (onFoo, onBar) is telling the client what it is dealing with, whereas just yielding the item may require the client to "reconstruct" it.

Since Python is dynamic, divining what the kind/tag of the object is may be complicated, and brittle. For a reliable solution, you can:

  1. Create a base class with a tag method, which returns the Tag, allowing the client to switch on it.
  2. Use a Sum Type to wrap the item.

The former works in any version of Python.

Note: you can also use generator.send(filter) and filter = yield x to allow the user to switch the filter every time they get an item.

1

Right now, your scanner is mixing three concerns:

  1. Iterating over your chunks
  2. Parsing the input based on several types
  3. Yielding / dispatching input based on several types.

The scanner can be made in an open-closed way, where these concerns are separated:
(note that I am mostly comfortable with C#, but will attempt to produce python)

class Parser():
    @abstractmethod
    def try_parse(self, chunk):
        """Return an object or None if this parser doesn't apply."""
        pass
class FooParser(Parser):
    def try_parse(self, chunk):
        if some_complicated_check:
            return Foo(...)
        return None

class BarParser(Parser):
    def try_parse(self, chunk):
        if other_logic:
            do_processing(chunk)
            return Bar(...)
        return None

Your scanner will become:

class Scanner:
    def __init__(self, parsers):
        self.parsers = parsers

    def scan(self, some_input):
        for chunk in some_input:
            for parser in self.parsers:
                item = parser.try_parse(chunk)
                if item is not None:
                    yield item

You can use the scanner with any parse strategy as you like:

scanner = Scanner([FooParser(), BarParser(), BazParser()])

for item in scanner.scan(some_input):
    ...

or if you care about foo only:

scanner = Scanner([FooParser()])

for foo in scanner.scan(some_input):
    ...

The open-closed principle shows itself when you want to add a new type to be parsed. In that case, you supply the scanner with a new (e.g. QuixParser()), and the scanner code does not need to be modified.

1

Here's how I would approach this. First define each of those entities as separate classes:

class Foo: ...
class Bar: ...
class Baz: ...

Next define a generic abstract handler per object. Not a single abstract handler for all objects. Like this:

T = TypeVar('T', Foo, Bar, Baz)

class Handler(ABC, Generic[T]):
    @abc.abstractmethod
    def handle(self, obj: T):
        pass

Now we want to store handlers per object type. We will utilize a dict of lists for that. But first lets define a scanner:

class Scanner:
    def __init__(self, handlers):
        # handlers is a dict of lists of Handler instances
        self.handlers = handlers

    def _apply_handlers(self, obj):
        # additional checks and logs here
        obj_type = type(obj)
        for handler in self.handlers[obj_type]:
            handler.handle(obj)

    def run(self, input):
        for chunk in input:
            ... complicated logic ...

            obj = None

            if ... something ...:
                obj = Foo(...)
            if ... something else ...:
                obj = Bar(...)
            if ... something else ...:
                obj = Baz(...)

            if obj is not None:
                self._apply_handlers(obj)
            else:
                # log it?

and finally the corresponding builder:

class ScannerBuilder:
    def __init__(self): 
        self.handlers = {}

    def register(self, obj_type: Type, handler):
        if obj_type not in self.handlers:
            self.handlers[obj_type] = []
        self.handlers[obj_type].push(handler)

    def build(self) -> Scanner:
        return Scanner(self.handlers)

And finally the usage:

builder = ScannerBuilder()
builder.register(Foo, foo_handler)
builder.register(Bar, bar_handler)
builder.register(Baz, baz_handler)
builder.register(Baz, other_baz_handler)
scanner = builder.build()
scanner.run(input)

Looks like clean, separated, easily testable solution to me.

0

I'd say the simplest (most pythonic?) way to achieve this is to yield tuples consisting of a type marker and the thing itself. You don't need an OOP visitor pattern with ScanHandler instances, not everything needs to be a class.

def scan(someInput):
    for chunk in someInput:
        ... complicated logic ...
        if ... something ...:
            foo = ...
            yield 'Foo', foo
        elif ... something else ...:
            bar = ...
            yield 'Bar', bar
        elif ... something else ...:
            baz = ...
            yield 'Baz', baz

You would typically consume the output with a match statement:

for token in scan(myInput):
    match token:
        case 'Foo', foo:
            """handle the foo"""
        case 'Bar', bar:
            """handle the bar"""
        case other, _
            raise NotImplementedError(f'Unexpected "{other}" token')

But with this basic scan, you can implement all the other patterns on top of this:

def handleScan(someInput, handler: ScanHandler):
    for typ, val in scan(someInput):
        getAttr(handler, 'on'+typ)(val);

If you want to get only foos, you'd write

foos = (val for typ, val in scan(myInput) if typ == 'Foo')

For one-off use, the if statement inside the for loop is totally fine. If you need to scan for only foos in multiple places, and for only bars in multiple other places, add some helper functions that wrap scan. I would however definitely recommend to write separate helper functions (handleScan, scanFoos, scanBars etc), do not try to combine them into one function that will return different types of values depending on how it is called.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.