1

Whats the best way to go about providing some kind of global context to all nodes, say I'm processing a file and I have an object to represent the file, with attributes etc, and I want to be able to access that in every node?

1 Answer 1

1

There are a lot of options depending on what you mean.

This is something you can achieve using regular python software development practices (see the class example below).

First option, if it's «global», is to use a global variable. Ok, that's not the best practice in the world, but I think this is bad mostly because you rarely want a global state (or you should rarely want it).

Now that the obvious is probably crossed out, let's think about what "global context" means in your question.

If it means something "global" to the execution, and defined outside of it, you can group this configuration object in a service and inject it into the nodes that need it.

config = {"location": "at joe"}

@use("config")
def choose_a_place_to_eat(config):
    return config["location"]

bonobo.run(..., services={"config": config})

If this is something that concerns a bunch of nodes, you can also group those nodes in a class whose role is to keep this configuration state.

class Venue():
    def __init__(self, name):
        self.name = name

    def producer(self):
        return self.name

    def checker(self, name):
        return self.name == name

venue = Venue(name="Joe")
graph = bonobo.Graph(venue.producer, venue.checker, print)
bonobo.run(graph)

or even (replacing the last part of the previous example)...

def build_graph(venue):
    return bonobo.Graph(venue.producer, venue.checker, print)

bonobo.run(build_graph(Venue(name="Joe")))

Bonobo internally also keeps a context for nodes, and graphs, related to one execution, and you can use those objects if you're familiar with threads (execution order is not predictible, so be careful about that). There are plans a few releases from now to add threadsafe tooling for this in the graph execution context.

Those contexts are created when the execution starts, and returned when execution finishes.

Hope that help.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks, looks like the Services approach would work. I dont want the context to be completely global, but just accessible by all nodes. What if the first node was a file generator (yields file objects) then second node is record extractor (yeilds records from file), and I want to be able to access the corresponding file object in every node after the File Generator node? MIght be easiest to have the file-generator loop outside Bonobo graph completely and just have first node as Record Extractor, and provide file as service to graph?
I think the approach work, but not after the record extractor. I can't see why you would want the file object after this place, especially because sharing objects between threads like this can be (well, is) very dangerous. Also, I would tend to pass filenames instead of file objects from one node to another so the node responsible for file operations can handle it correctly (like a context manager, with a finally: close())
It's because each node is a transformation that may want to use attributes associated with the file object, as well as each record
I think you'd be better of by having some kind of "file metadata" object created from the file object, handle the file cleanly in one node and only use the (readonly) metadata in the other nodes. Using python file objects with random open/close timings and overall file handle availability in nodes that run . in an unpredictable order would for sure bring you into trouble at some point.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.