In the second line of code above, what variable is lambda taking and what function is it carrying out?
A lambda function is an anonymous (without name) function. So a lambda expression like:
tree = lambda: collections.defaultdict(tree)
is, except for some details (the fact that its __name__ attribute contains the name of the function, and not '<lambda>'), it is equivalent to:
def tree():
return collectsions.defaultdict(tree)
The difference with a simple exression is thus that we here encode the computation in a function. We can never call it, call it once, or multiple times.
It also allows us to tie a knot. Notice that we pass a reference to the function (lambda expression) in the result. We thus have a function that construct a defaultdict with as factory the function itself. We can thus recursively construct subtrees.
I also understand that defaultdict can take parameters such as int or list. In the second line, defaultdict takes the parameter tree which is a variable. What is the significance of that?
The tree that we pass to the defaultdict is thus a reference to the lambda-expression we construct. It thus means that in case the defaultdict invokes the "factory". We get another defaultdict with as factory again the tree.
If we thus call some_dict['foo']['bar']['qux']. We thus have a defaultdict in a defaultdict in a defaultdict. All these defaultdicts have as factory the tree function. If we later construct extra children, these will again be a defaultdict with tree as constructor.
The list or int case is not special. If you invoke list (like list()), then you construct a new empty list. The same happens with int: if you call int(), you will obtain 0. The fact that this is a reference to a class object is irrelevant: the defaultdict does not take this into account (it does not know what the factory is, it only invokes it with no parameters).