It's very simple.
When you would recurse down the left child then the right child, instead start a task that recurses down the left, and continue down the right. When you are done with the right, you may have to wait on the left, but in a balanced tree that won't be long
When you are at depth N in your tree, you have the potential for 2^N cores working
That specific diagram is only spawning tasks at specific depths, probably because it isn't queueing tasks. That is using knowledge of the structure of the tree to not overload the scheduler
If you have a task queue and a thread pool, the implementation can leverage those to not have to worry about the quantity of tasks spawned.
Say you have
class Node
{
int data; // or w/e
Node * left;
Node * right;
template <typname Action>
void SerialTraverse(Action action)
{
action(data); // Pre-order traversal
if (left) left->SerialTraverse(action);
// Traverse the left
if (right) right->SerialTraverse(action);
// Traverse the right
}
}
namespace library
{
template<typename Task, typename Class, typename Arg>
std::future<void> enqueue_task(Task task, Class * obj, Arg arg);
// on some other thread, call "obj->task(arg);"
// returns a handle upon which we can wait
}
Then you can change SerialTraverse to
template <typname Action>
void ParallelTraverse(Action action)
{
action(data);
std::future<void> fut;
if (left) fut = library::enqueue_task(ParallelTraverse, left, action);
// start traversing left on another thread
if (right) right->ParallelTraverse(action);
// traverse right on this thread, in parallel to traversing left
if (fut.valid()) fut.wait();
// wait for the left traversal to end
}