I have seen a few papers on parallel/GPU processing of trees, but after briefly looking through them I wasn't able to grasp what they did. The closest to a helpful explanation was found in Parallelization: Binary Tree Traversal in this diagram:
But having a difficult time following the paper.
Wondering if one could outline an algorithm for parallel processing of a tree. Somehow I can imagine this being possible, and seeing papers on it suggests it is, but I can't really think of what you would do to make it happen.
If it's any help, specifically I'm wondering how to traverse a B+tree to find the matches.
Update
Here is another diagram (from here) which seems to shed some light, but having difficulty understanding.

