- Balancing the tree would decrease the worst-case insertion, lookup and deletion costs from \$\mathcal{O}(n)\$ to \$\mathcal{O}(\log n)\$. However, it might increase the average runtime by a constant factor. As always with performance: Measure, measure, measure!
- Trees are notoriously bad regarding cache misses because of all the cache misses due to pointer chasing. Maybe using an arena allocator (aka "object pool" or "region allocator") could help to reduce those cache misses.