Sjsonnet performance improvements#117
Conversation
We have to include the absolute path in addition to the source text in the cache key now that we're generating the FileSscope up front and sharing it across multiple compilations. Otherwise we could get wrong positions with wrong paths in identical files.
|
I've ran some tests with |
|
Added yet another bug fix. Now |
lihaoyi-databricks
left a comment
There was a problem hiding this comment.
@szeiger I read through all the commits. I don't really have any code-level feedback, but I trust that the test suites (both in this repo, and in our work codebase using Sjsonnet) would be enough to catch any correctness issues (including on the Scala.js and Scala.Native backends), and I trust that the benchmarks are sufficient to ensure the performance improvement is as stated.
Some high-level feedback:
-
Could you write up a few-paragraph summary of the major things that happened in this PR as part of its description? e.g. "unboxed options", "convert collection combinators to while-loops", "re-use literal AST nodes", "replace bitsets", "avoid position allocations", "optimize std", and so on, along with some explanation of why each change was helpful. That would definitely help future reviewers to be able to see at a glance what your high-level approach was, v.s. having to re-construct your intentions from the line-diff like I did when reviewing this.
-
I noticed you turned on
-optin the SBT build. Should we turn it on in the Mill build? Or should we only publish from the SBT build? -
If we're publishing artifacts compiled with
-optenabled, that will break scala binary compatibility right? Would that cause an issue given that we're still using 2.13.3 in our work code, but this repo is 2.13.4? -
On a similar note, should we publish both
-optand non--optartifacts? There are folks using Sjsonnet as a library rather than executable, and they may be on other Scala point versions different from what we use at Databricks.
|
Oh, I thought Ahir already added the optimizer options to the Mill build. Maybe that's why it wasn't as fast in universe as I expected? I never benchmarked without optimizations. There's no need to publish two versions or to turn optimization off, but we should only inline from our own codebase. (At this point there aren't many standard library calls left that are worth inlining anyway.) This will not affect binary compatibility in any way. |
|
I added the optimizer settings to the Mill build. It doesn't make a big difference to performance. |
|
If it doesn't make much difference, let's leave it out. That would simplify things and avoid questions about binary compatibility |
|
It doesn't cause any binary compatibility issues. We're only inlining from sjsonnet, nothing else. |
|
Oh got it |
This is the collection of changes that @ahirreddy and I made to improve Sjsonnet performance.
Performance baseline for me was Ahir's branch (which should already be 1/3 faster than the current release) running on Zulu 11:
Switching to OpenJDK 16:
And my various improvements on top of that:
Hottest methods in the final state:

High-level overview of important changes:
Array[(Int, Option[T])]has an extraTuple2, an extra boxedIntegerand potentially an extraSomeper element. Replacing it by two separate arraysArray[Int]andArray[T](usingnullinstead ofNone) is always going to be faster. There's not even an argument to be made for better locality when using the tupled version because the tuple contains references. Overall locality is worse (in the new version at least theInts are colocated). Many commits in this PR are concerned with removing more and more unnecessary objects.java.util.BitSetandjava.util.LinkedHashMapbutscala.collection.mutable.HashMap. The choice is easier for sequence types: Just use plain arrays.map. The scalac optimizer could inline all of these calls but we can't allow inlining from the standard library because of binary compatibility requirements, so we have to do the rewriting manually.Positionobject. An AST node can be evaluated many times so it is better to create thePositionobjects directly in the parser. This is also more correct. An offset is always tied to a specific file.Lazy. The old approach relied on the usual Scala mechanisms:Function0for the thunk, plus the actualLazy), thelazy valrequires a separateBooleanflag to keep track of the state, andlazy valuses thread-safe double-check locking. We can cut the allocations in half by makingLazya SAM type, and use our own state handling to encode a missing value asnull(because we know thatfcan never returnnull) and omit the locking (because we don't need thread-safety).matchexpressions like invisitExprandvisitBinaryOpare very efficient. The Scala compiler will remove the tuples and you end up with a nice table dispatch on final class types. We are avoiding separate handling of special cases (like the lazyandandoroperators) because a single table-based dispatch is faster.Vals. There is no need to have the parser emit, for example, anExpr.Str(s), only to get it turned into aVal.Str(s)immediately by the evaluator. All literals are emitted directly asVals and the evaluator can skip right over them.Expr.Parened(e)was used for an expression in parentheses. This information is not needed for evaluating the expression. The result is the same as evaluatingedirectly, so we don't even have to produce the node first.builtinmethods to avoid some boilerplate. These had a commonbuiltin0abstraction for arbitrary arities, which required wrapping values in arrays. We only need 3 different arities and avoiding this abstraction actually makes the code simpler (in addition to removing the unnecessary arrays). For the most common functions we do not usebuiltinanymore at all. A direct implementation with a bit more boilerplate is even faster.String.split(Char)quotes the char into a regular expression which then gets compiled just so it can reuse the regular expression engine for splitting. This can be implemented much more efficiently.