6
$\begingroup$

In Simple and Effective Type Check Removal through Lazy Basic Block Versioning, which introduces Lazy Basic Block Versioning (LBBV), they have runtime type checks expand into two thunks for the cases that the type check passes or fails; forcing the type check pass thunk emits a block which is specialized with knowledge that the variable is the given type.

However, they only propagate specific and positive types this way. In the type check failure case they don't track the knowledge that the variable isn't the given type, which causes things like diagram 3.4 having duplicate type checks for is_i32(i) at both block B and J, even though block J is dominated by the type check of block B.

Indeed, in section 3.3 "Flow-based Representation Analysis" they implement a static type propagation pass to compare the performance of LBBV against which does implement set-based type information, including negative sets, and point out that this analysis is richer than the strictly precise type tags that LBBV uses.

Why doesn't LBBV track type sets? Does the block versioning scheme fall apart with the introduction of negative sets, or cause the trace to explode in number of blocks? Is it simply a runtime performance trade-off?

$\endgroup$
1
  • 2
    $\begingroup$ Many languages' type systems cannot represent types like "not i32". Given that this is a research article, the authors may have thought it was more important to investigate their technique's effectiveness in the more typical scenario where negative types aren't possible. But unless they have explained this choice in the paper itself, then this question can only really be answered by the authors themselves; I would suggest sending a brief email (and perhaps including their response in an answer here). $\endgroup$ Commented May 31 at 8:19

1 Answer 1

8
$\begingroup$

Author here.

At the end of my PhD thesis I outlined some possible future directions, and I suggested that LBBV could be extended to encode arbitrary facts about programs, e.g. trees of boolean expressions (see section 9.5). You could encode that a value is not a given type, but also things like numerical ranges and specific constants. In practice it can be particularly useful to propagate small constants such as the booleans true and false and the integers 0 and 1. This wasn't discussed in the papers we published, but as part of my PhD I also used LBBV to eliminate overflow checks used in for loop increments by propagating some range information. You could also imagine using it to optimize a reference-counting garbage collector, for example.

In practice, with the type representation that we used, we found that there was no basic block explosion. I don't think that would happen if you used type sets. In practice most types tend to be monomorphic. The use of a basic block version limit also places a hard cap that prevents an explosion from happening. That is, most functions tend to be small and fairly monomorphic. If an explosion is going to happen, it will be limited to a few relatively rare function, and the basic block limit is going to greatly limit the impact when it does happen.

As for why we didn't use type sets, we wanted a compact representation. In JS you tend to need to know the exact type of a value to be able to operate on it. I don't know if it happens that often that you would know for instance that a value is not an integer, and then you don't run more type checks to find out its exact type. I don't know if it would be super useful to be able to propagate that "x is not an integer" for instance. That being said, we haven't run that experiment. Maybe type sets would turn out to work quite well, it's an open research question.

You specifically pointed to the diagram in section 3.4 of the paper. That diagram is before LBBV specialization is used. This is a diagram of the implicit type checks in an unoptimized JS program. The reason why you have a duplicated type check is that in the overflow case, i would become a floating-point value (or something wider than an int32). Figure 5 shows the code with LBBV specialization applied, and there, type of i is never checked inside the loop body. Provided that there is no overflow, we know that i remains int32 at every point inside the loop body.

$\endgroup$
3
  • $\begingroup$ Welcome! Does "basic block explosion" mean that you would make different specialised versions of each basic block for each different set of type information you have about it, and hence that the number of different versions could be very large if there are more fine-grained types (like "not i32" as opposed to "any")? $\endgroup$ Commented Jun 1 at 19:14
  • $\begingroup$ I don't think that's a concern in practice. We implemented a basic block version limit so we could have a hard guarantee that it would never be the case, because hard guarantees are always better than intuitive guesses, but even without the limit, we never saw that much code size growth. $\endgroup$ Commented Jun 2 at 21:10
  • $\begingroup$ Just checking that I understood the term correctly. $\endgroup$ Commented Jun 2 at 21:30

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.