Strings and arrays in Project Valhalla

Question

My understanding of Project Valhalla's impact on arrays and Strings (please let me know if this is off):

arrays will still be reference objects but an array of value objects may be flattened on the heap
despite the fact that the String class is discussed in JEP 401 as an example of a class where identity is confusing, Strings will still have identity after Valhalla

I can see the sense behind this:

arrays can be LARGE
arrays are currently mutable (though Strings are not)

I want to make the argument that Java will never have identity-less Strings or arrays. I'm looking for a "clinch" reason for that. I'm not sure the reasons I've given here are really "clinching". I'm hoping a real expert can chime in and explain. (Or tell me I'm wrong.)

Are there other reasons on top of that?

Is there any chance that a String object will move to the stack or that there might be some allowance for immutable, small, value arrays in the future?

"Any other reasons on top of that?" You mentioned a few quite different things - arrays being flattened, arrays still being reference objects but it making sense to have small arrays as value objects (how small?), strings still having identity vs. being on the stack - which are you asking about the reason for? I think this question is asking too many things but can probably be narrowed down. Also, welcome! — kaya3
– kaya3, Commented Jan 17 at 23:52
Thanks, @kaya3. I think/hope I've clarified the question, now. — BPS
– BPS, Commented Jan 18 at 1:35
Never is a very long time and the owner of the language can be arbitrarily capricious, so of course there's a chance, but I think that's not what you mean though I'm not sure how to pinpoint what you are aiming for. It's a very fair thing to wonder but I'm not sure it can get an absolute yes or no. Perhaps this is about the (known) costs and tradeoffs that would be anticipated? — Michael Homer
– Michael Homer ♦, Commented Jan 18 at 2:59
It's worth noting that "identity-less" isn't a necessary or sufficient condition for being "on the stack". It's possible as an optimisation to allocate an object with identity on the stack, when escape analysis determines that the object won't outlive the scope it was created from. It's also possible to use the heap for identity-less data which is too large for the stack, dynamically sized, shared between threads, etc. Functional languages generally don't have a notion of object identity, but they use the heap for lots of things. — kaya3
– kaya3, Commented Jan 18 at 11:53
@kaya3 Also: stack allocation is not the optimization that everyone thinks it is. If we can elide heap allocation, we're far more likely to scalarize the object and hoist its fields into registers. — Brian Goetz
– Brian Goetz, Commented Jan 19 at 16:21

Brian Goetz · Accepted Answer · 2025-01-18 15:21:38Z

There are a number of questions tangled together here, let me try and tease them apart.

How do arrays of value types work;
Are there short-term plans to migrate String to be a value class, like we have for Integer;
Are there long-term impediments to migrating String to be a value class.

Arrays will remain identity objects. (All objects will remain reference objects.) In the short term, this is a forced move because arrays in Java are mutable, and mutability requires identity. In the longer term, even if we had immutable arrays (which we would like to do, we call these "frozen" arrays), there is little value to stripping arrays of their identity, for several reasons, including the one you mention (there is limited value in trying to flatten arrays as they are generally large.)

Arrays of (references to) value objects are candidates for flattening in their layout, just as objects whose fields are (references to) value objects. (The model that all objects are referred to by references is unchanged; for references to value types we may be able to optimize away the physical representation of the reference (the pointer)).

In the short term, String is not on the list of classes that will be migrated to a value type. (Classes on that list include the primitive box classes like Integer and Long, date-time classes like LocalDate, and other utility classes like Optional, likely with more over time.)

Migrating String to a value class is more challenging for multiple reasons:

The identity of String may be significant to some code. The String::intern method has been present since day 1, and many frameworks rely on interning of strings. While this is theoretically not a problem (the effect would just be that more strings that are .equals() would also be considered ==, which mostly seems harmless), String is so pervasive that it is hard to imagine all the code that would be affected by such a change.
There may well be code out there that synchronizes on String instances. While we might say that this is silly, it has been valid Java code for 30 years, so such a change would not be behaviorally compatible. This might still be OK, but it would have to be done carefully over a longer term.
The current implementation of String uses mutable fields for caching the hash code, and mutability requires identity.
It would likely perturb the cost of == operations on String, which might conceivably show up as a macro change in performance (since Strings are so pervasive).

So, there are many reasons to tread carefully. But, could String ever be migrated? I believe so. Assuming that the nonfunctional impact (performance perturbations, equality becoming more liberal) were acceptable, the mutable state of the String used to cache the hash code (which is also the subject of a benign data race) could be moved into the backing character array instead. This would be an expensive implementation change (since there is so much code in the JVM to special-case String operations to make them fast), but would be possible, and might eventually be considered.

So, there is no "clinch" argument here that it will never happen, but there are surely impediments to overcome.

Finally, I would recommend that you get "stack allocation" out of your mental model of value classes; it is not a helpful mental model. It carries two incorrect assumptions: one, that all Java objects are allocated on the heap, which is simply not true, completely separate from Valhalla. And two, to the extent that the JVM can elide heap allocation, it is far more likely that it will be scalarized and the fields hoisted into registers, than allocated on the stack (though excess registers can spill to the stack as part of the method invocation protocol.)

Valhalla won't change this; it just increases the number of objects that the JVM can prove can be safely scalarized.

Straight from the horse's mouth. Doesn't get any better than this. — Jörg W Mittag
– Jörg W Mittag, Commented Jan 18 at 15:47

Stack Exchange Network

Strings and arrays in Project Valhalla

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Strings and arrays in Project Valhalla

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions