This question tries to collect the full picture if/when a stale object reference can happen from an old-gen (immutable) array referring a newer-gen object, from fragments of information.
Preface: was originally researching unsafeThaw/unsafeFreeze of vectors, but as these seem to use the unsafe[...]Array underlying (M)Array operations, continued the search on those. Assuming the thin wrapper of vector around them doesn't change the situation much.
The information fragments
In https://mail.haskell.org/pipermail/glasgow-haskell-users/2014-May/024978.html (Using mutable array after an unsafeFreezeArray, and GC details, reply by Edward Z. Yang):
[...] newArray# and unsafeFreezeArray#, what this operation does
is allocate a new array of pointers (initially recorded as mutable), and
then freezes it in-place (by changing the info-table associated with
it), but while maintaining a pointer to the original mutable array. Nothing bad
will happen immediately, but if you use this mutable reference to mutate
the pointer array, you can cause a crash (in particular, if the array
makes it to the old generation, it will not be on the mutable list and
so if you mutate it, you may be missing roots.)
which, IIUC, describes the scenario where by unsafely mutating an immutable array (residing in some old generation) by writing a ref into it pointing to a newer generation, there would be a reference from old-gen to a newer-gen, which is not followed by GC, so the new-gen pointed thing is deemed non-referenced and thus GC-d away.
This is understandable, but contradicted somewhat by (admittedly 2-year older) https://mail.haskell.org/pipermail/glasgow-haskell-users/2012-March/022140.html (How unsafe are unsafeThawArray# and unsafeFreezeArray#, reply by Simon Marlow)
I just ran across some code that calls unsafeThawArray#, writeArray#,
and unsafeFreezeArray#, in that order. How unsafe is that?
- Is it unsafe in the sense that if someone has a reference to the
original Array# they will see the value of that pure array change?
Yes.
- Is it unsafe in the sense things will crash and burn?
No. (at least, we would consider it a bug if a crash was the result)
The RTS implementation details of unsafeThawArray included at the bottom of the reply also hint that these unsafe thaw/freeze operations not only in-place change the array mutability marker, but also move them between generations and the mutable list in some way.
The last message of this mailing thread https://mail.haskell.org/pipermail/glasgow-haskell-users/2012-March/022145.html (reply by Johan Tibell) also hints that the unsafeFreezeArray is necessary due to its side-effect for marking as immutable (and compiling with Simon's previously mentioned reply, still keeping on the mutable list as FROZEN0 until eventually a GC moves it to an immutable GC space), but not sure if this is the full picture.
Searching some more, the QA here at Mutable Array in GHC Compact Region Ben Gamari nicely describes the multi-generation mutable reference scenario, and in the comments section we find the question
When someone calls unsafeFreezeArray#, it seems like they would end up with an immutable value that can point to younger generations. It doesn't seem like GHC changes the frozen array to be in generation 0, so I don't understand how this works with garbage collection.
which touches on what we are after I believe - it continues
just found out about eager promotion, which answers my question.
Then https://gitlab.haskell.org/ghc/ghc/-/wikis/commentary/rts/storage/gc/eager-promotion and https://gitlab.haskell.org/ghc/ghc/-/wikis/commentary/rts/storage/gc/remembered-sets describes how the new-gen object being referred would be promoted into the old-gen in which the immutable array lives.
Synthesis?
It sounds like the 2014 post's statement
if you use this mutable reference to mutate the pointer array, you can cause a crash (in particular, if the array makes it to the old generation, it will not be on the mutable list and so if you mutate it, you may be missing roots.)
is not true - the mutable-then-immutably-marked array is on the mutable list, which serves as a GC root for newer gens, but then as the frozen array is moved off the mutable list (does it get moved off? Simon Marlow's RTS copy doc says it would, but the Remembered set wiki is not clear about this), the new-gen object should get eager-promoted to that older gen as well, isn't it?
Unless there is some race or gap between moving the frozen array off the mutable list and performing the eager promotion of the referred new-gen object? Or maybe the eager promotion can not always be done?
Aside: in Johan Tibell's example, what would it cause to not call the final unsafeFreezeArray?
Without fully understanding the above mentioned mutable list / moving mechanics, and the main question being open, this is more a shot in the dark:
The unsafeThawArray likely moves the array to a mutable list
because the RTS comments referred above mention:
// So, when we thaw a MUT_ARR_PTRS_FROZEN, we must cope with two cases:
// either it is on a mut_list, or it isn't. We adopt the convention that
// the closure type is MUT_ARR_PTRS_FROZEN0 if it is on the mutable list,
// and MUT_ARR_PTRS_FROZEN otherwise. In fact it wouldn't matter if
// we put it on the mutable list more than once, but it would get
scavenged
// multiple times during GC, which would be unnecessarily slow.
then since the unsafeFreezeArray was not called, it would happily stay on the mutable list (though not clear which gen's mutable list, assuming its own generation's), serving as a GC root for younger generations. Which would be fine, just would prevent the eager promotion optimization.
Aside over
So the main question is, can this lost-object-ref issue ever arise, and what are the exact circumstances / mechanics involved?