How does the performance of reference counting and tracing GC compare?

Question

We already have a more general question about different types of garbage collection, but performance is only touched on briefly, and in pretty unspecific ways. What are the more specific ways in which reference counting and tracing GC compare, performance-wise?

In addition to just a binary "this one's faster", answers should preferably go into some detail on different variables which can impact things (e.g., if one's faster with less memory, or one's better for caching, or one's better for concurrency, etc.), and possible pitfalls or pathological situations for the two.

So: in general, is RC or tracing GC faster? In what situations might this change, and what factors go into determining the performance of these two?

Reference counting have addational costs if a language allows more then a single thread. — Ian Ringrose
– Ian Ringrose, Commented Oct 9, 2024 at 13:36
Tracing GCs become more efficient when the system have a lot more memory then required to store all live objects. Likewise if you have some unneeded cores to run the GC on. — Ian Ringrose
– Ian Ringrose, Commented Oct 9, 2024 at 13:38

Gavin D. Howard · Accepted Answer · 2023-06-27 20:11:37Z

Disclaimer: I haven't implemented either in any non-trivial form. I'll delete my answer if a true expert has a better answer.

The true answer is unsatisfactory: it depends. There is no general answer simply because there are too many variables.

Garbage collection (by which I mean anything that can handle cycles) and reference counting (by which I mean anything that can't handle cycles) have such widely different possible implementations that either one could be orders of magnitude faster in both trivial and non-trivial implementations.

Perhaps the question you meant was: which one would be faster with same amount of effort expended on the implementation?

Again the answer is it depends.

Why? Because there are still too many variables.

Here's a non-exhaustive list of the variables that might matter:

Do you have to handle cycles?
Does the language design push one or the other?
Does the hardware architecture help or hinder one versus the other?
As mentioned above, what actual implementation designs are you considering?
What aspects of performance do you care about, and in what order?
Do you have any non-standard constraints?
And finally, what is user code really doing?

Let's address each of these.

Do you have to handle cycles?

You'd think that the answer to this question would immediately tell you whether you should use GC or RC. After all, only GC can handle cycles, right?

Not so fast. How often do cycles happen? Not often? Or maybe you can just blame users if they get a cycle (looking at you, Objective-C and Swift). Then you can still get away with just RC.

Or maybe you can tweak your RC to detect cycles. Perhaps it doesn't handle it, but activates a trivial tracing GC to do so. That would work, but then you have a hybrid, not one of the other.

Does the language design push one or the other?

Language design matters a lot.

Look at Java. Nearly everything is a pointer, nearly everything is allocated, and new might be the most common keyword. In that situation, you know that so much garbage will be generated, with so many cycles, that you need a fast allocator and that RC would be a fool's errand no matter what else.

But perhaps your language has better value semantics. Less pointers, less references, better do RC.

Or maybe your language eschews pointers entirely for ownership and copies, lots of copies. Or maybe it just hides pointers. Or maybe your language enforces structured concurrency such that a thread will always free memory it allocates. Or maybe it does something so exotic that you might as well get a PhD for your work.

In each case, the choice could easily change.

For example, if you expose pointers directly, you may not be able to use a compacting GC anymore since you can't move stuff around. Don't expose them, and maybe you can.

Does the hardware architecture help or hinder one versus the other?

First, do you even have threads? Do you need synchronization at all?

Okay, since you need threads (I mean, who doesn't), would it be better to stop all of them or just some of them? Are atomics expensive? How much contention do you expect on synchronization? Can you have a separate GC/RC thread always be running to avoid stopping the world?

What actual implementation designs are you considering?

For GC, do you want generations or not? Do you want to stop the world or not? Do you want to compact or not? Do you want a simple allocator and move complexity into deallocation or not?

For RC, how big of a counter do you need? Should you check for overflow? Should you increment on every copy or can you elide some? How much can you elide?

For both, where do you store the metadata, whether in the allocation, in its own place, or something completely different? Do you track the type of object or not? Do you have destructors (finalizers), and if so, do you care about checking for use-after-free or double frees?

What aspects of performance do you care about, and in what order?

There are more aspects to performance than just speed of code.

What is the average case performance? What is the worst case performance? How do both relate to the amount of allocated memory?

What are the average case pauses? What are the worst case pauses? How do both relate to the amount of allocated memory?

What is the fragmentation, internal or external, in the average case? What is the fragmentation in the worst case? How do they relate to the amount of allocated memory?

What is the average case memory overhead? What is the worst case memory overhead? How do both relate to the amount of allocated memory?

Finally, do you have any real-time constraints? If you do, you must ensure that both pauses and execution speed are small enough when put together. Good luck with that, especially since

In any garbage collector, there exists some pathological case where the responsiveness of your program will be compromised.

Do you have any non-standard constraints?

Going off of real-time constraints, what if you have other constraints. Maybe your language can only have 15% memory overhead. Maybe you have a stack limit. Maybe you have a hard heap limit.

And finally, what is user code really doing?

This is really what comes after all your hard work.

Maybe you just thought users would use your language a certain way, and they don't. Read this keynote to see what can happen when real code meets a GC. The same can happen with RC.

So basically, you can do tons of work, put out good designs, implement them well, and users will still find some way to break them.

But really, who needs users anyway? \s

Please don't delete the answer. Even if someone with better expertise in this area answer it, yours can provide some retrospectives. — DannyNiu
– DannyNiu, Commented Feb 7, 2024 at 2:17

Stack Exchange Network

How does the performance of reference counting and tracing GC compare?

1 Answer 1

Do you have to handle cycles?

Does the language design push one or the other?

Does the hardware architecture help or hinder one versus the other?

What actual implementation designs are you considering?

What aspects of performance do you care about, and in what order?

Do you have any non-standard constraints?

And finally, what is user code really doing?

You must log in to answer this question.

Linked

Hot Network Questions

How does the performance of reference counting and tracing GC compare?

1 Answer 1

Do you have to handle cycles?

Does the language design push one or the other?

Does the hardware architecture help or hinder one versus the other?

What actual implementation designs are you considering?

What aspects of performance do you care about, and in what order?

Do you have any non-standard constraints?

And finally, what is user code really doing?

You must log in to answer this question.

Linked

Related

Hot Network Questions