Skip to content

Fix a deadlock in CanvasRenderRD#120071

Merged
Repiteo merged 1 commit into
godotengine:masterfrom
hpvb:fix-canvasrenderrd-deadlock
Jun 8, 2026
Merged

Fix a deadlock in CanvasRenderRD#120071
Repiteo merged 1 commit into
godotengine:masterfrom
hpvb:fix-canvasrenderrd-deadlock

Conversation

@hpvb

@hpvb hpvb commented Jun 7, 2026

Copy link
Copy Markdown
Member

RenderingServerDefault::_draw wants() to get all of the materials. The threaded loader is still loading the materials:

  1. The main thread locks canvas_singleton->shader.mutex via
RendererRD::Utilities::update_dirty_resources()
  -> RendererRD::MaterialStorage::_update_queued_materials
    -> RendererCanvasRenderRD::CanvasMaterialData::update_parameters()
  1. RendererCanvasRenderRD::CanvasMaterialData::update_parameters() wants to get the current shader via ShaderRD::version_get_shader(). This eventually wants to wait on the load of that shader via WorkerThreadPool::wait_for_group_task_completion() in ShaderRD::_compile_version_end()

At this point the main thread is waiting for the threaded load to finish loading the shader. Meanwhile in a thread not too far away the load is progressing:

  1. The loader is loading the tres in ResourceLoaderText::load() which eventually wants to ShaderMaterial::set_shader()
  2. We eventually end up in RendererCanvasRenderRD::CanvasShaderData::set_code()
  3. set_code wants to lock canvas_singleton->shader.mutex to protect against concurrent compilation

At this point the main thread is waiting on the load to complete, but holds canvas_singleton->shader.mutex via update_parameters(), and the loader is waiting on canvas_singleton->shader.mutex in set_code().

Tragedy ensues etc.

To fix the problem we don't acquire the lock until after we're sure we're done compiling the shader.

The same pattern exists in scene_shader_forward_clustered.cpp which was created in the same commit as the code in the canvas singleton.

With this change the deadlock can no longer occur as set_code() can now run concurrently with RenderingServerDefault::_draw() as the path where the main thread waits on the WorkerThreadPool with
canvas_singleton->shader.mutex held can no longer occur.

Data integrity wise: previously you'd think that canvas_singleton->shader.mutex would have protected against metadata changes to uniforms, ubo_offsets, and texture_uniforms but set_code() used to write to some of these (uniforms, ubo_size) without taking the lock with no ill effects because the material isn't updated until the shader has loaded.

A similar pattern was found in scene_shader_forward_mobile.cpp and was fixed in the same way.

@hpvb hpvb added this to the 4.7 milestone Jun 7, 2026
@hpvb hpvb requested a review from a team as a code owner June 7, 2026 21:56
RenderingServerDefault::_draw wants to get all of the materials. The threaded
loader is still loading the materials:

1) The main thread locks canvas_singleton->shader.mutex via
   RendererRD::Utilities::update_dirty_resources
    -> RendererRD::MaterialStorage::_update_queued_materials
      -> RendererCanvasRenderRD::CanvasMaterialData::update_parameters()
2) RendererCanvasRenderRD::CanvasMaterialData::update_parameters() wants
   to get the current shader via ShaderRD::version_get_shader. This
   eventually wants to wait on the load of that shader via
   WorkerThreadPool::wait_for_group_task_completion in
   ShaderRD::_compile_version_end

At this point the main thread is waiting for the threaded load to finish
loading the shader. Meanwhile in a thread not too far away the load is
progressing:

1) The loader is loading the tres in ResourceLoaderText::load which
   eventually wants to ShaderMaterial::set_shader
2) We eventually end up in RendererCanvasRenderRD::CanvasShaderData::set_code
3) set_code wants to lock canvas_singleton->shader.mutex to protect
   against concurrent compilation

At this point the main thread is waiting on the load to complete, but
holds "canvas_singleton->shader.mutex" via update_parameters(), and the
loader is waiting on "canvas_singleton->shader.mutex" in set_code.

Tragedy ensues etc.

To fix the problem we don't acquire the lock until after we're sure
we're done compiling the shader.

The same pattern exists in scene_shader_forward_clustered.cpp which
was created in the same commit as the code in the canvas singleton.

With this change the deadlock can no longer occur as set_code() can now run
concurrently with RenderingServerDefault::_draw() as the path where the
main thread waits on the workerthreadpool with
canvas_singleton->shader.mutex held can no longer occur.

Data integrity wise: previously you'd think that
canvas_singleton->shader.mutex would have protected against metadata
changes to uniforms, ubo_offsets, and texture_uniforms but set_code() used
to write to some of these (uniforms, ubo_size) without taking the lock with
no ill effects because the material isn't updated until the shader has loaded.

A similar pattern was found in scene_shader_forward_mobile.cpp and was
fixed in the same way.

@clayjohn clayjohn left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this issue was fixed for the Forward+ backend in #105138. I wonder if some of the other changes should also be copied over?

CC @stuartcarnie

At any rate, the code in this PR looks good to me

@hpvb

hpvb commented Jun 8, 2026

Copy link
Copy Markdown
Member Author

Looks like this issue was fixed for the Forward+ backend in #105138

That PR didn't fix the particular issue this PR fixes. It looks like forward+ didn't have this particular deadlock.

But there's some things in there that I did consider but rejected because of how close we are to release, if you want I can take the lock scope reduction from this PR into this one for canvasrenderrd also?

Or I can make a separate PR. But then this maybe? Isn't a pure bugfix pr any longer.

@akien-mga akien-mga moved this from Unassessed to Release Blocker in 4.x Release Priorities & Blockers Jun 8, 2026
@akien-mga

Copy link
Copy Markdown
Member

Further lock scope reduction might be interesting but I'd keep them for a separate PR. This one can be merged as is for 4.7 RC 1 today.

@hpvb

hpvb commented Jun 8, 2026

Copy link
Copy Markdown
Member Author

Further lock scope reduction might be interesting but I'd keep them for a separate PR. This one can be merged as is for 4.7 RC 1 today.

There are several opportunities, the lock probably doesn't need to be held past compilation, the lock probably doesn't need to be held for performance counters either. There's a couple more small things like that. But yeah, maybe not during RC :)

@Repiteo Repiteo merged commit 279e7ca into godotengine:master Jun 8, 2026
20 checks passed
@Repiteo

Repiteo commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment