Fix a deadlock in CanvasRenderRD#120071
Conversation
RenderingServerDefault::_draw wants to get all of the materials. The threaded
loader is still loading the materials:
1) The main thread locks canvas_singleton->shader.mutex via
RendererRD::Utilities::update_dirty_resources
-> RendererRD::MaterialStorage::_update_queued_materials
-> RendererCanvasRenderRD::CanvasMaterialData::update_parameters()
2) RendererCanvasRenderRD::CanvasMaterialData::update_parameters() wants
to get the current shader via ShaderRD::version_get_shader. This
eventually wants to wait on the load of that shader via
WorkerThreadPool::wait_for_group_task_completion in
ShaderRD::_compile_version_end
At this point the main thread is waiting for the threaded load to finish
loading the shader. Meanwhile in a thread not too far away the load is
progressing:
1) The loader is loading the tres in ResourceLoaderText::load which
eventually wants to ShaderMaterial::set_shader
2) We eventually end up in RendererCanvasRenderRD::CanvasShaderData::set_code
3) set_code wants to lock canvas_singleton->shader.mutex to protect
against concurrent compilation
At this point the main thread is waiting on the load to complete, but
holds "canvas_singleton->shader.mutex" via update_parameters(), and the
loader is waiting on "canvas_singleton->shader.mutex" in set_code.
Tragedy ensues etc.
To fix the problem we don't acquire the lock until after we're sure
we're done compiling the shader.
The same pattern exists in scene_shader_forward_clustered.cpp which
was created in the same commit as the code in the canvas singleton.
With this change the deadlock can no longer occur as set_code() can now run
concurrently with RenderingServerDefault::_draw() as the path where the
main thread waits on the workerthreadpool with
canvas_singleton->shader.mutex held can no longer occur.
Data integrity wise: previously you'd think that
canvas_singleton->shader.mutex would have protected against metadata
changes to uniforms, ubo_offsets, and texture_uniforms but set_code() used
to write to some of these (uniforms, ubo_size) without taking the lock with
no ill effects because the material isn't updated until the shader has loaded.
A similar pattern was found in scene_shader_forward_mobile.cpp and was
fixed in the same way.
01658c0 to
52f2737
Compare
That PR didn't fix the particular issue this PR fixes. It looks like forward+ didn't have this particular deadlock. But there's some things in there that I did consider but rejected because of how close we are to release, if you want I can take the lock scope reduction from this PR into this one for canvasrenderrd also? Or I can make a separate PR. But then this maybe? Isn't a pure bugfix pr any longer. |
|
Further lock scope reduction might be interesting but I'd keep them for a separate PR. This one can be merged as is for 4.7 RC 1 today. |
There are several opportunities, the lock probably doesn't need to be held past compilation, the lock probably doesn't need to be held for performance counters either. There's a couple more small things like that. But yeah, maybe not during RC :) |
|
Thanks! |
RenderingServerDefault::_draw wants()to get all of the materials. The threaded loader is still loading the materials:canvas_singleton->shader.mutexviaShaderRD::version_get_shader(). This eventually wants to wait on the load of that shader viaWorkerThreadPool::wait_for_group_task_completion()inShaderRD::_compile_version_end()At this point the main thread is waiting for the threaded load to finish loading the shader. Meanwhile in a thread not too far away the load is progressing:
ResourceLoaderText::load()which eventually wants toShaderMaterial::set_shader()RendererCanvasRenderRD::CanvasShaderData::set_code()canvas_singleton->shader.mutexto protect against concurrent compilationAt this point the main thread is waiting on the load to complete, but holds
canvas_singleton->shader.mutexviaupdate_parameters(), and the loader is waiting oncanvas_singleton->shader.mutexinset_code().Tragedy ensues etc.
To fix the problem we don't acquire the lock until after we're sure we're done compiling the shader.
The same pattern exists in
scene_shader_forward_clustered.cppwhich was created in the same commit as the code in the canvas singleton.With this change the deadlock can no longer occur as
set_code()can now run concurrently withRenderingServerDefault::_draw()as the path where the main thread waits on the WorkerThreadPool withcanvas_singleton->shader.mutexheld can no longer occur.Data integrity wise: previously you'd think that
canvas_singleton->shader.mutexwould have protected against metadata changes to uniforms, ubo_offsets, and texture_uniforms butset_code()used to write to some of these (uniforms, ubo_size) without taking the lock with no ill effects because the material isn't updated until the shader has loaded.A similar pattern was found in
scene_shader_forward_mobile.cppand was fixed in the same way.