“ A bit more Deferred” - CryEngine 3 Martin Mittring – Lead Graphics Programmer Triangle Game Conference 2009
Crytek Main office: Germany Frankfurt More studios:  Kiev, Budapest, Sofia, Nottingham, Seoul English as company language 30+ nationalities CryEngine 1: PC only (Far Cry …) CryEngine 2: PC only (Crysis …) CryEngine 3: PC, XBox360, PS3 (announced GDC09)
32/64 Bit, WinXP/Vista, DX9/10, Multi CPU/GPU WYSIWYP ResourceCompiler Source asset -> Platform specific Direct Light: Shadow mapping Indirect Light (AO): SSAO / RAM / ... No precomputed lighting Production time saver, Memory, Consistency, Dynamic content Übershaders [Mittring07]
Goals after CryEngine 2 PS3 / XBox360 GPU, CPU, memory Improve streaming Improve multithreading Improve lighting More predictable performance Tackle the shader combination issue
What is the shader combination issue? Übershader is one shader with many features  (e.g. 0..4 lights, light types, CM reflection, fog, detail texture, normalmap, specular texture) Compiling all possible permutations is a memory, production and performance problem Usual solutions: dynamic branching / separating into multiple passes / reducing combinations and accepting less functionality and less performance Asynchronous shader compiling Distributed Job System to compile the shader cache
Why Deferred Rendering? Rendering is a multi-dimensional query: View  x  Geometry  x  Material  x  Light Classic Forward Rendering:  for each light render geometry from scene query with shader Classic Deferred Rendering: render geometry from scene query outputting GBuffer render each light from scene query and shade with GBuffer => Decouples geometric complexity from lighting and shading   => Helps on shader combination issue and predictable performance
GBuffer in CryEngine 2 Minimal GBuffer (depth) Slower Early Z pass when outputting linear depth Formats: R16 R16G16 R32 Proved to be very useful Z
Deferred in CryEngine 2 Main use: Deferred shadows, Per pixel fog Additionally: Soft Z clipped Particles, Motion Blur, Beach/Ocean, EdgeAA, Sun Rays, SSAO, Fake lights, 2.5D TerrainAO SSAO ShadowMask
Deferred Lighting in CryEngine 3 Passes: 1) Forward GBuffer generation 2) Deferred light accumulation into texture (Phong) 3) Forward shading with light accumulation texture  =>   No deferred shading Deferred Lighting*    + Multiple light primitives are possible + even Image Based Lighting (IBL) + easy to extend Compared to Deferred Shading + Less bandwidth and memory problems (10MB EDRAM XBox360) + More flexibility on shading (besides Phong) Z (native) Normal Specular Power *  [Geldreich09] aka Light Pre-Pass Renderer  [Engel08]
Options for the light accumulation texture 6 channels: Diffuse and Specular  two  7e3 7e3 7e3, A16R16G16B16f  or  A8R8G8B8* 4 channels: Diffuse and Specular strength a single A16R16G16B16f or A8R8G8B8*  (specular approximated by diffuse*strength) The following pictures show lighting with two differently coloured lights: 6 channels (correct) 4 channels (fast) * sRGB helps to distribute more details in dark areas
Light accumulation texture in IBL The following pictures show lighting with Diffuse and Specular Cubemaps: Diffuse RGB Specular RGB High Quality (left) Diffuse RGB Specular Strength Fast Rendering (right) Difference often neglectable (depends on environment)
Storing normals in the GBuffer XYZ world space 8 bit: problematic with extreme reflections/specular 10 bit: good, but what about specular power and PS3 Solving Quantization Artefacts Detail Normalmaps, Noise, Dither XY view space (Z reconstruct) 8/10/16 bit, negate Z bit (perspective and normal mapping) [Lee09]  [Lob09]  => Problematic
Alternative: VS Normal in 2 scalars -1..1 => -1..1 Normal to GBufffer: G= normalize( N .xy)*sqrt( N .z*0.5+0.5) GBuffer to Normal: N .z=length2( G .xy)*2-1 N .xy=normalize( G .xy)*sqrt(1- N .z* N .z) + more precision where it matters (bright part) + framebuffer blending friendly + no z reconstruction issues wasted area  more ALU than WS => still, WS normals are faster
Improved SSAO (with normals) Video
Light rasterization in 2D  (Rectangle)  or 3D  (Convex Object) 2D + cheap WS position reconstruction (Interpolator+MAD) + Combining multiple lights Stencil prepass (if not fullscreen) Coarse blocks can be rejected based on z min/max 3D + Z buffer + tighter bounding object (less pixels to process) Depth bounds test (only on some HW)
Deferred Light Types 1/3: Directional light optional with cloud shadows, multiple shadowmaps Point/Projector lights optional with projector texture Procedural Caustics (before this was multi-pass, one drawcall for each object under water including terrain) Interleaved Shadowmap lookups no extra memory less bandwidth needed no limits on shadow mask channel count
Deferred Light Types 2/3: Image Based Lighting (IBL) Light Probes are the high quality solution for distant light Cubemaps allow efficient HDR lighting in real-time Diffuse CM can be computed from specular CM Mip adjusted lookup allows different specular power values Improves shading in ambient lighting condition by adding normal dependent and specular lighting Light Probes can be generated at specified level positions Deferred Lighting allows blending of localized Light Probes Looks even better with SSAO
Ambient without SSAO with hemispherical lighting
Bright ambient  SSAO Black ambient  Shadow casting light source  SSAO Grey ambient (hemispherical) Shadow casting light source  SSAO IBL ambient (Specular and Diffuse) Shadow casting light source SSAO
IBL ambient (Specular and Diffuse)  SSAO Video * brightened up for better display
Deferred Light Types 3/3: Real-time Dynamic Global Illumination Details will be presented at upcoming Siggraph 2009 by Anton Kaplanyan who developed that at Crytek Implemented and fast on XBox360, PS3 and PC No precomputation Fully dynamic (geometry, materials and lights) Unified for static and dynamic objects
black ambient (to emphasize where GI affects the image) color bleeding bump without light fully dynamic real-time Global Illumination off Global Illumination on
Global Illumination * brightened up for better display
Something missing? Transparency  => falling back to well known techniques: Per pixel global fog and fog volumes (deferred) Back to front sorted alpha transparent objects Volume texture clouds, Imposter clouds, Distance clouds Particle systems avoiding per particle sorting Anti-aliasing  => Nasty but possible EdgeAA, … ... we work on it
References [Mittring07] “Finding Next Gen CryEngine2“ Siggraph 2007, Martin Mittring http://ati.amd.com/developer/gdc/2007/mittring-finding_nextgen_cryengine2(siggraph07).pdf [Engel08] “The Light Pre-Pass Renderer“ ShaderX7, Wolfgang Engel http://diaryofagraphicsprogrammer.blogspot.com/2008/03/light-pre-pass-renderer.html [Lee09a] “Prelighting“ Mark Lee http://www.insomniacgames.com/tech/articles/0209/files/prelighting.pdf [Lee09b] “Pre-lighting in Resistance 2“ GDC 2009, Mark Lee http://www.gdconf.com/conference/Tutorial%20Handouts/200_insomniac/gdc09_insomniac_prelighting.pdf [Geldreich09] ”Deferred Lighting and Shading” GDC 2009, Rich Geldreich, Matt Pritchard, John Brooks http://archive.gdconf.com/gdc_2004/pritchard_matt.ppt [Shish05] “Deferred Shading in S.T.A.L.K.E.R.“ GPU Gems 2, Oles Shishkovtsov   http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter09.html [Lob09] S.T.A.L.K.E.R : Clear Sky – a showcase for Direct3D 10.0/1 GDC 2009,  Igor A. Lobanchikov, Holger Gruen http://www.gdconf.com/conference/Tutorial%20Handouts/100_Advanced%20Visual%20Effects%20with%20Direct3D/100_Handout%202.pdf [Valient07] “Deferred Rendering in Killzone 2“ Develop Conference 2007, Michal Valient http:// www.guerrilla-games.com / publications /dr_kz2_rsx_dev07.pdf Slides should be soon at  http://www.crytek.com/technology/presentations Special thanks to all the passionate people at Crytek
 
 

A Bit More Deferred Cry Engine3

  • 1.
    “ A bitmore Deferred” - CryEngine 3 Martin Mittring – Lead Graphics Programmer Triangle Game Conference 2009
  • 2.
    Crytek Main office:Germany Frankfurt More studios: Kiev, Budapest, Sofia, Nottingham, Seoul English as company language 30+ nationalities CryEngine 1: PC only (Far Cry …) CryEngine 2: PC only (Crysis …) CryEngine 3: PC, XBox360, PS3 (announced GDC09)
  • 3.
    32/64 Bit, WinXP/Vista,DX9/10, Multi CPU/GPU WYSIWYP ResourceCompiler Source asset -> Platform specific Direct Light: Shadow mapping Indirect Light (AO): SSAO / RAM / ... No precomputed lighting Production time saver, Memory, Consistency, Dynamic content Übershaders [Mittring07]
  • 4.
    Goals after CryEngine2 PS3 / XBox360 GPU, CPU, memory Improve streaming Improve multithreading Improve lighting More predictable performance Tackle the shader combination issue
  • 5.
    What is theshader combination issue? Übershader is one shader with many features (e.g. 0..4 lights, light types, CM reflection, fog, detail texture, normalmap, specular texture) Compiling all possible permutations is a memory, production and performance problem Usual solutions: dynamic branching / separating into multiple passes / reducing combinations and accepting less functionality and less performance Asynchronous shader compiling Distributed Job System to compile the shader cache
  • 6.
    Why Deferred Rendering?Rendering is a multi-dimensional query: View x Geometry x Material x Light Classic Forward Rendering: for each light render geometry from scene query with shader Classic Deferred Rendering: render geometry from scene query outputting GBuffer render each light from scene query and shade with GBuffer => Decouples geometric complexity from lighting and shading => Helps on shader combination issue and predictable performance
  • 7.
    GBuffer in CryEngine2 Minimal GBuffer (depth) Slower Early Z pass when outputting linear depth Formats: R16 R16G16 R32 Proved to be very useful Z
  • 8.
    Deferred in CryEngine2 Main use: Deferred shadows, Per pixel fog Additionally: Soft Z clipped Particles, Motion Blur, Beach/Ocean, EdgeAA, Sun Rays, SSAO, Fake lights, 2.5D TerrainAO SSAO ShadowMask
  • 9.
    Deferred Lighting inCryEngine 3 Passes: 1) Forward GBuffer generation 2) Deferred light accumulation into texture (Phong) 3) Forward shading with light accumulation texture => No deferred shading Deferred Lighting* + Multiple light primitives are possible + even Image Based Lighting (IBL) + easy to extend Compared to Deferred Shading + Less bandwidth and memory problems (10MB EDRAM XBox360) + More flexibility on shading (besides Phong) Z (native) Normal Specular Power * [Geldreich09] aka Light Pre-Pass Renderer [Engel08]
  • 10.
    Options for thelight accumulation texture 6 channels: Diffuse and Specular two 7e3 7e3 7e3, A16R16G16B16f or A8R8G8B8* 4 channels: Diffuse and Specular strength a single A16R16G16B16f or A8R8G8B8* (specular approximated by diffuse*strength) The following pictures show lighting with two differently coloured lights: 6 channels (correct) 4 channels (fast) * sRGB helps to distribute more details in dark areas
  • 11.
    Light accumulation texturein IBL The following pictures show lighting with Diffuse and Specular Cubemaps: Diffuse RGB Specular RGB High Quality (left) Diffuse RGB Specular Strength Fast Rendering (right) Difference often neglectable (depends on environment)
  • 12.
    Storing normals inthe GBuffer XYZ world space 8 bit: problematic with extreme reflections/specular 10 bit: good, but what about specular power and PS3 Solving Quantization Artefacts Detail Normalmaps, Noise, Dither XY view space (Z reconstruct) 8/10/16 bit, negate Z bit (perspective and normal mapping) [Lee09] [Lob09] => Problematic
  • 13.
    Alternative: VS Normalin 2 scalars -1..1 => -1..1 Normal to GBufffer: G= normalize( N .xy)*sqrt( N .z*0.5+0.5) GBuffer to Normal: N .z=length2( G .xy)*2-1 N .xy=normalize( G .xy)*sqrt(1- N .z* N .z) + more precision where it matters (bright part) + framebuffer blending friendly + no z reconstruction issues wasted area more ALU than WS => still, WS normals are faster
  • 14.
    Improved SSAO (withnormals) Video
  • 15.
    Light rasterization in2D (Rectangle) or 3D (Convex Object) 2D + cheap WS position reconstruction (Interpolator+MAD) + Combining multiple lights Stencil prepass (if not fullscreen) Coarse blocks can be rejected based on z min/max 3D + Z buffer + tighter bounding object (less pixels to process) Depth bounds test (only on some HW)
  • 16.
    Deferred Light Types1/3: Directional light optional with cloud shadows, multiple shadowmaps Point/Projector lights optional with projector texture Procedural Caustics (before this was multi-pass, one drawcall for each object under water including terrain) Interleaved Shadowmap lookups no extra memory less bandwidth needed no limits on shadow mask channel count
  • 17.
    Deferred Light Types2/3: Image Based Lighting (IBL) Light Probes are the high quality solution for distant light Cubemaps allow efficient HDR lighting in real-time Diffuse CM can be computed from specular CM Mip adjusted lookup allows different specular power values Improves shading in ambient lighting condition by adding normal dependent and specular lighting Light Probes can be generated at specified level positions Deferred Lighting allows blending of localized Light Probes Looks even better with SSAO
  • 18.
    Ambient without SSAOwith hemispherical lighting
  • 19.
    Bright ambient SSAO Black ambient Shadow casting light source SSAO Grey ambient (hemispherical) Shadow casting light source SSAO IBL ambient (Specular and Diffuse) Shadow casting light source SSAO
  • 20.
    IBL ambient (Specularand Diffuse) SSAO Video * brightened up for better display
  • 21.
    Deferred Light Types3/3: Real-time Dynamic Global Illumination Details will be presented at upcoming Siggraph 2009 by Anton Kaplanyan who developed that at Crytek Implemented and fast on XBox360, PS3 and PC No precomputation Fully dynamic (geometry, materials and lights) Unified for static and dynamic objects
  • 22.
    black ambient (toemphasize where GI affects the image) color bleeding bump without light fully dynamic real-time Global Illumination off Global Illumination on
  • 23.
    Global Illumination *brightened up for better display
  • 24.
    Something missing? Transparency => falling back to well known techniques: Per pixel global fog and fog volumes (deferred) Back to front sorted alpha transparent objects Volume texture clouds, Imposter clouds, Distance clouds Particle systems avoiding per particle sorting Anti-aliasing => Nasty but possible EdgeAA, … ... we work on it
  • 25.
    References [Mittring07] “FindingNext Gen CryEngine2“ Siggraph 2007, Martin Mittring http://ati.amd.com/developer/gdc/2007/mittring-finding_nextgen_cryengine2(siggraph07).pdf [Engel08] “The Light Pre-Pass Renderer“ ShaderX7, Wolfgang Engel http://diaryofagraphicsprogrammer.blogspot.com/2008/03/light-pre-pass-renderer.html [Lee09a] “Prelighting“ Mark Lee http://www.insomniacgames.com/tech/articles/0209/files/prelighting.pdf [Lee09b] “Pre-lighting in Resistance 2“ GDC 2009, Mark Lee http://www.gdconf.com/conference/Tutorial%20Handouts/200_insomniac/gdc09_insomniac_prelighting.pdf [Geldreich09] ”Deferred Lighting and Shading” GDC 2009, Rich Geldreich, Matt Pritchard, John Brooks http://archive.gdconf.com/gdc_2004/pritchard_matt.ppt [Shish05] “Deferred Shading in S.T.A.L.K.E.R.“ GPU Gems 2, Oles Shishkovtsov http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter09.html [Lob09] S.T.A.L.K.E.R : Clear Sky – a showcase for Direct3D 10.0/1 GDC 2009, Igor A. Lobanchikov, Holger Gruen http://www.gdconf.com/conference/Tutorial%20Handouts/100_Advanced%20Visual%20Effects%20with%20Direct3D/100_Handout%202.pdf [Valient07] “Deferred Rendering in Killzone 2“ Develop Conference 2007, Michal Valient http:// www.guerrilla-games.com / publications /dr_kz2_rsx_dev07.pdf Slides should be soon at http://www.crytek.com/technology/presentations Special thanks to all the passionate people at Crytek
  • 26.
  • 27.

Editor's Notes

  • #2 “ A bit more Deferred” – Deferred in the way of using the so called “deferred rendering” techniques