Draft Work in progress. Wording, structure, and claims may still change. Feedback welcome. ← Back to roadmap

04 foundation

The Frame Composes Itself

The lighting pass and the render graph. Reading from the G-Buffer, composing passes as pure data, and an immutable representation of the frame.

The Frame Composes Itself

A render graph does not execute passes. It describes which passes could exist — and a pure function figures out the rest.


Picking Up Where the Light Went Out

The last post ended in the dark. We built a G-Buffer — three textures holding everything a lighting algorithm needs: world position, surface normal, and albedo. The geometry pass writes them. Every visible pixel has a location, a direction, and a color. The data is complete.

But the screen shows nothing. The G-Buffer is a return value with no caller. No lighting pass reads it, no tonemapper maps it to the display, and no system connects the passes that would make the frame work. We have decomposition without composition.

This post fixes that. We will build the lighting pass, add a tonemap pass, and face the real challenge: composing multiple passes into a correct, synchronized execution order. The solution turns out to be more interesting than any individual pass — because it is the render graph, and it is built entirely from pure, immutable data.


Reading the G-Buffer: A Fullscreen Shader With Three Inputs

The lighting pass does not draw geometry. There are no vertices, no index buffers, no mesh data. Instead, it draws a single fullscreen triangle — a shader trick where three vertices cover the entire screen — and for every pixel, it reads the G-Buffer to reconstruct what the geometry pass wrote.

The fragment shader samples three textures:

// Lighting fragment shader (pseudocode)
input:  uv (screen coordinates)
uniform: gPosition (sampler2D), gNormal (sampler2D), gAlbedo (sampler2D)
uniform: cameraPos, lightPos, lightColor, lightIntensity

main:
    position = sample(gPosition, uv).xyz
    normal   = sample(gNormal, uv).xyz
    albedo   = sample(gAlbedo, uv).rgb
    
    lightDir = normalize(lightPos - position)
    viewDir  = normalize(cameraPos - position)
    halfDir  = normalize(lightDir + viewDir)
    
    // Lambertian diffuse
    diffuse = max(dot(normal, lightDir), 0.0)
    
    // Blinn-Phong specular
    specular = pow(max(dot(normal, halfDir), 0.0), 32.0)
    
    // Attenuation
    dist  = length(lightPos - position)
    atten = lightIntensity / (dist * dist)
    
    hdrColor = (albedo * diffuse + specular) * lightColor * atten
    output = vec4(hdrColor, 1.0)

Notice what this shader does not know. It has no idea how many triangles were drawn, what mesh they came from, or whether the geometry was static or animated. It operates entirely on the G-Buffer’s data — positions, normals, albedo values arranged in a 2D grid. If you replaced the geometry pass with a ray-traced G-Buffer, or a neural network that predicted surface properties from a photograph, this lighting shader would work unchanged. That is the power of the interface boundary: the G-Buffer is a contract between passes, and either side can be rewritten independently.

The light parameters arrive as push constants — a small block of data pushed to the shader before each draw call. Camera position, light position, light color and intensity. The lighting pass is parameterized: you can move the light, change its color, or adjust its intensity without recompiling anything. The shader, the geometry pass, and the light configuration are three independent concerns.

The output is an HDR image — high dynamic range. The color values can exceed 1.0 because real light intensities do. A bright specular highlight might produce values of 5.0 or higher. That’s physically meaningful, but a monitor can’t display it directly. We need one more transformation.


From HDR to Screen: The Tonemap Pass

The tonemap pass is the frame’s final transformation: it takes the unbounded HDR values from the lighting pass and maps them into the [0, 1] range that a display can show.

// Tonemap fragment shader (pseudocode)
input:  uv
uniform: hdrInput (sampler2D)

main:
    hdr = sample(hdrInput, uv).rgb
    
    // Reinhard tonemapping — compress HDR to [0,1]
    mapped = hdr / (hdr + 1.0)
    
    // Gamma correction — linear to sRGB
    corrected = pow(mapped, 1.0 / 2.2)
    
    output = vec4(corrected, 1.0)

Reinhard tonemapping is not the best operator — it desaturates highlights and has a soft rolloff that some find too flat. ACES, Uncharted 2’s filmic curve, and AgX are all better for different aesthetic goals. But Reinhard is one line of code and it works. We can swap it later.

And that “swap it later” is exactly why this is a separate pass. Tonemapping is a different concern from lighting. You might want to change the tonemap operator without touching the lighting code. You might want to insert a bloom pass between lighting and tonemapping — read the HDR image, extract bright pixels, blur them, add them back, then tonemap the combined result. If tonemapping were baked into the lighting shader, every one of these changes would mean editing lighting code. As a separate pass, the change is additive: add a pass, adjust the connections, done.

The frame is now three passes composed:

Render graph — three passes composing a frame through resource dependencies

Three functions. Each one transforms its inputs into its outputs. None of them knows the others exist. Together, they produce a fully lit, tonemapped image from raw scene geometry. The decomposition from the last post is paying off.

But we have a problem.


Three Passes, Three Problems

The passes work individually. Composition breaks them.

In a single-pass forward pipeline, there is no composition — one shader does everything. The moment you split the frame into multiple passes sharing resources, three problems appear that did not exist before:

Order. The lighting pass must run after the G-Buffer pass. The tonemap pass must run after the lighting pass. With three passes, the order is obvious. With ten passes — some independent, some dependent, some sharing resources in complex patterns — the correct order is no longer trivial. Who decides it?

Synchronization. The G-Buffer textures must be fully written before the lighting pass reads them. On a GPU, execution is asynchronous and pipelined — without explicit pipeline barriers, the lighting shader might sample textures that the geometry pass is still writing to. The result: flickering, corruption, or undefined behavior that only appears under load. Who inserts the barriers?

Layout transitions. A Vulkan image used as a color attachment (being written to) lives in a different memory layout than the same image used as a shader input (being read from). The layout must change between passes, and the transition must happen at the right time. Who tracks which layout each image is in and when it needs to change?

You could solve all three by hand. Hard-code the pass order. Manually insert barriers between each pair of passes. Manually track image layouts and transitions. I did this in the first version of the engine, and it worked.

It does not scale. The moment you add a fourth pass — SSAO, bloom, screen-space reflections — you are editing barrier code, reordering pass calls, and hoping you got the layout transitions right. The complexity grows as O(passes x resources), and every bit of it is mechanical. There is no creative decision in choosing when to transition an image from ColorAttachmentWrite to ShaderRead. It is bookkeeping. And bookkeeping is exactly the kind of work that a machine should do.

What if the passes just declared what they read and write — and something else figured out the rest?


The RenderGraph: Passes as Pure Declarations

The render graph starts with a vocabulary for describing passes as data. No execution. No GPU state. Just descriptions.

ResourceName: a string identifying a logical resource
    "GBuffer.Position", "GBuffer.Normal", "GBuffer.Albedo", "HDR", "Backbuffer"

ResourceUsage: how a pass uses a resource
    ColorAttachmentWrite    — the pass writes to this image as a render target
    ShaderRead              — the pass reads this image as a texture
    DepthStencilWrite       — the pass writes to this image as a depth buffer
    Present                 — this image will be presented to the display

PassInput:   { resource: ResourceName, usage: ResourceUsage }
PassOutput:  { resource: ResourceName, usage: ResourceUsage }

RenderPassDeclaration: { name: string, inputs: PassInput[], outputs: PassOutput[] }

With this vocabulary, the three passes become:

GBuffer:
    inputs:  []
    outputs: [ Position(Write), Normal(Write), Albedo(Write) ]

Lighting:
    inputs:  [ Position(Read), Normal(Read), Albedo(Read) ]
    outputs: [ HDR(Write) ]

Tonemap:
    inputs:  [ HDR(Read) ]
    outputs: [ Backbuffer(Present) ]

Read those declarations. No Vulkan types. No pipeline objects. No command buffers. Just: “I write these resources. I read those resources.” Each declaration is an immutable record — you can create it, inspect it, serialize it, compare it, test it. It describes intent, not action.

Look at how the dependencies emerge. The GBuffer pass writes Position. The Lighting pass reads Position. That’s a data dependency — Lighting depends on GBuffer. No explicit “depends on” annotation is needed. The dependency is implicit in the resource names. The compiler infers the execution order from the data flow, the same way a functional language infers evaluation order from expression dependencies.

And look at what the declarations are. Each pass is a function signature written as data. Inputs are arguments. Outputs are return values. The pass name is the function name. The render graph is a set of function signatures that compose through shared resource names.

This is what post 1 meant when it said “rendering is already a data transformation.” The pass declarations make that claim concrete. The frame is not a sequence of imperative commands — it is a set of pure declarations that describe what each pass needs and produces. The execution order, the synchronization, and the layout transitions are all derivable from these declarations.

All we need is something to derive them.


The Compiler: A Pure Function Over Pure Data

The render graph compiler takes an immutable array of pass declarations and returns an immutable array of resolved passes. It is a pure function: same input, same output, every time, with no side effects and no GPU involvement.

It does two things.

Step 1: Topological sort. Build a dependency graph from the resource names. For each pass, find which other passes write the resources it reads. Count incoming dependencies (in-degree) for each pass. Start with passes that have no unresolved inputs — the GBuffer pass, in our case, since it reads nothing. Process them, remove their outputs from the dependency graph, and check which passes now have zero in-degree. Repeat until all passes are processed. If passes remain with unresolved dependencies, the graph has a cycle — throw an error.

// Topological sort (pseudocode)
function topoSort(passes):
    writerOf = {}           // resource → which pass writes it
    inDegree = {}           // pass → number of unresolved inputs
    
    for each pass in passes:
        for each output in pass.outputs:
            writerOf[output.resource] = pass
    
    for each pass in passes:
        inDegree[pass] = 0
        for each input in pass.inputs:
            if input.resource in writerOf:
                inDegree[pass] += 1
    
    queue = [passes where inDegree == 0]
    sorted = []
    
    while queue is not empty:
        pass = queue.dequeue()
        sorted.add(pass)
        
        for each output in pass.outputs:
            for each dependent that reads output.resource:
                inDegree[dependent] -= 1
                if inDegree[dependent] == 0:
                    queue.enqueue(dependent)
    
    if sorted.length != passes.length:
        throw "Cycle detected in render graph"
    
    return sorted

Step 2: Barrier insertion. Walk the sorted passes in order. For each resource, track its last usage. When a pass uses a resource differently than the previous pass — the GBuffer wrote Position as ColorAttachmentWrite, now Lighting reads it as ShaderRead — emit a barrier descriptor that records the transition.

// Barrier insertion (pseudocode)
function insertBarriers(sortedPasses):
    lastUsage = {}      // resource → most recent ResourceUsage
    resolved = []
    
    for each pass in sortedPasses:
        barriers = []
        
        for each input in pass.inputs:
            prev = lastUsage[input.resource]
            if prev != input.usage:
                barriers.add(Barrier(input.resource, from: prev, to: input.usage))
            lastUsage[input.resource] = input.usage
        
        for each output in pass.outputs:
            lastUsage[output.resource] = output.usage
        
        resolved.add(ResolvedPass(pass, barriers))
    
    return resolved

That’s it. Two phases, roughly fifty lines of real code, and together they solve the ordering, synchronization, and layout transition problems for any number of passes. Add a pass, rerun the compiler, get the new order and barriers. Remove a pass, same thing. The compiler does not care how many passes exist — it just follows the data.

The testability payoff is the part I find most satisfying. The compiler is unit tested with test cases that cover single passes, two-pass dependencies, reversed declaration order (to prove the compiler sorts correctly regardless of input order), independent passes, cycle detection, and diamond dependencies. Every test runs without a GPU, without Vulkan, without a window. Just data in, data out, assertions.

// Test: two-pass dependency (pseudocode)
test "lighting depends on gbuffer":
    passes = [
        Declaration("Lighting", inputs: [("Color", ShaderRead)], outputs: []),
        Declaration("GBuffer",  inputs: [],                      outputs: [("Color", Write)])
    ]
    
    resolved = compile(passes)
    
    assert resolved[0].name == "GBuffer"      // GBuffer runs first
    assert resolved[1].name == "Lighting"     // Lighting runs second
    assert resolved[1].barriers contains      // Barrier between them
        Barrier("Color", from: Write, to: ShaderRead)

Notice that the declarations are passed in the wrong order — Lighting before GBuffer. The compiler sorts them correctly anyway. The input order is irrelevant. Only the data dependencies matter.

This is the functional core. It knows nothing about Vulkan, nothing about GPU memory layouts, nothing about pipeline stages. It could compile a render graph for DirectX, Metal, or a software rasterizer. The GPU is irrelevant here. Only the data matters.


The Executor: Where the Side Effects Live

The compiler produces resolved passes — sorted declarations with barrier descriptors attached. To actually render a frame, someone has to turn those descriptors into real GPU commands. That someone is the executor.

The executor takes three inputs: the resolved passes from the compiler, a map from resource names to actual Vulkan images, and a map from pass names to recorder functions (the closures that record draw calls into a command buffer). It walks the resolved passes in order. For each pass, it translates the barrier descriptors into Vulkan pipeline barriers — mapping ColorAttachmentWrite to the correct stage flags, access masks, and image layouts — and then calls the pass’s recorder function to record its draw calls.

// Executor (pseudocode)
function execute(resolvedPasses, resourceImages, recorders, cmd):
    for each resolved in resolvedPasses:
        for each barrier in resolved.barriers:
            image = resourceImages[barrier.resource]
            (srcStage, srcAccess, oldLayout) = mapUsage(barrier.from)
            (dstStage, dstAccess, newLayout) = mapUsage(barrier.to)
            
            insertImageBarrier(cmd, image,
                srcStage, srcAccess, oldLayout,
                dstStage, dstAccess, newLayout)
        
        recorders[resolved.name](cmd)

This is the imperative shell. Every Vulkan API call lives here. The barrier translation is mechanical — a lookup table from ResourceUsage to Vulkan constants. The recorder calls are the only place where draw commands, pipeline binds, and descriptor set binds happen. Everything impure is isolated in this function.

The boundary is explicit. Above the executor — declarations, compiler, resolved passes — everything is pure and testable. Inside the executor — Vulkan barriers, command buffer recording, queue submission — everything is impure and contained. If you drew a line through the architecture, the executor is the line.

The pure/impure boundary — data descriptions vs Vulkan execution

The pattern from post 1 — the XR lazy-follow, the pure pose function with a thin MonoBehaviour shell — is the same pattern, scaled to an entire rendering pipeline. Describe the work as data. Let a pure function organize it. Push the side effects to the boundary.


The Frame, Composed

Let’s zoom out and look at a full frame:

  1. Declare passes — pure immutable records describing inputs and outputs
  2. Compile the graph — pure function that sorts passes and inserts barriers
  3. Begin frame — acquire a swapchain image, wait for the previous frame’s fence
  4. Execute resolved passes — the imperative boundary: insert Vulkan barriers, call recorders
  5. End frame — submit the command buffer, present to the display

Steps 1 and 2 don’t change frame to frame unless passes are added or removed. The compiler runs once at startup (or when the graph changes). The executor runs every frame, but its work is mechanical — it follows the compiler’s output. The architecture’s cost is near zero.

What we have is a deferred rendering pipeline where the passes do not know about each other, the execution order is automatic, the synchronization barriers are automatic, and the entire frame description — every pass, every dependency, every resource transition — is testable without a GPU.

The G-Buffer is a function return value. The lighting pass is a function call. The tonemap pass is another function call. The RenderGraph is the composition operator that connects them. And the compiler is the proof that the composition is correct.

The next step tests this architecture for real. We will add a pass that the existing system has never seen — screen-space ambient occlusion — and the only thing that should change is one new declaration and one new recorder. No existing pass modified. No existing barrier edited. No existing order hard-coded.

If the architecture holds, the decomposition was worth it. That’s the next post.