Draft Work in progress. Wording, structure, and claims may still change. Feedback welcome. ← Back to roadmap

field notes

Choosing a Vulkan Allocator

Field note. Decision record for adopting VMA. Three copies of FindMemoryType, paired tuple lifetimes, ImGui destroy-then-reallocate on resize.

Choosing a Vulkan Allocator

Technical note — why the engine is moving to VMA before SSAO.

Status: Accepted. Migration follows this note.


Context

SSAO is next on the roadmap. It needs roughly four new render targets — linearized depth, an AO target, a blur ping-pong pair — and I opened VulkanImage.cs to start wiring them up. Before writing anything new, I noticed I was about to produce the third copy of this function:

uint FindMemoryType(uint typeFilter, MemoryPropertyFlags properties):
    memProps = queryPhysicalDeviceMemoryProperties()
    for i in 0..memProps.MemoryTypeCount:
        if (typeFilter has bit i)
           and (memProps.MemoryTypes[i] has all `properties`):
            return i
    throw "no suitable memory type"

It already exists in VulkanBuffer.cs. It already exists in VulkanImage.cs. The ImGui backend has its own copy too. Adding it to SSAO would make four. That’s the visible smell. The deeper shape under it is worth describing, because it sets up why VMA is on the table at all.

The current allocation path

Every buffer and every image in the engine goes through the same five calls:

createBuffer(usage, data):
    buffer  = vkCreateBuffer(size, usage)
    memReqs = vkGetBufferMemoryRequirements(buffer)
    memory  = vkAllocateMemory(memReqs.size, FindMemoryType(...))
    vkBindBufferMemory(buffer, memory)
    map → memcpy → unmap
    return (buffer, memory)

The return type is a tuple, (buffer, memory), because the buffer and its backing memory are two separate Vulkan handles with two separate lifetimes. Nothing in the type system enforces that they travel together — the discipline is entirely in the caller. Destroy mirrors it: two calls out, in the right order. Images add a third handle (the view), so their tuple is a three-tuple.

None of this is hard. It’s relentlessly mechanical, and the mechanical part is duplicated across every resource type.

The ImGui resize path

The place where raw allocation actually hurts today is the ImGui backend. It keeps per-frame vertex and index buffers that grow on demand, and the resize path looks like this:

if vtxSize > currentVertexBufferSize[frame]:
    vkDestroyBuffer(vertexBuffer[frame])
    vkFreeMemory(vertexMemory[frame])
    (vertexBuffer[frame], vertexMemory[frame])
        = createBuffer(VertexBuffer, vtxSize)
    currentVertexBufferSize[frame] = vtxSize

Every time the frame’s geometry outgrows the current buffer, the entire backing DeviceMemory block is destroyed and a fresh one is allocated. vkAllocateMemory is not a cheap call. Drivers cap the total number of live allocations (4096 on many implementations, lower on mobile), and we’re spending them inside the frame loop.

This isn’t broken. ImGui’s geometry is tiny and resizes are rare. But it is a preview — it’s the shape of code I’d keep writing as the engine grows. The moment I start caching transient render targets or building a particle system, I’m back here writing destroy-then- reallocate against raw vkAllocateMemory.


Options

Three directions considered.

1. Stay raw, accept the duplication. Copy FindMemoryType one more time for SSAO, keep the tuple lifetimes, keep the ImGui resize path. Zero dependencies, zero new abstractions. Costs a linear amount of boilerplate per new resource type and leaves the ImGui smell in place.

2. Hand-rolled sub-allocator. Write a small pool: fewer vkAllocateMemory calls, suballocate from larger blocks, centralize memory type selection. Dismissed immediately — this is reinventing VMA poorly. The engine is a learning project about rendering, not a research project on allocator design.

3. VMA (Vulkan Memory Allocator). AMD’s GPUOpen allocator, via Silk.NET.Vulkan.Extensions.VMA on the C# side. Three concrete changes:

  • Sub-allocation. A small number of large vkAllocateMemory calls feed many resources. The 4096-allocation ceiling stops mattering. ImGui’s resize path stops hitting the driver directly.
  • Memory type selection as intent. You describe what the allocation is for — GPU-only, CPU-to-GPU upload, readback — and VMA picks the type. FindMemoryType disappears, along with all three copies.
  • Coupled lifetime. vmaCreateBuffer returns a buffer and a single opaque VmaAllocation. No separate DeviceMemory to track. Destroy is vmaDestroyBuffer(buffer, allocation). The tuple problem goes away at the type level.

The buffer-create path collapses to:

createBuffer(usage, data):
    (buffer, allocation) = vmaCreateBuffer(
        bufferInfo:     { size, usage },
        allocationInfo: { usage: AutoPreferHost }
    )
    vmaCopyMemoryToAllocation(allocation, data)
    return (buffer, allocation)

Images get the symmetric treatment via vmaCreateImage. The image view is still a separate object — that’s Vulkan, not memory.


Decision

Adopt VMA before starting SSAO.

The weights:

  • For: removes three duplicated helpers, collapses tuple lifetimes to a single handle, fixes the ImGui allocation-per-resize smell, and makes SSAO’s four new render targets cost exactly the boilerplate they should cost.
  • Against: a new dependency; a layer of abstraction the engine didn’t have before; version lag between the Silk.NET VMA extension and native VMA.
  • The sharper against — techniques where VMA actively fights you. These are real and not hypothetical forever:
    • Transient aliasing in a render graph — when the G-Buffer depth and a later blur target could physically share memory because their lifetimes don’t overlap. VMA supports aliasing, but it’s not the happy path; you end up managing custom pools and lifetimes yourself.
    • Tile-based deferred on mobile — the optimal G-Buffer layout on tile GPUs is “lazily allocated, never leaves on-chip memory,” a VK_EXT_subpass_shading-adjacent pattern that VMA has opinions about.
    • Sparse / residency-managed resources — not on the roadmap, but if they ever were, VMA would sit in the way rather than help.

None of these are on the critical path for SSAO, HBAO, or GTAO. They might matter a year from now when the render graph becomes real or the mobile build starts caring about tile memory. The cost structure is asymmetric: the friction staying raw is immediate and certain; the friction VMA adds is deferred and uncertain. I’ll pay the certain cost.


Consequences

Migration scope. Three files: VulkanBuffer.cs, VulkanImage.cs, VulkanImGui.cs. Thirteen allocation sites. Replace vkAllocateMemory / FindMemoryType with vmaCreateBuffer / vmaCreateImage. Collapse (buffer, memory) and (image, memory, view) tuples into (buffer, allocation) and (image, allocation, view).

Preserve the raw path. Tag the last commit before the migration. The raw allocator code doesn’t need to live on the main branch forever, but it should stay easy to find — anyone (including future me) can git log back to see exactly what raw looked like and what it cost.

Do not touch Handles.cs. The opaque BufferHandle / ImageHandle pool is defined but unused. Activating it is a separate refactor. Bundling it with the VMA migration is exactly the scope creep this note exists to avoid.

Escape hatch. VMA doesn’t forbid raw vkAllocateMemory. If a future paper implementation needs raw for transient aliasing, tile memory, or anything exotic, the engine can do raw for that specific resource alongside VMA for everything else. Unwinding is contained: blast radius is still three files.

Functional angle

Worth naming because it’s the reason this decision is smaller than it looks. VMA lives entirely on one side of the functional core / imperative shell line. vkAllocateMemory, vmaCreateBuffer, the physical device queries — all of that is already in RenderLab.Gpu, the imperative shell. The functional core, which describes frames as data (render graph nodes, pass descriptors, G-Buffer layouts), doesn’t know or care which allocator sits underneath. This is a shell-layer swap, not an architectural change. That’s the test for whether an abstraction is in the right layer: can you swap it without the rest of the code noticing? Here, yes.


Follow-ups (not in this note)

  • The VMA migration itself — separate code-pillar task.
  • Revisit when transient aliasing in the render graph becomes a real concern.
  • Revisit when the mobile build starts caring about tile memory bandwidth.
  • Handles.cs pool activation — unblocked by this decision but not caused by it.