field notes
Choosing a Vulkan Allocator
Field note. Decision record for adopting VMA. Three copies of FindMemoryType, paired tuple lifetimes, ImGui destroy-then-reallocate on resize.
Choosing a Vulkan Allocator
Technical note — why the engine is moving to VMA before SSAO.
Status: Accepted. Migration follows this note.
Context
SSAO is next on the roadmap. It needs roughly four new render targets —
linearized depth, an AO target, a blur ping-pong pair — and I opened
VulkanImage.cs to start wiring them up. Before writing anything new,
I noticed I was about to produce the third copy of this function:
uint FindMemoryType(uint typeFilter, MemoryPropertyFlags properties):
memProps = queryPhysicalDeviceMemoryProperties()
for i in 0..memProps.MemoryTypeCount:
if (typeFilter has bit i)
and (memProps.MemoryTypes[i] has all `properties`):
return i
throw "no suitable memory type"
It already exists in VulkanBuffer.cs. It already exists in
VulkanImage.cs. The ImGui backend has its own copy too. Adding it
to SSAO would make four. That’s the visible smell. The deeper shape
under it is worth describing, because it sets up why VMA is on the
table at all.
The current allocation path
Every buffer and every image in the engine goes through the same five calls:
createBuffer(usage, data):
buffer = vkCreateBuffer(size, usage)
memReqs = vkGetBufferMemoryRequirements(buffer)
memory = vkAllocateMemory(memReqs.size, FindMemoryType(...))
vkBindBufferMemory(buffer, memory)
map → memcpy → unmap
return (buffer, memory)
The return type is a tuple, (buffer, memory), because the buffer
and its backing memory are two separate Vulkan handles with two
separate lifetimes. Nothing in the type system enforces that they
travel together — the discipline is entirely in the caller. Destroy
mirrors it: two calls out, in the right order. Images add a third
handle (the view), so their tuple is a three-tuple.
None of this is hard. It’s relentlessly mechanical, and the mechanical part is duplicated across every resource type.
The ImGui resize path
The place where raw allocation actually hurts today is the ImGui backend. It keeps per-frame vertex and index buffers that grow on demand, and the resize path looks like this:
if vtxSize > currentVertexBufferSize[frame]:
vkDestroyBuffer(vertexBuffer[frame])
vkFreeMemory(vertexMemory[frame])
(vertexBuffer[frame], vertexMemory[frame])
= createBuffer(VertexBuffer, vtxSize)
currentVertexBufferSize[frame] = vtxSize
Every time the frame’s geometry outgrows the current buffer, the
entire backing DeviceMemory block is destroyed and a fresh one is
allocated. vkAllocateMemory is not a cheap call. Drivers cap the
total number of live allocations (4096 on many implementations,
lower on mobile), and we’re spending them inside the frame loop.
This isn’t broken. ImGui’s geometry is tiny and resizes are rare.
But it is a preview — it’s the shape of code I’d keep writing as the
engine grows. The moment I start caching transient render targets or
building a particle system, I’m back here writing destroy-then-
reallocate against raw vkAllocateMemory.
Options
Three directions considered.
1. Stay raw, accept the duplication. Copy FindMemoryType one
more time for SSAO, keep the tuple lifetimes, keep the ImGui resize
path. Zero dependencies, zero new abstractions. Costs a linear
amount of boilerplate per new resource type and leaves the ImGui
smell in place.
2. Hand-rolled sub-allocator. Write a small pool: fewer
vkAllocateMemory calls, suballocate from larger blocks, centralize
memory type selection. Dismissed immediately — this is reinventing
VMA poorly. The engine is a learning project about rendering, not a
research project on allocator design.
3. VMA (Vulkan Memory Allocator). AMD’s GPUOpen allocator, via
Silk.NET.Vulkan.Extensions.VMA on the C# side. Three concrete
changes:
- Sub-allocation. A small number of large
vkAllocateMemorycalls feed many resources. The 4096-allocation ceiling stops mattering. ImGui’s resize path stops hitting the driver directly. - Memory type selection as intent. You describe what the
allocation is for — GPU-only, CPU-to-GPU upload, readback — and
VMA picks the type.
FindMemoryTypedisappears, along with all three copies. - Coupled lifetime.
vmaCreateBufferreturns a buffer and a single opaqueVmaAllocation. No separateDeviceMemoryto track. Destroy isvmaDestroyBuffer(buffer, allocation). The tuple problem goes away at the type level.
The buffer-create path collapses to:
createBuffer(usage, data):
(buffer, allocation) = vmaCreateBuffer(
bufferInfo: { size, usage },
allocationInfo: { usage: AutoPreferHost }
)
vmaCopyMemoryToAllocation(allocation, data)
return (buffer, allocation)
Images get the symmetric treatment via vmaCreateImage. The image
view is still a separate object — that’s Vulkan, not memory.
Decision
Adopt VMA before starting SSAO.
The weights:
- For: removes three duplicated helpers, collapses tuple lifetimes to a single handle, fixes the ImGui allocation-per-resize smell, and makes SSAO’s four new render targets cost exactly the boilerplate they should cost.
- Against: a new dependency; a layer of abstraction the engine didn’t have before; version lag between the Silk.NET VMA extension and native VMA.
- The sharper against — techniques where VMA actively fights you.
These are real and not hypothetical forever:
- Transient aliasing in a render graph — when the G-Buffer depth and a later blur target could physically share memory because their lifetimes don’t overlap. VMA supports aliasing, but it’s not the happy path; you end up managing custom pools and lifetimes yourself.
- Tile-based deferred on mobile — the optimal G-Buffer layout
on tile GPUs is “lazily allocated, never leaves on-chip
memory,” a
VK_EXT_subpass_shading-adjacent pattern that VMA has opinions about. - Sparse / residency-managed resources — not on the roadmap, but if they ever were, VMA would sit in the way rather than help.
None of these are on the critical path for SSAO, HBAO, or GTAO. They might matter a year from now when the render graph becomes real or the mobile build starts caring about tile memory. The cost structure is asymmetric: the friction staying raw is immediate and certain; the friction VMA adds is deferred and uncertain. I’ll pay the certain cost.
Consequences
Migration scope. Three files: VulkanBuffer.cs,
VulkanImage.cs, VulkanImGui.cs. Thirteen allocation sites.
Replace vkAllocateMemory / FindMemoryType with
vmaCreateBuffer / vmaCreateImage. Collapse (buffer, memory)
and (image, memory, view) tuples into (buffer, allocation) and
(image, allocation, view).
Preserve the raw path. Tag the last commit before the
migration. The raw allocator code doesn’t need to live on the main
branch forever, but it should stay easy to find — anyone (including
future me) can git log back to see exactly what raw looked like
and what it cost.
Do not touch Handles.cs. The opaque BufferHandle /
ImageHandle pool is defined but unused. Activating it is a
separate refactor. Bundling it with the VMA migration is exactly
the scope creep this note exists to avoid.
Escape hatch. VMA doesn’t forbid raw vkAllocateMemory. If a
future paper implementation needs raw for transient aliasing, tile
memory, or anything exotic, the engine can do raw for that specific
resource alongside VMA for everything else. Unwinding is contained:
blast radius is still three files.
Functional angle
Worth naming because it’s the reason this decision is smaller than
it looks. VMA lives entirely on one side of the functional core /
imperative shell line. vkAllocateMemory, vmaCreateBuffer, the
physical device queries — all of that is already in RenderLab.Gpu,
the imperative shell. The functional core, which describes frames
as data (render graph nodes, pass descriptors, G-Buffer layouts),
doesn’t know or care which allocator sits underneath. This is a
shell-layer swap, not an architectural change. That’s the test for
whether an abstraction is in the right layer: can you swap it
without the rest of the code noticing? Here, yes.
Follow-ups (not in this note)
- The VMA migration itself — separate code-pillar task.
- Revisit when transient aliasing in the render graph becomes a real concern.
- Revisit when the mobile build starts caring about tile memory bandwidth.
Handles.cspool activation — unblocked by this decision but not caused by it.