Descriptors in Vulkan: Pools, Sets, Buffers, and suffering

I’m currently learning Vulkan by following the (excellent!) Vulkan Guide. Vulkan has a notoriously steep learning curve, coming not only from the explicit and verbose nature of the API but also from the fact that most existing Vulkan resources assume prior knowledge of 3D graphics programming - of which I have none :)

One aspect of Vulkan that I found especially confusing upon first encounter was the concept of descriptors. There are seemingly several different ways to do things; the Vulkan Guide uses “bindful” descriptor sets, while most people nowadays recommend using something called “bindless”; there’s a new way using “Descriptor Buffers,” and something called “Buffer Device Address” thrown in there that doesn’t appear to replace descriptors but can be used alongside them. You will find different tutorials and resources using different approaches depending on when they were written, the opinions of the author, the phase of the moon, and other parameters. This post is an attempt to consolidate everything that I’ve learned so far not just about which method to use, but the motivation and rationale behind each one and how everything actually works under the hood.

What is a descriptor, anyways?

Modern graphics programming is generally done by writing shaders, programs that are written to be executed on the GPU. You will write your shaders in a shader language, typically GLSL, HLSL, or Slang, and when using Vulkan these will be compiled into the SPIR-V intermediate representation¹ that you can ship alongside your application binary. This IR is then compiled again at runtime into the hardware-specific instruction set of your actual GPU. Shaders can be incredibly varied in what they actually do: you can pull data from any number of on-device buffers containing vertex, texture, or image data, and mix and match that data however you want. This requires a flexible way to specify which on-device buffers you want your shader to have access to, and this is what descriptors provide. You will generally bind some set of descriptors to a specific stage in your pipeline, which will let you access them from within your shaders.

So are descriptors just pointers to GPU memory?

Yes and no. For buffers that you intend to address directly containing (for example) raw vertex data, this is close to being the case. But images and textures are different: on most hardware, the pixels of your image may be laid out not linearly but in some hardware-dependent configuration, requiring a mapping layer between pixel indices and physical buffer offsets. Additionally, you will often need to wrap textures onto 3D models or map them onto 2D sections of a different size; this is where samplers come in, letting you access texture data through continuous coordinates, with wrapping, downsampling, and upsampling performed through preconfigured fixed hardware functions.

There are several basic types of descriptor:²

Samplers, configurations for wrapping textures around 3D meshes and interpolating continuous coordinates to their resulting pixel values, among other things.
- VK_DESCRIPTOR_TYPE_SAMPLER.
Images, a handle to an image located in GPU memory. There are two types of image descriptors:
- VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE, which is an image that can be sampled from using a sampler as specified above.
- VK_DESCRIPTOR_TYPE_STORAGE_IMAGE, which is a raw image that can be read from or written to by pixel in shader code.
Combined image samplers, an image and a sampler bundled into one handle.
- VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER.
Buffers. There are multiple types of buffer descriptors, depending on the format of the buffer and where it is stored in GPU memory, but they are all fundamentally similar:
- VK_DESCRIPTOR_TYPE_UNIFORM_TEXEL_BUFFER
- VK_DESCRIPTOR_TYPE_STORAGE_TEXEL_BUFFER
- VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER
- VK_DESCRIPTOR_TYPE_STORAGE_BUFFER
- VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC
- VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC

Using the Descriptor Buffer extension, we can actually find out how each descriptor type is laid out in GPU memory on my hardware, an AMD Radeon RX 7800 XT.

Note: Some of the code in this post is written in Zig, a modern C-like language. It should be relatively easy to follow if you know C, but there may be some minor differences.

const descriptor_types = [_]vk.DescriptorType{
    .sampler,
    .combined_image_sampler,
    .sampled_image,
    .storage_image,
    .uniform_texel_buffer,
    .storage_texel_buffer,
    .uniform_buffer,
    .storage_buffer,
    .uniform_buffer_dynamic,
    .storage_buffer_dynamic,
};

std.debug.print("| Type | Size |\n", .{});
std.debug.print("| - | - |\n", .{});

inline for (descriptor_types) |descriptor_type| {
    const binding: vk.DescriptorSetLayoutBinding = .{
        .binding = 0,
        .descriptor_type = descriptor_type,
        .descriptor_count = 1,
        .stage_flags = .{},
    };
    const layout_info: vk.DescriptorSetLayoutCreateInfo = .{
        .binding_count = 1,
        .p_bindings = @ptrCast(&binding),
    };
    const layout = try device.createDescriptorSetLayout(&layout_info, null);
    std.debug.print("| `{}` | {} bytes |\n", .{descriptor_type, device.getDescriptorSetLayoutSizeEXT(layout)});
}

This results in the following table:

Descriptor Sizes

Type	Size
`vk.DescriptorType.sampler`	16 bytes
`vk.DescriptorType.combined_image_sampler`	96 bytes
`vk.DescriptorType.sampled_image`	64 bytes
`vk.DescriptorType.storage_image`	32 bytes
`vk.DescriptorType.uniform_texel_buffer`	16 bytes
`vk.DescriptorType.storage_texel_buffer`	16 bytes
`vk.DescriptorType.uniform_buffer`	16 bytes
`vk.DescriptorType.storage_buffer`	16 bytes
`vk.DescriptorType.uniform_buffer_dynamic`	0 bytes
`vk.DescriptorType.storage_buffer_dynamic`	0 bytes

Ignoring dynamic buffers for now, we can see that most objects are the size of two pointers, with the exception of images which are larger. Let’s look at the driver code to see what’s actually going on under the hood.

mesa/src/amd/common/ac_descriptors.c:

void
ac_build_buffer_descriptor(const enum amd_gfx_level gfx_level, const struct ac_buffer_state *state, uint32_t desc[4])
{
   uint32_t rsrc_word1 = S_008F04_BASE_ADDRESS_HI(state->va >> 32) | S_008F04_STRIDE(state->stride);
   uint32_t rsrc_word3;

   if (gfx_level >= GFX11) {
      rsrc_word1 |= S_008F04_SWIZZLE_ENABLE_GFX11(state->swizzle_enable);
   } else {
      rsrc_word1 |= S_008F04_SWIZZLE_ENABLE_GFX6(state->swizzle_enable);
   }

   ac_set_buf_desc_word3(gfx_level, state, &rsrc_word3);

   desc[0] = state->va;
   desc[1] = rsrc_word1;
   desc[2] = state->size;
   desc[3] = rsrc_word3;
}

This is the function that writes buffer descriptors. As you can see, the descriptor consists of the low 4 bytes of the virtual address, a 4-byte value containing the high bytes of the virtual address and the stride, a 4-byte metadata value (word3, which contains swizzle and compression information among other things) and a 4-byte buffer size. This is essentially just a pointer with some extra metadata about how the buffer is accessed. Let’s look at samplers next.

mesa/src/amd/common/ac_descriptors.c:

void
ac_build_sampler_descriptor(const enum amd_gfx_level gfx_level, const struct ac_sampler_state *state, uint32_t desc[4])
{
   const unsigned perf_mip = state->max_aniso_ratio ? state->max_aniso_ratio + 6 : 0;
   const bool compat_mode = gfx_level == GFX8 || gfx_level == GFX9;

   desc[0] = S_008F30_CLAMP_X(state->address_mode_u) |
             S_008F30_CLAMP_Y(state->address_mode_v) |
             S_008F30_CLAMP_Z(state->address_mode_w) |
             S_008F30_MAX_ANISO_RATIO(state->max_aniso_ratio) |
             S_008F30_DEPTH_COMPARE_FUNC(state->depth_compare_func) |
             S_008F30_FORCE_UNNORMALIZED(state->unnormalized_coords) |
             S_008F30_ANISO_THRESHOLD(state->max_aniso_ratio >> 1) |
             S_008F30_ANISO_BIAS(state->max_aniso_ratio) |
             S_008F30_DISABLE_CUBE_WRAP(!state->cube_wrap) |
             S_008F30_COMPAT_MODE(compat_mode) |
             S_008F30_TRUNC_COORD(state->trunc_coord) |
             S_008F30_FILTER_MODE(state->filter_mode);
   desc[1] = 0;
   desc[2] = S_008F38_XY_MAG_FILTER(state->mag_filter) |
             S_008F38_XY_MIN_FILTER(state->min_filter) |
             S_008F38_MIP_FILTER(state->mip_filter);
   desc[3] = S_008F3C_BORDER_COLOR_TYPE(state->border_color_type);

   if (gfx_level >= GFX12) {
      desc[1] |= S_008F34_MIN_LOD_GFX12(util_unsigned_fixed(CLAMP(state->min_lod, 0, 17), 8)) |
                 S_008F34_MAX_LOD_GFX12(util_unsigned_fixed(CLAMP(state->max_lod, 0, 17), 8));
      desc[2] |= S_008F38_PERF_MIP_LO(perf_mip);
      desc[3] |= S_008F3C_PERF_MIP_HI(perf_mip >> 2);
   } else {
      desc[1] |= S_008F34_MIN_LOD_GFX6(util_unsigned_fixed(CLAMP(state->min_lod, 0, 15), 8)) |
                 S_008F34_MAX_LOD_GFX6(util_unsigned_fixed(CLAMP(state->max_lod, 0, 15), 8)) |
                 S_008F34_PERF_MIP(perf_mip);
   }

   if (gfx_level >= GFX10) {
      desc[2] |= S_008F38_LOD_BIAS(util_signed_fixed(CLAMP(state->lod_bias, -32, 31), 8)) |
                 S_008F38_ANISO_OVERRIDE_GFX10(!state->aniso_single_level);
   } else {
      desc[2] |= S_008F38_LOD_BIAS(util_signed_fixed(CLAMP(state->lod_bias, -16, 16), 8)) |
                 S_008F38_DISABLE_LSB_CEIL(gfx_level <= GFX8) |
                 S_008F38_FILTER_PREC_FIX(1) |
                 S_008F38_ANISO_OVERRIDE_GFX8(gfx_level >= GFX8 && !state->aniso_single_level);
   }

   if (gfx_level >= GFX11) {
      desc[3] |= S_008F3C_BORDER_COLOR_PTR_GFX11(state->border_color_ptr);
   } else {
      desc[3] |= S_008F3C_BORDER_COLOR_PTR_GFX6(state->border_color_ptr);
   }
}

No pointers here, just lots of sampler information packed into 4 bytes. What about images?

mesa/src/amd/vulkan/radv_image_view.h:

union radv_descriptor {
   struct {
      uint32_t plane0_descriptor[8];
      uint32_t fmask_descriptor[8];
   };
   struct {
      uint32_t plane_descriptors[3][8];
   };
};

struct radv_image_view {
   struct vk_image_view vk;
   struct radv_image *image; /**< VkImageViewCreateInfo::image */

   unsigned plane_id;
   VkExtent3D extent; /**< Extent of VkImageViewCreateInfo::baseMipLevel. */

   /* Whether the image iview supports fast clear. */
   bool support_fast_clear;

   bool disable_dcc_mrt;
   bool disable_tc_compat_cmask_mrt;

   union radv_descriptor descriptor;

   /* Descriptor for use as a storage image as opposed to a sampled image.
    * This has a few differences for cube maps (e.g. type).
    */
   union radv_descriptor storage_descriptor;

   /* Block-compressed image views on GFX10+. */
   struct ac_surf_nbc_view nbc_view;
};

mesa/src/amd/vulkan/radv_descriptor_set.c:

static ALWAYS_INLINE void
write_image_descriptor(unsigned *dst, unsigned size, VkDescriptorType descriptor_type,
                       const VkDescriptorImageInfo *image_info)
{
   struct radv_image_view *iview = NULL;
   union radv_descriptor *descriptor;

   if (image_info)
      iview = radv_image_view_from_handle(image_info->imageView);

   if (!iview) {
      memset(dst, 0, size);
      return;
   }

   if (descriptor_type == VK_DESCRIPTOR_TYPE_STORAGE_IMAGE) {
      descriptor = &iview->storage_descriptor;
   } else {
      descriptor = &iview->descriptor;
   }
   assert(size > 0);

   /* Encourage compilers to inline memcpy for combined image/sampler descriptors. */
   switch (size) {
   case 32:
      memcpy(dst, descriptor, 32);
      break;
   case 64:
      memcpy(dst, descriptor, 64);
      break;
   default:
      unreachable("Invalid size");
   }
}

As we can see, there’s two different possible paths taken here depending on whether we’re dealing with a sampled image or not, because there are “a few differences for cube maps.” We’re reading at most 8 bytes here, so we’d only ever be touching the plane_descriptors component of our union radv_descriptor. Where do these get written? Well, the code path for these is quite complicated so I won’t repeat all of the details here, but long story short we end up calling radv_make_texture_descriptor for each plane, which diverges depending on the hardware version and calls into ac_build_texture_descriptor which diverges again. Here is the code for gfx12 and later, which writes into desc. I don’t pretend to understand any of the cubemapping implementation details, but this function sets most of the image state, and a later call to ac_set_mutable_tex_desc_fields (omitted here for brevity) fills in the rest and sets the virtual address, spread across the low and high u32 chunks.

mesa/src/amd/common/ac_descriptors.c:

static void
ac_build_gfx12_texture_descriptor(const struct radeon_info *info, const struct ac_texture_state *state, uint32_t desc[8])
{
   const struct radeon_surf *surf = state->surf;
   const struct util_format_description *fmt_desc = util_format_description(state->format);
   const uint32_t img_format = ac_get_gfx10_img_format(info->gfx_level, state);
   const uint32_t field_last_level = state->num_samples > 1 ? util_logbase2(state->num_samples) : state->last_level;
   const bool no_edge_clamp = state->num_levels > 1 && util_format_is_compressed(state->img_format) &&
                              !util_format_is_compressed(state->format);
   const uint32_t min_lod_clamped = util_unsigned_fixed(CLAMP(state->min_lod, 0, 15), 8);
   const struct ac_surf_nbc_view *nbc_view = state->gfx9.nbc_view;

   uint32_t max_mip = state->num_samples > 1 ? util_logbase2(state->num_samples) : state->num_levels - 1;
   if (nbc_view && nbc_view->valid)
      max_mip = nbc_view->num_levels - 1;

   desc[0] = 0;
   desc[1] = S_00A004_MAX_MIP_GFX12(max_mip) |
             S_00A004_FORMAT_GFX12(img_format) |
             S_00A004_BASE_LEVEL(state->num_samples > 1 ? 0 : state->first_level) |
             S_00A004_WIDTH_LO(state->width - 1);
   desc[2] = S_00A008_WIDTH_HI((state->width - 1) >> 2) |
             S_00A008_HEIGHT(state->height - 1);
   desc[3] = S_00A00C_DST_SEL_X(ac_map_swizzle(state->swizzle[0])) |
             S_00A00C_DST_SEL_Y(ac_map_swizzle(state->swizzle[1])) |
             S_00A00C_DST_SEL_Z(ac_map_swizzle(state->swizzle[2])) |
             S_00A00C_DST_SEL_W(ac_map_swizzle(state->swizzle[3])) |
             S_00A00C_NO_EDGE_CLAMP(no_edge_clamp) |
             S_00A00C_LAST_LEVEL_GFX12(field_last_level) |
             S_00A00C_BC_SWIZZLE(ac_border_color_swizzle(fmt_desc)) |
             S_00A00C_TYPE(state->type);

   /* Depth is the the last accessible layer on gfx9+. The hw doesn't need
    * to know the total number of layers.
    */
   desc[4] = S_00A010_DEPTH_GFX12(state->depth) |
             S_00A010_BASE_ARRAY(state->first_layer);
   desc[5] = S_00A014_UAV3D(state->gfx10.uav3d) |
             S_00A014_PERF_MOD(4) |
             S_00A014_MIN_LOD_LO_GFX12(min_lod_clamped);
   desc[6] = S_00A018_MAX_UNCOMPRESSED_BLOCK_SIZE(1 /*256B*/) |
             S_00A018_MAX_COMPRESSED_BLOCK_SIZE(surf->u.gfx9.color.dcc.max_compressed_block_size) |
             S_00A018_MIN_LOD_HI(min_lod_clamped >> 6);
   desc[7] = 0;
}

In summary this is just a bunch of image metadata along with a virtual address, packed in a hardware-dependent way.

Finally, combined image samplers, which are just a sampler stuck to a sampled image.

mesa/src/amd/vulkan/radv_descriptor_set.c:

static ALWAYS_INLINE void
write_combined_image_sampler_descriptor(struct radv_device *device, struct radv_cmd_buffer *cmd_buffer, unsigned *dst,
                                        struct radeon_winsys_bo **buffer_list, VkDescriptorType descriptor_type,
                                        const VkDescriptorImageInfo *image_info, bool has_sampler)
{
   write_image_descriptor_impl(device, cmd_buffer, 64, dst, buffer_list, descriptor_type, image_info);
   /* copy over sampler state */
   if (has_sampler) {
      VK_FROM_HANDLE(radv_sampler, sampler, image_info->sampler);
      const uint32_t sampler_offset = RADV_COMBINED_IMAGE_SAMPLER_DESC_SAMPLER_OFFSET;

      memcpy(dst + sampler_offset / sizeof(*dst), sampler->state, RADV_SAMPLER_DESC_SIZE);
   }
}

So what is a descriptor?

That was a lot of code! Let’s take a step back and go over some high-level details. It seems like buffers are pointers with a little bit of metadata, images are pointers with a lot of metadata, and samplers directly encode sampler state. Now that we know what descriptors are, we can finally look at how you as the user of the Vulkan API interact with them.

Vulkan 1.0

In the beginning³, there was Vulkan 1.0. Base Vulkan provides a highly abstracted descriptor model that consists of several Vulkan objects working together.

VkDescriptorSetLayout, which is a layout definition for a descriptor set.
VkDescriptorPool, which can allocate a certain number of VkDescriptorSet objects before it fills up and has to be reset.
VkDescriptorSet, a single allocated block of data laid out to according to a VkDescriptorSetLayout.

With this model, you define set layouts for all of your shaders up front, then call vkAllocateDescriptorSets to allocate sets from your VkDescriptorPool. Writing to those descriptor sets is done with vkUpdateDescriptorSets, and after you’ve filled them in you can call vkCmdBindDescriptorSets before each of your draw calls to bind the set or sets you want to read from your shaders.

Since we know that on modern desktop GPUs descriptors are just pointers with some metadata:

VkDescriptorSetLayout is a struct definition.
VkDescriptorPool is a fixed-size arena allocator for descriptor sets.
VkDescriptorSet is a single struct.

This model totally works, but there are a few limitations and annoyances.

Since VkDescriptorPool objects have a fixed size, you need to manually balance your allocations across multiple pools if you’re using more descriptors than can fit in a single pool, which requires an additional application-side abstraction layer.
Reusing descriptors across multiple shader invocations requires manually defining sets for each group of descriptors you want to reuse, and manually dealing with their lifetimes.
You typically have to call vkCmdBindDescriptorSets for each draw call, potentially multiple times. For many draw calls, this overhead can add up and impact frametimes.

Bindless

While the Vulkan 1.0 model is an “array-of-structs” approach, where we bind a specific struct within the array to our shader, the bindless⁴ descriptor technique lets us do “structure-of-arrays” instead, passing the indices through push constants or a uniform buffer. This is also called “descriptor indexing.” Here’s an outline of how it works.⁵

We use a single set layout, a single pool, and a single set for our entire application.
Our set layout has one binding for each type of descriptor we want to use.
Using VK_DESCRIPTOR_BINDING_PARTIALLY_BOUND_BIT_EXT, which lets us have uninitialized descriptors, we set the descriptorCount parameter for each binding to an arbitrarily high number (usually somewhere around 100k, which will allocate a few megabytes of GPU memory).⁶
With VK_DESCRIPTOR_BINDING_UPDATE_AFTER_BIND_BIT and VK_DESCRIPTOR_BINDING_UPDATE_UNUSED_WHILE_PENDING_BIT set, we can add new descriptors to our bound arrays at any time and overwrite descriptors that are no longer in use using vkUpdateDescriptorSets.

Here’s an example set layout for reference, with shader stage flags ommitted.

const bindings = [_]vk.DescriptorSetLayoutBinding{
    .{
        .binding = 0,
        .descriptor_type = .sampler,
        .descriptor_count = 131_072,
        .stage_flags = .{},
    },
    .{
        .binding = 1,
        .descriptor_type = .combined_image_sampler,
        .descriptor_count = 131_072,
        .stage_flags = .{},
    },
    .{
        .binding = 2,
        .descriptor_type = .sampled_image,
        .descriptor_count = 131_072,
        .stage_flags = .{},
    },
    .{
        .binding = 3,
        .descriptor_type = .storage_image,
        .descriptor_count = 131_072,
        .stage_flags = .{},
    },
    .{
        .binding = 4,
        .descriptor_type = .uniform_texel_buffer,
        .descriptor_count = 131_072,
        .stage_flags = .{},
    },
    .{
        .binding = 5,
        .descriptor_type = .storage_texel_buffer,
        .descriptor_count = 131_072,
        .stage_flags = .{},
    },
    .{
        .binding = 6,
        .descriptor_type = .uniform_buffer,
        .descriptor_count = 131_072,
        .stage_flags = .{},
    },
    .{
        .binding = 7,
        .descriptor_type = .storage_buffer,
        .descriptor_count = 131_072,
        .stage_flags = .{},
    },
};

Compared to the 1.0 model, there are a few advantages.

Instead of calling vkCmdBindDescriptorSets for each draw call, we will instead pass per-draw indices through push constants, which can be faster.
We don’t have to worry about set or pool capacity at all - only allocating descriptors from our big arrays.
Reusing descriptors is easy; just store and reuse your index for each resource.

Mutable Descriptors

In the previous example, we had to use a different binding for each type of descriptor we want to use. The mutable descriptor type, provided by VK_EXT_mutable_descriptor_type, lets you use a single binding for multiple descriptor types (typically excluding samplers), simplifying your set layout to something like this.

const bindings = [_]vk.DescriptorSetLayoutBinding{
    .{
        .binding = 0,
        .descriptor_type = .sampler,
        .descriptor_count = 131_072,
        .stage_flags = .{},
    },
    .{
        .binding = 1,
        .descriptor_type = .combined_image_sampler,
        .descriptor_count = 131_072,
        .stage_flags = .{},
    },
    .{
        .binding = 2,
        .descriptor_type = .mutable_ext,
        .descriptor_count = 131_072,
        .stage_flags = .{},
    },
};

Should you use mutable descriptors? The extension is primarily intended for DX12 emulation, and the Best Practices validation layer will in fact warn you about this if you try to use it.

vkCreateDevice(): Attempting to enable extension VK_EXT_mutable_descriptor_type, but this extension is intended to support D3D emulation layers, and applications ported from D3D, by adding functionality specific to D3D and it is strongly recommended that it be otherwise avoided.

So why is it a bad idea? Well, let’s look at the mesa source again to see what’s actually going on under the hood.

static VkResult
radv_create_descriptor_pool(struct radv_device *device, const VkDescriptorPoolCreateInfo *pCreateInfo,
                            const VkAllocationCallbacks *pAllocator, VkDescriptorPool *pDescriptorPool)
{
...
      switch (pCreateInfo->pPoolSizes[i].type) {
...
      case VK_DESCRIPTOR_TYPE_MUTABLE_EXT:
         /* Per spec, if a mutable descriptor type list is provided for the pool entry, we
          * allocate enough memory to hold any subset of that list.
          * If there is no mutable descriptor type list available,
          * we must allocate enough for any supported mutable descriptor type, i.e. 64 bytes if
          * FMASK is used.
          */
         if (mutable_info && i < mutable_info->mutableDescriptorTypeListCount) {
            uint64_t mutable_size, mutable_alignment;
            if (radv_mutable_descriptor_type_size_alignment(device, &mutable_info->pMutableDescriptorTypeLists[i],
                                                            &mutable_size, &mutable_alignment)) {
               /* 32 as we may need to align for images */
               mutable_size = align(mutable_size, 32);
               bo_size += mutable_size * pCreateInfo->pPoolSizes[i].descriptorCount;
               if (mutable_size < 32)
                  num_16byte_descriptors += pCreateInfo->pPoolSizes[i].descriptorCount;
            }
         } else {
            const uint32_t max_desc_size = pdev->use_fmask ? 64 : 32;
            bo_size += max_desc_size * pCreateInfo->pPoolSizes[i].descriptorCount;
         }
         break;
...
      }
...
}

This is just allocating a union of the descriptor types we specify. If we refer back to our table of descriptor sizes, we can see that buffers are all the same size, but sampled and storage images are different sizes, so our union here would likely exhibit worse caching performance for buffers and storage images due to wasted memory.

Buffer Device Address

You may have noticed from the previous section about descriptor sizes that all buffer types store the same data, and they are really just very thin wrappers around a pointer to GPU memory. If we drop that metadata or transmit it out-of-band (e.g. by making assumptions in our code about the buffer size and format), we could theoretically just pass a GPU address directly, eliminating the need to go through the driver at all. The VK_KHR_buffer_device_address extension lets you do this, completely eliminating the need for bound buffers at the cost of potentially more difficult debugging. Simply call vkGetBufferDeviceAddressKHR to retrieve the device address of your buffers and pass it to your shader as a push constant, where you can perform manual pointer arithmatic and dereference it as needed. Now you really only need four sets: one for samplers, one for combined image samplers, one for sampled images, and one for storage images. If you’re only using combined image samplers, you can even get away with two by dropping your sampled image and sampler sets. Buffer Device Address is orthogonal to your chosen descriptor solution and will reduce complexity and improve performance in most cases, and should be used if supported by your target platform(s).

Push Descriptors

Push descriptors extend the Buffer Device Address concept by allowing you to pass full-sized descriptors directly through push constants instead of as an index into a global set, removing a layer of indirection. Since full-size descriptors can be large and the available push constant size is usually quite small, you are generally limited in how many push descriptors you can use, but this can be a good approach for descriptors that change frequently (e.g. with each draw call).

Descriptor Buffer

While reading this post you may have figured out already that this whole descriptor thing is essentially just another memory allocation problem. In the Vulkan 1.0 model, the API provides an fixed-buffer arena allocator that hands out blocks of memory for sets of various predefined layouts. In the bindless model, we allocate a big block of memory for each descriptor type up front, then manually manage the descriptors stored within through indexing. But this is all done indirectly through the abstraction of pools and sets. What if we could just handle this allocation problem ourselves?

With descriptor buffers, you manually allocate blocks of GPU memory to store descriptors, querying for the size and layout of the descriptor types you want to use as they may differ across platforms. This process looks something like the following.

Manually allocating a VkBuffer of your desired size with one or more of the VK_BUFFER_USAGE_*_DESCRIPTOR_BUFFER_BIT_EXT flags set.
Calling vkGetDescriptorExt to retrieve descriptor data from your images, samplers, or buffers, and storing that data in the descriptor buffer.
Calling vkCmdBindDescriptorBuffersEXT to bind the descriptor buffer to your command buffer and pipeline.
Calling vkCmdSetDescriptorBufferOffsetsEXT for each draw call to set the offsets into the descriptor buffer you’re using.

Note that since vkCmdSetDescriptorBufferOffsetsEXT can take multiple offsets into the same buffer, you can actually achieve a bindless-like model without having to do any descriptor indexing in your shader code - just use one set shader-side for each resource your shader wants to access, and pass the offsets accordingly.

This approach replaces Pools and Sets, but is orthagonal to and compatible with both the traditional way of binding descriptors and the bindless model; we now have complete flexibility on how we want to organize our descriptors.

For more information about descriptor buffers, this official sample is a good resource.

Conclusion

In summary:

Probably use descriptor indexing, without mutable descriptors, for images and samplers unless you have a reason not to.
Use buffer device address for all of your buffers.
Push descriptors can be used for frequently changing data, but they’re optional and may increase application-side complexity if you have to mix-and-match them with regular descriptors - and you have to manage the total size of your push constants.
Descriptor buffers generally aren’t supported yet on enough targets,⁷ but if you have advanced descriptor managment needs and the feature is supported on your target platform(s) it’s generally a better way of doing things. For simple use cases, however, it may increase application-side complexity since the driver is no longer doing descriptor allocation for you.

I hope that this post is useful to anyone who was as confused about descriptors as I was when I was first starting out. Good luck on your Vulkan rendering journey!

Similar in concept to LLVM IR. ↩︎
There are a few other descriptor types, but they are generally only used in special cases and I will be ignoring them for the sake of simplicity. ↩︎
2016 ↩︎
Not actually bindless! Just less binding. ↩︎
Some approaches are a bit more complicated than this and use multiple sets sorted by modification frequency, but this is the most basic example. ↩︎
VK_DESCRIPTOR_BINDING_VARIABLE_DESCRIPTOR_COUNT_BIT can be used, but in this case we need one set per binding as you can only toggle this for the last binding in a set. ↩︎
As of June 2025, the extension is supported on approximately half of Linux and Windows devices, but generally not on MoltenVK/Mac/Ios or Android. ↩︎