Skip to content

Add InstanceContributionToHitGroupIndex YAML field and shader query#1286

Draft
MarijnS95 wants to merge 4 commits into
llvm:mainfrom
Traverse-Research:inlinert-instance-contribution
Draft

Add InstanceContributionToHitGroupIndex YAML field and shader query#1286
MarijnS95 wants to merge 4 commits into
llvm:mainfrom
Traverse-Research:inlinert-instance-contribution

Conversation

@MarijnS95
Copy link
Copy Markdown
Collaborator

@MarijnS95 MarijnS95 commented Jun 4, 2026

Depends on #1245

Summary

Adds a 24-bit per-instance InstanceContributionToHitGroupIndex slot on AccelerationStructureInstance / InstanceDesc, plumbed through:

  • DX: D3D12_RAYTRACING_INSTANCE_DESC.InstanceContributionToHitGroupIndex
  • Vulkan: instanceShaderBindingTableRecordOffset
  • Metal: the IR converter's addressOfInstanceContributions sidecar (the existing stub buffer in setupDispatch is now filled from per-instance values instead of being hardcoded to zeros — intersectionFunctionTableOffset is left at zero because the IRC bypasses it for inline RT)

The covering test (Feature/InlineRT/instance-contribution.test) verifies CommittedInstanceContributionToHitGroupIndex() returns the per-instance value across three distinct instances, including the top-of-range 0xFFFFFF.

Part of the inline-RT test coverage epic (#1258).

Test plan

  • instance-contribution.test passes on Metal
  • instance-contribution.test passes on Vulkan
  • instance-contribution.test passes on DX12
  • D3D12 via Wine + vkd3d-proton + cross-compiled offloader.exe on an NVIDIA RTX 3060 — passes

MarijnS95 and others added 4 commits June 3, 2026 13:33
…rce allocation

Introduce the foundational types for ray tracing acceleration structures:
abstract `AccelerationStructure` base class, geometry/instance descriptors,
BLAS/TLAS build-request structs with size queries, the
`AccelerationStructureBuildFlags` bitmask (using
`LLVM_DECLARE_ENUM_AS_BITMASK` since `TextureUsage` already uses the
intrusive `LLVM_MARK_AS_BITMASK_ENUM`; `TextureUsage` also gains its
previously-missing `LLVM_ENABLE_BITMASK_ENUMS_IN_NAMESPACE()`), and AS
resource allocation across DX12, Vulkan, and Metal. Recording build
commands lands in a follow-up commit on top of the ComputeEncoder
abstraction.

Vulkan device creation switches to a single `vkGetPhysicalDeviceFeatures2`
call covering every extension feature struct we care about (atomic-int64,
mesh-shader, acceleration-structure, BDA on 1.1): each struct is chained
into `pNext` before the query, and post-query we verify the gating bool
and clear the sub-features we don't enable (capture-replay,
indirect-build, multiview, etc.).

Drive-by: rather than letting `vkCreateDevice` reject the device with a
generic `VK_ERROR_FEATURE_NOT_PRESENT`, the code now returns a
descriptive `llvm::Error` naming the extension and the bool that came
back zero — pinpointing the case where a driver advertises an extension
but reports its base feature as `VK_FALSE`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…helper

Move acceleration-structure build commands behind the abstract
ComputeEncoder interface so the orchestration (data upload, build-request
creation, AS allocation, build recording) can live in one place rather
than splitting across three backends.

ComputeEncoder gains a single batchBuildAS(ArrayRef<ASBuildItem>) method.
Each item carries an AccelerationStructure plus a BLAS or TLAS build
request via PointerUnion. The caller guarantees no inter-item memory
dependencies inside a batch — backends record the whole batch with one
barrier slot, no per-element barriers.

  - Vulkan: single vkCmdBuildAccelerationStructuresKHR call covering the
    whole batch. TLAS items serialize VkAccelerationStructureInstanceKHR
    into a device-address upload buffer, BLAS items pull addresses from
    each VulkanBuffer (new getDeviceAddress accessor). Storage buffers
    transparently gain SHADER_DEVICE_ADDRESS + ACCEL_BUILD_INPUT_READ_ONLY
    flags when ray tracing is supported, with the matching
    VkMemoryAllocateFlagsInfo chained on every allocation.
  - DX12: loop calling BuildRaytracingAccelerationStructure per item with
    no intermediate barriers; D3D12_RAYTRACING_INSTANCE_DESC is
    bit-identical to the Vulkan instance struct.
  - Metal: lazy transition to MTL::AccelerationStructureCommandEncoder,
    deduplicates BLAS handles into the
    MTL::InstanceAccelerationStructureDescriptor's instancedAccelera-
    tionStructures array (Metal references BLASes by index, not GPU
    address).

Each backend's CommandBuffer now carries a back-pointer to its owning
Device so the encoder can reach device-loaded entry points and helpers,
plus a keep-alive list for AS scratch and instance buffers.

A shared helper buildPipelineAccelerationStructures in lib/API/Device.cpp
walks Pipeline::AccelStructs, uploads vertex/index data via the new
createBufferWithData, builds requests, allocates AS objects, and issues
two batchBuildAS calls (BLAS batch then TLAS batch — VUID-03403 forbids
referencing a sibling dstAccelerationStructure in one command). Each
backend's executeProgram calls this helper to build the pipeline's AS
objects.

Descriptor binding for AS resources is intentionally still missing — the
tests progress past AS-build now and surface only the descriptor-write
gap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wire up acceleration-structure descriptor binding end-to-end across
all three backends so shaders can actually consume the TLAS that
buildPipelineAccelerationStructures produced — completing the stack
and promoting the three InlineRT tests from XFAIL to passing.

Vulkan: createDescriptorPool counts AS descriptors in a separate
scalar (the KHR enum value 1000150000 doesn't fit in the indexed
array used for the core types) and emits one VkDescriptorPoolSize
for them. createDescriptorSets resolves each AS resource via
Resource::TLASPtr, locates the matching VulkanAccelerationStructure
in InvocationState::AccelStructs (BLASes-then-TLASes layout, matching
the helper's documented declaration order), and writes the handle
through a VkWriteDescriptorSetAccelerationStructureKHR chained on the
descriptor write's pNext. The dispatch's pre-barrier dst access now
includes VK_ACCESS_ACCELERATION_STRUCTURE_READ_BIT_KHR so the prior
AS-build's writes are made visible to the shader's RayQuery reads.
Device creation also enables VK_KHR_ray_query when supported so the
RayQuery shader instructions actually function.

DX12: writes a D3D12_SRV_DIMENSION_RAYTRACING_ACCELERATION_STRUCTURE
SRV with the AS GPU virtual address as Location into the heap slot
that createBuffers reserved (CreateShaderResourceView with a null
resource — the AS data lives in the buffer pointed to by Location).

Metal: the Metal shader converter doesn't bind the AS directly; the
shader reads a buffer containing an IRRaytracingAccelerationStructure-
GPUHeader that holds the AS's gpuResourceID plus a pointer to an
instance-contributions array. createBuffers allocates and fills both
buffers per AS-descriptor entry, then points the descriptor at the
header buffer's GPU address. The TLAS itself is built with the UserID
instance-descriptor variant so HLSL CommittedInstanceID() returns the
YAML-specified per-instance ID instead of the array index.

The three InlineRT tests now actually exercise the AS end-to-end:
TraceRayInline issues a RayQuery against `Scene` and writes a
hit-dependent value into `Output` (the instance ID for multi-instance,
1/0 otherwise). The catch-all `XFAIL: *` is dropped; `XFAIL: Clang`
remains. The test shaders gain explicit `[[vk::binding]]` annotations
since their `t0`/`u0` registers would otherwise collide under the
default dxc HLSL→SPIR-V mapping.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a 24-bit per-instance InstanceContributionToHitGroupIndex slot on
AccelerationStructureInstance / InstanceDesc, plumbed through DX
(D3D12_RAYTRACING_INSTANCE_DESC), VK (instanceShaderBindingTableRecord-
Offset), and the Metal IR converter's addressOfInstanceContributions
sidecar. The covering test verifies CommittedInstanceContributionToHit-
GroupIndex() returns the per-instance value across distinct instances,
including the top-of-range 0xFFFFFF.

Part of the inline-RT test coverage epic
(llvm#1258).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants