Add per-instance InstanceFlags YAML field and front-CCW test#1287
Draft
MarijnS95 wants to merge 5 commits into
Draft
Add per-instance InstanceFlags YAML field and front-CCW test#1287MarijnS95 wants to merge 5 commits into
MarijnS95 wants to merge 5 commits into
Conversation
…rce allocation Introduce the foundational types for ray tracing acceleration structures: abstract `AccelerationStructure` base class, geometry/instance descriptors, BLAS/TLAS build-request structs with size queries, the `AccelerationStructureBuildFlags` bitmask (using `LLVM_DECLARE_ENUM_AS_BITMASK` since `TextureUsage` already uses the intrusive `LLVM_MARK_AS_BITMASK_ENUM`; `TextureUsage` also gains its previously-missing `LLVM_ENABLE_BITMASK_ENUMS_IN_NAMESPACE()`), and AS resource allocation across DX12, Vulkan, and Metal. Recording build commands lands in a follow-up commit on top of the ComputeEncoder abstraction. Vulkan device creation switches to a single `vkGetPhysicalDeviceFeatures2` call covering every extension feature struct we care about (atomic-int64, mesh-shader, acceleration-structure, BDA on 1.1): each struct is chained into `pNext` before the query, and post-query we verify the gating bool and clear the sub-features we don't enable (capture-replay, indirect-build, multiview, etc.). Drive-by: rather than letting `vkCreateDevice` reject the device with a generic `VK_ERROR_FEATURE_NOT_PRESENT`, the code now returns a descriptive `llvm::Error` naming the extension and the bool that came back zero — pinpointing the case where a driver advertises an extension but reports its base feature as `VK_FALSE`. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…helper
Move acceleration-structure build commands behind the abstract
ComputeEncoder interface so the orchestration (data upload, build-request
creation, AS allocation, build recording) can live in one place rather
than splitting across three backends.
ComputeEncoder gains a single batchBuildAS(ArrayRef<ASBuildItem>) method.
Each item carries an AccelerationStructure plus a BLAS or TLAS build
request via PointerUnion. The caller guarantees no inter-item memory
dependencies inside a batch — backends record the whole batch with one
barrier slot, no per-element barriers.
- Vulkan: single vkCmdBuildAccelerationStructuresKHR call covering the
whole batch. TLAS items serialize VkAccelerationStructureInstanceKHR
into a device-address upload buffer, BLAS items pull addresses from
each VulkanBuffer (new getDeviceAddress accessor). Storage buffers
transparently gain SHADER_DEVICE_ADDRESS + ACCEL_BUILD_INPUT_READ_ONLY
flags when ray tracing is supported, with the matching
VkMemoryAllocateFlagsInfo chained on every allocation.
- DX12: loop calling BuildRaytracingAccelerationStructure per item with
no intermediate barriers; D3D12_RAYTRACING_INSTANCE_DESC is
bit-identical to the Vulkan instance struct.
- Metal: lazy transition to MTL::AccelerationStructureCommandEncoder,
deduplicates BLAS handles into the
MTL::InstanceAccelerationStructureDescriptor's instancedAccelera-
tionStructures array (Metal references BLASes by index, not GPU
address).
Each backend's CommandBuffer now carries a back-pointer to its owning
Device so the encoder can reach device-loaded entry points and helpers,
plus a keep-alive list for AS scratch and instance buffers.
A shared helper buildPipelineAccelerationStructures in lib/API/Device.cpp
walks Pipeline::AccelStructs, uploads vertex/index data via the new
createBufferWithData, builds requests, allocates AS objects, and issues
two batchBuildAS calls (BLAS batch then TLAS batch — VUID-03403 forbids
referencing a sibling dstAccelerationStructure in one command). Each
backend's executeProgram calls this helper to build the pipeline's AS
objects.
Descriptor binding for AS resources is intentionally still missing — the
tests progress past AS-build now and surface only the descriptor-write
gap.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wire up acceleration-structure descriptor binding end-to-end across all three backends so shaders can actually consume the TLAS that buildPipelineAccelerationStructures produced — completing the stack and promoting the three InlineRT tests from XFAIL to passing. Vulkan: createDescriptorPool counts AS descriptors in a separate scalar (the KHR enum value 1000150000 doesn't fit in the indexed array used for the core types) and emits one VkDescriptorPoolSize for them. createDescriptorSets resolves each AS resource via Resource::TLASPtr, locates the matching VulkanAccelerationStructure in InvocationState::AccelStructs (BLASes-then-TLASes layout, matching the helper's documented declaration order), and writes the handle through a VkWriteDescriptorSetAccelerationStructureKHR chained on the descriptor write's pNext. The dispatch's pre-barrier dst access now includes VK_ACCESS_ACCELERATION_STRUCTURE_READ_BIT_KHR so the prior AS-build's writes are made visible to the shader's RayQuery reads. Device creation also enables VK_KHR_ray_query when supported so the RayQuery shader instructions actually function. DX12: writes a D3D12_SRV_DIMENSION_RAYTRACING_ACCELERATION_STRUCTURE SRV with the AS GPU virtual address as Location into the heap slot that createBuffers reserved (CreateShaderResourceView with a null resource — the AS data lives in the buffer pointed to by Location). Metal: the Metal shader converter doesn't bind the AS directly; the shader reads a buffer containing an IRRaytracingAccelerationStructure- GPUHeader that holds the AS's gpuResourceID plus a pointer to an instance-contributions array. createBuffers allocates and fills both buffers per AS-descriptor entry, then points the descriptor at the header buffer's GPU address. The TLAS itself is built with the UserID instance-descriptor variant so HLSL CommittedInstanceID() returns the YAML-specified per-instance ID instead of the array index. The three InlineRT tests now actually exercise the AS end-to-end: TraceRayInline issues a RayQuery against `Scene` and writes a hit-dependent value into `Output` (the instance ID for multi-instance, 1/0 otherwise). The catch-all `XFAIL: *` is dropped; `XFAIL: Clang` remains. The test shaders gain explicit `[[vk::binding]]` annotations since their `t0`/`u0` registers would otherwise collide under the default dxc HLSL→SPIR-V mapping. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a 24-bit per-instance InstanceContributionToHitGroupIndex slot on AccelerationStructureInstance / InstanceDesc, plumbed through DX (D3D12_RAYTRACING_INSTANCE_DESC), VK (instanceShaderBindingTableRecord- Offset), and the Metal IR converter's addressOfInstanceContributions sidecar. The covering test verifies CommittedInstanceContributionToHit- GroupIndex() returns the per-instance value across distinct instances, including the top-of-range 0xFFFFFF. Part of the inline-RT test coverage epic (llvm#1258). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduces AccelerationStructureInstanceFlags with bit values that intentionally match D3D12_RAYTRACING_INSTANCE_FLAGS, VkGeometryInstance- FlagBitsKHR, and MTLAccelerationStructureInstanceOptions so backends can pass the value through unchanged. YAML exposes the field as a bitset list (e.g. `InstanceFlags: [TriangleCullDisable, ForceOpaque]`). The covering test sets TriangleFrontCounterclockwise on one of two instances and verifies CommittedTriangleFrontFace() flips against the same BLAS. Part of the inline-RT test coverage epic (llvm#1258). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Depends on #1245, #1286
Summary
Introduces
AccelerationStructureInstanceFlagswith bit values that intentionally matchD3D12_RAYTRACING_INSTANCE_FLAGS,VkGeometryInstanceFlagBitsKHR, andMTLAccelerationStructureInstanceOptionsso the value passes straight through each backend's instance fill with a singlestatic_cast. YAML exposes the field as a bitset list (e.g.InstanceFlags: [TriangleCullDisable, ForceOpaque]).The covering test (
Feature/InlineRT/instance-flags.test) setsTriangleFrontCounterclockwiseon one of two instances of the same single-triangle BLAS and verifiesCommittedTriangleFrontFace()flips — same world-space ray, same vertices, only the per-instance flag changes.Part of the inline-RT test coverage epic (#1258).
Test plan
instance-flags.testpasses on Metalinstance-flags.testpasses on Vulkaninstance-flags.testpasses on DX12