Cover ClosestHit ray system values: barycentrics, PrimitiveIndex, WorldRay#1278
Draft
MarijnS95 wants to merge 9 commits into
Draft
Cover ClosestHit ray system values: barycentrics, PrimitiveIndex, WorldRay#1278MarijnS95 wants to merge 9 commits into
MarijnS95 wants to merge 9 commits into
Conversation
…rce allocation Introduce the foundational types for ray tracing acceleration structures: abstract `AccelerationStructure` base class, geometry/instance descriptors, BLAS/TLAS build-request structs with size queries, the `AccelerationStructureBuildFlags` bitmask (using `LLVM_DECLARE_ENUM_AS_BITMASK` since `TextureUsage` already uses the intrusive `LLVM_MARK_AS_BITMASK_ENUM`; `TextureUsage` also gains its previously-missing `LLVM_ENABLE_BITMASK_ENUMS_IN_NAMESPACE()`), and AS resource allocation across DX12, Vulkan, and Metal. Recording build commands lands in a follow-up commit on top of the ComputeEncoder abstraction. Vulkan device creation switches to a single `vkGetPhysicalDeviceFeatures2` call covering every extension feature struct we care about (atomic-int64, mesh-shader, acceleration-structure, BDA on 1.1): each struct is chained into `pNext` before the query, and post-query we verify the gating bool and clear the sub-features we don't enable (capture-replay, indirect-build, multiview, etc.). Drive-by: rather than letting `vkCreateDevice` reject the device with a generic `VK_ERROR_FEATURE_NOT_PRESENT`, the code now returns a descriptive `llvm::Error` naming the extension and the bool that came back zero — pinpointing the case where a driver advertises an extension but reports its base feature as `VK_FALSE`. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lper
Move acceleration-structure build commands behind the abstract
ComputeEncoder interface so the orchestration (data upload, build-request
creation, AS allocation, build recording) can live in one place rather
than splitting across three backends.
ComputeEncoder gains a single batchBuildAS(ArrayRef<ASBuildItem>) method.
Each item carries an AccelerationStructure plus a BLAS or TLAS build
request via PointerUnion. The caller guarantees no inter-item memory
dependencies inside a batch — backends record the whole batch with one
barrier slot, no per-element barriers.
- Vulkan: single vkCmdBuildAccelerationStructuresKHR call covering the
whole batch. TLAS items serialize VkAccelerationStructureInstanceKHR
into a device-address upload buffer, BLAS items pull addresses from
each VulkanBuffer (new getDeviceAddress accessor). Storage buffers
transparently gain SHADER_DEVICE_ADDRESS + ACCEL_BUILD_INPUT_READ_ONLY
flags when ray tracing is supported, with the matching
VkMemoryAllocateFlagsInfo chained on every allocation.
- DX12: loop calling BuildRaytracingAccelerationStructure per item with
no intermediate barriers; D3D12_RAYTRACING_INSTANCE_DESC is
bit-identical to the Vulkan instance struct.
- Metal: lazy transition to MTL::AccelerationStructureCommandEncoder,
deduplicates BLAS handles into the
MTL::InstanceAccelerationStructureDescriptor's instancedAccelera-
tionStructures array (Metal references BLASes by index, not GPU
address).
Each backend's CommandBuffer now carries a back-pointer to its owning
Device so the encoder can reach device-loaded entry points and helpers,
plus a keep-alive list for AS scratch and instance buffers.
A shared helper buildPipelineAccelerationStructures in lib/API/Device.cpp
walks Pipeline::AccelStructs, uploads vertex/index data via the new
createBufferWithData, builds requests, allocates AS objects, and issues
two batchBuildAS calls (BLAS batch then TLAS batch — VUID-03403 forbids
referencing a sibling dstAccelerationStructure in one command). Each
backend's executeProgram calls this helper to build the pipeline's AS
objects.
Descriptor binding for AS resources is intentionally still missing — the
tests progress past AS-build now and surface only the descriptor-write
gap.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wire up acceleration-structure descriptor binding end-to-end across all three backends so shaders can actually consume the TLAS that buildPipelineAccelerationStructures produced — completing the stack and promoting the three InlineRT tests from XFAIL to passing. Vulkan: createDescriptorPool counts AS descriptors in a separate scalar (the KHR enum value 1000150000 doesn't fit in the indexed array used for the core types) and emits one VkDescriptorPoolSize for them. createDescriptorSets resolves each AS resource via Resource::TLASPtr, locates the matching VulkanAccelerationStructure in InvocationState::AccelStructs (BLASes-then-TLASes layout, matching the helper's documented declaration order), and writes the handle through a VkWriteDescriptorSetAccelerationStructureKHR chained on the descriptor write's pNext. The dispatch's pre-barrier dst access now includes VK_ACCESS_ACCELERATION_STRUCTURE_READ_BIT_KHR so the prior AS-build's writes are made visible to the shader's RayQuery reads. Device creation also enables VK_KHR_ray_query when supported so the RayQuery shader instructions actually function. DX12: writes a D3D12_SRV_DIMENSION_RAYTRACING_ACCELERATION_STRUCTURE SRV with the AS GPU virtual address as Location into the heap slot that createBuffers reserved (CreateShaderResourceView with a null resource — the AS data lives in the buffer pointed to by Location). Metal: the Metal shader converter doesn't bind the AS directly; the shader reads a buffer containing an IRRaytracingAccelerationStructure- GPUHeader that holds the AS's gpuResourceID plus a pointer to an instance-contributions array. createBuffers allocates and fills both buffers per AS-descriptor entry, then points the descriptor at the header buffer's GPU address. The TLAS itself is built with the UserID instance-descriptor variant so HLSL CommittedInstanceID() returns the YAML-specified per-instance ID instead of the array index. The three InlineRT tests now actually exercise the AS end-to-end: TraceRayInline issues a RayQuery against `Scene` and writes a hit-dependent value into `Output` (the instance ID for multi-instance, 1/0 otherwise). The catch-all `XFAIL: *` is dropped; `XFAIL: Clang` remains. The test shaders gain explicit `[[vk::binding]]` annotations since their `t0`/`u0` registers would otherwise collide under the default dxc HLSL→SPIR-V mapping. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Jun 3, 2026
Foundational bring-up for PSO-based raytracing tracked in llvm#1268. Lays out the framework-side surface (stage enums, pipeline kind, YAML schema, lit infrastructure) so subsequent per-backend bring-up PRs (VK → DX12 → Metal) only have to fill in pipeline-state-object creation, SBT construction, and DispatchRays. No backend can run an RT pipeline yet — each one's executeProgram gains a terminal `else if (P.isRayTracing())` that returns a "not yet supported" error. Pipeline.h gets six new Stages (RayGeneration, Miss, ClosestHit, AnyHit, Intersection, Callable), `ShaderPipelineKind::RayTracing`, an `isRayTracingStage` predicate, and `Pipeline::isRayTracing()`. The declarative YAML schema for an RT pipeline lives alongside the existing AccelerationStructureDescs: a `HitGroup` (Triangles | Procedural, with ClosestHit + optional AnyHit / Intersection entries), a `RayTracingPipelineConfig` block (MaxTraceRecursionDepth, MaxPayloadSizeInBytes, MaxAttributeSizeInBytes, optional PipelineFlags), and a `ShaderBindingTable` block with raygen / miss / hit-group / callable record arrays. SBTEntry carries an optional `LocalRootData` byte array reserved for the upcoming local-root-signature work. validatePipelineKind grows an RT branch: it allows multiple shaders of the same RT stage (a pipeline can have several misses or hit groups — the existing duplicate check would have rejected them), requires at least one RayGeneration, and rejects mixing RT with Compute/Vertex/Mesh. The reverse check rejects HitGroups / RTConfig / SBT on any non-RT pipeline. validateDispatchParameters reinterprets DispatchGroupCount as {Width, Height, Depth} for the eventual DispatchRays and forbids VertexCount on RT. Existing Stages switches grow the six new cases: * VK: getShaderStageFlag maps each RT stage to its VK_SHADER_STAGE_*_KHR bit so PR 2 can build VkPipelineShaderStageCreateInfos for the RT pipeline. * Metal: getShaderStage unreachables on RT (the metal-irconverter RT path takes a different route from the IRShaderStage one). * TraditionalRasterPipelineCreateDesc::setShader adds the RT stages to its existing "not a raster stage" unreachable group. test/lit.cfg.py adds a `%dxc_target_lib` substitution (same compiler, distinct name to signal `-T lib_6_x` library targets at a glance) and a `raytracing-pipeline` available-feature. On DX it tracks RaytracingTier >= 1.0; on Vulkan it aliases off the VK_KHR_ray_tracing_pipeline extension already reported by the device. The extension isn't enabled on the VkDevice yet — that lands in PR 2 — but the lit-level capability detection is independent of what the backend currently consumes, so a developer on a VK box can already see the foundational test routed through the RT path. The foundational test `Feature/RT/raygen-roundtrip.test` exercises the full RT YAML schema in one shape: raygen + miss + closest-hit shaders, a BLAS/TLAS pair, a HitGroups list, RayTracingPipelineConfig, and a ShaderBindingTable. `# REQUIRES: raytracing-pipeline` and `# XFAIL: *` keep it expectedly failing until the per-backend PRs drop entries from the XFAIL list as each one starts dispatching real rays. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First per-backend bring-up in the PSO raytracing series (llvm#1268). Adds the API surface (ComputeEncoder::dispatchRays, Device::createPipelineRT, Device::createShaderBindingTable, RayTracingPipelineCreateDesc) plus the Vulkan implementation behind it. D3D12 and Metal stub the new methods with not-yet-supported errors; their bring-up lands in follow-up PRs. The pre-existing YAML schema struct from PR llvm#1270 is renamed ShaderBindingTable -> ShaderBindingTableDesc so the bare name is free for the runtime resource class (parallel to BLASDesc / TLASDesc vs AccelerationStructure). A new include/API/ShaderBindingTable.h holds the abstract runtime base; concrete backend SBT classes derive from it with LLVM-style classof / cast<>. The VulkanDevice's prior `RaytracingFunctions RT` lumped AS and RT pipeline entry points together. They split into two structs — `ASFunctions AS` and `RTPipelineFunctions RT` — matching the actual feature-gate split (AS+ray-query is a complete configuration on its own, RT pipeline is layered on top). `HasRayTracingSupport` renames to `HasASSupport`, and a separate `HasRTPipelineSupport` tracks the new VK_KHR_ray_tracing_pipeline extension. Vulkan bring-up: - Extension: VK_KHR_ray_tracing_pipeline is requested when reported, with VkPhysicalDeviceRayTracingPipelineFeaturesKHR chained into the pre-create feature query. After the query the gating rayTracingPipeline bool is checked; capture-replay / trace-rays- indirect / traversal-primitive-culling sub-features are cleared since the tests don't exercise them. - Function pointers: vkCreateRayTracingPipelinesKHR, vkGetRayTracingShaderGroupHandlesKHR, vkCmdTraceRaysKHR. - Properties: VkPhysicalDeviceRayTracingPipelinePropertiesKHR is cached at device-create time for SBT handle size / alignment / base-alignment. - VKRayTracingPipelineState derives from VulkanPipelineState; an IsRayTracing flag on the base lets the existing Vulkan cast<> path stay polymorphic without adding a new GPUAPI value. classof tests both the API and the flag. The derived class also carries a StringMap<uint32_t> resolving each shader EntryPoint or HitGroup Name to its index in the pipeline's group array, plus per-bucket counts so the SBT builder can slice the contiguous handle blob into raygen / miss / hit / callable regions. - createPipelineRT builds a single VkShaderModule (the DXIL library compiles to one SPIR-V module with multiple OpEntryPoints), then one VkPipelineShaderStageCreateInfo per Shader entry and one VkRayTracingShaderGroupCreateInfoKHR per general shader / hit group. Pipeline layout is shared with the compute path via createPipelineLayout, gated on all six RT stage flags so any binding can be consumed from any RT shader. - createShaderBindingTable allocates a host-visible coherent buffer big enough for four regions and lays out each entry as [handle bytes][localRootData bytes][padding-to-stride]. Per-region stride = align(handleSize + max-local-root-data-in-region, handleAlignment); per-region size = align(count * stride, baseAlignment). LocalRootData support comes free from the PR1 SBT schema; the test doesn't exercise it yet. Each region's VkStridedDeviceAddressRegionKHR derives from the buffer's vkGetBufferDeviceAddress. - dispatchRays binds the pipeline at VK_PIPELINE_BIND_POINT_RAY_TRACING_KHR, emits a pre-barrier with AS_READ + SHADER_READ/WRITE dst access into RAY_TRACING_SHADER_BIT_KHR, then calls vkCmdTraceRaysKHR with the SBT's four region structs. - createCommands picks the new bind point for RT pipelines so vkCmdBindDescriptorSets binds to the right point. executeProgram's isRayTracing branch builds a RayTracingPipelineCreateDesc from the YAML, calls createPipelineRT then createShaderBindingTable, and keeps both on InvocationState for the dispatch. raygen-roundtrip.test now expects DirectX/Metal/Clang to XFAIL; on a DXC + Vulkan combo with VK_KHR_ray_tracing_pipeline supported the test should PASS via this implementation. On the user's Linux + clang-dxc loop the test still XFAILs because clang-dxc doesn't yet lower [shader("raygeneration")] entry points to SPIR-V, so the Clang XFAIL token catches the compile failure. CI on a working DXC install will exercise the runtime path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Second per-backend bring-up in the PSO raytracing series (llvm#1268). Mirrors PR llvm#1273 for D3D12: builds an ID3D12StateObject from the YAML schema, hands out shader identifiers via ID3D12StateObjectProperties, lays out the SBT in an upload heap, and routes DispatchRays through ID3D12GraphicsCommandList4 (same query path the AS build already uses). DXRayTracingPipelineState derives from DXPipelineState with an IsRayTracing flag on the base for classof — matching the VulkanPipelineState pattern. It carries the ID3D12StateObject + a cached ID3D12StateObjectProperties + a StringMap<const void *> that resolves each shader EntryPoint or hit-group Name to its 32-byte shader identifier blob. The identifiers are driver-owned and stay alive for the Properties COM lifetime, so the PSO keeps Properties alive. DXShaderBindingTable holds a single upload-heap buffer plus four pre-built D3D12_DISPATCH_RAYS_DESC ranges (raygen, miss, hit-group, callable) — `RANGE` for raygen since it's always one record, and `RANGE_AND_STRIDE` for the others. createPipelineRT builds a CD3DX12_STATE_OBJECT_DESC with subobjects for the DXIL library (one export per Shader entry), per-hit-group subobjects with closest-hit / any-hit / intersection imports, the pipeline shader config (max payload + max attribute bytes), pipeline config (max recursion depth), and a global root signature subobject. The root signature comes from the library's embedded RTS0 part when present, falling back to the BindingsDesc path (matching the existing compute / raster pipeline behaviour). Wide strings for the subobject exports live in a SmallVector that outlives the SODesc, since the helper classes store pointers into the strings rather than copying. createShaderBindingTable lays out each entry as [identifier][LocalRootData][padding-to-stride] with per-region stride = align(D3D12_SHADER_IDENTIFIER_SIZE_IN_BYTES + max-LocalRoot- Data-in-region, D3D12_RAYTRACING_SHADER_RECORD_BYTE_ALIGNMENT) and per-region size = align(count * stride, D3D12_RAYTRACING_SHADER_TABLE_BYTE_ALIGNMENT). The buffer lives in an upload heap with D3D12_RESOURCE_STATE_GENERIC_READ — PR3 simplification; a staging copy into a default heap is a follow-up. dispatchRays queries the underlying CommandListX for ID3D12GraphicsCommandList4 (matching the AS-build path), binds the global root signature via SetComputeRootSignature, calls SetPipelineState1 with the state object, and issues DispatchRays with a D3D12_DISPATCH_RAYS_DESC populated from the SBT's four ranges plus the dispatch dimensions. The descriptor heap + descriptor-table bindings are set up by the existing createComputeCommands helper before the encoder is created. createComputeCommands grows an isRayTracing branch at the dispatch point so it calls dispatchRays instead of dispatch, reusing all of the descriptor-heap and root-signature wiring. InvocationState carries a ShaderBindingTable unique_ptr that's only populated for RT pipelines. executeProgram's isRayTracing branch builds a RayTracingPipelineCreate- Desc from Pipeline.Shaders / HitGroups / RTConfig, calls createPipelineRT then createShaderBindingTable, then re-enters createComputeCommands which dispatches via the new RT path. raygen-roundtrip.test's XFAIL becomes Clang, Metal — DirectX should PASS via this implementation on Windows CI (and via Wine + vkd3d-proton locally on Linux). The Clang token still catches the compile failure on clang-dxc since [shader("raygeneration")] doesn't yet lower to either DXIL libraries or SPIR-V on that path. Locally verified by cross-compiling lib/API/DX/Device.cpp via `clang++ --target=x86_64-pc-windows-msvc` against the xwin Windows SDK headers and the project's bundled DirectX-Headers. Runtime verification is left to Windows CI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RayTracing pipelines compile every entry point — raygen, miss, closest-hit, any-hit, intersection, callable — into a single DXIL library via `dxc -T lib_6_x` / `clang-dxc -T lib_6_x`. That's the shape every real DXR app ships: D3D12's CreateStateObject requires a DXIL-library subobject anyway, and the driver fuses entry points across the whole library at link time, so writing one .hlsl file and compiling it once is both idiomatic and the path the framework's `%dxc_target_lib` substitution emits. Compute and raster pipelines stay one-to-one (the existing position- based mapping handles VS+PS, AS+MS+PS, etc.). RT pipelines today need N positional args even though one library blob holds every entry — which the foundational `raygen-roundtrip.test` runs straight into: 3 Shaders[] entries vs 1 input file fails the count check before any GPU work happens. Detect the RT-pipeline-with-one-input shape and copy the library blob into every `Shaders[].Shader` slot via `MemoryBuffer::getMemBufferCopy`. Each entry owns its own buffer copy (DXIL libraries are KBs, no real memory pressure) keeping the existing `unique_ptr<MemoryBuffer>` ownership model intact. Non-RT pipelines still go through the positional path and still enforce the count check. Verified by re-running `raygen-roundtrip.test`'s pipeline.yaml + the DXIL library via Wine + vkd3d-proton with a single .o argument — same 0xBEEF result the prior three-arg invocation produced. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DXR-style ray tracing reaches Metal through metal_irconverter: each RT
entry point is lowered from DXIL to a Metal IR function, raygen is
emitted as a kernel (IRRayGenerationCompilationKernel) so it can be
dispatched directly, and miss / closest-hit / any-hit / intersection /
callable functions are emitted as visible functions and pulled into a
MTLVisibleFunctionTable.
Implements the three virtuals the foundation PR left stubbed on Metal:
• MTLDevice::createPipelineRT compiles every Shaders[] entry against a
single IRRayTracingPipelineConfiguration (max attribute/recursion
from the YAML RTConfig), builds one MTL::Library per entry, hands
the raygen function to the compute pipeline as the kernel, and
registers the rest as LinkedFunctions. The freshly-built pipeline
then mints a MTLVisibleFunctionTable and resolves each callable
function's handle into a slot index that the SBT builder reuses.
• MTLDevice::createShaderBindingTable lays the four SBT regions out
via the shared computeSBTLayout helper sized for IRShaderIdentifier
records, looks up each region entry's ShaderName in the pipeline's
name → IRShaderIdentifier map, and memcpys the records into a
shared-storage MTL::Buffer the runtime will dereference at dispatch.
• MTLComputeEncoder::dispatchRays binds the raygen pipeline and runs
dispatchThreads(Width, Height, Depth) on the encoder. The caller
(createRayTracingCommands) is responsible for binding the global
descriptor heap, top-level argument buffer, IRDispatchRaysArgument
(slot 3), and marking the SBT buffer + function tables resident.
The IRDispatchRaysArgument struct is built per-dispatch in
createRayTracingCommands: SBT region addresses + sizes (read off the
MTLShaderBindingTable), GRS / ResDescHeap GPU pointers, and the
visible / intersection function table resourceIDs. It's parked in a
shared MTL::Buffer kept alive on the command buffer's KeepAlive list
and bound at kIRRayDispatchArgumentsBindPoint so callees reached via
TraceRay() inherit the same dispatch state through that pointer.
Plumbs the existing executeProgram RT branch on Metal the same way the
VK / DX backends already do (validate Shaders / SBT / RTConfig, build
RayTracingPipelineCreateDesc from the YAML pipeline, create PSO, build
SBT, record commands), and adds the raytracing-pipeline lit feature
on Metal so test/Feature/RT/raygen-roundtrip.test drops Metal from its
XFAIL list and passes natively on Apple Silicon (the 0xBEEF payload
roundtrip matches the DX / VK references, verified locally on
macOS 15 / metal-irconverter 3.1.1).
This PR1 bring-up only handles Triangle hit groups whose only member
is a ClosestHit shader — any-hit / intersection / procedural / local
root signatures land in follow-ups; createPipelineRT now returns a
clear unsupported error for those shapes instead of silently producing
wrong output.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ldRay Three small PSO RT tests stacked on llvm#1275, each isolating one shader- observable closest-hit system value from llvm#1268's 👍 list. Same shape as the prior batch in llvm#1277 — one .test file per behavior, single-purpose shader, exact buffer comparison. - `closest-hit-barycentrics.test` — 3-lane dispatch, each lane fires at a clearly-interior point of the single triangle so the closest- hit shader reports a known `BuiltInTriangleIntersectionAttributes ::barycentrics` (u, v). Points are picked from the inside of the triangle to avoid the watertight-traversal edge-rule lottery you hit at edge midpoints / vertices (the first cut of this test used midpoint(v0, v1) and one lane silently missed on both backends). - `closest-hit-primitive-index.test` — three triangles tiled at x = -3, 0, +3 in a single BLAS. 3-lane dispatch fires straight down at each triangle's centroid; the closest-hit reports `PrimitiveIndex()` and must match the lane index 0..2. - `closest-hit-world-ray.test` — 2-lane dispatch with rays from different z heights (1.0 and 2.0). Closest-hit packs `WorldRayOrigin().z`, `WorldRayDirection().z`, and `RayTCurrent()` through the payload; raygen flattens the float3 into a 6-element Float32 buffer. Verifies the system values match the raygen-side `RayDesc` and that t is correctly computed by the traversal. All three are `# REQUIRES: raytracing-pipeline` with `# XFAIL: Clang` — `clang-dxc` doesn't yet lower `[shader(…)]` entry points. With the Metal RT bring-up rebased on top, all three pass natively on Apple Silicon and Metal is dropped from the XFAIL list. Locally verified end-to-end on the user's Linux box: all three pass on Vulkan via the native offloader, and on D3D12 via Wine + vkd3d-proton + the cross-compiled `offloader.exe`, against an NVIDIA RTX 3060. And on macOS 15 / metal-irconverter 3.1.1 via the native offloader: all three PASS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
81848cb to
dd817cc
Compare
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Depends on #1281
Summary
Three small PSO raytracing tests stacked on #1275, each isolating one shader-observable closest-hit system value from #1268's 👍 list. Same shape as the prior batch in #1277 — one
.testfile per behavior, single-purpose shader, exact buffer comparison.closest-hit-barycentrics.test— 3-lane dispatch, each lane fires at a clearly-interior point of the single triangle so the closest-hit shader reports a knownBuiltInTriangleIntersectionAttributes::barycentrics(u, v). Points are picked from the inside of the triangle to avoid the watertight-traversal edge-rule lottery you hit at edge midpoints / vertices (the first cut of this test usedmidpoint(v0, v1)and one lane silently missed on both backends).closest-hit-primitive-index.test— three triangles tiled at x = -3, 0, +3 in a single BLAS. 3-lane dispatch fires straight down at each triangle's centroid; the closest-hit reportsPrimitiveIndex()and must match the lane index 0..2.closest-hit-world-ray.test— 2-lane dispatch with rays from different z heights (1.0 and 2.0). Closest-hit packsWorldRayOrigin().z,WorldRayDirection().z, andRayTCurrent()through the payload; raygen flattens the float3 into a 6-element Float32 buffer. Verifies the system values match the raygen-sideRayDescand that t is correctly computed by the traversal.All three are
# REQUIRES: raytracing-pipelinewith# XFAIL: Clang—clang-dxcdoesn't yet lower[shader("…")]entry points. With the Metal RT bring-up in #1281 rebased underneath this branch, all three pass natively on Apple Silicon andMetalis dropped from the XFAIL list.Test plan
offloaderagainst an NVIDIA RTX 3060: all three PASS.offloader.exeon the same GPU: all three PASS.offloaderon Apple Silicon (macOS 15 / metal-irconverter 3.1.1): all three PASS.