Further tests would have been interesting. I wanted to test CPU to GPU bandwidth using the GPU’s copy engine. DMA engines can queue up memory accesses independently of CPU (or GPU) cores, and are generally more latency tolerant. Nemes does have a test that uses vkCmdCopyBuffer to test exactly that. Unfortunately, that test hung and never completed. Checking dmesg showed the kernel complaining about PCIe errors and graphics exceptions. I tried looking up some of those messages in Linux source code, but couldn’t find anything. They probably come from a closed source Nvidia kernel module. Overall, I had a frustrating experience exercising NVLink C2C. At least the Vulkan test didn’t hang the system, unlike running a plain memory latency test targeting the HBM3 memory pool. I also couldn’t use any OpenCL tests. clinfo could detect the GPU, but clpeak or any other application was unable to create an OpenCL context. I didn’t have the same frustrating experience with H100 PCIe cloud instances, where the GPU pretty much behaved as expected with Vulkan or OpenCL code. It’s a good reminder that designing and validating a custom platform like GH200 can be an incredibly difficult task.