#performance
2 posts tagged #performance.
-
CUDA 13.3: tile programming in C++ without the boilerplate
NVIDIA CUDA 13.3 (May 26) adds C++ tile programming: declarative tile abstractions replace manual shared memory, synchronization, and indexing. CompileIQ autotuning uses evolutionary algorithms to tune tile sizes and memory layout per kernel (up to 15% speedup on GEMM/attention). Works on Hopper and all other supported architectures.
-
Could C++ handle an ABI break? The 2026 case
Two pieces dropped in the same week: Luis Caro Campos' CppCon 2025 talk arguing package managers make ABI breaks manageable, and an HFT University article claiming a 58x P99 latency gap between Rust's and C++'s stdlib. The ABI debate is back. Here is what both sides are saying, and what C++26 shipped despite the constraint.