Explores GPU architecture fundamentals and advanced techniques for maximizing SIMD unit utilization and VALU throughput. Covers bottleneck identification using profiling tools, shader type selection strategies, occupancy optimization, and async compute for parallel execution. Discusses trade-offs between pixel shaders and compute shaders, wave size considerations, and practical approaches to overlap rendering tasks for better GPU resource utilization.
Sort: