The post shares profiling data from DeepSeek's training and inference framework, focusing on communication-computation overlap strategies and low-level implementation details. The data, captured using the PyTorch Profiler, can be visualized through Chrome or Edge tracing tools. It covers training, prefilling, and decoding
Sort: