The post shares profiling data from DeepSeek's training and inference framework, focusing on communication-computation overlap strategies and low-level implementation details. The data, captured using the PyTorch Profiler, can be visualized through Chrome or Edge tracing tools. It covers training, prefilling, and decoding

2m read timeFrom github.com
Post cover image
Table of contents
TrainingInference

Sort: