TorchSpec is a PyTorch-native framework for training speculative decoding draft models at scale using a disaggregated architecture. It separates the inference system (which generates hidden states from a large target model) from the training system (which trains the draft model), streaming tensor data between them via RDMA or

10m read timeFrom pytorch.org
Post cover image
Table of contents
IntroductionBackgroundTorchSpec: Disaggregated Draft Model TrainingRoadmapAcknowledgement

Sort: