TorchSpec is a PyTorch-native framework for training speculative decoding draft models at scale using a disaggregated architecture. It separates the inference system (which generates hidden states from a large target model) from the training system (which trains the draft model), streaming tensor data between them via RDMA or
Table of contents
IntroductionBackgroundTorchSpec: Disaggregated Draft Model TrainingRoadmapAcknowledgementSort: