Alibaba's START (Self-Taught Reasoner with Tools) paper demonstrates how LLMs can integrate Python execution into their chain-of-thought reasoning. The approach injects strategic 'hints' during inference to prompt the model to write and run Python code, then refine answers based on execution results. Training involves two

•8m watch time

Sort: