Alibaba's START (Self-Taught Reasoner with Tools) paper demonstrates how LLMs can integrate Python execution into their chain-of-thought reasoning. The approach injects strategic 'hints' during inference to prompt the model to write and run Python code, then refine answers based on execution results. Training involves two
ā¢8m watch time
Sort: