A conversation with John Schulman on the first year LLMs could have been useful, building research teams, and where RL goes from here.

00:00 - Speedrunning ChatGPT
09:22 - Archetypes of research managers
11:56 - Was OpenAI inspired by Bell Labs?
16:54 - The absence of value functions
18:23 - Continual learning
21:09 - Brittle generalization
24:05 - Co-training generators and verifiers, GANs
27:06 - John’s personal use of AI for research
28:54 - Day in the life
33:01 - Slowdowns in consequential ML ideas
36:21 - "Peer review" within the labs
39:19 - Distribution shift in researchers
43:33 - Future of RL
45:33 - Will the labs coordinate if the world needs them to?
44:46 - Forecasting ills in AGI and engineering
47:53 - Thinking Machines

Cursor

John Schulman, co-founder of OpenAI and now at Thinking Machines, reflects on early OpenAI's ragtag origins, failed projects like Universe, and what it would have taken to build ChatGPT earlier with full hindsight. He discusses why value functions are currently unpopular in RL, the future of continual learning, co-training generators and verifiers, and multi-agent game-based training. He shares his personal AI workflow using Cursor, Claude Code, and GPT-5 Pro for literature search and idea iteration. He also introduces Tinker, a low-level fine-tuning API from Thinking Machines aimed at ML researchers who want to run post-training algorithms without managing GPU infrastructure. The conversation covers research management styles, how the field's talent distribution has shifted toward engineering over research taste, AGI timeline uncertainty, and the challenges of coordinating between major AI labs.

John Schulman on dead ends, scaling RL, and building research institutions