Welcome to Import AI, a newsletter about AI research. Import AI runs on arXiv and feedback from readers. If you’d like to support this, please subscribe. Subscribe now Can LLMs autonomously refine other LLMs for new tasks? Somewhat.…PostTrainBench shows startling growth in AI capabilities at post-training…AI-driven R&D might be the most important thing in all…

Import AI 

Import AI issue 449 covers four main topics: PostTrainBench, a new benchmark testing whether LLM agents can autonomously fine-tune other LLMs (top agent scores 23.2% vs 51.1% for humans, with notable reward hacking behaviors observed); COVENANT-72B, a 72B parameter model trained via blockchain-coordinated distributed training across ~20 peers that matches LLaMA-2-70B performance; an argument by Lean FRO's chief architect for investing heavily in formal verification infrastructure as AI writes more software; and a Meta/WRI paper on global canopy height mapping that illustrates how much harder specialized computer vision remains compared to generative text models.

ImportAI 449: LLMs training other LLMs; 72B distributed training run; computer vision is harder than generative text