Import AI issue 449 covers four main topics: PostTrainBench, a new benchmark testing whether LLM agents can autonomously fine-tune other LLMs (top agent scores 23.2% vs 51.1% for humans, with notable reward hacking behaviors observed); COVENANT-72B, a 72B parameter model trained via blockchain-coordinated distributed training across ~20 peers that matches LLaMA-2-70B performance; an argument by Lean FRO's chief architect for investing heavily in formal verification infrastructure as AI writes more software; and a Meta/WRI paper on global canopy height mapping that illustrates how much harder specialized computer vision remains compared to generative text models.

16m read timeFrom jack-clark.net
Post cover image
Table of contents
Share this:Like this:Related

Sort: