Langchain is a publication focusing on programming languages, language design, and compiler development. Readers can explore articles covering topics such as language features, syntax design, and compiler optimization techniques. Additionally, they can learn about programming language theory, language implementation challenges, and practical applications of language design principles.

LangChain

LangChain shares their methodology for building evaluations for Deep Agents, an open-source model-agnostic agent harness. The core principle is that more evals don't equal better agents — targeted evals that reflect real production behaviors do. They cover three areas: data curation (dogfooding, adapting external benchmarks like BFCL and Terminal Bench 2.0, and hand-written evals), metric definition (correctness first, then efficiency via step ratio, tool call ratio, latency ratio, and solve rate), and execution (pytest with GitHub Actions in CI, with tag-based filtering for cost control). A key concept is the 'ideal trajectory' — a reference sequence of steps used to measure efficiency against. All eval runs are traced to LangSmith for shared team visibility.

How we Build Evals for Deep Agents