A research study evaluates 9 state-of-the-art LLMs in the context of a functional programming course using OCaml, a low-resource language. Three benchmarks were created: λCodeGen (homework programming problems), λRepair (programs with syntax/type/logical errors from student submissions), and λExplain (theoretical concept questions). Results show top LLMs perform well overall but solve fewer problems in OCaml compared to Python/Java. LLMs excel at fixing syntax and type errors and answering basic conceptual questions. The benchmarks aim to help instructors, students, and PL researchers understand LLM limitations and opportunities for improvement in low-resource language settings.

2m read timeFrom programming-journal.org
Post cover image
Table of contents
Abstract

Sort: