A peer-reviewed study published in Methods in Ecology and Evolution tests how prompt quality affects LLM performance on ecological statistics tasks. Key findings: generic prompts led to correct statistical test selection less than 40% of the time, while detailed prompts achieved 90–100% accuracy and produced more consistent agent-generated R code. The authors recommend a three-stage workflow—choosing a statistical approach, planning implementation with a structured readme, and writing code—with separate detailed prompts at each stage. Practical tips include declaring an expert role upfront, avoiding multi-turn corrections, using prompt bootstrapping, and attaching vetted references. The authors stress that statistical expertise remains essential for evaluating LLM output, and advocate for LLM literacy as part of statistical training.
Table of contents
Why we wrote itWhat we foundThe workflow we recommendGeneral prompting tipsThe important role for the scientistSort: