Forge is a Python framework (forge-guardrails on PyPI) that adds a reliability layer to self-hosted LLM tool-calling and multi-step agentic workflows. It provides guardrails (rescue parsing, retry nudges, step enforcement), VRAM-aware context management, and tiered compaction to boost small local models (~8B) to competitive performance. Three usage modes are offered: a WorkflowRunner for full agentic loops, composable guardrails middleware for custom orchestration, and a drop-in OpenAI-compatible proxy server. Supported backends include Ollama, llama-server (llama.cpp), Llamafile, and Anthropic. The top self-hosted config (Ministral-3 8B Instruct Q8 on llama-server) scores 86.5% on forge's 26-scenario eval suite. The framework is backed by a published IEEE paper with an ablation study.

6m read timeFrom github.com
Post cover image
Table of contents
RequirementsInstallQuick StartProxy ServerBackendsRunning TestsEval HarnessProject StructureDocumentationPaperLicense

Sort: