Production-grade RAG pipelines with evaluation baked in — not bolted on after deployment.
Docs · Website · Discussions · Changelog
Most RAG projects ship without evaluation, and most evaluation libraries don't help you build the pipeline. Few tools score maturity end-to-end — so teams often don't know if they're at "a demo that sometimes works" or "a system you can put in front of customers."
- Building a RAG pipeline is easy. Knowing whether it works is hard. RAG-Forge closes that loop.
- Eval is a first-class citizen, not an afterthought. Every template ships with a golden set and an audit gate.
- The RAG Maturity Model (RMM-0 → RMM-5) gives you a concrete scorecard for any RAG system — yours or someone else's.
RAG-Forge is one of the few toolkits that scaffolds production-ready RAG pipelines, runs continuous evaluation as a CI/CD gate, and scores any existing system against a published maturity model — all in one CLI.
The RMM is the scoring framework at the heart of RAG-Forge. Run rag-forge assess on any audit report to see where your system sits.
| Level | Name | Exit Criteria |
|---|---|---|
| RMM-0 | Naive | Basic vector search works |
| RMM-1 | Better Recall | Hybrid search, Recall@5 > 70% |
| RMM-2 | Better Precision | Reranker active, nDCG@10 +10% |
| RMM-3 | Better Trust | Guardrails, faithfulness > 85% |
| RMM-4 | Better Workflow | Caching, P95 < 4s, cost tracking |
| RMM-5 | Enterprise | Drift detection, CI/CD gates, adversarial tests |
npm install -g @rag-forge/cli
# Scaffold a project (use --directory to name the folder)
rag-forge init basic --directory my-rag-project
cd my-rag-project
# Drop your documents into a folder of your choice (or use the example below)
mkdir docs
echo "RAG-Forge is a CLI for building and evaluating RAG pipelines." > docs/example.md
rag-forge index --source ./docs
rag-forge audit --golden-set eval/golden_set.json
rag-forge assess --audit-report reports/audit-report.jsonFrom empty directory to a scored RAG system with a golden set and an audit report — in under a minute.
CLI (Node.js 20+):
npm install -g @rag-forge/cliPython packages (Python 3.11+):
pip install rag-forge-core rag-forge-evaluator rag-forge-observability| Template | Use Case |
|---|---|
basic |
First RAG project, simple Q&A |
hybrid |
Production-ready document Q&A with reranking |
agentic |
Multi-hop reasoning with query decomposition |
enterprise |
Regulated industries with full security suite |
n8n |
AI automation agency deployments |
Templates generate editable source code in your project — not framework dependencies. Fork the code, not the abstraction.
| Category | Commands |
|---|---|
| Scaffolding | init, add |
| Ingestion | parse, chunk, index |
| Query | query, inspect |
| Evaluation | audit, assess, golden add, golden validate |
| Operations | report, cache stats, drift report, cost |
| Security | guardrails test, guardrails scan-pii |
| Integration | serve --mcp, n8n export |
Run rag-forge --help for the full command reference.
There are great tools in this space. Here's an honest look at where each fits.
| Capability | RAG-Forge | RAGAS | LangChain Eval | Giskard |
|---|---|---|---|---|
| Scaffolds a RAG pipeline | ✓ | — | — | — |
| Evaluation metrics | ✓ | ✓ | ✓ | ✓ |
| Maturity scoring (RMM-0 → 5) | ✓ | — | — | — |
| CI gate workflow (audit action) | ✓ | — | partial | partial |
| MCP server | ✓ | — | — | — |
| Guardrails / PII scanning | ✓ | — | partial | ✓ |
| Drift detection | ✓ | — | — | partial |
| Multi-language (TS + Python) | ✓ | — | ✓ | — |
| Framework-agnostic | ✓ | ✓ | — | ✓ |
Peer strengths worth knowing:
- RAGAS has deeper metric research and a large community. RAG-Forge's evaluator supports RAGAS as a backend — run
rag-forge audit --evaluator ragasto use it directly. - LangChain Eval has the broadest ecosystem of integrations if you're already invested in LangChain.
- Giskard has a strong general-purpose ML testing story beyond RAG.
Pick the tool that matches your stage. RAG-Forge's wedge is the full lifecycle — scaffold → evaluate → score → ship — in one CLI, with the RMM as the objective function.
RAG-Forge is a polyglot monorepo. The CLI and MCP server are TypeScript; all RAG logic is Python. The CLI delegates to Python via a subprocess bridge so the two halves can be developed and versioned independently.
rag-forge/
├── packages/
│ ├── cli/ TypeScript — Commander.js CLI (rag-forge command)
│ ├── mcp/ TypeScript — MCP server (@modelcontextprotocol/sdk)
│ ├── core/ Python — RAG pipeline primitives
│ ├── evaluator/ Python — RAGAS + DeepEval + LLM-as-Judge
│ └── observability/ Python — OpenTelemetry + Langfuse
├── templates/ Project templates (basic, hybrid, agentic, enterprise, n8n)
└── apps/site/ Docs and marketing site (Next.js, deployed to Vercel)
See docs/architecture.md for a deeper dive.
- 📚 Docs: https://rag-forge-docs.vercel.app/
- 🌐 Website: https://rag-forge-site.vercel.app/
- 💬 Discussions: https://github.com/hallengray/rag-forge/discussions
- 🔒 Security: see SECURITY.md
- 📝 Changelog: docs/release-notes
See CONTRIBUTING.md for development setup and contribution guidelines. All contributors are expected to follow our Code of Conduct.
MIT — see LICENSE