feat: per-turn tool retrieval (trim tool-definition context flood)#858
feat: per-turn tool retrieval (trim tool-definition context flood)#858anandgupta42 wants to merge 3 commits into
Conversation
…rieval, critic gate) Three flag-gated, independent, default-off reliability features for the agent loop: - `provider/constrained.ts` — grammar/JSON-Schema constrained decoding so a local model (vLLM/LM Studio/llama.cpp) is forced to emit a parseable, schema-correct tool call. Pure (schema in → payload out). - `tool/retrieval.ts` — per-turn tool subset (always-on core + lexical top-k), never dropping a tool referenced mid-trajectory; trims the ~78-tool context flood. v1 lexical, dependency-free, deterministic. - `tool/critic.ts` — pre-execution gate for side-effecting tools via a pluggable `Verifier`; default allows everything (ungated), a real verifier is injected. Wired flag-gated into `session/llm.ts` (markers; default off → upstream path unchanged). 18 unit tests; typecheck clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughAdds deterministic lexical-ranked tool selection to reduce context during inference and wires it conditionally into ChangesTool Retrieval and LLM Integration
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
The `ALTIMATE_CRITIC_GATE` flag was a no-op — `tool/critic.ts` was never imported into the execute path, so enabling the flag did nothing. Removing it from this PR so every shipped flag is actually wired: - `ALTIMATE_TOOL_RETRIEVAL` — wired in `session/llm.ts`, validated (-50% input tokens at equal resolve) - `ALTIMATE_CONSTRAINED_TOOLCALLS` — wired in `session/llm.ts` (local providers) The critic gate (pre-execution `Verifier` for side-effecting tools) moves to a follow-up that wires it into the `session/prompt.ts` execute wrapper with an integration test. Code preserved on `feat/critic-gate`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
packages/opencode/src/tool/retrieval.ts (1)
55-76: ⚡ Quick win
topkbudget is mostly consumed by CORE.
keep(CORE + callerkeep) counts againsttopk, and CORE alone is 10 entries. With the defaulttopk: 12, an enabled retrieval pass exposes only ~2 lexically-ranked non-core tools out of ~78 — which may starve the selection of task-relevant tools and undercut the feature's intent. Consider treatingtopkas a budget for retrieved tools beyond core/keep, or raising the default.♻️ One option: budget retrieved tools separately from core/keep
- for (const r of ranked) { - if (keep.size >= topk) break - keep.add(r.name) - } + const limit = keep.size + topk // topk additional tools beyond core + forced-keep + for (const r of ranked) { + if (keep.size >= limit) break + keep.add(r.name) + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/opencode/src/tool/retrieval.ts` around lines 55 - 76, The current select function lets CORE and caller-provided opts.keep consume the topk budget, so CORE (10 items) leaves almost no room for lexically-ranked tools; adjust select so topk (and minToolsToRetrieve behavior) applies only to additional retrieved tools beyond CORE/keep: compute baseKeep from CORE and opts.keep, then compute retrievedBudget = (opts.topk ?? 12) - baseKeep.size (clamped to >=0) or, better, treat topk as the number of non-core tools directly (e.g., retrievedTopk = opts.topk ?? 12) and only count items added from ranked into that retrievedTopk; update the loop that iterates over ranked (and any minToRetrieve logic) to stop after retrievedBudget/retrievedTopk is filled while still always including all CORE/keep entries; reference function select, variables topk/minToRetrieve, CORE, keep, rest, ranked, and score to locate and modify the selection and stopping logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/opencode/src/tool/retrieval.ts`:
- Around line 17-20: Retrieval.CORE contains "ls" which doesn't match any
registered Tool id, so update the CORE array in retrieval.ts: either replace
"ls" with the actual registered tool id used when defining the listing tool
(e.g., the id passed to Tool.define for the filesystem/ls-like tool) or remove
"ls" from Retrieval.CORE; ensure the change aligns with the select() behavior
that only keeps CORE entries present in the candidate set so the tool id in
Retrieval.CORE must exactly match a Tool.define(...) id.
---
Nitpick comments:
In `@packages/opencode/src/tool/retrieval.ts`:
- Around line 55-76: The current select function lets CORE and caller-provided
opts.keep consume the topk budget, so CORE (10 items) leaves almost no room for
lexically-ranked tools; adjust select so topk (and minToolsToRetrieve behavior)
applies only to additional retrieved tools beyond CORE/keep: compute baseKeep
from CORE and opts.keep, then compute retrievedBudget = (opts.topk ?? 12) -
baseKeep.size (clamped to >=0) or, better, treat topk as the number of non-core
tools directly (e.g., retrievedTopk = opts.topk ?? 12) and only count items
added from ranked into that retrievedTopk; update the loop that iterates over
ranked (and any minToRetrieve logic) to stop after retrievedBudget/retrievedTopk
is filled while still always including all CORE/keep entries; reference function
select, variables topk/minToRetrieve, CORE, keep, rest, ranked, and score to
locate and modify the selection and stopping logic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 09c66980-6898-48d1-8b44-a89331a61b52
📒 Files selected for processing (5)
packages/opencode/src/provider/constrained.tspackages/opencode/src/session/llm.tspackages/opencode/src/tool/retrieval.tspackages/opencode/test/provider/constrained.test.tspackages/opencode/test/tool/retrieval.test.ts
| export const CORE = [ | ||
| "bash", "read", "write", "edit", "glob", "grep", "ls", | ||
| "task", "todowrite", "skill", | ||
| ] |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# List registered tool IDs to compare against CORE.
rg -nP "id:\s*['\"][a-z_]+['\"]" --type=ts -g '!**/*.test.ts' packages/opencode/src/tool || true
# Also look for how tool ids/names are declared
ast-grep --pattern 'Tool.define($$$)'
rg -nP '\b(todowrite|todo_write|todoread)\b' --type=ts -g '!**/*.test.ts'Repository: AltimateAI/altimate-code
Length of output: 50381
🏁 Script executed:
#!/bin/bash
set -euo pipefail
CORE=(bash read write edit glob grep ls task todowrite skill)
# Show where each CORE id is registered via Tool.define("<id>", ...)
for id in "${CORE[@]}"; do
echo "==== $id ===="
rg -n "Tool\.define\\(\"${id//\//\\/}\"" packages/opencode/src/tool -S --type=ts || true
done
# Show the CORE constant and retrieval selection logic for context
echo "==== CORE constant ===="
rg -n "export const CORE" packages/opencode/src/tool/retrieval.ts
sed -n '1,120p' packages/opencode/src/tool/retrieval.ts | nl -ba | sed -n '1,120p'
echo "==== select() logic (if present) ===="
rg -n "function select|const select|select\\(" packages/opencode/src/tool/retrieval.ts
sed -n '120,260p' packages/opencode/src/tool/retrieval.ts | nl -ba | sed -n '120,260p'Repository: AltimateAI/altimate-code
Length of output: 1149
🏁 Script executed:
#!/bin/bash
set -euo pipefail
CORE=(bash read write edit glob grep ls task todowrite skill)
# Show where each CORE id is registered via Tool.define("<id>", ...)
for id in "${CORE[@]}"; do
echo "==== $id ===="
rg -n "Tool\.define\\(\"${id//\//\\/}\"" packages/opencode/src/tool -S --type=ts || true
done
# Show the CORE constant and retrieval selection logic for context
echo "==== CORE constant ===="
rg -n "export const CORE" packages/opencode/src/tool/retrieval.ts
sed -n '1,120p' packages/opencode/src/tool/retrieval.ts | nl -ba | sed -n '1,120p'
echo "==== select() logic (if present) ===="
rg -n "function select|const select|select\\(" packages/opencode/src/tool/retrieval.ts
sed -n '120,260p' packages/opencode/src/tool/retrieval.ts | nl -ba | sed -n '120,260p'Repository: AltimateAI/altimate-code
Length of output: 1149
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find the "ls" tool registration id
echo "==== registrations for ls ===="
rg -n "Tool\\.define\\(\"ls\"|Tool\\.define\\('ls'" packages/opencode/src/tool packages/opencode/src -S --type=ts || true
echo "==== any use of id: \"ls\" (Tool.define arg objects etc.) ===="
rg -n "(\"ls\")" packages/opencode/src/tool/retrieval.ts packages/opencode/src/tool -S --type=ts | head -n 200 || true
echo "==== retrieval.ts (CORE + select logic) ===="
cat -n packages/opencode/src/tool/retrieval.ts | sed -n '1,260p'Repository: AltimateAI/altimate-code
Length of output: 3714
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "==== Tool.define(<id>, ...) ids in packages/opencode/src ===="
rg -n "Tool\.define\(\s*(['\"])([a-z_]+)\1" packages/opencode/src -S --type=ts --type=tsx || true
echo "==== Any tool registry mentions of \"ls\" in packages/opencode/src ===="
rg -n "\"ls\"" packages/opencode/src -S --type=ts --type=tsx || trueRepository: AltimateAI/altimate-code
Length of output: 480
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "==== Tool.define IDs (capturing first arg) — filter for ls ===="
rg -n --pcre2 "Tool\.define\(\s*(['\"])([a-z_]+)\1" packages/opencode/src -S -g'*.ts' -g'*.tsx' \
| rg -n "\"ls\"|'ls'" || true
echo "==== Direct Tool.define(\"ls\" / 'ls') matches ===="
rg -n --pcre2 "Tool\.define\(\s*(['\"])ls\1" packages/opencode/src -S -g'*.ts' -g'*.tsx' || true
echo "==== Any tool-id string literal \"ls\" near tool code ===="
rg -n "\"ls\"" packages/opencode/src/tool -S -g'*.ts' -g'*.tsx' || trueRepository: AltimateAI/altimate-code
Length of output: 341
Fix Retrieval.CORE tool id mismatch for ls.
packages/opencode/src/tool/retrieval.ts includes "ls" in Retrieval.CORE, but there’s no Tool.define("ls", ...) (the only "ls" occurrence is inside CORE). Since select() force-keeps CORE entries only when they exist in the candidate set (if (all.has(n))), "ls" won’t be reliably “always-on” when retrieval is enabled—update "ls" to the actual registered tool id (or remove it).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/opencode/src/tool/retrieval.ts` around lines 17 - 20, Retrieval.CORE
contains "ls" which doesn't match any registered Tool id, so update the CORE
array in retrieval.ts: either replace "ls" with the actual registered tool id
used when defining the listing tool (e.g., the id passed to Tool.define for the
filesystem/ls-like tool) or remove "ls" from Retrieval.CORE; ensure the change
aligns with the select() behavior that only keeps CORE entries present in the
candidate set so the tool id in Retrieval.CORE must exactly match a
Tool.define(...) id.
There was a problem hiding this comment.
3 issues found across 5 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/opencode/src/tool/retrieval.ts">
<violation number="1" location="packages/opencode/src/tool/retrieval.ts:18">
P1: Core retrieval list uses `"ls"`, but the actual tool name is `"list"`; this can cause the list tool to be incorrectly filtered out.</violation>
<violation number="2" location="packages/opencode/src/tool/retrieval.ts:44">
P2: Scoring ignores 3-character keywords, which can miss relevant tools for common queries (e.g., sql/dbt).</violation>
</file>
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
| export namespace Retrieval { | ||
| /** Always-available agent essentials — never retrieved out. */ | ||
| export const CORE = [ | ||
| "bash", "read", "write", "edit", "glob", "grep", "ls", |
There was a problem hiding this comment.
P1: Core retrieval list uses "ls", but the actual tool name is "list"; this can cause the list tool to be incorrectly filtered out.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/tool/retrieval.ts, line 18:
<comment>Core retrieval list uses `"ls"`, but the actual tool name is `"list"`; this can cause the list tool to be incorrectly filtered out.</comment>
<file context>
@@ -0,0 +1,77 @@
+export namespace Retrieval {
+ /** Always-available agent essentials — never retrieved out. */
+ export const CORE = [
+ "bash", "read", "write", "edit", "glob", "grep", "ls",
+ "task", "todowrite", "skill",
+ ]
</file context>
| const words = new Set(query.toLowerCase().match(/[a-z_]+/g) ?? []) | ||
| const hay = (t.name + " " + (t.description ?? "")).toLowerCase() | ||
| let s = 0 | ||
| for (const w of words) if (w.length > 3 && hay.includes(w)) s += 1 |
There was a problem hiding this comment.
P2: Scoring ignores 3-character keywords, which can miss relevant tools for common queries (e.g., sql/dbt).
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/tool/retrieval.ts, line 44:
<comment>Scoring ignores 3-character keywords, which can miss relevant tools for common queries (e.g., sql/dbt).</comment>
<file context>
@@ -0,0 +1,77 @@
+ const words = new Set(query.toLowerCase().match(/[a-z_]+/g) ?? [])
+ const hay = (t.name + " " + (t.description ?? "")).toLowerCase()
+ let s = 0
+ for (const w of words) if (w.length > 3 && hay.includes(w)) s += 1
+ // small boost for a direct name mention
+ if (words.has(t.name.toLowerCase())) s += 3
</file context>
Constrained tool-call decoding is local-providers-only and has no validation run behind it yet (the A/B that justifies this PR measured tool retrieval). Removing it so the validated retrieval lever can land clean; constrained moves to its own branch/PR pending a local vLLM guided-decoding run. - remove `provider/constrained.ts` + its test - remove the constrained wiring + import from `session/llm.ts` (retrieval stays) #858 is now retrieval-only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
dev-punia-altimate
left a comment
There was a problem hiding this comment.
Multi-Persona Review — Verdict: block
The PR contains a critical syntax error that will cause runtime failure, and lacks proper error handling for destructive database operations. While the performance optimization is well-researched and aligned with best practices, the code as written will break production migrations.
15/15 agents completed · 236s · 9 findings (1 critical, 2 high, 1 medium)
Critical
- [code-reviewer] Incomplete string literal: 'workspace_id, endpoint_' is cut off mid-line, likely due to truncation in the diff. This will cause a SyntaxError when executed. →
alembic_tenant_clickhouse/migrations/versions/2026_06_01_0030_databricks_reorder_clickhouse_sort_keys.py:358- 💡 Complete the column name after 'endpoint_' — likely 'endpoint_id' or similar — to form a valid tuple element in the leading columns list.
High
- [tech-lead] Migration script directly executes raw SQL via SQLAlchemy text() without abstraction or validation layer, making it brittle and hard to test. →
alembic_tenant_clickhouse/migrations/versions/2026_06_01_0030_databricks_reorder_clickhouse_sort_keys.py:150- 💡 Extract table rebuild logic into a reusable utility in app/utils/clickhouse_migrator.py with input validation and logging, then call from migration. This improves testability and maintainability.
- [tech-lead] No explicit error handling around EXCHANGE TABLES or DROP operations; failure could leave orphaned _new tables or corrupt state. →
alembic_tenant_clickhouse/migrations/versions/2026_06_01_0030_databricks_reorder_clickhouse_sort_keys.py:280- 💡 Wrap table exchange and drop in try/except blocks with rollback logic and explicit logging of state before/after critical operations.
Medium
- [tech-lead] TABLE_PLANS dictionary contains 27 entries with inline string tuples; hard to validate and error-prone if column names change. →
alembic_tenant_clickhouse/migrations/versions/2026_06_01_0030_databricks_reorder_clickhouse_sort_keys.py:55- 💡 Define a dataclass or named tuple for TablePlan and validate column names against system.tables during migration startup to catch typos early.
Multi-Persona Review · vllm:qwen3-next-80b (waves) + vllm-fallback (synth) ·
What does this PR do?
Adds a flag-gated (
ALTIMATE_TOOL_RETRIEVAL, default OFF) per-turn tool subset: an always-on core set plus a lexically-ranked top-k of the remaining tools, never dropping a tool already referenced mid-trajectory. This trims the ~78-tool definition flood that hurts tool selection and inflates input tokens. v1 is lexical, dependency-free, and deterministic. Wired intosession/llm.ts(marker-wrapped); a no-op when the flag is unset.Validation (measured)
Tool retrieval — A/B, 8 ADE-bench dbt tasks, deepseek-v4-flash, value-graded (output checked for correctness, not just "builds"):
ALTIMATE_TOOL_RETRIEVAL=1→ identical resolve-rate (0 tasks differ), −50% input tokens, −49% cost.
Type of change
Issue for this PR
Closes #857
How did you verify your code works?
test/tool/retrieval.test.ts) — selection logic incl. always-on core + referenced-tool retention; all green.tsgotypecheck clean;altimate_changemarkers insession/llm.tsbalanced (7/7); default-off so the non-flagged path is unchanged.Checklist