fix(agents): raise step limit and centralize on one global default#1389
fix(agents): raise step limit and centralize on one global default#1389kovtcharov-amd wants to merge 4 commits into
Conversation
Most agents capped at max_steps=10 (some 5/6) and the Agent UI hardcoded 10 in four more places, so multi-step agents (e.g. the browser agent) ran out of steps mid-task while the CLI used 100 and background ticks 20 — the limit was both too low and inconsistent across entry points. Introduce default_max_steps() in base/agent.py as the single source of truth (DEFAULT_MAX_STEPS=50, overridable at runtime via GAIA_AGENT_MAX_STEPS) and have every agent config inherit it via field(default_factory=...), plus the base Agent, the UI chat paths, and the autonomous-tick loop. Agents that intentionally need more keep their explicit override (CodeAgent=100, EMR=50). A present-but-invalid env value raises instead of silently capping.
Review: fix(agents): raise step limit and centralize on one global defaultSummarySolid, well-tested refactor that does most of what it promises: The one thing the author should know: the headline claim — " Issues🟡 Centralization is incomplete —
Verified: Two clean options:
Either is fine — but right now the docs and the code disagree, and the env override silently no-ops on the primary launch path (the very 🟢 Blender scene creation lost its step headroom (
🟢 New env var isn't documented (
Strengths
VerdictRequest changes — the core refactor is clean and well-tested, but the CLI gap means the PR's central promise (fleet-wide env override) is materially untrue on the primary launch path. Either thread the default through |
The CLI's shared --max-steps defaulted to 100 and was always passed explicitly, so the new global default and GAIA_AGENT_MAX_STEPS were ignored for gaia chat/browse/analyze/blender — the env override silently no-op'd on the primary launch path. Default the flag to None so it falls through to default_max_steps() in Agent.__init__, making the env var work everywhere. Document the knob (default 50, env override, fail-loud) in the CLI reference.
The Code Quality (Lint) job runs pylint over all of src/gaia and was already failing on main: #1355 left a mutable-default (W0102) in quality_metrics.py and several Protocol-parity unused-argument signatures (W0613) in the Outlook/email-MCP backends. Fix the mutable default with a None sentinel and annotate the interface-parity stubs, so the gate is green for this PR and every other open PR.
|
Thanks for the thorough review — addressed the 🟡 and both 🟢s, plus the two failing checks (neither was caused by the original change). 🟡 CLI gap — fixed ( 🟢 Blender doubling — intentional simplification. The old 🟢 Env var docs — added ( On the red checks:
|
|
🟡
This contradicts the documentation added in this same PR ( Fix — widen the guard in The same pattern should be applied to |
The CLI now passes max_steps=None to mean 'use the global default', but CodeAgent/JiraAgent/DockerAgent/EMR guarded their override with 'if "max_steps" not in kwargs' (and EMR with setdefault), which treats an explicit None as a provided value. That skipped the override and let Agent.__init__ resolve None to the global default — e.g. CodeAgent's 100-step budget would silently drop to 50. Treat None as 'no override' so intentional per-agent limits survive the sentinel.
|
Good catch — fixed in if kwargs.get("max_steps") is None:
kwargs["max_steps"] = 100 # CodeAgent; 50 for EMR; default_max_steps() for Jira/DockerOne nuance on reachability: |
Why this matters
Agents were running out of steps mid-task. The browser agent (and others) stopped at 10 steps and told users to re-run with
--max-steps 60, even on routine multi-step work. The cap was hardcoded low almost everywhere —10for most agents,5–6for a couple — and the Agent UI pinned10in four more places, while the CLI used100and background ticks20. So the limit was both too low and inconsistent depending on how you launched the agent.After this change there's one knob:
default_max_steps()inbase/agent.py(default 50, overridable at runtime withGAIA_AGENT_MAX_STEPS=<n>). Every agent config inherits it, as do the baseAgent, the UI chat paths, and the autonomous-tick loop — so retuning the whole fleet is a one-line change or a single env var, no per-agent edits. Agents that genuinely need more keep their explicit override (CodeAgent=100, EMR=50), and a typo'd env value now fails loudly instead of silently capping.Test plan
pytest tests/unit/agents/test_default_max_steps.py— new helper tests (default, env override, empty, loud-failure on non-int/non-positive, configs inherit override)pytest tests/unit/agents/— existing config-default tests updated to assert against the helper; all green (pre-existing context-overflow + Windows-encoding failures are unrelated, confirmed failing onmain)python util/lint.py --black --isortclean;pyflakesclean on all touched filesGAIA_AGENT_MAX_STEPS=5and confirm every agent caps at 5