Skip to content

fix(agents): raise step limit and centralize on one global default#1389

Open
kovtcharov-amd wants to merge 4 commits into
mainfrom
kalin/agent-max-steps-global-default
Open

fix(agents): raise step limit and centralize on one global default#1389
kovtcharov-amd wants to merge 4 commits into
mainfrom
kalin/agent-max-steps-global-default

Conversation

@kovtcharov-amd
Copy link
Copy Markdown
Collaborator

Why this matters

Agents were running out of steps mid-task. The browser agent (and others) stopped at 10 steps and told users to re-run with --max-steps 60, even on routine multi-step work. The cap was hardcoded low almost everywhere — 10 for most agents, 56 for a couple — and the Agent UI pinned 10 in four more places, while the CLI used 100 and background ticks 20. So the limit was both too low and inconsistent depending on how you launched the agent.

After this change there's one knob: default_max_steps() in base/agent.py (default 50, overridable at runtime with GAIA_AGENT_MAX_STEPS=<n>). Every agent config inherits it, as do the base Agent, the UI chat paths, and the autonomous-tick loop — so retuning the whole fleet is a one-line change or a single env var, no per-agent edits. Agents that genuinely need more keep their explicit override (CodeAgent=100, EMR=50), and a typo'd env value now fails loudly instead of silently capping.

Test plan

  • pytest tests/unit/agents/test_default_max_steps.py — new helper tests (default, env override, empty, loud-failure on non-int/non-positive, configs inherit override)
  • pytest tests/unit/agents/ — existing config-default tests updated to assert against the helper; all green (pre-existing context-overflow + Windows-encoding failures are unrelated, confirmed failing on main)
  • python util/lint.py --black --isort clean; pyflakes clean on all touched files
  • Manual: launch the browser agent in the Agent UI and confirm it no longer stops at 10 steps; set GAIA_AGENT_MAX_STEPS=5 and confirm every agent caps at 5

Most agents capped at max_steps=10 (some 5/6) and the Agent UI hardcoded
10 in four more places, so multi-step agents (e.g. the browser agent) ran
out of steps mid-task while the CLI used 100 and background ticks 20 — the
limit was both too low and inconsistent across entry points.

Introduce default_max_steps() in base/agent.py as the single source of
truth (DEFAULT_MAX_STEPS=50, overridable at runtime via GAIA_AGENT_MAX_STEPS)
and have every agent config inherit it via field(default_factory=...),
plus the base Agent, the UI chat paths, and the autonomous-tick loop.
Agents that intentionally need more keep their explicit override
(CodeAgent=100, EMR=50). A present-but-invalid env value raises instead of
silently capping.
@github-actions github-actions Bot added tests Test changes agents labels Jun 3, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 3, 2026

Review: fix(agents): raise step limit and centralize on one global default

Summary

Solid, well-tested refactor that does most of what it promises: default_max_steps() in base/agent.py is now the single source of truth, every agent config inherits it via field(default_factory=default_max_steps), the base Agent falls back to it when max_steps=None, and the UI chat paths + autonomous-tick loop correctly stop pinning 10/20. The fail-loud handling of a typo'd env value is exactly right per the project's no-silent-fallbacks rule.

The one thing the author should know: the headline claim — "GAIA_AGENT_MAX_STEPS overrides every agent at once" — does not hold on the CLI, which is left untouched. The env var is silently ignored for gaia chat/gaia browse/gaia analyze and friends, so the manual test in the PR plan ("set GAIA_AGENT_MAX_STEPS=5 and confirm every agent caps at 5") will fail for CLI-launched agents. Details below.

Issues

🟡 Centralization is incomplete — GAIA_AGENT_MAX_STEPS is ignored on the whole CLI surface (src/gaia/cli.py:1273, :708, :805, :826)

cli.py isn't in this PR, but it's where the "one knob" claim breaks. The shared --max-steps argument has default=100, and kwargs is built from vars(args), so args.max_steps is always 100 when the flag is omitted. The three call sites then do max_steps=kwargs.get("max_steps", 100) — the .get default never fires, and the config's default_factory is overridden by an explicit 100. Net effect for every CLI agent that inherits parent_parser (chat, browse, analyze, …): the new 50 default and the env var are both bypassed.

Verified:

GAIA_AGENT_MAX_STEPS=5  →  kwargs["max_steps"] == 100   # env var ignored

Two clean options:

  1. Make the argparse default None and let it fall through — parent_parser.add_argument("--max-steps", type=int, default=None, ...), then max_steps=kwargs.get("max_steps") or default_max_steps() at the three sites; or
  2. Keep CLI behaviour as-is but scope the PR description/test plan down (drop "every agent" / "one knob") so the env-var promise isn't overstated.

Either is fine — but right now the docs and the code disagree, and the env override silently no-ops on the primary launch path (the very gaia browse the PR leads with).

🟢 Blender scene creation lost its step headroom (src/gaia/agents/blender/agent.py:525)

create_interactive_scene previously ran at self.max_steps * 2 (scenes are more complex than a normal query); it now passes max_steps=None, so it just uses self.max_steps. With the new 50 default this is harmless in practice, but an explicit low override (e.g. max_steps=20) no longer gives scene-building any extra room. If the doubling was deliberate, keep it; if not, the simplification is fine — just confirm it's intended.

🟢 New env var isn't documented (docs/)

GAIA_AGENT_MAX_STEPS is a new user-facing knob with no entry in docs/reference/cli.mdx or an env-var reference. CLAUDE.md asks that new features be documented — a one-liner noting the var, the 50 default, and the loud-failure-on-typo behaviour would close this.

Strengths

  • Correct default_factory semantics. Resolving the env var at instantiation (not import) is the right call, and test_configs_inherit_the_override_at_construction pins exactly that — a config built after the env is set picks up the override.
  • Fail-loud on bad input matches the project's "No Silent Fallbacks" rule, with an actionable message naming the var, the bad value, and how to recover.
  • Test coverage is focused and meaningful — default, env override, empty string, non-int, non-positive, and config inheritance. Existing config tests were updated to assert against the helper instead of a magic 10, which keeps them honest as the default evolves.
  • UI chat paths (_chat_helpers.py) and the autonomous-tick loop (agent_loop.py) now delegate cleanly instead of hardcoding — the env var genuinely does work on those paths.

Verdict

Request changes — the core refactor is clean and well-tested, but the CLI gap means the PR's central promise (fleet-wide env override) is materially untrue on the primary launch path. Either thread the default through cli.py (preferred — it's a small change) or rescope the description and test plan so the claim matches the code. The two 🟢 items are non-blocking.

Ovtcharov added 2 commits June 3, 2026 16:45
The CLI's shared --max-steps defaulted to 100 and was always passed
explicitly, so the new global default and GAIA_AGENT_MAX_STEPS were ignored
for gaia chat/browse/analyze/blender — the env override silently no-op'd on
the primary launch path. Default the flag to None so it falls through to
default_max_steps() in Agent.__init__, making the env var work everywhere.
Document the knob (default 50, env override, fail-loud) in the CLI reference.
The Code Quality (Lint) job runs pylint over all of src/gaia and was
already failing on main: #1355 left a mutable-default (W0102) in
quality_metrics.py and several Protocol-parity unused-argument signatures
(W0613) in the Outlook/email-MCP backends. Fix the mutable default with a
None sentinel and annotate the interface-parity stubs, so the gate is green
for this PR and every other open PR.
@github-actions github-actions Bot added documentation Documentation changes mcp MCP integration changes cli CLI changes eval Evaluation framework changes performance Performance-critical changes labels Jun 3, 2026
@kovtcharov-amd
Copy link
Copy Markdown
Collaborator Author

Thanks for the thorough review — addressed the 🟡 and both 🟢s, plus the two failing checks (neither was caused by the original change).

🟡 CLI gap — fixed (5b86aa2b). --max-steps now defaults to None and falls through to default_max_steps() in Agent.__init__, so GAIA_AGENT_MAX_STEPS genuinely works on gaia chat/browse/analyze (and gaia blender's --steps). The "one knob" claim now holds on the CLI too. The Code agent keeps its explicit 100.

🟢 Blender doubling — intentional simplification. The old self.max_steps * 2 only existed to compensate for the tiny old default of 5; with the global default at 50, create_interactive_scene has ample headroom from the normal limit, so the special-case is no longer warranted.

🟢 Env var docs — added (5b86aa2b, docs/reference/cli.mdx): the --max-steps/--steps flags, the GAIA_AGENT_MAX_STEPS default (50), and the fail-loud-on-typo behaviour.

On the red checks:

  • Code Quality (Lint) was already failing on main (the merge of feat(email): create calendar events from email context #1355 left W0102/W0613 pylint debt in quality_metrics.py + the Outlook/email-MCP backends). The gate scans all of src/gaia, so every open PR inherited it. Cleared in 9954a050 — verified the latest main lint run reports exactly these 9 issues and nothing else.
  • Test GAIA CLI on Linux was a HuggingFace 429 Too Many Requests during model download — an infra flake, now re-running on the latest push.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 3, 2026

🟡 src/gaia/agents/code/agent.py:119 — CodeAgent's 100-step override is bypassed by the CLI's new None sentinel.

CodeAgent.__init__ guards its 100-step default with if "max_steps" not in kwargs, but the CLI now passes max_steps=None explicitly. None satisfies "max_steps" in kwargs, so the check is skipped and Agent.__init__ resolves None to default_max_steps() = 50. gaia code silently dropped from 100 → 50 steps.

This contradicts the documentation added in this same PR (cli.mdx): "the Code agent uses 100 for multi-file generation."

Fix — widen the guard in CodeAgent.__init__:

        if "max_steps" not in kwargs or kwargs["max_steps"] is None:
            kwargs["max_steps"] = 100  # Increased for complex project generation

The same pattern should be applied to JiraAgent and DockerAgent, which use the same if "max_steps" not in kwargs guard and face the same bypass.

@kovtcharov-amd kovtcharov-amd self-assigned this Jun 4, 2026
@kovtcharov-amd kovtcharov-amd requested a review from itomek June 4, 2026 00:17
The CLI now passes max_steps=None to mean 'use the global default', but
CodeAgent/JiraAgent/DockerAgent/EMR guarded their override with
'if "max_steps" not in kwargs' (and EMR with setdefault), which treats an
explicit None as a provided value. That skipped the override and let
Agent.__init__ resolve None to the global default — e.g. CodeAgent's 100-step
budget would silently drop to 50. Treat None as 'no override' so intentional
per-agent limits survive the sentinel.
@kovtcharov-amd
Copy link
Copy Markdown
Collaborator Author

Good catch — fixed in 8ba89f87. All four setdefault-style guards (CodeAgent, JiraAgent, DockerAgent, EMR) now treat an explicit max_steps=None as "no override":

if kwargs.get("max_steps") is None:
    kwargs["max_steps"] = 100   # CodeAgent; 50 for EMR; default_max_steps() for Jira/Docker

One nuance on reachability: gaia code runs through gaia-code (code/cli.py), which has its own --max-steps default=100, so CodeAgent wasn't actually dropping to 50 in practice today. But you're right that the None sentinel is now the documented contract, so the guard was a latent trap — hardened it (plus a regression test: CodeAgent(max_steps=None) asserts 100). Jira/Docker resolve to default_max_steps() either way, and EMR keeps its 50; the change just makes the intent explicit and uniform.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents cli CLI changes documentation Documentation changes eval Evaluation framework changes mcp MCP integration changes performance Performance-critical changes tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant