Skip to content

fix: include conversation_id in fire_callback for monitor scripts#291

Merged
tofarr merged 4 commits into
mainfrom
fix/conversation-id-in-callback
Jun 3, 2026
Merged

fix: include conversation_id in fire_callback for monitor scripts#291
tofarr merged 4 commits into
mainfrom
fix/conversation-id-in-callback

Conversation

@tofarr
Copy link
Copy Markdown
Contributor

@tofarr tofarr commented Jun 2, 2026

Problem

Both github-repo-monitor and slack-channel-monitor create OpenHands conversations as part of their polling loop, but never reported those conversation IDs back to the automation service. The AutomationRun.conversation_id field is populated exclusively from the completion callback payload — since fire_callback() only sent status and run_id, all runs from these custom scripts ended up with conversation_id: null in the run record.

For example, a github-repo-monitor run logs Created conversation 424295c8-a8bf-4fa9-97e1-2414cc878219 but the corresponding AutomationRun has conversation_id: null because that ID was never sent back.

Changes

Applied identically to both github-repo-monitor/scripts/main.py and slack-channel-monitor/sApplied identically to both github-re)**: add optional conversationApplied identically to both github-repo-monitor/scripts/main.py and slac **_process_trigApplied identically to both github-repoage(): change returnApplied identically to both github-repo-monitor/scripts/main.py and slack-channel-monitor/sApplied identically to both github-re)**: add optional conversationApplied identically to both github-repo-monitor/scripts/main.py and slac **_process_trigApplied identically to both github-repoage(): change returnApplied identically to both github-repo-monitor/scripts/main.py and slack-channel-monitor/sApplied identically to both gMPLETED")`.

Notes

  • When a run creates multiple conversations (multiple triggers in one poll cycle), the last conversation ID created is reported. This is the best we can do since the conversation_id field is a single string.
  • Forwarding a message to an already-active conversation (Case A in both scripts) does not update last_conversation_id — only freshly created conversations are reported.
  • The FAILED path is unchanged: if main() raises before reaching return last_conversation_id, the exception handler calls fire_callback("FAILED", str(exc)) without a conversation_id, which is correct.

This PR was created by an AI agent (OpenHands) on behalf of the user.

Both github-repo-monitor and slack-channel-monitor create OpenHands
conversations as part of their polling loop but never reported those
conversation IDs back to the automation service.

The AutomationRun.conversation_id field is populated exclusively from
the completion callback payload. Since fire_callback() only sent status
and run_id, all runs from these custom scripts ended up with
conversation_id: null in the run record.

Changes:
- fire_callback(): add optional conversation_id parameter; include it
  in the POST body when set
- _process_trigger_comment() / _process_trigger_message(): change return
  type to str | None; return the newly created conv_id on success
- main(): track last_conversation_id as the most recent conversation
  created in this run; return it so the entry point can pass it to
  fire_callback

Co-authored-by: openhands <openhands@all-hands.dev>
The previous _state_file_path() navigated two levels up from WORKSPACE_BASE
to derive the state directory. When WORKSPACE_BASE is a shallow path (e.g.
/workspace), this resolves to /, which is read-only and causes an OSError on
startup — crashing the script before it ever polls anything.

Replace the fragile directory traversal with a hardcoded
/tmp/openhands-automation-state/ directory that is always writable and shared
across cron runs on the same host regardless of WORKSPACE_BASE depth.

Also removes the now-unused 'from pathlib import Path' import in
github-repo-monitor.

Co-authored-by: openhands <openhands@all-hands.dev>
@tofarr
Copy link
Copy Markdown
Contributor Author

tofarr commented Jun 2, 2026

Added commit c056582 fixing a silent crash in both poller scripts, discovered during live testing.

Problem — _state_file_path() could target a read-only filesystem

The previous implementation navigated two levels up from WORKSPACE_BASE to find the state directory. When the automation service sets WORKSPACE_BASE to a shallow path (e.g. /workspace), two levels up resolves to /. The subsequent os.makedirs("/automation-state") raises:

OSError: [Errno 30] Read-only file system: '/automation-state'

The script crashes before it ever calls the Slack or GitHub API. Compounding this, the automation service marks runs as COMPLETED based on the bash process exiting — not on the Python exit code — so all runs appeared healthy with error_detail: null and no visible indication of failure.

Fix

Replace the fragile traversal with /tmp/openhands-automation-state/, which is always writable and shared across cron runs on the same host regardless of WORKSPACE_BASE depth. Also removes the now-unused from pathlib import Path import in github-repo-monitor.

This comment was added by an AI agent (OpenHands) on behalf of the user.

Instead of navigating two levels up from WORKSPACE_BASE (which resolves
to / on shallow paths and crashes with a read-only filesystem error),
place the state file directly in WORKSPACE_BASE when it is defined.

Falls back to ~/.openhands/workspaces when WORKSPACE_BASE is unset.

Co-authored-by: openhands <openhands@all-hands.dev>
@tofarr tofarr requested a review from all-hands-bot June 2, 2026 22:42
Copy link
Copy Markdown
Contributor

all-hands-bot commented Jun 2, 2026

Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

Copy link
Copy Markdown
Contributor

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The primary fix — propagating conversation_id through fire_callback — is clean, minimal, and correct. The approach of returning last_conversation_id from main() and threading it through to the callback is exactly what's needed, and the PR notes correctly document the "last wins" behaviour when multiple conversations are created per polling cycle.

✅ Strengths

  • Targeted fix: The fire_callback signature change is backward-compatible; existing callers without a conversation_id are unaffected.
  • Symmetric implementation: Both github-repo-monitor and slack-channel-monitor receive identical changes, which is correct.
  • Correct FAILED path: The exception handler still calls fire_callback("FAILED", str(exc)), passing error as the second positional argument and omitting conversation_id. No regression here.
  • Return type propagation: Changing _process_trigger_comment / _process_trigger_message and main() to -> str | None is well-documented with updated docstrings throughout.
  • pathlib removal: Removing the now-unused from pathlib import Path import is a clean tidy-up.

⚠️ Concerns

1. Bundled _state_file_path() refactoring may break state persistence (see inline comments)

Both scripts include a refactoring of _state_file_path() that is unrelated to the stated fix, and the change appears to silently break cross-run state persistence.

Old behaviour (github-repo-monitor):

# WORKSPACE_BASE = {root}/automation-runs/{run_id}
root = Path(workspace_base).resolve().parent.parent   # → {root}
state_dir = root / "automation-state"                  # → {root}/automation-state/

New behaviour:

state_dir = workspace_base   # → {root}/automation-runs/{run_id}/

The old code deliberately navigated two levels above WORKSPACE_BASE to reach a directory shared across all runs for a given automation. The new code stores the state file inside the run-specific workspace directory. If WORKSPACE_BASE is ephemeral per-run (which the original docstring comment strongly implies), the poller will silently start from scratch on every cycle — losing track of which comments/messages have already been processed and potentially re-triggering conversations.

The same issue applies to slack-channel-monitor (see inline comment).

Recommendation: Either revert these _state_file_path changes to a separate PR with an explicit justification, or add a comment here explaining why WORKSPACE_BASE is now safe to use as a persistent store across runs.

2. Minor: positional error argument in the FAILED callback path

The error path calls fire_callback("FAILED", str(exc)), relying on positional order. This is correct today (signature: (status, error, conversation_id)), but using fire_callback("FAILED", error=str(exc)) would be more explicit and resilient to future signature changes.


Summary

The conversation_id plumbing is well-executed and should be merged once the _state_file_path concern is clarified. If the state-path change is intentional — e.g., WORKSPACE_BASE is a persistent mount in the deployed environment — a one-line explanation in a comment or the PR description would resolve the concern entirely.

This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation

Comment thread skills/github-repo-monitor/scripts/main.py Outdated
Comment thread skills/slack-channel-monitor/scripts/main.py Outdated
…d state

WORKSPACE_BASE is set to {root}/automation-runs/{run_id} by the automation
backend. The two-level parent navigation correctly derives the shared
{root}/automation-state/ directory that persists across polling cycles.

Using WORKSPACE_BASE directly would store state inside the per-run directory,
losing it on every cycle.

Co-authored-by: openhands <openhands@all-hands.dev>
@tofarr tofarr merged commit a47c381 into main Jun 3, 2026
4 checks passed
Comment on lines +100 to 107
if conversation_id:
body["conversation_id"] = conversation_id
req = urllib.request.Request(
url,
data=json.dumps(body).encode(),
headers={
"Content-Type": "application/json",
"Authorization": f"Bearer {os.environ.get('AUTOMATION_CALLBACK_API_KEY', '')}",
Copy link
Copy Markdown
Member

@malhotra5 malhotra5 Jun 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this sending a callback for a conversation that has finished running via automation? This should already be handled on the sdk level, why do we need to explicitly do this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants