Python: Refactor workflow as agent pending request handling#6259
Python: Refactor workflow as agent pending request handling#6259TaoChenOSU wants to merge 6 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors Python workflow-as-agent “pending request” handling so tool-approval flows (FunctionApprovalRequest/Response) round-trip correctly across WorkflowAgent, AgentExecutor, and the Foundry ResponsesHostServer HTTP pipeline. It also introduces a Workflow.status property to expose run-level state transitions.
Changes:
- Add
Workflow.status(mirrors emitted status events) and new tests covering status transitions, including pending-request scenarios. - Update
WorkflowAgent/AgentExecutorto route tool-approval requests viarequest_infousing the underlying approval request id (so pending request IDs match what the caller echoes back). - Update Foundry hosting response conversion to persist approval requests and rehydrate them when converting
mcp_approval_responseitems; add end-to-end hosting tests for workflow agents with approvals.
Reviewed changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| python/samples/02-agents/tools/function_tool_with_approval.py | Disables the streaming demo invocation in main(). |
| python/packages/foundry_hosting/tests/test_responses.py | Adds end-to-end HTTP tests for hosting WorkflowAgent with tool approvals (streaming and non-streaming). |
| python/packages/foundry_hosting/agent_framework_foundry_hosting/_responses.py | Plumbs approval storage through input/output conversion so approval responses can be reconstructed. |
| python/packages/core/tests/workflow/test_workflow_status.py | New tests for the new Workflow.status property across success/failure/pending-request flows. |
| python/packages/core/tests/workflow/test_workflow_agent.py | Updates request-info/tool-approval expectations and adds tests ensuring raw tool-approval content forwards unchanged. |
| python/packages/core/tests/workflow/test_agent_executor.py | Adds regression tests for “double-emission” of approval payload and request-id matching. |
| python/packages/core/tests/core/test_skills.py | Minor parentheses cleanup in async awaits. |
| python/packages/core/agent_framework/_workflows/_workflow.py | Implements run-level status tracking (Workflow.status) and updates during execution. |
| python/packages/core/agent_framework/_workflows/_agent.py | Refactors WorkflowAgent continuation logic to depend on Workflow.status and adjusts request-info conversion. |
| python/packages/core/agent_framework/_workflows/_agent_executor.py | Adjusts approval request handling to avoid double-emitting approval payloads and passes explicit request_id. |
| python/packages/core/agent_framework/_skills.py | Minor formatting of tool descriptions. |
Python Test Coverage Report •
Python Unit Test Overview
|
||||||||||||||||||||||||||||||||||||||||||||||||||
| ) from exc | ||
| elif isinstance(arguments_payload, dict): | ||
| parsed_args = self.RequestInfoFunctionArgs.from_dict(arguments_payload) | ||
| if function_call.name != self.REQUEST_INFO_FUNCTION_NAME: |
There was a problem hiding this comment.
Old code raised AgentInvalidResponseException on not content.approved for every approval response. New code never reads approved. Raw-approval branch (line 736) forwards the full content, so the workflow still sees approved. But the request_info-derived branch (line 753) forwards parsed_args.data, the echoed request payload, so a rejected response (approved=False) is indistinguishable from approval and the workflow proceeds as fulfilled. Is the asymmetry intended? If a declined request_info should still short-circuit, should we read approved before line 753?
| for user_input_request in response.user_input_requests: | ||
| self._pending_agent_requests[user_input_request.id] = user_input_request # type: ignore[index] | ||
| await ctx.request_info(user_input_request, Content) | ||
| await ctx.request_info(user_input_request, Content, request_id=user_input_request.id) |
There was a problem hiding this comment.
What happens when a response carries a user_input_request plus other contents (model emits explanatory text alongside the approval request)? We return None below, so yield_output never runs and the non-request contents never reach workflow output. The warning above documents the drop but does not prevent it. Common case for chatty models. Could we yield the non-request contents before issuing the requests instead of log-and-drop?
| # Only yield output events for updates that do not contain user input requests. | ||
| # This is to avoid emitting two events of different types ('output' and 'request_info') | ||
| # that carry the same payload. | ||
| await ctx.yield_output(update) |
There was a problem hiding this comment.
Same drop on the streaming path. An update carrying both a user_input_request and content deltas takes the if branch, so the else yield_output(update) is skipped and the deltas are never emitted. The reconstructed response only feeds request extraction and returns None at line 544 when requests exist, so the text is lost here too. Worth splitting the request contents from the rest and yielding the remainder? Are we not supporting this scenario?
| # original tool name ('delete_file'), NOT 'request_info'. This exercises the | ||
| # branch in WorkflowAgent._extract_function_responses that routes raw | ||
| # tool-approval responses straight through using content.id. | ||
| approval_response = approval.to_function_approval_response(approved=True) # type: ignore[attr-defined] |
There was a problem hiding this comment.
Are we covering the rejection path? This is the only resume test and it asserts approved=True through the raw-approval branch. Couldn't find a test that sends approved=False, which is the contract that changed (old code raised, new code forwards). Also no test drives the request_info-derived branch at _agent.py:753 or the AgentException invalid-state raise in _run_core. A rejected-approval test pinning the new intended behavior would lock down the finding that I commented on above.
Motivation and Context
Function approval was not handled correctly when the approval comes from an agent within a workflow that is used as an agent.
Addresses #5654
Description
Old data flow:
New data flow:
Contribution Checklist