Skip to content

azure: don't let console-log capture mask deployment errors#4498

Open
LiliDeng wants to merge 1 commit into
mainfrom
lildeng/fix-deploy-error-masking
Open

azure: don't let console-log capture mask deployment errors#4498
LiliDeng wants to merge 1 commit into
mainfrom
lildeng/fix-deploy-error-masking

Conversation

@LiliDeng
Copy link
Copy Markdown
Collaborator

_save_console_log_and_check_panic is a best-effort diagnostic helper. When the boot diagnostics API returns a transient failure (e.g. 'Service Unavailable'), the HttpResponseError used to escape and either replace the original deployment timeout error or trigger an AssertionError in the outer 'except HttpResponseError' handler (where 'e.error' is None for body-less responses). Wrap the helper so any failure is logged and swallowed.

Description

Related Issue

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Refactoring
  • Documentation update

Checklist

  • Description is filled in above
  • No credentials, secrets, or internal details are included
  • Peer review requested (if not, add required peer reviewers after raising PR)
  • Tests executed and results posted below

Test Validation

Key Test Cases:

Impacted LISA Features:

Tested Azure Marketplace Images:

Test Results

Image VM Size Result
PASSED / FAILED / SKIPPED

_save_console_log_and_check_panic is a best-effort diagnostic helper. When the boot diagnostics API returns a transient failure (e.g. 'Service Unavailable'), the HttpResponseError used to escape and either replace the original deployment timeout error or trigger an AssertionError in the outer 'except HttpResponseError' handler (where 'e.error' is None for body-less responses). Wrap the helper so any failure is logged and swallowed.
@LiliDeng LiliDeng requested a review from johnsongeorge-w as a code owner May 27, 2026 04:26
Copilot AI review requested due to automatic review settings May 27, 2026 04:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes Azure console-log collection during deployment failures best-effort so transient boot diagnostics failures do not hide the original deployment error.

Changes:

  • Wraps serial console capture in exception handling.
  • Logs console-log capture failures per VM and at the resource-group level.
  • Preserves deployment flow when diagnostic capture fails.

Comment thread lisa/sut_orchestrator/azure/platform_.py
@github-actions
Copy link
Copy Markdown

✅ AI Test Selection — PASSED

1 test case(s) selected (view run)

Marketplace image: canonical 0001-com-ubuntu-server-jammy 22_04-lts-gen2 latest

Count
✅ Passed 1
❌ Failed 0
⏭️ Skipped 0
Total 1
Test case details
Test Case Status Time (s) Message
smoke_test (lisa_0_0) ✅ PASSED 33.178

saved_path.mkdir(parents=True, exist_ok=True)
for vm in vms:
try:
log_response_content = save_console_log(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should enhance exception handling in save_console_log() in common.py
except Exception as e: # Broad except: boot diagnostics can fail transiently with # HttpResponseError (e.g. ARM 503), ResourceNotFoundError, etc. # Any such failure must not mask the caller's original deployment # exception when this is invoked from a failure-handling path. log.debug(f"fail to get serial console log. {e}") return b""

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants