Skip to content

Guard StreamEngine handshake against null peer identity when RecvIdentity is enabled#1160

Merged
drewnoakes merged 6 commits into
masterfrom
copilot/fix-nullreferenceexception-streamengine
May 24, 2026
Merged

Guard StreamEngine handshake against null peer identity when RecvIdentity is enabled#1160
drewnoakes merged 6 commits into
masterfrom
copilot/fix-nullreferenceexception-streamengine

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 21, 2026

  • Restore TimeSpan.FromMilliseconds(1) in reader loop so the lock is held long enough to stress the concurrency race
  • Add catch (TerminatingException) { } to the reader task — the actual bug fix (this exception escapes when the socket is disposed while ProcessCommands is in progress)
  • Update XML doc comment to describe why TerminatingException must be caught
  • Build and test pass

Copilot AI changed the title [WIP] Fix unhandled NullReferenceException in StreamEngine Guard StreamEngine handshake against null peer identity when RecvIdentity is enabled May 21, 2026
Copilot AI requested a review from drewnoakes May 21, 2026 11:23
Msg identity = new Msg();
identity.InitPool(m_mechanism.PeerIdentity.Length);
identity.Put(m_mechanism.PeerIdentity, 0, m_mechanism.PeerIdentity.Length);
byte[] peerIdentity = m_mechanism.PeerIdentity;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot is this code not null annotated? If this can be null, then the old code should ideally have flagged a warning. Is it feasible to annotate it as part of this work?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I updated this with focused nullable annotations in StreamEngine.MechanismReady by enabling nullable context for that method and changing peerIdentity to byte[]?, so the potential null is explicit in the code path (f841e0c). Mechanism.PeerIdentity was already nullable (byte[]?).

Copilot AI requested a review from drewnoakes May 21, 2026 11:55
@drewnoakes
Copy link
Copy Markdown
Member

@copilot in what case would m_mechanism.PeerIdentity be null? Does it make sense to specify the option to receive identity, then not actually receive one?

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 22, 2026

@copilot in what case would m_mechanism.PeerIdentity be null? Does it make sense to specify the option to receive identity, then not actually receive one?

RecvIdentity is a local request to surface the peer’s metadata identity if present, but it’s not guaranteed to exist on every connection. PeerIdentity is only set when an Identity property is actually present in parsed metadata (Mechanism.ParseMetadata), and peers may omit it depending on socket type or implementation. In that case we treat it as an empty identity frame so handshake/setup remains robust instead of failing.

@drewnoakes drewnoakes marked this pull request as ready for review May 22, 2026 04:54
@drewnoakes
Copy link
Copy Markdown
Member

@copilot PTAL at the macos failure. Even if it's unrelated, see if we can fix it.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 24, 2026

@copilot PTAL at the macos failure. Even if it's unrelated, see if we can fix it.

I tracked the macOS failure to a hang in ThreadSafeSocketPollerCleanupTests.ClosingServerWhileClientsConnectDoesNotCrash (testhost sat in that test until the 90s blame-hang timeout). I pushed 1fff707, which makes the reader loop non-blocking (TryReceive(..., TimeSpan.Zero)), and that removed the intermittent hang in repeated stress runs and targeted test runs.

…imeout to stress the race

Agent-Logs-Url: https://github.com/zeromq/netmq/sessions/6a257730-950c-4974-a634-78c12e549d5b

Co-authored-by: drewnoakes <350947+drewnoakes@users.noreply.github.com>
Comment thread src/NetMQ/Core/Transports/StreamEngine.cs Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented May 24, 2026

Codecov Report

❌ Patch coverage is 0% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 19.73%. Comparing base (8109871) to head (0cc9b8c).
⚠️ Report is 9 commits behind head on master.

Files with missing lines Patch % Lines
src/NetMQ/Core/Transports/StreamEngine.cs 0.00% 5 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #1160       +/-   ##
===========================================
- Coverage   65.98%   19.73%   -46.25%     
===========================================
  Files         145      148        +3     
  Lines        9149     9408      +259     
  Branches     1486     1544       +58     
===========================================
- Hits         6037     1857     -4180     
- Misses       2500     7270     +4770     
+ Partials      612      281      -331     
Files with missing lines Coverage Δ
src/NetMQ/Core/Transports/StreamEngine.cs 43.04% <0.00%> (-18.77%) ⬇️

... and 117 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@drewnoakes drewnoakes merged commit 69fc0a3 into master May 24, 2026
3 of 5 checks passed
@drewnoakes drewnoakes deleted the copilot/fix-nullreferenceexception-streamengine branch May 24, 2026 05:10
drewnoakes added a commit to drewnoakes/netmq that referenced this pull request May 24, 2026
Remove #nullable disable and annotate all nullable types:
- m_handle, m_session, m_socket, m_ioObject are lifecycle-nullable
- m_encoder, m_decoder, m_mechanism are nullable until handshake
- m_inpos, m_outpos are nullable buffer positions
- Use null-forgiving operator where state machine invariants guarantee
  non-null (post-Plug, post-handshake, post-V3-handshake)
- Remove redundant null assignments in constructor for nullable fields

This is the file that motivated the nullable annotation effort (PR zeromq#1160).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
drewnoakes added a commit to drewnoakes/netmq that referenced this pull request May 24, 2026
Remove #nullable disable and annotate all nullable types:
- m_handle, m_session, m_socket, m_ioObject are lifecycle-nullable
- m_encoder, m_decoder, m_mechanism are nullable until handshake
- m_inpos, m_outpos are nullable buffer positions
- Use null-forgiving operator where state machine invariants guarantee
  non-null (post-Plug, post-handshake, post-V3-handshake)
- Remove redundant null assignments in constructor for nullable fields

This is the file that motivated the nullable annotation effort (PR zeromq#1160).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unhandled NullReferenceException in StreamEngine.MechanismReady when PeerIdentity is null

2 participants