Handle empty list case in mirrored_supervisor:child/2#15229
Merged
michaelklishin merged 1 commit intorabbitmq:mainfrom Jan 8, 2026
Merged
Conversation
During production testing of `amazon-mq/rabbitmq-queue-migration`, a
badmatch exception was observed during shovel cleanup:
```
exit:{{{badmatch,[]},[{mirrored_supervisor,child,2,...}]},
{gen_server2,call,[<0.1346.0>,{delete_child,...},infinity]}}
```
The exception occurs in `mirrored_supervisor:child/2` when the list
comprehension returns an empty list instead of a single-element list.
The function uses pattern matching `[Pid] = [...]` which fails when no
matching child is found in the supervisor's children list.
This change updates `child/2` to use a case statement that returns
`undefined` when the list is empty, matching the behavior expected by
`check_stop/3` which already handles `undefined` as "child not found".
The empty list case is safe to treat as `undefined` because it indicates
the child has already been removed from the supervisor, which is the
desired end state for deletion operations.
While we could not reliably reproduce the race condition in testing, the
fix is defensive and aligns with how `terminate_child` can return
`{error, not_found}` when a child doesn't exist. This change makes
`delete_child` operations more robust when children are removed through
other means (supervisor EXIT handling, distributed coordination, etc).
michaelklishin
approved these changes
Jan 8, 2026
the-mikedavis
approved these changes
Jan 8, 2026
Collaborator
Author
|
Thank you @michaelklishin and @the-mikedavis |
michaelklishin
added a commit
that referenced
this pull request
Jan 9, 2026
Handle empty list case in `mirrored_supervisor:child/2` (backport #15229)
Collaborator
Author
|
Just FYI, it turns out that if this |
lukebakken
added a commit
to amazon-mq/rabbitmq-queue-migration
that referenced
this pull request
Mar 27, 2026
HTTP_API.md:
- Fix 404 error response body: was {"error": "Object Not Found",
"reason": "Not Found"}, actually {"error": "Migration not found"}
- Add missing instance_id field to snapshot response examples and
field description list
- Document all vhost response shape for check endpoint: returns
{"vhost": "all", "vhost_results": [...]} not the single-vhost shape
- Add active_alarms and memory_usage to system_checks response example
and System Check Types list
- Fix concurrent migration error: remove incorrect 409 status code row,
fix error body to {"error": "bad_request", "reason": "Migration
validation failed: in_progress"}
- Fix Validation Failed, No Eligible Queues, and Insufficient Disk
Space error bodies to match actual rqm_mgmt.erl output
- Remove invalid Parameter error example: batch_size=-10 is silently
ignored by the parser, not rejected with a 400
- Remove internal AGENTS.md link from See Also section
API_EXAMPLES.md:
- Add missing instance_id field to snapshot response example
- Add active_alarms and memory_usage to system_checks response example
- Replace invalid unsynchronized queue issue type in compat checker
results (unsynchronized is a system-level check, not a per-queue
issue type); replace with queue_expires example
- Fix unsuitable_overflow and too_many_queues reason strings to match
actual code output
- Add missing queue_expires and message_ttl to Skip Reasons list
- Fix concurrent migration error body
- Fix Migration Not Found 404 body
CONFIGURATION.md:
- Add missing usage example for shovel_prefetch_count
INTEGRATION_TESTING.md:
- Add missing quorum_queue.property_equivalence.relaxed_checks_on_redeclaration
and queue_migration.snapshot_mode to cluster configuration example
MIGRATION_GUIDE.md:
- Remove "gracefully" from connection closing description: connections
are closed by stopping TCP listeners, not via graceful handshake
SKIP_UNSUITABLE_QUEUES.md:
- Fix broken link: INTEGRATION_TESTS.md -> INTEGRATION_TESTING.md
TROUBLESHOOTING.md:
- Remove duplicate "Completed queues remain as quorum queues" line
- Document root cause of shovel noproc failure: race condition in
mirrored_supervisor:child/2 that can exhaust shovel supervisor
restart intensity; reference upstream fix
rabbitmq/rabbitmq-server#15229 (merged into 4.1.x+); note that
Amazon MQ for RabbitMQ includes this fix
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
During production testing of
amazon-mq/rabbitmq-queue-migration, a badmatch exception was observed during shovel cleanup:The exception occurs in
mirrored_supervisor:child/2when the list comprehension returns an empty list instead of a single-element list. The function uses pattern matching[Pid] = [...]which fails when no matching child is found in the supervisor's children list.This change updates
child/2to use a case statement that returnsundefinedwhen the list is empty, matching the behavior expected bycheck_stop/3which already handlesundefinedas "child not found". The empty list case is safe to treat asundefinedbecause it indicates the child has already been removed from the supervisor, which is the desired end state for deletion operations.While we could not reliably reproduce the race condition in testing, the fix is defensive and aligns with how
terminate_childcan return{error, not_found}when a child doesn't exist. This change makesdelete_childoperations more robust.