Skip to content

Test Flake: helios / deploy job timing out due to NTP? #5598

@smklein

Description

@smklein

Seen on https://github.com/oxidecomputer/omicron/pull/5597/checks?check_run_id=24122021575

The failure I saw was "Failed to start at least one nexus zone after 300 seconds". See: https://buildomat.eng.oxide.computer/wg/0/details/01HW3Q3V4PVMDKPJE991EJ0394/Yh52tb5BDLgnIRQRXWCb94scPT4iiPZEHyXZA5Tn19Dnximl/01HW3Q48AKGGRVW4GKBCEMFK3W#S2111

It appears that NTP and internal DNS came up successfully - they both have logs.

However, Sled Agent seems stuck claiming that "Time is not yet synchronized": https://buildomat.eng.oxide.computer/wg/0/artefact/01HW3Q3V4PVMDKPJE991EJ0394/Yh52tb5BDLgnIRQRXWCb94scPT4iiPZEHyXZA5Tn19Dnximl/01HW3Q48AKGGRVW4GKBCEMFK3W/01HW3TVQ923Z6Z0YBGX2GWPCBV/oxide-sled-agent:default.log?format=x-bunyan#L775

SledAgent (RSS): Time is not yet synchronized
    error = "Time is synchronized on 0/1 sleds"
    file = sled-agent/src/rack_setup/service.rs:654

This is enough to prevent the Nexus zone from being started. It's unclear to me why NTP failed to synchronize within 300 seconds.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Test FlakeTests that work. Wait, no. Actually yes. Hang on. Something is broken.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions