Skip to content

Camera: try to make discovery more robust#13100

Merged
DonLakeFlyer merged 7 commits intomasterfrom
pr-camera-fixes
Jul 18, 2025
Merged

Camera: try to make discovery more robust#13100
DonLakeFlyer merged 7 commits intomasterfrom
pr-camera-fixes

Conversation

@julianoes
Copy link
Contributor

This is an attempt to fix some of the corner cases trying to discover cameras, namely:

  • Implement missing retries for CAMERA_INFORMATION, us old specific commands and REQUEST_MESSAGE.
  • Try to simplify CameraControl::_initWhenReady a bit.
  • Fix _requestStreamInfo() and _requestStreamStatus() using wrong retry variable.
  • Add a few missing timers.

I haven't been able to properly test each and every retry mechanism used but overall this consistently initializes correctly for my case when it previously does not.

@julianoes julianoes requested a review from DonLakeFlyer June 28, 2025 01:37
@julianoes julianoes marked this pull request as ready for review June 30, 2025 22:47
@julianoes julianoes requested a review from HTRamsey June 30, 2025 22:47
@julianoes
Copy link
Contributor Author

This does improve camera discovery for me but I want to walk through it once more and verify what is going on.

@julianoes
Copy link
Contributor Author

I tested this again and went through it once again. I believe it's in a better state overall but if you have concerns, let me know.

@DonLakeFlyer
Copy link
Collaborator

DonLakeFlyer commented Jul 10, 2025

I don't know if this would make your code more complicated with all the timers and whatnot but there is a method for the "try this and if that doesn't work do this instead" pattern:

    /// Sends the command and calls the fallback lambda function in
    /// case the command is MAV_RESULT_UNSUPPORTED
    void sendMavCommandWithLambdaFallback(
        std::function<void()> lambda,
        int compId, MAV_CMD command,
        bool showError,
        float param1 = 0.0f, float param2 = 0.0f, float param3 = 0.0f, float param4 = 0.0f, float param5 = 0.0f, float param6 = 0.0f, float param7 = 0.0f);

Not sure if that would simplify or complicate your code?

@julianoes
Copy link
Contributor Author

@DonLakeFlyer yes that would probably make sense to use. Does this command do internal retries or not? If it does, it wouldn't work because these need to happen outside where we can try both new and legacy commands.

@DonLakeFlyer
Copy link
Collaborator

Only these commands will retry:

bool Vehicle::_sendMavCommandShouldRetry(MAV_CMD command)
{
    switch (command) {
#ifdef QT_DEBUG
    // These MockLink command should be retried so we can create unit tests to test retry code
    case MockLink::MAV_CMD_MOCKLINK_ALWAYS_RESULT_ACCEPTED:
    case MockLink::MAV_CMD_MOCKLINK_ALWAYS_RESULT_FAILED:
    case MockLink::MAV_CMD_MOCKLINK_SECOND_ATTEMPT_RESULT_ACCEPTED:
    case MockLink::MAV_CMD_MOCKLINK_SECOND_ATTEMPT_RESULT_FAILED:
    case MockLink::MAV_CMD_MOCKLINK_NO_RESPONSE:
        return true;
#endif

        // In general we should not retry any commands. This is for safety reasons. For example you don't want an ARM command
        // to timeout with no response over a noisy link twice and then suddenly have the third try work 6 seconds later. At that
        // point the user could have walked up to the vehicle to see what is going wrong.
        //
        // We do retry commands which are part of the initial vehicle connect sequence. This makes this process work better over noisy
        // links where commands could be lost. Also these commands tend to just be requesting status so if they end up being delayed
        // there are no safety concerns that could occur.
    case MAV_CMD_REQUEST_AUTOPILOT_CAPABILITIES:
    case MAV_CMD_REQUEST_PROTOCOL_VERSION:
    case MAV_CMD_REQUEST_MESSAGE:
    case MAV_CMD_PREFLIGHT_STORAGE:
    case MAV_CMD_RUN_PREARM_CHECKS:
        return true;

    default:
        return false;
    }
}

@julianoes
Copy link
Contributor Author

So this would retry the MAV_CMD_REQUEST_MESSAGE command and that means we can't use it. Unless we add an argument that let's us specify whether to retry.

@DonLakeFlyer
Copy link
Collaborator

How come you don't want to retry the request message?

@julianoes
Copy link
Contributor Author

julianoes commented Jul 12, 2025

I am retrying it with the logic around it but I'm retrying like with the legacy command interleaved, e.g.:

  1. REQUEST_MESSAGE (CAMERA_INFORMATION)
  2. MAV_CMD_REQUEST_CAMERA_INFORMATION (deprecated)
  3. REQUEST_MESSAGE
    etc.

5 times each with exponential back off for slow cameras.

@DonLakeFlyer
Copy link
Collaborator

OK, understood on retries.

@julianoes I poked at this a bit and the crashes in unit tests are real. They are crashing in the camera retry logic. Run with --unittest and a debugger.

@julianoes
Copy link
Contributor Author

Ok, I will look into that. Thanks!

This is an attempt to fix some of the corner cases trying to discover
cameras, namely:

- Implement missing retries for CAMERA_INFORMATION, us old specific
  commands and REQUEST_MESSAGE.
- Try to simplify CameraControl::_initWhenReady a bit.
- Fix _requestStreamInfo() and _requestStreamStatus() using wrong retry
  variable.
- Add a few missing timers.
The logic was spread across 3 functions, we can actually remove a bit of
this duplication.
This makes us retry to discover a camera if it disappears after not
having been connected previously (for whatever reason).
In case a camera takes a while to properly respond, it's useful to wait
a while with retries while also not spamming the component with commands
the whole time. Therefore, retry with exponential backoff should do the
job.
- Fix missing retry counter reset in handleCaptureStatus()
- Fix missing timer stop in handleCaptureStatus()
- Consolidate inconsistent counter usage for REQUEST_CAMERA_CAPTURE_STATUS
- Standardize all retry limits to 6 attempts (was mixed 5/6)
- Standardize all retry timeouts to 1000ms (was mixed 500ms/1000ms)
- Standardize video stream info retry factor from 5 to 6 (_expectedCount * 6)
- Standardize _initWhenReady timing to consistent 500ms spacing
- Add consistent debug logging across all request functions
- Verify all success handlers properly stop timers and reset counters
- Ensure all requests use proper alternating message patterns
This fixes the unittests segfaulting.
@julianoes
Copy link
Contributor Author

I've added a commit to fix the unit tests crashing on destruction.

@DonLakeFlyer DonLakeFlyer merged commit 3e28205 into master Jul 18, 2025
20 checks passed
@DonLakeFlyer DonLakeFlyer deleted the pr-camera-fixes branch July 18, 2025 23:20
@DonLakeFlyer
Copy link
Collaborator

You should bring this across to Stable as well

@julianoes
Copy link
Contributor Author

julianoes commented Sep 1, 2025

@DonLakeFlyer I tried but it's a bit tricky with quite a few conflicts, mostly due to the changes in #11178.

Edit: oh, you mean Stable_V5.0, not V4.4!

@julianoes
Copy link
Contributor Author

@DonLakeFlyer backport: #13342

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants