Skip to content

Support one-time audio video reading#22930

Merged
alalek merged 15 commits intoopencv:4.xfrom
MaximMilashchenko:gstreamer_support
Dec 23, 2022
Merged

Support one-time audio video reading#22930
alalek merged 15 commits intoopencv:4.xfrom
MaximMilashchenko:gstreamer_support

Conversation

@MaximMilashchenko
Copy link
Copy Markdown
Contributor

@MaximMilashchenko MaximMilashchenko commented Dec 7, 2022

added to gsrtreamer:

  • audio + video capturing
  • added audio video samples synchronization
# Linux OpenCL
force_builders=linux,docs,Linux x64 Debug,Custom

build_image:Linux OpenCL=ubuntu:20.04
buildworker:Linux OpenCL=linux-4

build_image:Custom=gstreamer:16.04
buildworker:Custom=linux-1

Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to fix compilation / related CI issues.

Comment on lines +239 to +242
#ifdef _WIN32
// validate audio frame size
EXPECT_NEAR(audioFrame.cols, samplesPerFrame, audioSamplesTolerance);
/EXPECT_NEAR(audioFrame.cols, samplesPerFrame, audioSamplesTolerance);
#endif
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check backend, not platform.

Comment on lines +327 to +328
gint nbVideoStream;
gint nbAudioStream;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not used?

class GStreamerCapture CV_FINAL : public IVideoCapture
{
public:
struct StreamData
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StreamData

This type is not really used in the current state (used once to replace videoStream / audioStream fields).

GSafePtr<GstSample> usedVideoSample;
GSafePtr<GstSample> impendingAudioSample;
GSafePtr<GstSample> audioSample;
std::vector<GstSample*> audioSamples;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::vector<GstSample*>

How is lifetime controlled?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memory clearing is done manually

for (uint i = 0; i < audioSamples.size(); i++)
{
    gst_sample_unref(audioSamples[i]);
}
audioSamples.clear();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work as many other error-prone attempts of manual memory management.

There are other places missed:

  • close()
  • destructor

Just add CV_CheckEQ(audioSamples.size(), (size_t)0, ""); into destructor and bailout from the middle of the capture loop. E.g.:

if (frame == 11)
    break;

paramCombination("test_audio.mp4", 1, 0.15, CV_8UC3, 240, 320, 90, 132299, 30, 30., cv::CAP_GSTREAMER)
#ifdef _WIN32
paramCombination("test_audio.mp4", 1, 0.15, CV_8UC3, 240, 320, 90, 131819, 30, 30., cv::CAP_MSMF)
, paramCombination("test_audio.mp4", 1, 0.15, CV_8UC3, 240, 320, 90, 131819, 30, 30., cv::CAP_MSMF),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

, ... ,

One comma only.

INSTANTIATE_TEST_CASE_P(/**/, Media, testing::ValuesIn(mediaParams));
#endif // _WIN32

#ifdef _WIN32
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer backend check if possible.

(because GStreamer builds exist for Windows too)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_WIN32 check is only for MSMF

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I wrote above, GStreamer for Windows passes this check too.

is only for MSMF

it is not.

@alalek
Copy link
Copy Markdown
Member

alalek commented Dec 16, 2022

There are merge conflicts. Squash commits into one and then rebase onto upstream.

Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the update.
Please take a look on comments below.

GSafePtr<GstSample> usedVideoSample;
GSafePtr<GstSample> impendingAudioSample;
GSafePtr<GstSample> audioSample;
std::vector<GstSample*> audioSamples;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work as many other error-prone attempts of manual memory management.

There are other places missed:

  • close()
  • destructor

Just add CV_CheckEQ(audioSamples.size(), (size_t)0, ""); into destructor and bailout from the middle of the capture loop. E.g.:

if (frame == 11)
    break;

Comment on lines +369 to +370
gint videoStream;
gint audioStream;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to move them - they were defined previous on lines 327-328 before duration

INSTANTIATE_TEST_CASE_P(/**/, Media, testing::ValuesIn(mediaParams));
#endif // _WIN32

#ifdef _WIN32
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I wrote above, GStreamer for Windows passes this check too.

is only for MSMF

it is not.

ASSERT_TRUE(cap.grab());

if (frame == 0)
if (backend == cv::CAP_MSMF)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it applicable for MSMF only?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it applicable for MSMF only?

Because position tracking in gstreamer with the gst_element_query_position function is inaccurate. A longer video frame can be read anywhere as samples are captured, or not counted at all

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still could dump these properties without test failures.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get it. Fixed

case CAP_PROP_AUDIO_BASE_INDEX:
return audioBaseIndex;
case CAP_PROP_AUDIO_TOTAL_STREAMS:
CV_LOG_ONCE_WARNING(NULL, "OpenCV | GStreamer: CAP_PROP_AUDIO_TOTAL_STREAMS property not support");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"is not supported"

Comment on lines +338 to +347
gint64 bufferedAudioDuration;
gint64 requiredAudioTime;
gint64 impendingVideoSampleTime;
gint64 usedVideoSampleTime;
gint64 videoSampleDuration;
gint64 audioSampleDuration;
gint64 impendingAudioSampleTime;
gint64 audioSampleTime;
gint64 chunkLengthOfBytes;
gint64 givenAudioTime;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duration
time

All these fields must have the clear unit suffix, e.g. NS for nanoseconds. For audio, inSamples could be applicable too.

Comment on lines +341 to +347
gint64 usedVideoSampleTime;
gint64 videoSampleDuration;
gint64 audioSampleDuration;
gint64 impendingAudioSampleTime;
gint64 audioSampleTime;
gint64 chunkLengthOfBytes;
gint64 givenAudioTime;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usedVideo
givenAudio

Why are here different prefixes?


usedVideoSample / audioSample - the same

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usedVideo
givenAudio

Why are here different prefixes?

usedVideoSample / audioSample - the same

Two samples are buffered for video, the one has prefix used. The sample that will be used in the next iteration has prefix impending. The logic is the same with the video timestaps. This is not necessary for audio, for timestamps prefix is given (duration of the got audio)

audioSample.attach(gst_app_sink_pull_sample(GST_APP_SINK(audiosink.get())));
if (!audioSample)
{
CV_LOG_ERROR(NULL, "videoio(MSMF): gst_app_sink_pull_sample() method is not succeeded");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we see this error message in tests?

[ RUN      ] Audio.audio/1, where GetParam() = ("test_audio.mp4", 1, 132522, 0.15, GSTREAMER)
[ INFO:0@2.343] global cap_gstreamer.cpp:1378 open OpenCV | GStreamer: /home/alalek/projects/opencv/extra/testdata/highgui/audio/test_audio.mp4
[ INFO:0@2.343] global cap_gstreamer.cpp:1411 open OpenCV | GStreamer: mode - FILE
[ERROR:0@4.890] global cap_gstreamer.cpp:700 grabAudioFrame videoio(MSMF): gst_app_sink_pull_sample() method is not succeeded
[       OK ] Audio.audio/1 (2772 ms)

Why is it an error?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it an error?

The gst_app_sink_is_eos function does not return true in some cases until it reads an empty sample. Added a condition to track this case

#ifdef _WIN32
const paramCombination mediaParams[] =
{
paramCombination("test_audio.mp4", 1, 0.15, CV_8UC3, 240, 320, 90, 132299, 30, 30., cv::CAP_GSTREAMER)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have doubled "the same" videoTimestamp's on frames 85 and 86 (2866.67 ms)?

[ RUN      ] Media.audio/0, where GetParam() = ("test_audio.mp4", 1, 0.15, 16, 240, 320, 90, 132299, 30, 30, GSTREAMER)
[ INFO:0@5.116] global cap_gstreamer.cpp:1378 open OpenCV | GStreamer: /home/alalek/projects/opencv/extra/testdata/highgui/audio/test_audio.mp4
[ INFO:0@5.116] global cap_gstreamer.cpp:1411 open OpenCV | GStreamer: mode - FILE
[ WARN:0@5.607] global cap_gstreamer.cpp:1672 open OpenCV | GStreamer warning: unable to query duration of stream
[ WARN:0@5.610] global cap_gstreamer.cpp:1703 open OpenCV | GStreamer warning: Cannot query video position: status=1, value=0, duration=-1
[ WARN:0@5.630] global cap_gstreamer.cpp:1878 getProperty OpenCV | GStreamer: CAP_PROP_AUDIO_SHIFT_NSEC property is not supported
[ WARN:0@5.630] global cap_gstreamer.cpp:1792 getProperty OpenCV | GStreamer: CAP_PROP_POS_MSEC property result may be unrealiable: https://github.com/opencv/opencv/issues/19025
video0 timestamp: 0  audio0 timestamp: 0 (audio shift nanoseconds: 0 , seconds: 0)
frame=0:  audioFrameSize=1470  videoTimestamp=0 ms
frame=1:  audioFrameSize=1470  videoTimestamp=33.3333 ms
frame=2:  audioFrameSize=1470  videoTimestamp=66.6667 ms
frame=3:  audioFrameSize=1469  videoTimestamp=100 ms
frame=4:  audioFrameSize=1470  videoTimestamp=133.333 ms
frame...
frame=85:  audioFrameSize=0  videoTimestamp=2866.67 ms
frame=86:  audioFrameSize=1470  videoTimestamp=2866.67 ms
frame=87:  audioFrameSize=1470  videoTimestamp=2900 ms
frame=88:  audioFrameSize=1470  videoTimestamp=2933.33 ms
frame=89:  audioFrameSize=1470  videoTimestamp=2966.67 ms
Total audio samples=132299
[       OK ] Media.audio/0 (2821 ms)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I wrote, the gst_element_query_position function is not accurate, this is the reason

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some random in output:

frame=0:  audioFrameSize=1470  videoTimestamp=0 ms
frame=1:  audioFrameSize=2940  videoTimestamp=33.3333 ms
frame=2:  audioFrameSize=1470  videoTimestamp=100 ms
frame=3:  audioFrameSize=1469  videoTimestamp=133.333 ms
frame=4:  audioFrameSize=1470  videoTimestamp=166.667 ms
frame...
frame=85:  audioFrameSize=1470  videoTimestamp=2866.67 ms
frame=86:  audioFrameSize=1470  videoTimestamp=2900 ms
frame=87:  audioFrameSize=1470  videoTimestamp=2933.33 ms
frame=88:  audioFrameSize=0  videoTimestamp=2966.67 ms
frame=89:  audioFrameSize=1470  videoTimestamp=2966.67 ms
frame=0:  audioFrameSize=1470  videoTimestamp=0 ms
frame=1:  audioFrameSize=1470  videoTimestamp=33.3333 ms
frame=2:  audioFrameSize=1470  videoTimestamp=66.6667 ms
frame=3:  audioFrameSize=1469  videoTimestamp=100 ms
frame=4:  audioFrameSize=1470  videoTimestamp=133.333 ms
frame...
frame=85:  audioFrameSize=2940  videoTimestamp=2833.33 ms
frame=86:  audioFrameSize=1470  videoTimestamp=2900 ms
frame=87:  audioFrameSize=0  videoTimestamp=2933.33 ms
frame=88:  audioFrameSize=1470  videoTimestamp=2933.33 ms
frame=89:  audioFrameSize=1470  videoTimestamp=2966.67 ms

Below timestamps for generic video-only scan (ignore crc64):

frame=80  frame_size=[320 x 240]  videoTimestamp=2700 ms   crc64=0FF87E8865B65E85
frame=81  frame_size=[320 x 240]  videoTimestamp=2733.33 ms   crc64=5B0016D23525B13E
frame=82  frame_size=[320 x 240]  videoTimestamp=2766.67 ms   crc64=7ACED82BA667C494
frame=83  frame_size=[320 x 240]  videoTimestamp=2800 ms   crc64=5F393F9FF6515BEF
frame=84  frame_size=[320 x 240]  videoTimestamp=2833.33 ms   crc64=1DA3C18F7CC6254F
frame=85  frame_size=[320 x 240]  videoTimestamp=2866.67 ms   crc64=2D2D5461D2C20E89
frame=86  frame_size=[320 x 240]  videoTimestamp=2900 ms   crc64=568E4DC494CEBB11
frame=87  frame_size=[320 x 240]  videoTimestamp=2933.33 ms   crc64=C73EDB4F7625C01D
frame=88  frame_size=[320 x 240]  videoTimestamp=2966.67 ms   crc64=B7BE9F635F4ED293
frame=89  frame_size=[320 x 240]  videoTimestamp=3000 ms   crc64=9349423711753500
    VideoCapture capture(findDataFile("audio/test_audio.mp4"), CAP_GSTREAMER);

    Mat frame;
    for (int i = 0; ; i++)
    {
        if (!capture.grab())
            break;
        capture.retrieve(frame);
        std::cout << "frame=" << i << "  frame_size=" << frame.size() << "  videoTimestamp=" << capture.get(CAP_PROP_POS_MSEC) << " ms   " << cv::format("crc64=%016lX", crc64(frame)) << std::endl;
    }

@alalek
Copy link
Copy Markdown
Member

alalek commented Dec 21, 2022

There is failed test after the last update (reproduced locally):

[ RUN      ] Media.audio/0, where GetParam() = ("test_audio.mp4", 1, 0.15, 16, 240, 320, 90, 132299, 30, 30, GSTREAMER)

(opencv_test_videoio:33728): GStreamer-WARNING **: Name 'uridecodebin3' is not unique in bin 'pipeline8', not adding
[ INFO:0@48.940] global cap_gstreamer.cpp:1398 open OpenCV | GStreamer: /build/precommit_custom_linux/4.x/opencv_extra/testdata/highgui/audio/test_audio.mp4
[ INFO:0@48.940] global cap_gstreamer.cpp:1431 open OpenCV | GStreamer: mode - FILE
[ WARN:0@48.944] global cap_gstreamer.cpp:1692 open OpenCV | GStreamer warning: unable to query duration of stream
[ WARN:0@48.944] global cap_gstreamer.cpp:1723 open OpenCV | GStreamer warning: Cannot query video position: status=1, value=0, duration=-1
[ WARN:0@48.945] global cap_gstreamer.cpp:1898 getProperty OpenCV | GStreamer: CAP_PROP_AUDIO_SHIFT_NSEC property is not supported
[ WARN:0@48.945] global cap_gstreamer.cpp:1813 getProperty OpenCV | GStreamer: CAP_PROP_POS_MSEC property result may be unrealiable: https://github.com/opencv/opencv/issues/19025
video0 timestamp: 0  audio0 timestamp: 0 (audio shift nanoseconds: 0 , seconds: 0)
frame=0:  audioFrameSize=1470  videoTimestamp=0 ms
frame=1:  audioFrameSize=2940  videoTimestamp=33.3333 ms
frame=2:  audioFrameSize=1470  videoTimestamp=100 ms
/build/precommit_custom_linux/4.x/opencv/modules/videoio/test/test_audio.cpp:183: Failure
Value of: cap.grab()
  Actual: false
Expected: true
Google Test trace:
/build/precommit_custom_linux/4.x/opencv/modules/videoio/test/test_audio.cpp:181: frame=3
[  FAILED  ] Media.audio/0, where GetParam() = ("test_audio.mp4", 1, 0.15, 16, 240, 320, 90, 132299, 30, 30, GSTREAMER) (11 ms)

Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you 👍

a-sajjad72 pushed a commit to a-sajjad72/opencv that referenced this pull request Mar 30, 2023
Support one-time audio video reading

* stream switching functionality

* audio+video pipeline with switch stream functionality

* audio video sync

* fixed sync

* removed switch swtream functionality

* changed test for gstreamer audio

* fixed error

* fixed error

* fixed issue

* fixed issue

* fixed error

* fixed error

* fixed error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants