add audio support in cap_msmf#19721
Conversation
|
@alalek, I fixed the errors, but there is a warning because the patch size is exceeded. Also, the build failed on one of the Jenkins. |
alalek
left a comment
There was a problem hiding this comment.
patch size opencv_extra: 114698 KiB
Where is the link on opencv_extra's PR?
114 Mb is very large. Existed video test data is even smaller. Test data must be reduced.
@allnes do we have an issue about Audio support? Link should be added.
PR's description should not be empty. At least it should define the scope of work, how/what should be validated, prerequisites, etc.
|
|
||
| //Properties of audio VideoIO | ||
| CV_CAP_PROP_AUDIO_ENABLE = 1000, // Select audio or video | ||
| CV_CAP_PROP_BPS = 1001, // Change bit_per_sample parametr for audio |
There was a problem hiding this comment.
legacy/constants_c.h
Legacy files must not be touched without critical reasons.
There was a problem hiding this comment.
I changed this file to use the same style of property names inside the backend (CV_CAP_PROP_...). If I don't change the file then I need to use CAP_PROP_...
| CAP_PROP_HW_ACCELERATION=50, //!< (**open-only**) Hardware acceleration type (see #VideoAccelerationType). Setting supported only via `params` parameter in cv::VideoCapture constructor / .open() method. Default value is backend-specific. | ||
| CAP_PROP_HW_DEVICE =51, //!< (**open-only**) Hardware device index (select GPU if multiple available) | ||
| CAP_PROP_AUDIO_ENABLE =1000, | ||
| CAP_PROP_BPS =1001, |
There was a problem hiding this comment.
CAP_PROP_BPS
What does that mean in the context of VideoCapture?
"Bytes per second"? (the most incorrect case)
| CAP_PROP_HW_ACCELERATION=50, //!< (**open-only**) Hardware acceleration type (see #VideoAccelerationType). Setting supported only via `params` parameter in cv::VideoCapture constructor / .open() method. Default value is backend-specific. | ||
| CAP_PROP_HW_DEVICE =51, //!< (**open-only**) Hardware device index (select GPU if multiple available) | ||
| CAP_PROP_AUDIO_ENABLE =1000, | ||
| CAP_PROP_BPS =1001, |
There was a problem hiding this comment.
Proper documentation must be added.
| CAP_PROP_ORIENTATION_AUTO=49, //!< if true - rotates output frames of CvCapture considering video file's metadata (applicable for FFmpeg back-end only) (https://github.com/opencv/opencv/issues/15499) | ||
| CAP_PROP_HW_ACCELERATION=50, //!< (**open-only**) Hardware acceleration type (see #VideoAccelerationType). Setting supported only via `params` parameter in cv::VideoCapture constructor / .open() method. Default value is backend-specific. | ||
| CAP_PROP_HW_DEVICE =51, //!< (**open-only**) Hardware device index (select GPU if multiple available) | ||
| CAP_PROP_AUDIO_ENABLE =1000, |
There was a problem hiding this comment.
I wanted the value in this file to match the values in the constants_c.h file.
modules/videoio/test/test_audio.cpp
Outdated
| @@ -0,0 +1,150 @@ | |||
| #include <tuple> | |||
There was a problem hiding this comment.
Mandatory requirements of OpenCV library:
- proper license header
- test_precomp.hpp must be included first.
| CAP_PROP_ORIENTATION_AUTO=49, //!< if true - rotates output frames of CvCapture considering video file's metadata (applicable for FFmpeg back-end only) (https://github.com/opencv/opencv/issues/15499) | ||
| CAP_PROP_HW_ACCELERATION=50, //!< (**open-only**) Hardware acceleration type (see #VideoAccelerationType). Setting supported only via `params` parameter in cv::VideoCapture constructor / .open() method. Default value is backend-specific. | ||
| CAP_PROP_HW_DEVICE =51, //!< (**open-only**) Hardware device index (select GPU if multiple available) | ||
| CAP_PROP_AUDIO_ENABLE =1000, |
There was a problem hiding this comment.
// Select audio or video
We should support processing of both video and audio streams. This is why we are integrating into VideoCapture instead of adding dedicated Audio API.
The next requirement is that video and audio must be synchronized in a some way (need to define these details and write them somewhere. Other "backends" and user apps should follow there guidelines).
Also we need these properties:
- (Priority 0) CAP_PROP_VIDEO_STREAM - (open-only, 0-based index or -1 to disable) - Default value is 0.
- (Priority 0) CAP_PROP_AUDIO_STREAM - (open-only, 0-based index or -1 to disable) - Specify stream in multi-language media files. -1 - disable audio processing (default)
- (Priority 1) CAP_PROP_AUDIO_POS - (read-only, in samples) accurate audio sample timestamp of previous grabbed fragment. See CAP_PROP_AUDIO_SAMPLES_PER_SECOND.
- (Priority 1) CAP_PROP_AUDIO_DATA_DEPTH - (open-only, Mat::depth()) Default value is -1: use "native" file/codec information. Alternative definition to bits-per-sample, but with clear handling of 32F / 32S.
- (Priority 1) CAP_PROP_AUDIO_SAMPLES_PER_SECOND - (open-only) 0 - determine from file/codec input. If not specified through parameters or input, then selected audio sample rate is 44100. Note: resampling is not performed by OpenCV.
(Priority 2) CAP_PROP_AUDIO_CHANNELS - (open-only?, bitset) - to properly handle stereo or 5.1 streams. ? 0 - use average of all channels. Use -1 to grab all channels. Default value is 0 / -1 ?- (Priority 3) CAP_PROP_AUDIO_TOTAL_SAMPLES - (read-only) Number of samples in the file. Returns zero if information is not available (live streams).
- (Priority 3) CAP_PROP_AUDIO_TOTAL_CHANNELS - (read-only) Number of audio channels in the selected audio stream.
- (Priority 4) CAP_PROP_AUDIO_TOTAL_STREAMS - (read-only) Number of audio stream in the used media.
- 44100 has problem with 24 FPS video (1827.5 audio samples per frame). 48000 is good for 24,25,30 FPS media streams, but it has less support from codecs. See https://en.wikipedia.org/wiki/Sampling_(signal_processing)
Update 2021-04-30:
- drop
CAP_PROP_AUDIO_CHANNELSdue to confusion withCAP_PROP_AUDIO_TOTAL_CHANNELS. - will be replaced by
CAP_PROP_AUDIO_CHANNELS_RETRIEVE_MODElater (not in the scope of this PR)
TBD later
| int apiID = cv::CAP_MSMF; | ||
| //congigurate VideoCapture for video and audio | ||
| VideoCapture cap_video; | ||
| VideoCapture cap_audio; |
There was a problem hiding this comment.
Wrong way.
As said above, we are integrating into VideoCapture to handle both streams instead of adding a dedicated Audio API.
| return -1; | ||
| } | ||
| // open selected micro using selected API | ||
| std::vector<int> params { CAP_PROP_AUDIO_ENABLE , static_cast<int>(1) }; |
There was a problem hiding this comment.
I guided on creating a vector of parameters in a video writer (cap.cpp). There the bool cast to int, in my case it is not required. I will correct this.
samples/cpp/videocapture_audio.cpp
Outdated
| // open selected micro using selected API | ||
| cap.open(0, apiID, params); | ||
| if (!cap.isOpened()) { | ||
| cerr << "ERROR! Can't to open file\n"; |
|
jenkins cn please retry a build |
| enum DeviceSatus | ||
| { | ||
| NONDEVICE = -1, | ||
| CAMERA = 0, | ||
| MICROPHONE = 1 | ||
| }; |
There was a problem hiding this comment.
Why users need that in public API?
| CV_CAP_PROP_XI_SENSOR_FEATURE_SELECTOR = 585, // Selects the current feature which is accessible by XI_PRM_SENSOR_FEATURE_VALUE. | ||
| CV_CAP_PROP_XI_SENSOR_FEATURE_VALUE = 586, // Allows access to sensor feature value currently selected by XI_PRM_SENSOR_FEATURE_SELECTOR. | ||
|
|
||
|
|
There was a problem hiding this comment.
revert all changes in this file, including touching of empty lines.
modules/videoio/src/cap_msmf.cpp
Outdated
| res.majorType = MFMediaType_Audio; | ||
| res.subType = MFAudioFormat_PCM; | ||
| res.bit_per_sample = 32; | ||
| res.nChannels = 2; |
There was a problem hiding this comment.
I believe we should start from handling of 'Mono' audio streams
modules/videoio/src/cap_msmf.cpp
Outdated
| const double thisRateDiff = absDiff(getFramerate(), ref.getFramerate()); | ||
| const double otherRateDiff = absDiff(other.getFramerate(), ref.getFramerate()); | ||
| if (thisRateDiff < otherRateDiff) | ||
| if (width > other.width) |
There was a problem hiding this comment.
Please rename original function to handle "video"only data and create new one for audio and "generic".
Keep patches small, code history clear.
modules/videoio/src/cap_msmf.cpp
Outdated
| if (thisDiff < otherDiff) | ||
| return true; | ||
| if (thisDiff == otherDiff) | ||
| if(majorType == MFMediaType_Video) |
There was a problem hiding this comment.
Code format: if<space>()
if is not a function, it is C++ statement.
The same note is about for()
| const int audio_base_index = cap.get(cv::CAP_PROP_AUDIO_BASE_INDEX); | ||
| for (;;) | ||
| { | ||
| cap.read(video_frame); |
There was a problem hiding this comment.
Don't lost returned results.
Show how to handle them.
| CAP_PROP_ORIENTATION_AUTO=49, //!< if true - rotates output frames of CvCapture considering video file's metadata (applicable for FFmpeg back-end only) (https://github.com/opencv/opencv/issues/15499) | ||
| CAP_PROP_HW_ACCELERATION=50, //!< (**open-only**) Hardware acceleration type (see #VideoAccelerationType). Setting supported only via `params` parameter in cv::VideoCapture constructor / .open() method. Default value is backend-specific. | ||
| CAP_PROP_HW_DEVICE =51, //!< (**open-only**) Hardware device index (select GPU if multiple available) | ||
| CAP_PROP_ENABLE_MICROPHONE = 52, |
There was a problem hiding this comment.
CAP_PROP_ENABLE_MICROPHONE
Why do we need dedicated property for that?
| CAP_PROP_AUDIO_SAMPLES_PER_SECOND = 57, | ||
| CAP_PROP_AUDIO_CHANNELS = 58, | ||
| CAP_PROP_AUDIO_BASE_INDEX = 59, | ||
| CAP_PROP_AUDIO_TOTAL_CHANNELS = 60, |
There was a problem hiding this comment.
CAP_PROP_AUDIO_CHANNELS
CAP_PROP_AUDIO_TOTAL_CHANNELS
Documentation should be added.
modules/videoio/src/cap_msmf.cpp
Outdated
| else | ||
| duration = 0; | ||
| } | ||
| } |
There was a problem hiding this comment.
Read OpenCV contribution guidelines on Wiki and install necessary Git hooks.
modules/videoio/src/cap_msmf.cpp
Outdated
| MediaType captureFormat; | ||
| MediaType captureVideoFormat; | ||
| MediaType captureAudioFormat; | ||
| bool deviceType; // false - camera, true - audio |
There was a problem hiding this comment.
bool deviceType; // false - camera, true - audio
... and somewhere we have recorded streams too.
This looks overcomplicated.
alalek
left a comment
There was a problem hiding this comment.
Updated comment with properties proposals too.
modules/videoio/src/cap_msmf.cpp
Outdated
| else | ||
| { | ||
| cv::Mat(1, cursize, CV_8UC1, ptr, pitch).copyTo(frame); | ||
| //switch(index) |
There was a problem hiding this comment.
There is cv::extractChannel() call in OpenCV which can help to extract channel from interleaved data.
|
Can i get a common structure of the OpenCv development here, so i can know the idea forward, relative to this. i have been busy and could not follow a lot of trends on opencv. can one just make a simple summary? |
alalek
left a comment
There was a problem hiding this comment.
All CI builds are failed due to merge conflict.
Squash commits and then rebase on upstream branch (in that order).
samples/cpp/videocapture_audio.cpp
Outdated
| #include <fstream> | ||
| #include <stdio.h> |
samples/cpp/videocapture_audio.cpp
Outdated
| } | ||
|
|
||
| const int audioBaseIndex = cap.get(cv::CAP_PROP_AUDIO_BASE_INDEX); | ||
| const int numberOfChannels = cap.get(cv::CAP_PROP_AUDIO_TOTAL_CHANNELS); |
There was a problem hiding this comment.
cv::
we have using namespace cv in all samples to avoid using of cv:: prefix
samples/cpp/videocapture_audio.cpp
Outdated
| cout << "Number of samples: " << cap.get(cv::CAP_PROP_AUDIO_POS) << endl; | ||
| return 0; |
There was a problem hiding this comment.
How we can get here?
This is the "error" path.
.grab() should handle normal end-of-stream handling.
| if (audioFrame.empty() && videoFrame.empty()) | ||
| { | ||
| cerr << "ERROR! blank frame grabbed" << endl; | ||
| break; |
| cap.grab(); | ||
| cap.retrieve(frame, audio_base_index); |
There was a problem hiding this comment.
missing check of the returned results.
modules/videoio/test/test_audio.cpp
Outdated
| VideoCapture cap; | ||
| }; | ||
|
|
||
| class aud : public AudioTestFixture{}; |
There was a problem hiding this comment.
aud
What is the problem to use normal name?
modules/videoio/test/test_audio.cpp
Outdated
| void comparison() | ||
| { | ||
| for(int i = 0; i < validData.size(); i++) | ||
| ASSERT_TRUE(fabs(validData[i] - fileData[i]) < epsilon); |
There was a problem hiding this comment.
Google test provides ASSERT_NEAR with accurate error reporting.
Also use (...) << i; to dump index at least.
modules/videoio/test/test_audio.cpp
Outdated
| void comparison() | ||
| { | ||
| for(int i = 0; i < validData.size(); i++) | ||
| ASSERT_TRUE(fabs(validData[i] - fileData[i]) < epsilon); |
There was a problem hiding this comment.
fileData[i]
missing check for buffer overflow.
modules/videoio/test/test_audio.cpp
Outdated
| for(int j = 0; j < 44100*3; j += 44100) | ||
| { |
|
Hello, @MaximMilashchenko : Is it possible to attach a full diff (with master) ? I'd like to test the whole changes. Thanks in advance, |
|
@ebachard Check links at page footer: "ProTip! Add .patch or .diff to the end of URLs for Git’s plaintext views." |
|
@alalek : thanks for the tip. In fact, in meantime I created a full diff including new created files, and I'll give it a try. First, I'll do a (personal) code review, to understand how things work, and then I'll experiment. Thanks again ! |
86f5c78 to
a7e9d4d
Compare
| CAP_PROP_AUDIO_BASE_INDEX = 60, //!< Number of video channels | ||
| CAP_PROP_AUDIO_TOTAL_CHANNELS = 61, //!< Number of audio channels in the selected audio stream. | ||
| CAP_PROP_AUDIO_TOTAL_STREAMS = 62, //!< Number of audio stream in the used media. | ||
| CAP_PROP_SYNC_LAST_FRAME = 63, //!< Defult value is 1 (the last audio frame is synchronized with the video frame by duration), 0 is no audio and video last frames sync(the last audio frame will contain all remaining audio data. The duration of the received audio data may be longer than the duration of the received video data) |
There was a problem hiding this comment.
It makes sense to keep AUDIO prefix:
CAP_PROP_SYNC_LAST_FRAME => CAP_PROP_AUDIO_SYNC_LAST_FRAME
| CAP_PROP_AUDIO_POS = 57, //!< Audio position is measured in samples. Accurate audio sample timestamp of previous grabbed fragment. See CAP_PROP_AUDIO_SAMPLES_PER_SECOND | ||
| CAP_PROP_AUDIO_DATA_DEPTH = 58, //!< Alternative definition to bits-per-sample, but with clear handling of 32F / 32S | ||
| CAP_PROP_AUDIO_SAMPLES_PER_SECOND = 59, //!< determined from file/codec input. If not specified, then selected audio sample rate is 44100 | ||
| CAP_PROP_AUDIO_BASE_INDEX = 60, //!< Number of video channels |
There was a problem hiding this comment.
Number of video channels
(read-only) Index of first audio channel. Used as parameter for .retrieve() call.
Perhaps we should have: CAP_PROP_VIDEO_TOTAL_CHANNELS (currently with the same value)
There was a problem hiding this comment.
Agree, it's not clear why Audio_base_index sets number of video channels
There was a problem hiding this comment.
I would do both. (add total and rename to VIDEO_TOTAL_CHANNELS). Also, you need to define clearly in docs that audio channel number is not zero-indexed, it continues enumeration after video channels.
There was a problem hiding this comment.
Number of video channels
(read-only) Index of first audio channel. Used as parameter for
.retrieve()call.Perhaps we should have:
CAP_PROP_VIDEO_TOTAL_CHANNELS(currently with the same value)
What do I have to do here: to rename CAP_PROP_AUDIO_BASE_INDEX=>CAP_PROP_VIDEO_TOTAL_CHANNELS or to add a new CAP_PROP_VIDEO_TOTAL_CHANNELS property and to change the CAP_PROP_AUDIO_BASE_INDEX description?
There was a problem hiding this comment.
CAP_PROP_AUDIO_BASE_INDEXshould be here with updated comment (see above).- add new property
CAP_PROP_VIDEO_TOTAL_CHANNELS(for RGBD or stereo streams/cameras)
modules/videoio/src/cap_msmf.cpp
Outdated
|
|
||
| #include <mferror.h> | ||
|
|
||
| #include <fstream> |
modules/videoio/src/cap_msmf.cpp
Outdated
| bool grabVideoFrame(); | ||
| virtual bool grabFrame() CV_OVERRIDE; | ||
| bool retrieveAudioFrame(int, cv::OutputArray); | ||
| bool retrieveVideoFrame(cv::OutputArray); |
There was a problem hiding this comment.
cv::
Should not be used in OpenCV code.
Exception is cv::format() (to avoid problem with std::format())
There was a problem hiding this comment.
cv::Should not be used in OpenCV code.
Exception is
cv::format()(to avoid problem withstd::format())
In master, cv:: is used with this argument. Is it necessary to remove it?
There was a problem hiding this comment.
remove from modified/added code.
No need to change unmodified parts.
modules/videoio/src/cap_msmf.cpp
Outdated
| buf = NULL; | ||
| } | ||
| audioSamples.clear(); | ||
| int numberOfByte = (videoStream != -1) ? (int)((double)(curVideoTime/1e7)*captureAudioFormat.nSamplesPerSec*captureAudioFormat.nChannels*(captureAudioFormat.bit_per_sample)/8) : cursize; |
There was a problem hiding this comment.
curVideoTime/1e7
It is not accurate with integer numbers:
LONGLONG curVideoTime;
Division should be the last operation of expression (also avoid overflow - int64_t should be enough).
There was a problem hiding this comment.
numberOfByte is confusing. What is it - start of data chunk or chunk's length ? Please rename (using start or length).
What is magic number 1e7 - does it come from MSFT measurements in units of 0.1us ?
(double) cast is extra here, 1e7 should be double and will autocast everything else to double.
There was a problem hiding this comment.
curVideoTime/1e7
It is not accurate with integer numbers:
LONGLONG curVideoTime;Division should be the last operation of expression (also avoid overflow -
int64_tshould be enough).
Mat accepts arguments with the int type. If we take int64_t numberOfByte, then the conversion of int64_t types to int will occur in the Mat constructor.
There was a problem hiding this comment.
- compute as int64_t
- avoid doubles / floating-point computations
- check range of final result and cast to "int" if needed
| ASSERT_TRUE(cap.retrieve(videoFrame)); | ||
| ASSERT_TRUE(cap.retrieve(audioFrame, audioBaseIndex)); |
There was a problem hiding this comment.
Provide useful messages about failed checks. Add number of processed frame.
There are different cases with failures on frame 0 (nothing works here), and some frame 345 (something works).
Use GoogleTest SCOPED_TRACE for that.
| { | ||
| ASSERT_TRUE(cap.retrieve(videoFrame)); | ||
| ASSERT_TRUE(cap.retrieve(audioFrame, audioBaseIndex)); | ||
| ASSERT_EQ(audioFrame.cols/44100, 1/fps); // check if the duration of the received audio data satisfies one video frame |
There was a problem hiding this comment.
Yes. If the duration of the received audio data does not equal to one frame of the video, the check will not be passed. This is possible if synchronization does not work
There was a problem hiding this comment.
Provided code doesn't look valid. Looks like it always compare 0 vs 0.
Use this instead:
int samplesPerFrame = 1/fps*44100;
// 44100 should be checked and replaced to CAP_PROP_AUDIO_SAMPLES_PER_SECOND property.
int audioSamplesTolerance = samplesPerFrame / 2;
We should have these checks:
- position: abs(CAP_PROP_AUDIO_POS - CAP_PROP_POS_MSEC / 1000 * 44100) < audioSamplesTolerance
- duration: abs(audioFrame.cols - samplesPerFrame) < audioSamplesTolerance
There was a problem hiding this comment.
Provided code doesn't look valid. Looks like it always compare 0 vs 0.
Use this instead:
int samplesPerFrame = 1/fps*44100;
// 44100 should be checked and replaced toCAP_PROP_AUDIO_SAMPLES_PER_SECONDproperty.int audioSamplesTolerance = samplesPerFrame / 2;
We should have these checks:
- position: abs(CAP_PROP_AUDIO_POS - CAP_PROP_POS_MSEC / 1000 * 44100) < audioSamplesTolerance
- duration: abs(audioFrame.cols - samplesPerFrame) < audioSamplesTolerance
The CAP_PROP_POS_MSEC value is not always valid
| ASSERT_EQ(audioFrame.cols/44100, 1/fps); // check if the duration of the received audio data satisfies one video frame | ||
| if (!videoFrame.empty()) | ||
| videoData.push_back(videoFrame); | ||
| for (int i = 0; i < audioFrame.cols; i++) |
There was a problem hiding this comment.
Add check for audioFrame.type() as code below assumes CV_16SC1.
modules/videoio/test/test_audio.cpp
Outdated
| } | ||
| ASSERT_FALSE(fileData.empty()); | ||
| } | ||
| void comparison() |
There was a problem hiding this comment.
checkAudio
validateAudio
What does this mean?
There was a problem hiding this comment.
suggestion to use more clear name
| } | ||
| // check if 'this' is better than 'other' comparing to reference | ||
| bool isBetterThan(const MediaType& other, const MediaType& ref) const | ||
| bool VideoIsBetterThan(const MediaType& other, const MediaType& ref) const |
There was a problem hiding this comment.
Usually video quality is compared not only by resolution, but also by bitrate and FRC mode
modules/videoio/src/cap_msmf.cpp
Outdated
| buf = NULL; | ||
| } | ||
| audioSamples.clear(); | ||
| int numberOfByte = (videoStream != -1) ? (int)((double)(curVideoTime/1e7)*captureAudioFormat.nSamplesPerSec*captureAudioFormat.nChannels*(captureAudioFormat.bit_per_sample)/8) : cursize; |
There was a problem hiding this comment.
numberOfByte is confusing. What is it - start of data chunk or chunk's length ? Please rename (using start or length).
What is magic number 1e7 - does it come from MSFT measurements in units of 0.1us ?
(double) cast is extra here, 1e7 should be double and will autocast everything else to double.
modules/videoio/src/cap_msmf.cpp
Outdated
| } | ||
| } | ||
| _ComPtr<IMFMediaBuffer> buf = NULL; | ||
| BYTE* useAudioData = NULL; |
There was a problem hiding this comment.
audioDataBuffer or audioDataInUse ? useAudioData sounds like function name or boolean flag
modules/videoio/src/cap_msmf.cpp
Outdated
| for (int i = 0; i < numberOfByte; i++) | ||
| { | ||
| useAudioData[i] = bufferAudioData[i]; | ||
| } |
There was a problem hiding this comment.
why don't we use memcpy_s or std::copy (in case of using std::dequeue instead of vector for bufferAudioData)?
modules/videoio/src/cap_msmf.cpp
Outdated
| LONGLONG audioSamplePos; | ||
| DWORD numberOfAudioStreams; | ||
| Mat audioFrame; | ||
| std::vector<BYTE> bufferAudioData; |
There was a problem hiding this comment.
As long as this buffer will be used as queue for samples, I'd suggest to use ring queue or std::deque to speed it up. This will be noticeable on long queues only though.
modules/videoio/src/cap_msmf.cpp
Outdated
|
|
||
| if (!SUCCEEDED(buf->Lock(&ptr, &maxsize, &cursize))) | ||
| break; | ||
| for (unsigned int i = 0; i < cursize; i++) |
There was a problem hiding this comment.
do .reserve first to speed it up. Or do resize and just fill the data to the tail.
| } | ||
| if (!audioFrame.empty()) | ||
| { | ||
| audioData.push_back(audioFrame); |
There was a problem hiding this comment.
Same as above, it would be good to do something with decoded data. As long as we have live video,you may paint oscillogram of decoded chunk over the video frame.
There was a problem hiding this comment.
@fzhar Audio writer is out of scope of this PR. It would be added later.
| for (int nCh = 0; nCh < numberOfChannels; nCh++) | ||
| { | ||
| cap.retrieve(frame, audioBaseIndex+nCh); | ||
| audioData.push_back(frame); |
There was a problem hiding this comment.
Same as above. Writing audio to the file would be nice.
modules/videoio/src/cap_msmf.cpp
Outdated
| { | ||
| DWORD streamIndex, flags; | ||
| HRESULT hr; | ||
| std::vector<bool> outInstalFlag; |
There was a problem hiding this comment.
Why do we need vector of flags here ? At the end they are just and-ed, mybe better use single boolean flag instead of vector ?
modules/videoio/src/cap_msmf.cpp
Outdated
| } | ||
| else if (isOpen) | ||
| { | ||
| std::vector<bool> outInstalFlag; |
There was a problem hiding this comment.
vector is really extra here. jut use single bool
modules/videoio/src/cap_msmf.cpp
Outdated
| } | ||
| else | ||
| { | ||
| sampleTime += frameStep; |
There was a problem hiding this comment.
sampleTime += frameStep;
This may provide inaccurate values.
Use timestamp value from ReadSample. See code related to m_lastSampleTimestamp.
There was a problem hiding this comment.
sampleTime += frameStep;This may provide inaccurate values.
Use timestamp value from ReadSample. See code related tom_lastSampleTimestamp.
Timestamp value from Read Sample does not always indicate the end of the captured sample. Example, in the test case of {"mov", "H 264", 30. f, CAP_MSMF} (videoio_synthetic, write_read_position ) the time stamp in ReadSample is set at the beginning of the captured frame and for correct operation it is necessary to calculate SampleTime= + frameStep
There was a problem hiding this comment.
the end of the captured sample
end
This is not needed at all and can NOT be properly determined for VFR videos (without reading of the next frame).
Use provided value from the decoder/demuxer.
Compare results with FFmpeg backend.
modules/videoio/src/cap_msmf.cpp
Outdated
| case CV_CAP_PROP_POS_FRAMES: | ||
| return floor(((double)sampleTime / 1e7)* captureFormat.getFramerate() + 0.5); | ||
| return floor(((double)sampleTime / 1e7)* captureVideoFormat.getFramerate() + 0.5); |
There was a problem hiding this comment.
Use direct frame counter by counting of grab() / ReadSample() calls.
Update also ::setTime() method. Seek to zero should be accurate, others are not (perhaps we should not return property value after that)
d50197d to
31d3b96
Compare
modules/videoio/test/test_audio.cpp
Outdated
| typedef std::tuple<std::string, double, int, int, int, int, int, double, std::pair<std::string, int> > paramCombination; | ||
| typedef std::tuple<std::string, double, std::pair<std::string, int> > param; | ||
|
|
||
| class baseAudio |
There was a problem hiding this comment.
baseAudio must start with capital letter
baseAudio => AudioBaseTest
modules/videoio/test/test_audio.cpp
Outdated
| void getValidAudioData() | ||
| { | ||
| const double step = 3.14/22050; | ||
| double value = 0; | ||
| for (int j = 0; j < 3; j++) | ||
| { | ||
| value = 0; | ||
| for (int i = 0; i < 44100; i++) | ||
| { | ||
| validAudioData.push_back(sin(value)); | ||
| value += step; | ||
| } | ||
| } | ||
| } |
modules/videoio/test/test_audio.cpp
Outdated
| void comparisonAudio() | ||
| { | ||
| for (unsigned int i = 0; i < audioData.size(); i++) | ||
| { | ||
| EXPECT_LE(fabs(validAudioData[i] - audioData[i]), epsilon) << "sample index " << i; | ||
| } | ||
| } |
modules/videoio/test/test_audio.cpp
Outdated
| EXPECT_LE(fabs(validAudioData[i] - audioData[i]), epsilon) << "sample index " << i; | ||
| } | ||
| } | ||
| void comparisonVideo() |
There was a problem hiding this comment.
comparisonVideo => checkVideoFrames
modules/videoio/test/test_audio.cpp
Outdated
| void getValidAudioData() | ||
| { | ||
| const double step = 3.14/22050; | ||
| double value = 0; | ||
| for(int j = 0; j < 3; j++) | ||
| { | ||
| value = 0; | ||
| for(int i = 0; i < 44100; i++) | ||
| { | ||
| validAudioData.push_back(sin(value)); | ||
| value += step; | ||
| } | ||
| } | ||
| } |
samples/cpp/videocapture_audio.cpp
Outdated
| int numberOfSamles = 0; | ||
| for (auto item : audioData) | ||
| numberOfSamles+=item.cols; | ||
| cout << "Number of samples: " << numberOfSamles << endl; |
There was a problem hiding this comment.
move out post processing code from the for loop.
Keep for loops readable - minimal as possible.
| cap.open(0, CAP_MSMF, params); | ||
| if (!cap.isOpened()) | ||
| { | ||
| cerr << "ERROR! Can't to open file" << endl; |
| const double cvTickFreq = getTickFrequency(); | ||
| int64 sysTimeCurr = getTickCount(); | ||
| int64 sysTimePrev = sysTimeCurr; | ||
| while ((sysTimeCurr-sysTimePrev)/cvTickFreq < 10) |
There was a problem hiding this comment.
It makes sense to emit message that audio would be captured for the next 10 seconds.
To avoid confusion with program hang.
| if (cap.grab()) | ||
| { |
There was a problem hiding this comment.
"else" handling is missing.
What if user disconnect microphone device?
| } | ||
| if (!audioFrame.empty()) | ||
| { | ||
| audioData.push_back(audioFrame); |
There was a problem hiding this comment.
@fzhar Audio writer is out of scope of this PR. It would be added later.
modules/videoio/src/cap_msmf.cpp
Outdated
| try | ||
| { | ||
| if (chunkLengthOfBytes < INT_MIN || chunkLengthOfBytes > INT_MAX) | ||
| throw "The chunkLengthOfBytes is out of the allowed range"; | ||
| } | ||
| catch (const std::exception& e) | ||
| { | ||
| CV_LOG_WARNING(NULL, "MSMF: Exception is raised: " << e.what()); | ||
| return false; | ||
| } |
There was a problem hiding this comment.
How I can fix that?
There was a problem hiding this comment.
Look for similar code in the library. Use CV_Check.
modules/videoio/src/cap_msmf.cpp
Outdated
| } | ||
| copy(bufferAudioData.begin(), bufferAudioData.begin()+chunkLengthOfBytes, std::back_inserter(audioDataInUse)); | ||
| bufferAudioData.erase(bufferAudioData.begin(), bufferAudioData.begin()+chunkLengthOfBytes); | ||
| residualTime = (double)(bufferAudioData.size()/((captureAudioFormat.bit_per_sample/8)*captureAudioFormat.nChannels))/captureAudioFormat.nSamplesPerSec; |
There was a problem hiding this comment.
residualTime
Wrong logic is here.
.retrive() must be idempotent implementation. It can be called many times with same parameters or not called at all.
It is wrong to modify residualTime variable.
State must be changed during .grab() call only.
modules/videoio/src/cap_msmf.cpp
Outdated
| bool returnFlag = true; | ||
| if (videoStream != -1) | ||
| returnFlag &= grabVideoFrame(); | ||
| if (audioStream != -1) | ||
| returnFlag &= grabAudioFrame(); |
There was a problem hiding this comment.
If we fail to capture video, then there is no reason to capture audio. We already failed.
modules/videoio/src/cap_msmf.cpp
Outdated
| &flags, // Receives status flags. | ||
| &sampleTime, // Receives the time stamp. | ||
| &videoSample // Receives the sample or NULL. | ||
| NULL, // Receives the time stamp. |
There was a problem hiding this comment.
Why audio ignores timestamp?
This is why the current logic fails on sample_960x400_ocean_with_audio.mp4
Add debug dump of values from ReadSample:
CvCapture_MSMF::initStream Init stream 1 with MediaType (960x400 @ 23.976) MFVideoFormat_RGB32
CvCapture_MSMF::initStream Init stream 0 with MediaType (0x0 @ 1) ☺
CvCapture_MSMF::grabVideoFrame ReadSample(video): stream=1 time=1668332
CvCapture_MSMF::grabVideoFrame ReadSample(video): GetSampleDuration=417083
CvCapture_MSMF::grabAudioFrame ReadSample(audio): stream=0 time=213333
CvCapture_MSMF::grabAudioFrame ReadSample(audio): GetSampleDuration=206349
CvCapture_MSMF::grabAudioFrame ReadSample(audio): stream=0 time=419682
CvCapture_MSMF::grabAudioFrame ReadSample(audio): GetSampleDuration=213379
...
CvCapture_MSMF::grabVideoFrame ReadSample(video): stream=1 time=2085415
CvCapture_MSMF::grabVideoFrame ReadSample(video): GetSampleDuration=417083
CvCapture_MSMF::grabAudioFrame ReadSample(audio): stream=0 time=633061
CvCapture_MSMF::grabAudioFrame ReadSample(audio): GetSampleDuration=213379
CvCapture_MSMF::grabAudioFrame ReadSample(audio): stream=0 time=846441
CvCapture_MSMF::grabAudioFrame ReadSample(audio): GetSampleDuration=213152
...
CvCapture_MSMF::grabVideoFrame ReadSample(video): stream=1 time=2502498
CvCapture_MSMF::grabVideoFrame ReadSample(video): GetSampleDuration=417083
CvCapture_MSMF::grabAudioFrame ReadSample(audio): stream=0 time=1059593
CvCapture_MSMF::grabAudioFrame ReadSample(audio): GetSampleDuration=213379
CvCapture_MSMF::grabAudioFrame ReadSample(audio): stream=0 time=1272972
CvCapture_MSMF::grabAudioFrame ReadSample(audio): GetSampleDuration=213379
Audio stream starts 4+ frames before video. We need to capture all of this audio data and return with the first video frame.
There was a problem hiding this comment.
Why audio ignores timestamp?
This is why the current logic fails on
sample_960x400_ocean_with_audio.mp4Add debug dump of values from ReadSample:
CvCapture_MSMF::initStream Init stream 1 with MediaType (960x400 @ 23.976) MFVideoFormat_RGB32 CvCapture_MSMF::initStream Init stream 0 with MediaType (0x0 @ 1) ☺ CvCapture_MSMF::grabVideoFrame ReadSample(video): stream=1 time=1668332 CvCapture_MSMF::grabVideoFrame ReadSample(video): GetSampleDuration=417083 CvCapture_MSMF::grabAudioFrame ReadSample(audio): stream=0 time=213333 CvCapture_MSMF::grabAudioFrame ReadSample(audio): GetSampleDuration=206349 CvCapture_MSMF::grabAudioFrame ReadSample(audio): stream=0 time=419682 CvCapture_MSMF::grabAudioFrame ReadSample(audio): GetSampleDuration=213379 ... CvCapture_MSMF::grabVideoFrame ReadSample(video): stream=1 time=2085415 CvCapture_MSMF::grabVideoFrame ReadSample(video): GetSampleDuration=417083 CvCapture_MSMF::grabAudioFrame ReadSample(audio): stream=0 time=633061 CvCapture_MSMF::grabAudioFrame ReadSample(audio): GetSampleDuration=213379 CvCapture_MSMF::grabAudioFrame ReadSample(audio): stream=0 time=846441 CvCapture_MSMF::grabAudioFrame ReadSample(audio): GetSampleDuration=213152 ... CvCapture_MSMF::grabVideoFrame ReadSample(video): stream=1 time=2502498 CvCapture_MSMF::grabVideoFrame ReadSample(video): GetSampleDuration=417083 CvCapture_MSMF::grabAudioFrame ReadSample(audio): stream=0 time=1059593 CvCapture_MSMF::grabAudioFrame ReadSample(audio): GetSampleDuration=213379 CvCapture_MSMF::grabAudioFrame ReadSample(audio): stream=0 time=1272972 CvCapture_MSMF::grabAudioFrame ReadSample(audio): GetSampleDuration=213379Audio stream starts 4+ frames before video. We need to capture all of this audio data and return with the first video frame.
Audio ignores timestamp because there was no need to use it. I will fix it.
| bool CvCapture_MSMF::setTime(int numberFrame) | ||
| { | ||
| if(videoStream == -1) | ||
| return false; | ||
| PROPVARIANT var; |
There was a problem hiding this comment.
We should forbid seeking in case of Video+Audio capturing, except seek to beginning of the media file.
Other values are just not accurate even for Video-only cases.
The SetCurrentPosition method does not guarantee exact seeking. The accuracy of the seek depends on the media content.
There was a problem hiding this comment.
We should forbid seeking in case of Video+Audio capturing, except seek to beginning of the media file.
Other values are just not accurate even for Video-only cases.
The SetCurrentPosition method does not guarantee exact seeking. The accuracy of the seek depends on the media content.
We can manually rewind to any frame by setting SetCurrentPosition to 0, call grab and release the frames until we reach the desired one
There was a problem hiding this comment.
call grab and release the frames until we reach the desired one
Then the next step will be receiving a bug report about why seeking implementation is so slow.
If we want to implement feature, then we should it implemented in the best way. Otherwise we would get unresolvable threads like these: #9053
There was a problem hiding this comment.
We should forbid seeking in case of Video+Audio capturing, except seek to beginning of the media file.
Then I can forbid setting CV_CAP_PROP_YPOS_FRAMES and CV_CAP_PROP_YPOS_MSC in case of Video+Audio capturing and allow setting CV_CAP_PROP_POS_AVI_RATIO for values 0 and 1
There was a problem hiding this comment.
except seek to beginning of the media file.
Seek to zero should be supported for all modes (this case is well defined and just works).
modules/videoio/test/test_audio.cpp
Outdated
| if (!videoFrame.empty()) | ||
| videoData.push_back(videoFrame); |
There was a problem hiding this comment.
videoFrame.empty()
Must be always false on .grab() success.
Current last frame synchronization logic doesn't work properly.
modules/videoio/test/test_audio.cpp
Outdated
|
|
||
| const paramCombination mediaParams[] = | ||
| { | ||
| paramCombination("mp4", 1, 0.15, CV_8UC3, 240, 320, 90, 30, 30., {"CAP_MSMF", cv::CAP_MSMF}) |
There was a problem hiding this comment.
"CAP_MSMF", cv::CAP_MSMF
Which other tests uses the same scheme? Why we need string here?
modules/videoio/test/test_audio.cpp
Outdated
| } | ||
| ASSERT_LT(abs(cap.get(CAP_PROP_AUDIO_POS) - (cap.get(CAP_PROP_POS_MSEC)/ 1000 + 1./fps) * samplePerSecond), audioSamplesTolerance); | ||
| ASSERT_LT(abs(audioFrame.cols - samplesPerFrame), audioSamplesTolerance); | ||
| ASSERT_EQ(CV_16SC1, audioFrame.type()); |
There was a problem hiding this comment.
Type check is placed after data usage.
modules/videoio/test/test_audio.cpp
Outdated
| } | ||
| } | ||
| ASSERT_LT(abs(cap.get(CAP_PROP_AUDIO_POS) - (cap.get(CAP_PROP_POS_MSEC)/ 1000 + 1./fps) * samplePerSecond), audioSamplesTolerance); | ||
| ASSERT_LT(abs(audioFrame.cols - samplesPerFrame), audioSamplesTolerance); |
There was a problem hiding this comment.
audioFrame.cols must be same across channels.
audioFrame.cols should not be checked for the first frame (or multiple no-audio frames in the beginning) and the last frame (due to the last frame synchronization, which reads audio till the end).
There was a problem hiding this comment.
audioFrame.colsmust be same across channels.
Do I need to check it?
modules/videoio/test/test_audio.cpp
Outdated
| audioData[nCh].push_back(f); | ||
| } | ||
| } | ||
| ASSERT_LT(abs(cap.get(CAP_PROP_AUDIO_POS) - (cap.get(CAP_PROP_POS_MSEC)/ 1000 + 1./fps) * samplePerSecond), audioSamplesTolerance); |
There was a problem hiding this comment.
To make this check to work we need one more read-only property which contains timestamp shift between audio and video streams (of the first captured data)
timestamp shift should be in nanoseconds.
|
How the pipeline should behave when the audio stream is shorter than the video stream? that is, the audio has an offset at the beginning and may end earlier |
|
Last frames should return empty audio data. Audio data for the first frame is larger that for others (due to audio starts before video) |
alalek
left a comment
There was a problem hiding this comment.
Please run test locally with
paramCombination("sample_960x400_ocean_with_audio.mp4", 2, 0.15, CV_8UC3, 400, 960, 1116, 30, 30., cv::CAP_MSMF)
Several test conditions still fail (besides of audio gold results).
modules/videoio/test/test_audio.cpp
Outdated
| for (unsigned int nCh = 0; nCh < audioData.size(); nCh++) | ||
| for (unsigned int i = 0; i < audioData[nCh].size(); i++) | ||
| { | ||
| EXPECT_LE(fabs(validAudioData[nCh][i] - audioData[nCh][i]), epsilon) << "sample index " << i; |
There was a problem hiding this comment.
Again, if readFile() fails or read more data then that code tried to perform out of buffer access.
nCh < audioData.size()
Checks constraints based on input data are logically incorrect. Gold values and test parameters must be used instead.
modules/videoio/test/test_audio.cpp
Outdated
| if (nCh == 0) | ||
| audioFrameCols = audioFrame.cols; | ||
| else | ||
| ASSERT_EQ(audioFrameCols, audioFrame.cols); |
There was a problem hiding this comment.
<< nCh; or use SCOPED_TRACE with nCh value.
modules/videoio/test/test_audio.cpp
Outdated
| if (!audioFrame.empty()) | ||
| if (frame != numberOfFrames-1) | ||
| { | ||
| ASSERT_LT(abs(cap.get(CAP_PROP_AUDIO_POS) + (cap.get(CAP_PROP_TIME_SHIFT_STREAMS)/ 1e6) * samplePerSecond - (cap.get(CAP_PROP_POS_MSEC)/ 1000 + 1./fps) * samplePerSecond), audioSamplesTolerance); |
There was a problem hiding this comment.
This condition is failed with:
paramCombination("sample_960x400_ocean_with_audio.mp4", 2, 0.15, CV_8UC3, 400, 960, 1116, 30, 30., cv::CAP_MSMF)
See audioSamplePos comment above.
1./fps
This is wrong. Both positions must point to be beginning of the data. You just can't estimate the end for VFR videos.
Check must look like this:
EXPECT_NEAR(
cap.get(CAP_PROP_AUDIO_POS),
((cap.get(CAP_PROP_POS_MSEC) - cap.get(CAP_PROP_TIME_SHIFT_STREAMS) / 1e6) / 1000) * samplePerSecond,
audioSamplesTolerance)
<< cap.get(CAP_PROP_POS_MSEC)=" << cap.get(CAP_PROP_POS_MSEC);
There was a problem hiding this comment.
Check must look like this:
EXPECT_NEAR( cap.get(CAP_PROP_AUDIO_POS), ((cap.get(CAP_PROP_POS_MSEC) - cap.get(CAP_PROP_TIME_SHIFT_STREAMS) / 1e6) / 1000) * samplePerSecond, audioSamplesTolerance) << cap.get(CAP_PROP_POS_MSEC)=" << cap.get(CAP_PROP_POS_MSEC);
This is how the check should look, only for the first frame. After reading the first frame, the timestamps of audio and video will not need to align and take into account the offset
if (frame == 0)
{
EXPECT_NEAR(
cap.get(CAP_PROP_AUDIO_POS),
((cap.get(CAP_PROP_POS_MSEC) + cap.get(CAP_PROP_AUDIO_SHIFT_NSEC) / 1e6) / 1000) * samplePerSecond,
audioSamplesTolerance)
<< "CAP_PROP_POS_MSEC = " << cap.get(CAP_PROP_POS_MSEC);
}
else
{
EXPECT_NEAR(cap.get(CAP_PROP_AUDIO_POS), (cap.get(CAP_PROP_POS_MSEC) / 1000) * samplePerSecond, audioSamplesTolerance) << "CAP_PROP_POS_MSEC = " << cap.get(CAP_PROP_POS_MSEC);
EXPECT_LT(abs(audioFrame.cols - samplesPerFrame), audioSamplesTolerance);
}
There was a problem hiding this comment.
No, CAP_PROP_AUDIO_SHIFT_NSEC must be applied for all frames.
frame == 0 check is EXPECT_EQ(0, cap.get(CAP_PROP_AUDIO_POS));
There was a problem hiding this comment.
No,
CAP_PROP_AUDIO_SHIFT_NSECmust be applied for all frames.
Calculations will be erroneous. From CAP_PROP_POS_MSEC, you need to subtract the audio offset relative to the zero of the media file, and not the time difference between the start of the audio stream and the video stream
There was a problem hiding this comment.
Check is updated. Please fetch the last commit.
modules/videoio/src/cap_msmf.cpp
Outdated
| nFrame++; | ||
| usedVideoSample->GetSampleDuration(&videoSampleDuration); | ||
| requiredAudioTime = usedVideoSampleTime + videoSampleDuration - allTime; | ||
| allTime += requiredAudioTime; |
There was a problem hiding this comment.
allTime
unclear name which is used for audio only.
modules/videoio/src/cap_msmf.cpp
Outdated
| if (residualTime*1e7 > requiredAudioTime) | ||
| return true; | ||
| while ((!vEOS) ? bufferedAudioDuration <= requiredAudioTime : !aEOS) |
There was a problem hiding this comment.
residualTime
This is some kind of duplicate for bufferedAudioDuration (but with unclear name)
modules/videoio/src/cap_msmf.cpp
Outdated
| if (returnFlag) | ||
| if (!audioSamples.empty() || !bufferAudioData.empty() && aEOS) | ||
| { |
There was a problem hiding this comment.
No need to indent such huge code blocks:
if (!returnFlag)
{
CV_LOG_DEBUG(...);
return false;
}
... code block ...
modules/videoio/src/cap_msmf.cpp
Outdated
| if (!audioSamples.back()) | ||
| audioSamples.pop_back(); |
There was a problem hiding this comment.
Just do not write into array directly through &audioSamples[numberOfSamples].
It is better to write into local variable and register it on success after all passed checks.
modules/videoio/src/cap_msmf.cpp
Outdated
| sampleTime += frameStep; | ||
| audioSamples[numberOfSamples]->GetSampleDuration(&audioSampleDuration); | ||
| CV_LOG_DEBUG(NULL, "videoio(MSMF): got audio frame with timestamp=" << audioSampleTime << " duration=" << audioSampleDuration); | ||
| bufferedAudioDuration += (LONGLONG)(audioSampleDuration + residualTime*1e7); |
There was a problem hiding this comment.
residualTime*1e7
Added multiple times in the loop (unexpected shift):
while ((!vEOS) ? bufferedAudioDuration <= requiredAudioTime : !aEOS)
modules/videoio/src/cap_msmf.cpp
Outdated
| if (videoStream == -1) | ||
| break; |
There was a problem hiding this comment.
"while()" condition should handle this?
modules/videoio/test/test_audio.cpp
Outdated
| void checkAudio() | ||
| { | ||
| for (unsigned int nCh = 0; nCh < audioData.size(); nCh++) | ||
| for (unsigned int i = 0; i < validAudioData[nCh].size(); i++) |
There was a problem hiding this comment.
audioData[nCh].size() check is missing
No need to have 2 identical checkAudio() implementations.
| #if 0 | ||
| // https://filesamples.com/samples/video/mp4/sample_960x400_ocean_with_audio.mp4 | ||
| , paramCombination("sample_960x400_ocean_with_audio.mp4", 2, 0.15, CV_8UC3, 400, 960, 1116, 0, 30, 30., cv::CAP_MSMF) | ||
| #endif |
There was a problem hiding this comment.
This test case has audio-video desync starting from frame 666 which increases.
There was a problem hiding this comment.
This test case has audio-video desync starting from frame 666 which increases.
Starting from frame 666, the audio position begins to outpace the video by more than (1.0 / fps) * 0.3, because for each output of audio data, it is necessary that the data size in bytes be a multiple of the dimension of the matrix.
chunkLengthOfBytes +=
((int)(captureAudioFormat.bit_per_sample)/8* (int)captureAudioFormat.nChannels) - chunkLengthOfBytes %
((int)(captureAudioFormat.bit_per_sample)/8* (int)captureAudioFormat.nChannels);
For this reason, with each iteration, the gap between the video and audio positions will increase and on sufficiently long media files, desynchronization of positions will be observed, but the amount of output audio data with a frame will be unchanged. Confirmed by verification:
EXPECT_NEAR(audio Frame.cols, samplesPerFrame, audioSamplesTolerance);
There was a problem hiding this comment.
It is not necessary to return chunks of audio of the same size.
but the amount of output audio data with a frame will be unchanged
This doesn't make any sense for videos with variable frame rate.
There was a problem hiding this comment.
It is not necessary to return chunks of audio of the same size.
but the amount of output audio data with a frame will be unchanged
This doesn't make any sense for videos with variable frame rate.
Chunks of audio of the same size are returned only if the frame rate is constant. Chunks of audio are calculated based on the duration of the frame
There was a problem hiding this comment.
Chunks of audio are calculated based on the duration of the frame
Chunks of audio should be calculated based on the timestamp of the next video frame.
add audio support in cap_msmf * audio msmf * fixed warnings * minor fix * fixed SampleTime MSMF * minor fix, fixed audio test, retrieveAudioFrame * fixed warnings * impelemented sync audio and video stream with start offset * fixed error * fixed docs * fixed audio sample * CAP_PROP_AUDIO_POS, minor fixed * fixed warnings * videoio(MSMF): update audio test checks, add debug logging * fixed * fixed desynchronization of time positions, warnings * fixed warnings * videoio(audio): tune tests checks * videoio(audio): update properties description * build warnings Co-authored-by: Alexander Alekhin <alexander.a.alekhin@gmail.com>
Merge with extra: opencv/opencv_extra#864
@alalek, @allnes, @fzhar
Issue #16394
I attach a link on opencv_extra's PR opencv/opencv_extra#864 .