Added Halide backend support for deep learning layers#8794
Merged
vpisarev merged 1 commit intoopencv:masterfrom Jun 21, 2017
Merged
Added Halide backend support for deep learning layers#8794vpisarev merged 1 commit intoopencv:masterfrom
vpisarev merged 1 commit intoopencv:masterfrom
Conversation
Member
|
Could you please rebase commit from this branch instead: https://github.com/alalek/opencv/commits/halide_cmake Usage (requires CMake 3.x):
Usage of |
Member
Author
|
@alalek , thanks! I think I've done it correctly. |
7 tasks
Contributor
|
👍 |
DINGWeiDavid
added a commit
to DINGWeiDavid/opencv
that referenced
this pull request
Jul 8, 2017
* Compile fix for circlesgrid in debug. * AVX and SSE optimizations for resize NN * photo(test): fix MergeRobertson test for AARCH64 build * Fixing buildbot's messages. * TBB: fix build on ARM * update convertFp16 using CV_CPU_CALL_FP16 * avoid link error (move the implementation of software version to header) * make getConvertFuncFp16 local (move from precomp.hpp to convert.hpp) * fix error on 32bit x86 * build: fix PCH stub files generation optimization * java: use module's public headers only * Modify the pyrlk.cl to support winSize from 8*8 to 24*24 for optical flow * cmake: add ENABLE_BUILD_HARDENING option * build: fix errors for MSVS2010-2013, reduce default softfloat scope * add tests for videostab; * build: fix "ambiguous call" (MSVS2010) * build: fix warning C4189: 'clImageUV' : local variable is initialized but not referenced * Merge pull request opencv#8816 from mshabunin:sprintf-fix Fixed snprintf for VS 2013 (opencv#8816) * Fixed snprintf for VS 2013 * snprintf: removed declaration from header, changed implementation * cv_snprintf corrected according to comments * update snprintf patch * photo: fix integer overflow There is no cast to wide integer type: std::numeric_limits<ST>::max() * std::numeric_limits<ST>::max() * Update doc build instructions for doxygen * 3rdparty: remove jinja2 source code It used for Matlab binding only * cmake: set minimal CPU instruction to SSE3 (x64) * update CPU detection on ANDROID patch * 3rdparty: cpufeatures workaround * suppress unreachable code warning - fix the define condition based on the comment * fixing models to resolve XML violation issue * added v_reduce_sum4() universal intrinsic; corrected number of threads in cv::getNumThreads() in the case of GCD * Updated alignment declarations to CV_DECL_ALIGNED macro * Updated fix for accumulate performance test in case of multiple iterations * calib3d: add CALIB_FIX_TANGENT_DIST flag * calib3d: use calibration flags from the new enums * flann: add normal assignment operator for cvflann::any * Added Python Docstrings * build: fix v_reduce_sum4 (requires SSE3) * photo: add assertion on empty image in denoising * removed MSVC warning suppression * licence updated * suppress warning on Jetson TK1 * suppress warning - check if compiler is Intel compiler - remove not referenced variables * float constant replaced by int hex representations * Refactor OpenCV Java Wrapping * video: add one more constructor for VideoWriter * Fixed gray window for gpu stereo BP and CSBP compute() for BP and CSBP output 32-bit floating-point mat, and in cv::imshow() 32-bit floating-point is recognized as [0,1] and maped to [0,255], that causes gray window for BP and CSBP. * Initial version of MediaSDK integration: - cmake dependencies search (WITH_MFX option) - raw H264, H265, MPEG2 encoding and decoding - tests for supported formats * build: added VERSIONINFO resource * build: update modules descriptions * Rewritten some tests in videoio and imgcodecs modules general: - all iterative tests have been replaced with parameterized tests - old-style try..catch tests have been modified to use EXPECT_/ASSERT_ gtest macros - added temporary files cleanup - modified MatComparator error message formatting imgcodecs: - test_grfmt.cpp split to test_jpg.cpp, test_png.cpp, test_tiff.cpp, etc. videoio: - added public HAVE_VIDEO_INPUT, HAVE_VIDEO_OUTPUT definitions to cvconfig.h - built-in MotionJPEG codec could not be tested on some platforms (read_write test was disabled if ffmpeg is off, encoding/decoding was handled by ffmpeg otherwise). - image-related tests moved to imgcodecs (Videoio_Image) - several property get/set tests have been combined into one - added MotionJPEG test video to opencv_extra * cmake: guard scanning of default MKL system-wide paths - WITH_MKL option is enabled - user doesn't specify MKLROOT/MKL_ROOT_DIR variables * core(test): added cv::sortIdx accuracy tests * core: fix IPP optimization for sortIdx * android: make optional "cpufeatures", build fixes for NDK r15 * remove ARC and auto synthesize assumptions in cocoa_window.mm * Merge pull request opencv#8869 from hrnr:akaze_part1 [GSOC] Speeding-up AKAZE, part #1 (opencv#8869) * ts: expand arguments before stringifications in CV_ENUM and CV_FLAGS added protective macros to always force macro expansion of arguments. This allows using CV_ENUM and CV_FLAGS with macro arguments. * feature2d: unify perf test use the same test for all detectors/descriptors we have. * added AKAZE tests * features2d: extend perf tests * add BRISK, KAZE, MSER * run all extract tests on AKAZE keypoints, so that the test si more comparable for the speed of extraction * feature2d: rework opencl perf tests use the same configuration as cpu tests * feature2d: fix descriptors allocation for AKAZE and KAZE fix crash when descriptors are UMat * feature2d: name enum to fix build with older gcc * Revert "ts: expand arguments before stringifications in CV_ENUM and CV_FLAGS" This reverts commit 19538ca. This wasn't a great idea after all. There is a lot of flags implemented as #define, that we don't want to expand. * feature2d: fix expansion problems with CV_ENUM in perf * expand arguments before passing them to CV_ENUM. This does not need modifications of CV_ENUM. * added include guards to `perf_feature2d.hpp` * feature2d: fix crash in AKAZE when using KAZE descriptors * out-of-bound access in Get_MSURF_Descriptor_64 * this happened reliably when running on provided keypoints (not computed by the same instance) * feature2d: added regression tests for AKAZE * test with both MLDB and KAZE keypoints * feature2d: do not compute keypoints orientation twice * always compute keypoints orientation, when computing keypoints * do not recompute keypoint orientation when computing descriptors this allows to test detection and extraction separately * features2d: fix crash in AKAZE * out-of-bound reads near the image edge * same as the bug in KAZE descriptors * feature2d: refactor invariance testing * split detectors and descriptors tests * rewrite to google test to simplify debugging * add tests for AKAZE and one test for ORB * stitching: add tests with AKAZE feature finder * added basic stitching cpu and ocl tests * fix bug in AKAZE wrapper for stitching pipeline causing lots of ! OPENCV warning: getUMat()/getMat() call chain possible problem. ! Base object is dead, while nested/derived object is still alive or processed. ! Please check lifetime of UMat/Mat objects! * cmake: add Halide support (opencv#8794) * .gitignore: added ".cache" directory back It is necessary for proper work of "git clean" command * cmake: additional messages on download errors * Catch SkipTestException in performance tests * More accurate condition to detect emulator Previous commit, 6f39f9a, tries to fix the color issue for emulator. But the condition for detecting emulator is incomplete, e.g. it stops working for emulators using Google Play, whose Build.BRAND=="google". https://stackoverflow.com/a/21505193 shows a more accurate condition for this. * Fix possible uninitialized memory in libtiff * 3rdparty: protobuf 3.1.0 sources without tests, testdata, .proto files * 3rdparty: update CMake scripts for protobuf * cmake: fix typo * fast_math.hpp: Use __asm__ rather than asm; fixes including with -std=c99 * videoio(macosx): fix array access exception in AVFoundation * Add a note to morphologyEx documentation to clarify the behavior when iterations > 1. * videoio: update VideoWriter apiPreference parameter position * videoio: drop changes from legacy C-API header * videoio: do not mix `CV_CAP` and `CAP_` APIs enum values * build: disable AVX512 Currently it is not supported. All builds are broken with enabled AVX512 option. * dnn: move module from opencv_contrib https://github.com/opencv/opencv_contrib/tree/e6f63c7a38ca40c5dc33e38736e3027e3528d6cb/modules/dnn * core: forbid handling of the case when src=dst in cv::repeat * dnn: fix public headers guards * dnn: move samples * dnn: remove unused README * dnn: fix documentation links * dnn: remove obsolete "build opencv_contrib with dnn module" tutorial * dnn: fix dnn python test files * Fixed clipLine evaluation for very long lines * 3rdparty: add ittnotify sources https://github.com/01org/IntelSEAPI/tree/master/ittnotify * viz: fix tests build * trace: initial support for code trace * dnn: fix build warnings * dnn: AVX2 fix invalid unaligned read * fixed typo * dnn: fix failed Torch tests "Torch invalid argument 2: position must be smaller than LLONG_MAX" These conditions are always true for "long position" argument. * Compiling the Java tutorials codes using Apache Ant. * build: fix viz tests removed test_precomp.cpp * build: eliminate warning * dnn: fix build - winpack - opencv_world * Fixing some static analysis issues * Fixed some bugs from Halide tests * flann: fix build with MSVC /sdl option * videoio: synchronize ffmpeg open() call * dispatch: added CV_TRY_${OPT} macro, fix dnn build - 1: OPT is available directly or via dispatcher - 0: optimization is not compiled at all * dnn: fix LayerFactory initialization * another round of dnn optimization (opencv#9011) * another round of dnn optimization: * increased malloc alignment across OpenCV from 16 to 64 bytes to make it AVX2 and even AVX-512 friendly * improved SIMD optimization of pooling layer, optimized average pooling * cleaned up convolution layer implementation * made activation layer "attacheable" to all other layers, including fully connected and addition layer. * fixed bug in the fusion algorithm: "LayerData::consumers" should not be cleared, because it desctibes the topology. * greatly optimized permutation layer, which improved SSD performance * parallelized element-wise binary/ternary/... ops (sum, prod, max) * also, added missing copyrights to many of the layer implementation files * temporarily disabled (again) the check for intermediate blobs consistency; fixed warnings from various builders * Removed usage of std::map in DetectionOutput layer * formating style and making changes accordingly to review * dnn: added trace macros * core: fix infinite recursion in compare * Fixed python sample for googlenet in dnn * Merge pull request opencv#8585 from tonyke1993:ap3p Enable p3p and ap3p in solvePnPRansac (opencv#8585) * add paper info * allow p3p and ap3p being RANSAC kernel * keep previous code * apply catrees comment * fix getMat * add comment * add solvep3p test * test return value * fix warnings * core: add an ability to use cxx11 lambda as a parallel_for_ body * core: add CV_CXX_11 flag to cvdef.h * Align convolutional layer weights separately from origin ones * Fixed several issues found by static analysis * build: remove #define to prevent unexpected impact on user applications * dnn: added "hidden" experimental namespace Main purpose of this namespace is to avoid using of incompatible binaries that will cause applications crashes. This additional namespace will not impact "Source code API". This change allows to maintain ABI checks (with easy filtering out). * dnn: cleanup torch integration code * gtk: check NULL before unref * calib3d(perf): disable SGBM tests in debug mode because they are too long (takes minutes) * dnn: fix compilation of Halide tests * Disabled logging in caffe parser in release * Fix WinRT build breaks in highgui and videoio. * Fixed some issues found by static analysis (4th round) * cmake: don't add vs_version.rc for static modules (ts) * hdr_parser: ignore lines with 'CV__' macros * binding: fix headers processing * a fix for open issue 4772 * Prevent crash when attempting to create training data without responses. This is at least useful when using an SVM one-class linear classifier, so there are valid use cases. * how_to_scan_images.markdown: fix grammer mistakes Improve the readability of the tutorial. * static build workaround * Removed extra dependencies from videoio library * canny: disallow broken inplace arguments * Fix style * Issues found by static analysis (5th round) * Fix error message fisheye CALIB_CHECK_COND The old error message was not giving any hint which input array (image) led to an ill conditioned matrix. This made it near impossible to identify poor images in a larger set. A better approach would be to implement a checker function which gives each image a rating before the real calibration is performed. This could also include some image properties like sharpness, etc. * Fix MKL linkage with enabled OpenMP * Add 64-bit imshow behavior in the documentation. * Fix ffmpeg detection with -D OPENCV_WARNINGS_ARE_ERRORS=ON option. * add java wrappers to dnn module * dnn: fix some tutorial links * dnn: fix links * Add a note about cxx11 range-based loop in Mat_ documentation * core: add a test of iteration through the Mat_ with range-based for * Fixed minor doc issue * cmake: update CXX11 compiler flag * version 3.3.0-rc * Fix wrong mat access. * Merge pull request opencv#9075 from TonyLianLong:master Remove unnecessary Non-ASCII characters from source code (opencv#9075) * Remove unnecessary Non-ASCII characters from source code Remove unnecessary Non-ASCII characters and replace them with ASCII characters * Remove dashes in the @param statement Remove dashes and place single space in the @param statement to keep coding style * misc: more fixes for non-ASCII symbols * misc: fix non-ASCII symbol in CMake file * cmake: fix linker flags * ocl: rework events handling with clSetEventCallback * ocl: async cl_buffer cleanup queue (for event callback)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pullrequest changes
Added looking for Halide library. Require flags
WITH_HALIDEandHALIDE_ROOT_DIR.If Halide was found, defines
HAVE_HALIDEto 1. Otherwise defines to 0.In code we should use
#if HAVE_HALIDE /*...*/ #endifinstead#ifdef HAVE_HALIDE /*...*/ #endif.Installation guide might be found at dnn/tutorials/tutorial_dnn_halide.markdown.
TODO: Change
HALIDE_ROOT_DIRto something more suitable. It might beHALIDE_BUILD_DIR.