* remove duplicated newline (Tencent#4187)
* remove duplicated newline (Tencent#4188)
* optmize softmax arm neon (Tencent#4171)
* [docs] Fix typo (Tencent#4201)
* [Prelu x86] Finish intrinsic with elempack merged (Tencent#4177)
* changed size of images for pretty formatting of page (Tencent#4193)
* [Gelu x86] Finish intrinsic with elempack merged(fast version) (Tencent#4144)
* Finish the gelu x86 intrinsics
* Finish the fast tanh x86 simd impl
* Ignore .xmake directory (Tencent#4212)
* Bump pypa/cibuildwheel from 2.9.0 to 2.10.1 (Tencent#4207)
Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.9.0 to 2.10.1.
- [Release notes](https://github.com/pypa/cibuildwheel/releases)
- [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md)
- [Commits](pypa/cibuildwheel@v2.9.0...v2.10.1)
---
updated-dependencies:
- dependency-name: pypa/cibuildwheel
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* style: space alignment (Tencent#4217)
* Ignore CMakeSettings.json, the Visual Studio CMake schema file (Tencent#4228)
* RVV: use new interface for segment load/store & change word_type to size_t&add clang ci (part Tencent#4100) (Tencent#4118)
* RVV: use size_t for vl
* RVV: replace vsseg.v tuple type by using regex
-----
search:
vsseg([1-9])e(8|16|32)_v_(f|i|u)\2m(1|2|4|8)x\1\(([ -~]+), vcreate_\3\2m\4x\1\(([ -~]+)\), vl\);
substitute by:
vsseg$1e$2_v_$3$2m$4($5, $6, vl);
* RVV: replace vssseg.v tuple types by using regex
---
search:
vssseg([1-9])e(8|16|32)_v_f\2m1x\1\(([ -~]+), vcreate_f\2m1x\1\(([ -~]+)\), vl\);
substitute by:
vssseg$1e$2_v_f$2m1($3, $4, vl);
* RVV: replace vlseg.v tuple types in load/store
* RVV: replace vloxseg2ei32.v tuple types
* RVV: add a wrapper for old compilers
* RVV: add segment load/store wrapper in pakcing
* RVV: fix cmake test
* RVV: make clang happy by dropping VLAs in sgemm
* RVV: add clang cmake toolchain configure
* RVV: add clang ci, riscv64-unknown-linux-gnu
Co-authored-by: thelastlin <thelastlin@users.noreply.github.com>
Co-authored-by: nihui <shuizhuyuanluo@126.com>
* Bump pypa/cibuildwheel from 2.10.1 to 2.10.2 (Tencent#4220)
Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.10.1 to 2.10.2.
- [Release notes](https://github.com/pypa/cibuildwheel/releases)
- [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md)
- [Commits](pypa/cibuildwheel@v2.10.1...v2.10.2)
---
updated-dependencies:
- dependency-name: pypa/cibuildwheel
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* add c906 build ci (Tencent#4232)
* Add benchmark result of T-Head TH1520 (Tencent#4240)
`cpuinfo`:
```
isa : rv64imafdcvsu
mmu : sv39
cpu-freq : 1.848Ghz
cpu-icache : 64KB
cpu-dcache : 64KB
cpu-l2cache : 1MB
cpu-tlb : 1024 4-ways
cpu-cacheline : 64Bytes
cpu-vector : 0.7.1
```
Compiled with `-DCMAKE_TOOLCHAIN_FILE=../toolchains/c910-v240.toolchain.cmake -DCMAKE_BUILD_TYPE=release -DNCNN_OPENMP=OFF -DNCNN_THREADS=OFF -DNCNN_RUNTIME_CPU=OFF -DNCNN_RVV=ON -DNCNN_SIMPLEOCV=ON -DNCNN_BUILD_EXAMPLES=ON`
Seems much worse than expected 🤔
* fix param parsing issue when layer/blob name exceeds 255 (Tencent#4236)
* fix param parsing issue when layer/blob name exceeds 255
* apply code-format changes
Co-authored-by: ZhangGe6 <ZhangGe6@users.noreply.github.com>
* Memory Pool Improvement For Variadic Sized Inputs (Tencent#4190)
* Simple miss count for better space efficiency
* Simple double ended greedy;
* Add size drop threshold setter;
* set workspace allocator cr to zero as we had some sort of recylcing capability :P
Co-authored-by: LinHeLurking <LinHeLurking@users.noreply.github.com>
Co-authored-by: nihuini <nihuini@tencent.com>
* docs: disable fp16 when wrong results encountered caused by overflow (Tencent#4248)
* pnnx math operation (Tencent#4251)
* more stricter armv7 fp16 and armv84 bf16 compiler check, fix Tencent#4147 fix Tencent#4222 (Tencent#4247)
* modified the param axes of expanddims in modelwriter (Tencent#4259)
* Add TH1520 (4*C910V) toolchain support. (Tencent#4267)
* implement lstm proj_size (Tencent#4263)
* Optimize x86 DeformableConv2D (Tencent#4128)
* fix compile warning with gcc 9.1.0 including simplestl.h file (Tencent#4274)
* fix compile warning with gcc 9.1.0 including simplestl.h file
* apply code-format changes
Co-authored-by: veahow <veahow@users.noreply.github.com>
* add benchmark for rk3588 on rock5b (Tencent#4275)
* linux-x64-cpu-gcc on tencent ci
* implement layer feature disabled bit (Tencent#4278)
* add elu vulkan operator (Tencent#4280)
* fix tencent ci (Tencent#4277)
* implement GLU and pnnx conversion (Tencent#4283)
* Bump pypa/cibuildwheel from 2.10.2 to 2.11.1 (Tencent#4271)
Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.10.2 to 2.11.1.
- [Release notes](https://github.com/pypa/cibuildwheel/releases)
- [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md)
- [Commits](pypa/cibuildwheel@v2.10.2...v2.11.1)
---
updated-dependencies:
- dependency-name: pypa/cibuildwheel
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* fix pnnx softmax/normalize/slice negative axis conversion to ncnn (Tencent#4284)
* pnnx glu batchindex aware conversion (Tencent#4285)
* 1. Fix typo in readme (Tencent#4287)
* x86 sse2/avx2 optimization for convolution sgemm/winograd int8 family (Tencent#4286)
* pnnx skip dynamic size evaluation (Tencent#4291)
* Fix linux build error(Tencent#4265) (Tencent#4294)
Co-authored-by: wangyu <786794414@qq.com>
* general cpu feature detection on macos/ios, enable bf16 and i8mm on a15 a16 and m2 (Tencent#4300)
* x86 unified fc fp32/fp16s (Tencent#4303)
* more fma
* more transpose utility function
* Bump pypa/cibuildwheel from 2.11.1 to 2.11.2 (Tencent#4308)
Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.11.1 to 2.11.2.
- [Release notes](https://github.com/pypa/cibuildwheel/releases)
- [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md)
- [Commits](pypa/cibuildwheel@v2.11.1...v2.11.2)
---
updated-dependencies:
- dependency-name: pypa/cibuildwheel
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* pnnx pytorch 1.13 (Tencent#4314)
* fix Tencent#4315 (Tencent#4316)
* get_physical_cpu_count api family (Tencent#4302)
* get_physical_cpu_count api family
* set default to physical big cpu
* always treat smt core as big core
* is_smt_cpu
* get max freq mhz on windows
* windows thread affinity
* groupnorm 1d/2d/4d (Tencent#4312)
* fix slice end index, fix fp16 model weight alignment (Tencent#4317)
* tencent ci test-coverage pnnx (Tencent#4305)
* RVV: BatchNorm with fp16s(a) support (Tencent#4075)
* RVV: InstanceNorm with fp16s(a) support (Tencent#4078)
* fix ci pnnx build
* fold new_full and full_like (Tencent#4323)
* pnnx convert nn.Softmax2d (Tencent#4324)
* pnnx convert fold unfold (Tencent#4325)
* support yolov5 6.2 (Tencent#4328)
* implement ncnn fold and unfold (Tencent#4326)
* pnnx load gpu torchscript and reset device (Tencent#4330)
* fix:pnnx-softmax (Tencent#4333)
* pnnx save onnx zero (Tencent#4077)
* save foldable constants in file for reducing memory usage (Tencent#4337)
* match inplace slice copy pattern, rewrite copy uses (Tencent#4338)
* add vector optimization for loongarch64 (Tencent#4242)
* ci loongarch64 lsx (Tencent#4344)
* gridsample op support (Tencent#4288)
Co-authored-by: LRY89757 <LRY89757@users.noreply.github.com>
Co-authored-by: nihuini <nihuini@tencent.com>
Co-authored-by: nihui <shuizhuyuanluo@126.com>
* squeeze and expanddims 4d (Tencent#4346)
* implement MultiheadAttention kdim vdim (Tencent#4347)
* pnnx convert torch bitwise left_shift right_shift (Tencent#4349)
* pnnx fp16 option for ncnn and onnx weight type (Tencent#4350)
* pnnx fuse more function to module (Tencent#4351)
* pnnx fuse more function to module
* rename some pass name
* fuse adjacent reshape, fuse pad conv2d
* fuse pad conv1d
* split tests (Tencent#4354)
* Support mat.numpy() in Python (Tencent#4356)
* Fix typo in stb_image.h (Tencent#4358)
exitting -> exiting
* Fix windows-arm64 build for non-neon case (Tencent#4227)
* update release ci (Tencent#4359)
* update release ci
* find modern glslang
* parallel jobs on windows
* Fix c api allocator (Tencent#4360)
* add some c_api interfaces related to allocator setup.
* fix errors in allocator parameters in c_api.
* test c api allocator
Co-authored-by: zhangtongshe <yuyuyezi@vip.qq.com>
* update glslang (Tencent#4361)
* disable out-of-line atomics since ndk23+ for resolving linking issue with old ndk (Tencent#4362)
* I added one more project to the list of examples. (Tencent#4205)
* Dedicated to coloring black and white photographs.
* add example project link (Tencent#4365)
* fix(pybind11): build error (Tencent#4368)
* fix openmp affinity abort when cpu goes offline (Tencent#4370)
* Update release-python.yml
* small fixes
* unpack list input
* Remove LSTM2
* fix LSTM
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: Menci <huanghaorui301@gmail.com>
Co-authored-by: luqiang guo <702572275@qq.com>
Co-authored-by: Lry89757 <77330637+LRY89757@users.noreply.github.com>
Co-authored-by: magicse <magicse@users.noreply.github.com>
Co-authored-by: Zhuo Zhang <imzhuo@foxmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: 汤圆奶昔 <47135403+tonori@users.noreply.github.com>
Co-authored-by: Xavier Hsinyuan <me@lstlx.com>
Co-authored-by: thelastlin <thelastlin@users.noreply.github.com>
Co-authored-by: nihui <shuizhuyuanluo@126.com>
Co-authored-by: 柚木鉉 <740291272@qq.com>
Co-authored-by: Zhang Ge <sjtu.zg123@gmail.com>
Co-authored-by: ZhangGe6 <ZhangGe6@users.noreply.github.com>
Co-authored-by: LinHe <LinHe.Lurking@gmail.com>
Co-authored-by: LinHeLurking <LinHeLurking@users.noreply.github.com>
Co-authored-by: nihuini <nihuini@tencent.com>
Co-authored-by: MisakaBit <MisakaBit@gmail.com>
Co-authored-by: LiuYi-Up <73060646+LiuYi-Up@users.noreply.github.com>
Co-authored-by: 陸 言 <robinluaa@outlook.com>
Co-authored-by: miemie2013 <53960695+miemie2013@users.noreply.github.com>
Co-authored-by: Eahow Chen <15228088+veahow@users.noreply.github.com>
Co-authored-by: veahow <veahow@users.noreply.github.com>
Co-authored-by: li mengyang <hwdefcom@outlook.com>
Co-authored-by: Yoh <wpz_yoh@163.com>
Co-authored-by: Caize Wu <zepanwucai@gmail.com>
Co-authored-by: bestpower <wangyu117136@gmail.com>
Co-authored-by: wangyu <786794414@qq.com>
Co-authored-by: shaoshengsong <30892500+shaoshengsong@users.noreply.github.com>
Co-authored-by: WuJinxuan <2456510228@qq.com>
Co-authored-by: junchao-loongson <68935141+junchao-loongson@users.noreply.github.com>
Co-authored-by: LRY89757 <LRY89757@users.noreply.github.com>
Co-authored-by: Ikko Ashimine <eltociear@gmail.com>
Co-authored-by: zhangtongshe <yuyuyezi@vip.qq.com>
Co-authored-by: tpoisonooo <khj.application@aliyun.com>
No description provided.