per-layer feature mask

create a new param entry with id 31=uint
use bit for per-layer feature masking

```cpp
bool use_fp16_packed;
bool use_fp16_storage;
bool use_fp16_arithmetic;
```

Sample use case
```ruby
7767517
6 6
Input            data         0 1 data 0=224 1=224 2=3
Convolution      conv1_1      1 1 data conv1_1 0=64 1=3 4=1 5=1 6=1728 9=1
Convolution      conv1_2      1 1 conv1_1 conv1_2 0=64 1=3 4=1 5=1 6=36864 9=1
Pooling          pool1        1 1 conv1_2 pool1 1=2 2=2
Convolution      conv2_1      1 1 pool1 conv2_1 0=128 1=3 4=1 5=1 6=73728 9=1
Convolution      conv2_2      1 1 conv2_1 output 0=128 1=3 4=1 5=1 6=147456 9=1
```
Typically, we use fp16 computation to improve inference speed
Because the weight value of `conv2_1` is large, fp16 accumulation may cause numerical overflow, so fp16 needs to be disabled individually for `conv2_1`, while other layers continue to use fp16 mode

Add `31=1` i.e. `(1<<0)` as disabled bit to disable fp16
```ruby
Convolution      conv2_1      1 1 pool1 conv2_1 0=128 1=3 4=1 5=1 6=73728 9=1 31=1
```

It is also possible to control `num_threads` for each layer individually, but it is not very useful, so no more precious bits are used

|mask|bit|rationale|
|---|---|---|
|no fp16 arithmetic|1<<0|precision concern|
|no fp16 storage|1<<1|precision concern|
|no bf16 storage|1<<2|precision concern|
|no int8|1<<3|debug dynamic quantized model|
|no vulkan|1<<4|reduce overhead for cpu op - gpu split - cpu op|
|no sgemm|1<<5|reduce some memory|
|no winograd|1<<6|reduce some memory|

These masks will be implemented, and more bits can be used to achieve other needs in the future


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

per-layer feature mask #4273

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

mask	bit	rationale
no fp16 arithmetic	1<<0	precision concern
no fp16 storage	1<<1	precision concern
no bf16 storage	1<<2	precision concern
no int8	1<<3	debug dynamic quantized model
no vulkan	1<<4	reduce overhead for cpu op - gpu split - cpu op
no sgemm	1<<5	reduce some memory
no winograd	1<<6	reduce some memory

per-layer feature mask #4273

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions