Skip to content

per-layer feature mask #4273

@nihui

Description

@nihui

create a new param entry with id 31=uint
use bit for per-layer feature masking

bool use_fp16_packed;
bool use_fp16_storage;
bool use_fp16_arithmetic;

Sample use case

7767517
6 6
Input            data         0 1 data 0=224 1=224 2=3
Convolution      conv1_1      1 1 data conv1_1 0=64 1=3 4=1 5=1 6=1728 9=1
Convolution      conv1_2      1 1 conv1_1 conv1_2 0=64 1=3 4=1 5=1 6=36864 9=1
Pooling          pool1        1 1 conv1_2 pool1 1=2 2=2
Convolution      conv2_1      1 1 pool1 conv2_1 0=128 1=3 4=1 5=1 6=73728 9=1
Convolution      conv2_2      1 1 conv2_1 output 0=128 1=3 4=1 5=1 6=147456 9=1

Typically, we use fp16 computation to improve inference speed
Because the weight value of conv2_1 is large, fp16 accumulation may cause numerical overflow, so fp16 needs to be disabled individually for conv2_1, while other layers continue to use fp16 mode

Add 31=1 i.e. (1<<0) as disabled bit to disable fp16

Convolution      conv2_1      1 1 pool1 conv2_1 0=128 1=3 4=1 5=1 6=73728 9=1 31=1

It is also possible to control num_threads for each layer individually, but it is not very useful, so no more precious bits are used

mask bit rationale
no fp16 arithmetic 1<<0 precision concern
no fp16 storage 1<<1 precision concern
no bf16 storage 1<<2 precision concern
no int8 1<<3 debug dynamic quantized model
no vulkan 1<<4 reduce overhead for cpu op - gpu split - cpu op
no sgemm 1<<5 reduce some memory
no winograd 1<<6 reduce some memory

These masks will be implemented, and more bits can be used to achieve other needs in the future

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions