HI-APP Implementation

Abstract

Recently, vision–language models (VLMs) have delivered state-of-the-art multi-modal accuracy, yet deploying them on FPGA/ASIC accelerators remains costly: after quantizing general matrix multiply (GEMM) to low-bit integer math, non linear functions (NLFs) such as GELU and LayerNorm dominate resources and power. Prior NLF approximations either use high-precision piecewise-linear (PWL) models that still demand sizable logic/DSP budgets, or low-precision integer surrogates that require fine-tuning to recover accuracy. In this paper, we present hardware-efficient, training-free approximations for two representative NLFs, namely GELU and LayerNorm. First, for GELU, we propose a power-of-two (PoT) PWL scheme: we analytically study the LUT-entry/accuracy trade-off under input clipping, introduce an automatic clipping-point selection to meet a target error, and quantize segment slopes to PoT to replace multipliers with shifts. Second, for LayerNorm, we eliminate floating-point operations in quantized pipelines via a PoT-based mean estimator and a log-based shift-LUT approximation of the reciprocal square root for variance normalization. Both designs map to a common shift-add datapath and co-optimize naturally with quantized GEMM. On quantized CLIP-ViT models, our approach is plug-and-play (no additional training) and incurs at most a 0.93% Top-1 drop on ImageNet. A prototype on Xilinx FPGA reduces DSP usage by up to 100%, LUTs by 69.8%, and FFs by 96.0%, yielding substantial resource-utilization and practicality gains. These results indicate that simple, PoT-driven approximations can cap NLF overheads and enable practical, resource-aware VLM acceleration on reconfigurable and custom silicon.

Model List

ViT-B/32   
ViT-L/14

Baseline Inference

python clip_test.py --model {model_name} --train_set {ImageNet Trainset Directory} --test_set{ImageNet Testset Directory}

Quantized Model Inference

python clip_test.py --model {model_name} --train_set {ImageNet Trainset Directory} --test_set{ImageNet Testset Directory}  --quant --calib

Approximated Model Inference

python clip_test.py --model {model_name} --train_set {ImageNet Trainset Directory} --test_set{ImageNet Testset Directory}  --quant --calib --int_norm --int_act --int_softmax

References

This project is based on the following repository:
- https://github.com/openai/CLIP
  (Used as the base implementation, and model weights were downloaded from this repository.)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
clip		clip
data		data
figure		figure
tests		tests
.gitignore		.gitignore
CLIP.png		CLIP.png
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
clip_test.py		clip_test.py
hubconf.py		hubconf.py
imagenet_class.txt		imagenet_class.txt
imagenet_template.txt		imagenet_template.txt
model-card.md		model-card.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HI-APP Implementation

Abstract

Model List

Baseline Inference

Quantized Model Inference

Approximated Model Inference

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

HI-APP Implementation

Abstract

Model List

Baseline Inference

Quantized Model Inference

Approximated Model Inference

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages