KnowBias, a lightweight and conceptually distinct framework that mitigates bias by strengthening, rather than suppressing, neurons encoding "knowing bias".
run_.pywill calculate the attribution scores of each neuron and each layer for the given LLM, and save into a .npy file; data can be found in 📂 data/ with constructed simple and abstract-level questions.analysis_res.pywill based on the .npy file obtained from the previous step, and output all the neuron information, like (1,2) means the neuron for index 2 in layer 1.
Run evaluation_bias_all.py to evaluate the debiasing performance and general capabilities for the LLM by enhancing these know-bias neurons λ.