Skip to content

[Feature] Add Model Hooks for Accessing and Customizing Model Activations #3266

@shuyhere

Description

@shuyhere

Checklist

Motivation

Description

It would be beneficial to introduce model hooks that allow users to access and modify model activations. This feature would enable greater flexibility for tasks such as visualization, debugging, and custom processing of intermediate representations.

Use case

  • Extract intermediate outputs for interpretability analysis, such as LogitLens-style investigations.
  • Expose internal activations, enabling users to cache activations and implement functions to edit, remove, or replace them dynamically during inference, for example representation engineering.

While this may introduce some performance overhead, it would enhance interpretability research and enable efficient model editing.

Related resources

model hook resources

related issues and use case

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions