Skip to content

Improve block weighting with uniform and hat functions#147

Merged
markus583 merged 1 commit intosegment-any-text:mainfrom
lsorber:main
Jan 18, 2025
Merged

Improve block weighting with uniform and hat functions#147
markus583 merged 1 commit intosegment-any-text:mainfrom
lsorber:main

Conversation

@lsorber
Copy link
Copy Markdown
Contributor

@lsorber lsorber commented Jan 2, 2025

This PR makes the current uniform weighting scheme explicit, and adds an improved hat weighting scheme.

The rationale behind hat weighting is that predictions for tokens near the beginning or end of the block will be less accurate than predictions for tokens near the middle of the block, where the model has maximal context.

For instance, let's say we use stride=128 and block_size=256 and compare the predictions for the token with index 128:

  1. With uniform weighting, its prediction will be 0.5 * first_block[128] + 0.5 * second_block[0].
  2. With hat weighting, its prediction will (approximately) be 1 * first_block[128] + 1/256 * second_block[0].

In this example, hat weighting is preferable because the first token of the second block is likely to be much less accurate than the middle token of first block.

Anecdotally, I've also observed that hat weighting improves output quality on test data.

@markus583
Copy link
Copy Markdown
Collaborator

Hi! Thanks a lot for implementing this. Interesting idea, cool stuff! It intuitively makes sense, but I'm unsure if it makes a practical difference. It would be interesting to test it on some benchmarks. For the time being, I'd be happy to add it as a feature and leave the default to uniform. Would you agree @bminixhofer?

@bminixhofer
Copy link
Copy Markdown
Collaborator

LGTM!

I've tried this idea a while ago (when I was working on the original WtP) and didn't see improvements on benchmarks, but maybe it helps on other model / benchmark combinations. I agree that intuitively it makes total sense.

So let's add it and leave the default to uniform as you suggested @markus583.

@lsorber
Copy link
Copy Markdown
Contributor Author

lsorber commented Jan 13, 2025

Thanks for the reviews @markus583 @bminixhofer! For your information, I created an inference-only version of wtpsplit called wtpsplit-lite with minimal dependencies to make it easier to integrate SaT into projects that only need inference. Thanks for your work!

@markus583
Copy link
Copy Markdown
Collaborator

Cool, thanks for letting us know. Thanks and keep up the good work! :)

@markus583 markus583 merged commit 5902e7e into segment-any-text:main Jan 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants