What would you like to be added?
A very fast context compression process that only trim unwanted data from chat without using AI/LLM.
Optionally, provide user a multiple choice to select what to remove.
For example:
Choose what to remove:
1. Tool calls + thinking.
2. Everything except AI last response.
3. Cancel
Basically, from my observation, AI context consist of
- user input
- thinking
- tool calls
- final reply to user
The final reply generally conclude the results of thinking and tool call. For a less accurate summary, simply pruning these two is enough in some use cases. It help speed things up for manual compress use case.
Why is this needed?
The current compression is too slow because it use AI to summarize the chat. I found myself most of the time don't really need a high quality summary to continue the work.
Sometime, I just need the conclusion which is resided in the final response in the chat to start the next phase. In this case, I can just option 2 suggested above to start a new round of work.
Some other time I just want a little more context window for a little more work. Instead of waiting a few minutes for /compress to summarize, I can use option 1 to start right away.
Additional context
It is especially helpful for local host model setup as local model generally run slower than cloud service. A choice for user to do faster but less accurate summary is very welcomed.
What would you like to be added?
A very fast context compression process that only trim unwanted data from chat without using AI/LLM.
Optionally, provide user a multiple choice to select what to remove.
For example:
Basically, from my observation, AI context consist of
The final reply generally conclude the results of thinking and tool call. For a less accurate summary, simply pruning these two is enough in some use cases. It help speed things up for manual compress use case.
Why is this needed?
The current compression is too slow because it use AI to summarize the chat. I found myself most of the time don't really need a high quality summary to continue the work.
Sometime, I just need the conclusion which is resided in the final response in the chat to start the next phase. In this case, I can just option 2 suggested above to start a new round of work.
Some other time I just want a little more context window for a little more work. Instead of waiting a few minutes for
/compressto summarize, I can use option 1 to start right away.Additional context
It is especially helpful for local host model setup as local model generally run slower than cloud service. A choice for user to do faster but less accurate summary is very welcomed.