Inspiration
Open web is dying (more like being killed), useful information communities online is being hidden in the silos. As a long time contributor to open online communities I want to make them sustainable again. To do that I want to automatically convert useful information, usually Question and Answer exchanges from closed platforms like Discord and publish it on open web (with original posters approval of course.).
What it does
To make this happen we need some way of doing complicated NLP work: finding conversation threads in unstructured chats, extracting metadata of the exchanges (who started the thread, what is it about, was it answered successfully), converting those exchanges into a static webpage. Claude 2 is great for that with it's huge context window of 100k tokens. After generating content we can serve them using Hugo.
How we built it
Discord chat downloader in docker, testing prompts in webapp, asking Claude 2 for jq programs, tuning prompts for markdown text formatting. Several tools and a lot of testing in jupyterlab.
Challenges we ran into
Wanted to get analog of openAI function calls in Claude 2, only way I found of doing that was in langchain-experimental package, and it didn't work. Had to find workarounds with parsing model completion output in python.
Accomplishments that we're proud of
It kind'a works.
What we learned
Testing output data schema is not easy even with framework packages like langchain.
What's next for Knowhow-Liberator
More testing, more prompt tuning for reliability. the script can be deployed as workflow automation in github or pipedream / zappier .
Log in or sign up for Devpost to join the conversation.