Qwen3 0.6Bwas removed because it could not handel large inputs like when it had to review the votes of other models. Some Qwen outputs may still be in the early lines of the dataset.
The Synthetic Conversations dataset is a set made up of inputs and outputs that was completely automated and generated by AI language models. I used AI models such as DeepSeek R1 Llama 70B Distil, Google's Gemini 2.0 Flash, Microsoft's Phi 3, and Qwen3-0.6B.
This is a fully automated dataset that was built from Google's Gemini 2.0 Flash AI model asking complex questions and other AI models answering those questions. I used:
DeepSeek R1 Llama 70B DistilGemini 2.0 FlashPhi 4 ReasoningQwen3 0.6B
Only the best responses are selected and added to the dataset. This is done by having all of the AI models voting on which output they think is the best without being able to vote for their own output.
The main dataset file is new_dataset.jsonl. The input and outputs are classified as:
"user_input:""output:"
If you are looking for all of the prompts that were asked to generate the outputs, look in the asked.txt file.
The Resources folder contains static images and GIF assets and the files for the tools that were used to create them. Resources/synthetic-conversations.gif and Resources/cluster-logo.png, are used as the title for this file.
The outputs folder is used to store outputs from all the AI models after they reply to the question input generated by Gemini 2.0 Flash.
vote.txt is where the AI models will write their vote. dataset_builder.py reads this to find the winner.
dataset_builder.py is the main file. When run it will begin to build the dataset.
Each AI has its own .py file which is used to interact with it via its API. (Be sure to create your own .env and add your API keys as they appear in their corresponding AI file!).
The AI's that are included in this version of the program are:
gemini.pyUsing gemini-2.0-flash the offical Gemini API and google-geai SDK.deepseek.pyRunning DeepSeek R1 Llama 70B Distil running as a DigitalOcean AI Agent. Interact with the agent with the openai SDK.phi.pyUsing an OpenRouter API and the openai SDK.
All dependencies can be found in and installed from requirements.txt.
