Skip to content

bhavya632/CharacterGPT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

18 Commits
Β 
Β 
Β 
Β 

Repository files navigation

🎭 CharacterGPT: Evaluating LLM Memorization on Friends Dataset

This project explores whether Large Language Models (LLMs) can mimic characters from the TV show Friends and evaluates the degree of memorization versus generalization in their responses.

We leverage a cleaned transcript dataset, format dialogue into conversational contexts, and then benchmark model outputs using semantic similarity and text generation evaluation metrics.

πŸ”Ž Key Insights

  • LLMs do not replicate exact lines (low ROUGE).

  • LLMs capture style and semantics fairly well (higher BERTScore).

  • Indicates generalization rather than rote memorization of transcripts.

βš™οΈ Tech Stack

  • Python (Pandas, NumPy, Matplotlib, tqdm)

  • LLM APIs (OpenAI GPT-4o-mini)

  • Embedding Models: Sentence-BERT (all-MiniLM-L6-v2)

  • Evaluation Libraries:

    • sentence-transformers

    • bert-score

    • rouge-score

    • evaluate

πŸ“Œ Next Steps

  • Expand tests to all ~2,300 conversations.

  • Compare across multiple LLMs (GPT-4, Claude, LLaMA).

  • Introduce memorization checks by holding out episodes.

πŸ™Œ Acknowledgments

  • Friends dataset adapted from public transcripts.

  • Inspired by ongoing research in LLM memorization and character simulation.


I hope you found this project interesting! If you want to look at a more detailed report, please refer to - https://github.com/bhavya632/CharacterGPT/blob/8702d38dc9aa3c894afc9864e1d8648c9245825b/CharacterGPT/Final_Report.pdf.

About

Persona-based dialogue generation with ChatGPT using the Friends dataset, evaluated with semantic and stylistic metrics.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors