Some Context, Please?

Sarcasm Detection Model

The Demo of the 2D Basic and LSTM models can be found at https://some-context-please.streamlit.app/

Background

Why Detect Sarcasm?

Sentiment analysis is an effective method for turning text into meaningful feedback that can be addressed without manual review.
While accurate sentiment analysis tools exist, they are often rendered useless when predicting on sarcastic text.

Difficulty

Sarcasm detection is a difficult problem that requires 3 key elements:

The words that make up a statement
The tone of voice in which the statement was made
The context prompting the statement

Purpose and Hypothesis

As a result of the nature of sarcastic text, identical sentences made in different situations can have wildly different meanings, and determining if a sentence is sarcastic is a challenging task for even humans at times.

Word clouds from different subbreddits

r/politics Sarcastic	r/politics Sincere

r/ProgrammerHumor Sarcastic	r/ProgrammerHumor Sincere

To help combat this reality, we provide our model with that much needed context in the form of the title a comment is replying to. Furthermore, we limit predictions to one subject area (politics) to ensure that subjects and phrases specific to that area are known by the model. With this setup, our hope is that our model will be able to vary its confidence in sarcasm depending on the title a comment is made on. Serious titles will likely have more serious comments, and laughable titles will most likely be replied to with more sarcastic comments. This makes our data much closer to real-world situations, as opposed to expecting a model to determine sarcasm on a sentence alone.

Materials and Methods

Data Collection

To limit our text to one context area, and to provide context in the form of a title, we decided on using text from Reddit. We made use of the Python Reddit API Wrapper (PRAW) and restricted our search to r/politics. Sarcastic text was determined by whether a comment contained the ‘/s’ tag or not. We made the assumption that the majority of text on the subreddit was not sarcastic, and that each comment containing ‘/s’ was. Because of the limitations of the PRAW, only 100 posts were able to be scraped at a time, containing about 2,000 sarcastic comments. To maximize training data (as well as isolate in-the-wild testing data), we made use of the ChatGPT API to generate sarcastic titles and comments with this prompt:

" Please generate X examples of sarcastic text, along with the title of the post they commented on, like that found on Reddit in the politics subreddit (r/politics). Do this in the format: Title: ... Comment: … "

By looping these requests, we extracted another 2,000 artificial sarcastic comments for training.

Model and Implementation

We tried 3 different inputs for training our Sarcasm detection models:

1D concatenation of Title + comment
2D input of Title, comment
Comment alone

We also compared the results of a basic Embedding vs. LSTM approach to see if the sequence recognizing capabilities of the LSTM truly provided an advantage in detecting sarcasm.

Results

Configuration Model	Type	Train Accuracy	Validation Accuracy	Test Accuracy	In-the-wild Test Accuracy
1D Title + Comment	Basic	1	0.98	0.977	0.669
1D Title + Comment	LSTM	0.988	0.997	0.962	0.766
2D Title, Comment	Basic	1	0.98	0.967	0.613
2D Title, Comment	LSTM	0.99	0.932	0.903	0.57
Just Comment	Basic	0.999	0.909	0.995	0.561
Just Comment	LSTM	0.994	0.926	0.921	0.565

2D Basic Model	2D LSTM Model

Areas of Improvement

Lack of Data

With only 2k sentences from each class trained on, great improvements could arise from just an increase in data.

Overfitting

Modifications need to be made to the model to prevent overfitting while producing consistent results.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
images		images
models		models
tokenizers		tokenizers
LSTM_sarcasm_model.py		LSTM_sarcasm_model.py
README.md		README.md
basic_sarcasm_model.py		basic_sarcasm_model.py
demo.py		demo.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Some Context, Please?

Background

Why Detect Sarcasm?

Difficulty

Purpose and Hypothesis

Materials and Methods

Data Collection

Model and Implementation

Results

Areas of Improvement

Lack of Data

Overfitting

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Some Context, Please?

Background

Why Detect Sarcasm?

Difficulty

Purpose and Hypothesis

Materials and Methods

Data Collection

Model and Implementation

Results

Areas of Improvement

Lack of Data

Overfitting

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages