Market manipulation is not a victimless crime. It erodes the foundational trust that financial markets depend on to function efficiently, and its consequences ripple far beyond the trading floor.
-
Retail investors lose billions annually to pump-and-dump schemes, where coordinated social media campaigns inflate stock prices before insiders sell off, leaving ordinary investors holding worthless shares. The U.S. Securities and Exchange Commission estimates that microcap fraud alone costs investors $3โ5 billion per year.
-
The GameStop saga (January 2021) demonstrated how social media-driven momentum on platforms like Reddit's r/WallStreetBets can create extreme volatility. While some profited, many retail investors who bought at the peak lost over 80% of their investment within weeks.
-
Cryptocurrency pump-and-dump schemes have become rampant, with researchers at the University of Texas finding that 80%+ of ICOs in 2017โ2018 showed signs of manipulation, resulting in estimated losses exceeding $1 billion.
-
Confidence erosion: A CFA Institute survey found that 56% of retail investors believe markets are rigged against them, leading to reduced market participation and a widening wealth gap.
Traditional manipulation detection relies on regulatory bodies with limited resources. By the time enforcement actions are taken, the damage is already done. Real-time, AI-powered screening tools like Sussy Scanner aim to give everyday investors the ability to see the same red flags that institutional risk managers monitor, before they become casualties.
"Sunlight is said to be the best of disinfectants." โ Louis Brandeis, U.S. Supreme Court Justice
Sussy Scanner is a full-stack market manipulation detector that cross-references real-time stock data with social media activity to flag suspicious behavior patterns. Built for the Hack Brooklyn hackathon, it combines multi-source data ingestion, statistical feature engineering, and Google Geminiโpowered AI analysis to produce an explainable, composite risk score for any publicly traded stock.
| Feature | Description |
|---|---|
| ๐ Multi-Score Risk Engine | Five independent sub-scores (Pump Risk, Social Hype, Liquidity Stress, Technical Fragility, Squeeze Pressure) combined into a weighted composite |
| ๐ค AI-Powered Analysis | Google Gemini analyzes social media posts for promotional language, hype signals, and coordinated campaigns |
| ๐ Interactive Charts | TradingView lightweight charts for candlestick/line visualization + Recharts for risk signals and narrative breakdowns |
| ๐ Similarity Engine | Compares current stock behavior against a database of known historical pump events (GME, AMC, DWAC, BBBY, SMCI) |
| ๐ง Explainable Scoring | Every score includes top contributing features and AI-generated natural language narratives |
| ๐ฐ News Cross-Referencing | Checks whether price moves are backed by credible news or appear "unexplained" |
| ๐ฌ Social Media Aggregation | Aggregates posts from multiple sources (Tavily search, Reddit) with sentiment and hype classification |
| โฑ๏ธ Smart Caching | SQLite-backed caching layer with TTL to avoid redundant API calls and speed up repeat analyses |
| ๐๏ธ Preset Historical Events | One-click analysis of famous manipulation events (GameStop squeeze, AMC rally, Trump SPAC, etc.) |
The following diagram traces the complete data flow from the moment a user enters a stock ticker to the final rendered analysis dashboard:
flowchart TB
User["๐ค User"]
subgraph Frontend ["๐ Frontend โ React + Vite"]
direction LR
FE1["๐ Ticker Search"]
FE2["๐ก API Client<br/>(Axios)"]
FE3["๐ Dashboard Grid<br/>Charts + Panels"]
FE4["๐ฏ Risk Scores<br/>& Narratives"]
FE5["๐ TradingView<br/>Candlestick Charts"]
FE1 -->|"build request"| FE2 -->|"render components"| FE3 -->|"display scores"| FE4
FE3 -->|"render chart"| FE5
end
subgraph Backend ["๐ฅ๏ธ Backend โ Express + Node.js"]
direction LR
BE1["๐ API Router<br/>+ SQLite Cache"]
BE2["๐ Market Data Fetcher<br/>Yahoo Finance ยท Finnhub"]
BE3["๐ฌ Social Aggregator<br/>Tavily ยท Reddit ยท News"]
BE4["๐งฎ Feature Vector Builder<br/>Price ยท Tech ยท Liquidity ยท Social ยท Squeeze"]
BE5["๐ฏ Scoring Engine<br/>5 Sub-Scores โ Composite"]
BE6["๐ค Gemini Post Analyzer<br/>Hype ยท Sentiment ยท Narratives"]
BE7["๐ Narrative Generator<br/>Score Explanations"]
BE8["๐ Similarity Engine<br/>Historical Pump Matching"]
BE1 -->|"fetch market data"| BE2
BE2 -->|"fetch social posts"| BE3
BE3 -->|"compute features"| BE4
BE4 -->|"generate scores"| BE5
BE5 -->|"classify posts"| BE6
BE6 -->|"explain scores"| BE7
BE7 -->|"match history"| BE8
BE8 -->|"assembled analysis"| BE1
end
subgraph External ["๐ External APIs"]
direction TB
EX1["Yahoo Finance<br/>Price ยท Volume ยท Stats"]
EX2["Tavily Search<br/>Social Media Posts"]
EX3["Reddit<br/>r/wallstreetbets"]
EX4["Google Gemini<br/>AI Analysis"]
EX5["Finnhub<br/>Quotes ยท News"]
EX1 --- EX2 --- EX3 --- EX4 --- EX5
end
%% User โ Frontend
User -->|"enters ticker<br/>e.g. GME"| FE1
FE3 -.->|"views dashboard"| User
%% Frontend โ Backend
FE2 -->|"GET /api/analysis/:symbol"| BE1
BE1 -->|"JSON scores + narratives"| FE3
%% Backend โ External
BE2 -.->|"price, volume, stats"| EX1
BE3 -.->|"social media posts"| EX2
BE3 -.->|"reddit discussions"| EX3
BE2 -.->|"quotes, news"| EX5
BE6 -.->|"classify posts, sentiment"| EX4
BE7 -.->|"generate narratives"| EX4
%% Styling
style Frontend fill:#dbeafe,color:#1e3a5f,stroke:#7aa2f7
style Backend fill:#dcfce7,color:#14532d,stroke:#4ade80
style External fill:#fee2e2,color:#7f1d1d,stroke:#f87171
The composite manipulation risk score is a weighted average of five independent sub-scores, each ranging from 0โ100:
โโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Sub-Score โ Weight โ Key Features โ
โโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ ๐ด Pump Risk โ 30% โ Volume z-scores, price gaps, ROC accel. โ
โ ๐ Social Hype โ 25% โ Mention velocity, hype score, spam ratio โ
โ ๐ต Liquidity Stress โ 20% โ Float, ADV, market cap, inst. ownership โ
โ ๐ก Technical Fragilityโ 15% โ RSI, SMA distance, Bollinger breaches โ
โ ๐ฃ Squeeze Pressure โ 10% โ Short % float, days-to-cover, P/C ratio โ
โโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
| Band | Score Range | Color |
|---|---|---|
| ๐ข LOW | 0โ25 | Green |
| ๐ก MEDIUM | 25โ50 | Yellow |
| ๐ HIGH | 50โ75 | Orange |
| ๐ด CRITICAL | 75โ100 | Red |
The composite applies a stacking floor (when 2+ sub-scores are HIGH, the composite can't fall below their average) and a correlation cap (no single outlier can push the composite more than 15 points above the max sub-score) to prevent dilution or exaggeration.
sussy-scanner/
โโโ ๐ client/ # React + Vite frontend
โ โโโ src/
โ โ โโโ components/
โ โ โ โโโ AnalysisHeader.jsx # Symbol, date, score badge
โ โ โ โโโ DashboardGrid.jsx # Main analysis layout
โ โ โ โโโ TradingViewChart.jsx # Candlestick / line chart
โ โ โ โโโ SignalsPanel.jsx # Expandable risk scores
โ โ โ โโโ NarrativeMixChart.jsx # Social narrative donut
โ โ โ โโโ SocialPanel.jsx # Social media posts feed
โ โ โ โโโ NewsPanel.jsx # News articles panel
โ โ โ โโโ KeyStatsPanel.jsx # Feature value table
โ โ โ โโโ ScoreCard.jsx # Individual score display
โ โ โ โโโ SimilarityCallout.jsx # Historical match display
โ โ โ โโโ ...
โ โ โโโ routes/
โ โ โ โโโ Landing.jsx # Home page with search
โ โ โ โโโ Analysis.jsx # Analysis dashboard
โ โ โโโ services/
โ โ โ โโโ api.js # Axios API client
โ โ โโโ styles/
โ โ โโโ tokens.css # Design tokens
โ โโโ package.json
โ
โโโ ๐ server/ # Express backend
โ โโโ index.js # App entry + route definitions
โ โโโ db.js # SQLite connection
โ โ
โ โโโ ๐ ingestion/ # Data source adapters
โ โ โโโ marketData.js # Unified market data facade
โ โ โโโ yahooFinance.js # Yahoo Finance quote/stats
โ โ โโโ yahooChart.js # Yahoo Finance OHLCV history
โ โ โโโ finnhub.js # Finnhub quote/news adapter
โ โ โโโ newsService.js # News aggregation
โ โ โโโ tavilySearch.js # Tavily social media search
โ โ โโโ redditSearch.js # Reddit post search
โ โ โโโ timestampEnricher.js # Post timestamp normalization
โ โ
โ โโโ ๐ features/ # Feature engineering modules
โ โ โโโ featureVector.js # Feature assembly + ordering
โ โ โโโ priceVolumeFeatures.js # Volume z-scores, returns
โ โ โโโ technicalFeatures.js # RSI, SMA, Bollinger Bands
โ โ โโโ liquidityFeatures.js # Float, ADV, market cap
โ โ โโโ squeezeFeatures.js # Short interest, options
โ โ โโโ socialFeatures.js # Mentions, hype, sentiment
โ โ โโโ stats.js # Statistical helpers
โ โ
โ โโโ ๐ services/ # Business logic
โ โ โโโ explainability.js # Main analysis orchestrator
โ โ โโโ compositeScore.js # Weighted score aggregation
โ โ โโโ pumpRiskScore.js # Pump & dump detection
โ โ โโโ socialHypeScore.js # Social media hype scoring
โ โ โโโ liquidityScore.js # Liquidity stress scoring
โ โ โโโ techFragilityScore.js # Technical fragility scoring
โ โ โโโ squeezeScore.js # Short squeeze scoring
โ โ โโโ scoringHelpers.js # Severity + band utilities
โ โ โโโ geminiClient.js # Gemini API client
โ โ โโโ geminiAnalyzer.js # AI post classification
โ โ โโโ geminiNarrator.js # AI narrative generation
โ โ โโโ similarityEngine.js # Historical event matching
โ โ โโโ cache.js # SQLite TTL cache
โ โ
โ โโโ ๐ data/ # Reference datasets
โ โ โโโ pump_anchors.json # Known pump event features
โ โ โโโ preset_squeeze.json # Preset analysis events
โ โ
โ โโโ package.json
โ
โโโ .env.example # Environment template
โโโ package.json # Root workspace config
- Node.js โฅ 18
- npm โฅ 9
- API keys for:
- Tavily (required โ social media search)
- Google AI Studio (required โ Gemini API)
- Finnhub (optional โ enhanced market data)
# 1. Clone the repository
git clone <repo-url>
cd "Hack Brooklyn"
# 2. Install all dependencies (root + server + client)
npm run install:all
# 3. Configure environment variables
cp .env.example .env
# Edit .env with your API keys
# 4. Start development servers (both client + server)
npm run devThe app will be available at:
- ๐ฅ๏ธ Frontend:
http://localhost:5173(Vite dev server) - โ๏ธ Backend:
http://localhost:3001(Express API)
| Variable | Required | Description |
|---|---|---|
TAVILY_API_KEY |
โ | Tavily search API key for social media ingestion |
GEMINI_API_KEY |
โ | Google AI Studio API key for Gemini/Gemma model |
GEMINI_MODEL |
โ | Model ID (default: gemma-3-27b-it) |
FINNHUB_API_KEY |
โ | Finnhub API key for enhanced market data |
PORT |
โ | Server port (default: 3001) |
CACHE_DB_PATH |
โ | SQLite cache path (default: ./cache.sqlite) |
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/health |
Health check + model info |
GET |
/api/analysis/:symbol |
๐ฅ Full analysis โ scores, features, AI narratives, similarity |
GET |
/api/similarity/:symbol |
Historical pump event similarity match |
GET |
/api/timeline/:symbol |
Price + volume + social velocity time series |
GET |
/api/presets |
List of preset historical events |
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/stock/quote/:symbol |
Real-time quote |
GET |
/api/stock/history/:symbol |
OHLCV history (?period1=&period2=&interval=1d) |
GET |
/api/stock/stats/:symbol |
Key statistics (market cap, float, etc.) |
GET |
/api/stock/search?q= |
Symbol search / autocomplete |
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/social/:symbol |
Social media posts (?date=&window=7) |
GET |
/api/news/:symbol |
News articles (?date=&window=7) |
All endpoints support an optional date query parameter for historical analysis (e.g., ?date=2021-01-27).
The app ships with pre-configured famous market events for instant analysis:
| Ticker | Date | Event |
|---|---|---|
| ๐ฎ GME | 2021-01-27 | GameStop short squeeze |
| ๐ฌ AMC | 2021-06-02 | AMC ape rally |
| ๐๏ธ DWAC | 2021-10-22 | Trump SPAC pump |
| ๐๏ธ BBBY | 2022-08-16 | Bed Bath & Beyond meme revival |
| ๐ป SMCI | 2024-03-08 | Super Micro AI spike |
- Social Post Collection โ Tavily and Reddit APIs gather recent social media posts mentioning the stock
- Pre-filtering โ Posts are pre-screened using regex patterns for clearly benign (earnings, SEC filings) or clearly suspicious ("to the moon", "diamond hands") language
- Gemini Batch Analysis โ Remaining posts are sent to Google Gemini in batches of 5, classified for:
- Promotional language detection
- Hype score (0โ1)
- Urgency signals
- Sentiment (bullish/bearish/neutral)
- Narrative categorization (meme_hype, short_squeeze, coordination_signal, etc.)
- Narrative Generation โ Gemini generates one-line explanations for each risk sub-score
- Similarity Matching โ The assembled feature vector is compared against known historical pump events using cosine similarity
This project was built for Hack Brooklyn and is intended for educational and research purposes. Market data is sourced from public APIs; this tool does not constitute financial advice.
Built with ๐ by the Sussy Scanner team @ Hack Brooklyn

