Read subscribed channels -> apply a Markdown profile -> ship a self-contained HTML report.
Built for job leads, airdrop watchlists, market/news tracking, and any Telegram workflow where the problem is too many channels and too little signal.
δΈζζζ‘£ Β· Demo Β· Quick Start Β· Report Output Β· Roadmap Β· Safety
| Profile-driven Plain Markdown profiles define what counts as a match, reject, or follow-up. |
Cutoff-aware Telethon reads through MTProto and stops as soon as messages fall outside your time window. |
Report-ready Generate a single HTML file with semantic labels, source links, raw context, and stats. |
demo-optimized.mp4
49s walkthrough preview. Source MP4: docs/demo.mp4.
- Python 3.12+
- Telegram account (phone number)
- Telegram API credentials (
api_id+api_hashfrom my.telegram.org/apps)
git clone https://github.com/Sapientropic/tg-channel-scanner.git
cd tg-channel-scanner
chmod +x setup.sh tgcs scripts/scan.sh
./setup.sh# 0. Try the offline demo first (no Telegram login or LLM key required)
./tgcs demo
# Writes output/demo-report.html
# 1. Edit config with your Telegram API credentials
# (setup.sh created it at ~/.config/tgcli/config.toml)
nano ~/.config/tgcli/config.toml
# 2. Check first-run prerequisites for the developer-opportunity starter
# setup.sh already initialized local .tgcs defaults with --starter jobs
./tgcs quickstart jobs
./tgcs doctor --profile jobs
# 3. Complete Telegram login once
./tgcs login
# 4. Run the jobs-fast monitor once without sending alerts
./tgcs monitor run --profile-id jobs-fast --delivery-mode dry-runOn Windows, use tgcs.bat instead of ./tgcs. The human facade defaults to
the local .tgcs/config.toml profile, .tgcs/sources.json, output/, HTML
output, and v0.4 local decision memory at .tgcs/state. setup.* initializes
the jobs starter by default; use tgcs run --no-state when you need a stateless
daily-report run.
The v0.5-alpha monitor keeps the CLI-first workflow and adds repeated-run state, alert events, and a local review inbox:
# Write .tgcs/profiles.toml if you want an editable monitor config
./tgcs monitor init-config
# For the developer-opportunity lane, initialize directly from channel_lists/jobs.txt
# Existing .tgcs/sources.json files are kept and merged with the jobs topic tag.
./tgcs init --starter jobs
# Show the one current next action for the jobs starter
./tgcs quickstart jobs
# Run one profile monitor; dry-run delivery is the safe default
./tgcs monitor run --profile-id market-news --delivery-mode dry-run
# Fast developer opportunity alerts: run this from Task Scheduler/cron every 15 minutes
./tgcs monitor run --profile-id jobs-fast --delivery-mode live
# Import real opportunity channels into the jobs-fast lane
./tgcs sources import channel_lists/jobs.txt --topic jobs
# Review or export only one profile lane's sources
./tgcs sources list --topic jobs
./tgcs sources export --topic jobs --output output/jobs-sources.txt
# Print a scheduler command without installing it
./tgcs schedule print --profile-id jobs-fast --interval-minutes 15 --delivery-mode live
# Serve the optional localhost dashboard; first launch auto-builds dashboard/dist
./tgcs dashboard
# Export reviewed dashboard decisions back into reusable report feedback
./tgcs feedback exportMonitor runs write artifacts under output/runs/<run_id>/, update a
run_manifest_v1, and store dashboard state in .tgcs/tgcs.db. High-priority
new or changed items become alert candidates and pending review cards. Telegram
Bot delivery reads TGCS_TELEGRAM_BOT_TOKEN from the environment and never
stores the token in SQLite, manifests, or docs.
The built-in jobs-fast monitor keeps developer opportunity alerts separate from the daily audit
report. It scans a 2-hour catch-up window, but only interrupts for high-priority
new or changed roles, contracts, freelance gigs, or Mini Apps/TON projects whose source message is within the last 60 minutes. The
high-frequency path first applies a local keyword prefilter, so runs with no
opportunity-signal keywords skip the report/LLM stage entirely. The dashboard can switch
each profile between work-hours alerts, all-day alerts, and muted delivery, and
its Yield History and Source Actions panels help review which job channels
produce fresh messages, which sources produce high-value leads, which sources
need more observation, and which noisy sources may be prune candidates.
Import real opportunity sources with ./tgcs sources import <channel-list> --topic jobs;
the import also adds the topic to existing matching sources, so jobs-fast
will keep using a topic-filtered registry instead of silently falling back to
placeholder sources.
./tgcs doctor also checks whether dashboard assets are already built. Missing
assets are only a warning because ./tgcs dashboard can build them on first
launch when Node/npm is available.
Dashboard keep/skip/false-positive decisions can be exported from Settings or
as note-free tgcs-feedback-v1 JSONL with ./tgcs feedback export, then reused
by the decision-memory path through --feedback-jsonl output/feedback/review-feedback.jsonl.
When the latest run has actionable review cards, the first screen opens directly
on the queue and triage controls. Otherwise the board keeps the latest-run
summary compact: one human task label, one action/all-clear/source-fix headline,
and a scanned -> matched -> cards -> action funnel instead of repeating the full
report in prose.
The Runs tab also opens generated reports through a local-only artifact route
restricted to report Markdown/HTML files under workspace-local runs/
directories. Monitor reports use human-readable names such as
developer-opportunity-signal-report-2026-05-09-1225.html, while the dashboard
displays the profile report title/category instead of raw absolute paths or the
high-frequency lane id. When a run has both Markdown and HTML reports, the
dashboard opens HTML by default for phone-friendly reading; Markdown-only
reports are rendered through the same local route instead of shown as raw text.
Dashboard state projects runs down to counts, health, a human task label, and
one report artifact, and projects profiles down to display labels plus
alert/source limits; raw scan artifacts, full profile config, registry paths,
hashes, and error files stay in local manifests for debugging. The Dashboard is
kept ADHD-friendly: top metrics are compact readouts, repository operations stay
in Settings instead of every board, Inbox uses a triage distribution bar, and
Runs use a fixed seven-day health chart plus a capped evidence ledger instead of
an ever-growing row of run cells or repeated report titles.
For the interrupt lane, jobs-fast caps semantic extraction at 20 matched
messages and 2000 output tokens. Keep the daily audit/backfill lane for
exhaustive review over larger windows.
./tgcs schedule print only prints a Windows Task Scheduler or cron command for
review; it does not create a system task by itself.
When OPENAI_API_KEY is not configured and DEEPSEEK_API_KEY is present,
semantic extraction defaults to deepseek-v4-flash with thinking disabled and
JSON output requested, even if MiniMax is also configured. MiniMax M2.7 is also
supported through the official OpenAI-compatible endpoint: set
MINIMAX_TOKEN_PLAN_KEY for a Token Plan key or MINIMAX_API_KEY for a
standard platform key. Token Plan keys default to the China-region endpoint
https://api.minimaxi.com/v1; standard platform keys default to
https://api.minimax.io/v1. Set MINIMAX_BASE_URL when your account needs an
explicit endpoint override. Use the local eval to compare provider latency,
JSON reliability, and aggregate token
usage on your history without copying raw Telegram text into the result
artifact. Workspace-local input paths are stored as relative paths; external
input paths are reduced to file names and disambiguated with a short hash only
when duplicate basenames would collide:
python scripts/eval_deepseek_cache.py --sample-size 20 --repeat 3 --format json
python scripts/eval_deepseek_cache.py --sample-sizes 10,20,30 \
--models deepseek-v4-flash,MiniMax-M2.7 --repeat 1 --max-tokens 1000 --format jsonThe repository also ships a root SKILL.md and a structured
agent CLI contract. The short tgcs command is
for humans. Agents should prefer the explicit JSON contract and the private
source registry at .tgcs/sources.json:
python scripts/source_registry.py import-list channel_lists/example.txt \
--source-registry .tgcs/sources.json --format json
python scripts/source_registry.py import-list channel_lists/jobs.txt \
--source-registry .tgcs/sources.json --topic jobs --format json
python scripts/doctor.py --source-registry .tgcs/sources.json \
--profile profiles/templates/market-news.md --output-dir output --format json
python scripts/scan.py --source-registry .tgcs/sources.json --hours 24 \
--output output/scan.jsonl --format json
python scripts/report.py --input output/scan.jsonl \
--profile profiles/templates/market-news.md \
--output output/report.md --html-output output/report.html \
--source-registry .tgcs/sources.json --format json
# Optional v0.4 decision memory and feedback import
python scripts/report.py --input output/scan.jsonl \
--profile profiles/templates/market-news.md \
--items-json output/extracted-items.json \
--output output/report.md --html-output output/report.html \
--source-registry .tgcs/sources.json \
--state-dir .tgcs/state \
--feedback-jsonl output/report-feedback.jsonl \
--format json
# Optional v0.5-alpha monitor state, manifest, inbox, and alert events
python scripts/monitor.py run --profile-id market-news \
--delivery-mode dry-run --format json
python scripts/monitor.py feedback-export \
--db .tgcs/tgcs.db --output output/feedback/review-feedback.jsonl --format jsonIf no LLM provider key exists, report.py --extractor auto returns
agent_extraction_required; the agent can read the local extraction request,
write semantic_items_v1, then rerun report.py with --items-json.
Passing --state-dir .tgcs/state turns on local decision intelligence:
items are marked as new, seen, changed, recurring, or expired across runs.
The state file stores only item keys, source refs, counters, fingerprints,
rating history, and feedback counts. It does not store raw Telegram message
text or feedback note bodies.
# Past 24 hours (default)
./scripts/scan.sh channel_lists/example.txt
# Past 7 days
./scripts/scan.sh channel_lists/example.txt 168
# From a precise ISO-8601 cutoff
./scripts/scan.sh channel_lists/example.txt --since 2026-05-06T07:30:00ZThe scanner uses Telethon (MTProto) with iter_messages and early termination β it stops as soon as it hits a message older than your cutoff. No over-fetching.
Environment variables
SCAN_INITIAL_LIMIT=200 # initial read limit per channel
SCAN_MAX_LIMIT=5000 # hard cap before reporting incomplete
SCAN_DELAY=1 # seconds between channels
SCAN_MAX_FLOOD_WAIT_SECONDS=300
TG_SCANNER_CONFIG_DIR=~/.config/tgclipython scripts/export_folder.py --list
python scripts/export_folder.py --folder "Jobs" --output channel_lists/jobs.txt# Human default: market-news + HTML + .tgcs/state
./tgcs run
# Human alternate profile
./tgcs run --profile jobs --hours 72
# Markdown + HTML report
python scripts/daily_report.py channel_lists/example.txt \
--profile profiles/example.md --html
# Custom LLM endpoint (DeepSeek, Ollama, etc.)
# If only DEEPSEEK_API_KEY is set, these DeepSeek defaults are selected automatically.
python scripts/report.py --input output/scan_XXXX.jsonl \
--profile profiles/example.md \
--base-url https://api.deepseek.com/v1 --model deepseek-chat
# Redact contact info before sending to LLM
python scripts/report.py --input output/scan_XXXX.jsonl \
--profile profiles/example.md --redact-contact-info
# Preview prompt without calling LLM
python scripts/report.py --input output/scan_XXXX.jsonl \
--profile profiles/example.md --dry-run-prompt output/prompt-preview.mdThe generated report is designed to be read as a decision surface: what matters, why it matched, where it came from, and whether it deserves action.
![]() Retro-pixel masthead, scan metadata, and dashboard counters. |
![]() Ranked cards with action labels, rationale, source chips, and raw message access. |
The HTML report is a single portable file with inline CSS, JS, and icon assets: premium retro-pixel styling, light/dark themes, dashboard counters, scroll-parallax cards, expandable raw messages, and Telegram deep links. Web fonts are an optional enhancement; system fallbacks keep the report readable offline.
Scheduling examples
# cron: every day at 09:00
0 9 * * * cd /path/to/tg-channel-scanner && .venv/bin/python scripts/daily_report.py channel_lists/example.txt --profile profiles/example.mdREM Windows Task Scheduler
cmd /c "cd /d C:\path\to\tg-channel-scanner && .venv\Scripts\python.exe scripts\daily_report.py channel_lists\example.txt --profile profiles\example.md"Free-form AI summary & Media OCR
Free-form summary (no fixed layout, just a digest):
python scripts/summarize.py --input output/scan_XXXX.jsonl --profile profiles/example.mdMedia OCR/STT (off by default):
# xAI vision
export XAI_API_KEY=your-key
./scripts/scan.sh channel_lists/example.txt --ocr --ocr-provider xai
# OpenAI vision
export OPENAI_API_KEY=sk-your-key
./scripts/scan.sh channel_lists/example.txt --ocr --ocr-provider openai
# Custom endpoint
./scripts/scan.sh channel_lists/example.txt --ocr --ocr-provider custom \
--ocr-base-url http://localhost:11434/v1 --ocr-model your-vision-modelVideo OCR is thumbnail-first by default, including standalone reprocessing with
python scripts/ocr_media.py. Use --ocr-full-video during scans, or
--full-video with ocr_media.py, only when you explicitly want full-video
processing. Full-video mode requires ffmpeg and can send extracted frames,
audio, or transcripts to the selected OCR/STT provider, so review privacy and
cost before enabling it.
graph LR
A["π± Telegram<br>Channels"] -->|MTProto| B["π Scanner<br>scan.py"]
B -->|"JSONL + meta"| C["π€ LLM or Agent<br>Semantic Extraction"]
C -->|"Structured JSON"| D["π Report<br>report.py"]
D --> E["π Markdown"]
D --> F["π¨ HTML Report"]
style A fill:#26A5E4,color:#fff
style B fill:#3776AB,color:#fff
style C fill:#14B8A6,color:#fff
style D fill:#22C55E,color:#fff
style E fill:#64748B,color:#fff
style F fill:#F59E0B,color:#fff
- Read β Telethon reads messages from your subscribed channels
- Filter β Precise timestamp cutoff with early termination
- Save β JSONL +
.meta.jsonsidecar - Report β LLM or agent semantic extraction -> Python renders stats + Markdown/HTML
Data contract: each scanned message carries a stable message_ref (channel + id).
Reports ask the LLM for source_message_refs and use that channel-scoped key for raw
message lookup; source_message_ids is kept only for older JSONL/report compatibility.
The daily pipeline passes an explicit scan --output path into report.py, so a report
cannot silently reuse an older scan_*.jsonl from the output directory.
If no LLM key is configured, the same report flow can hand semantic extraction to the
calling agent through the local agent_extraction_request_v1 / semantic_items_v1
contract documented in docs/agent-cli-contract.md.
Start from a built-in template, or copy profiles/example.md for the legacy job-focused sample:
cp profiles/templates/jobs.md profiles/my-profile.md
cp profiles/templates/airdrops.md profiles/my-airdrops.md
cp profiles/templates/market-news.md profiles/my-market-news.mdAvailable templates: jobs, airdrops, market/news, research leads, and competitor monitoring.
Edit the copied profile:
## Candidate
- Role: Frontend Developer
- Stack: React, TypeScript, Next.js
- Level: Middle/Senior
- Location: Remote preferred
## Filter Rules
- Only include jobs from last 24 hours
- Remove duplicates (same company + title)
- Exclude: Backend-only, Mobile, DevOps...Custom modes (airdrops, news, events) add ## Extraction Schema, ## Extraction Prompt, and ## Report Labels sections. See profiles/example-airdrop.md.
Create a .txt in channel_lists/ with Telegram usernames (not display names), one per line:
remote_italic
dev_jobs_remote
react_jobs
Find a channel's username: open in Telegram β tap name β look for @username.
Or export directly from Telegram: python scripts/export_folder.py --folder "Jobs" --output channel_lists/jobs.txt
For agent-maintained source operations, prefer a private source registry over
editing channel lists in place. .tgcs/ is gitignored by default, so real
source notes and priorities stay local:
python scripts/source_registry.py import-list channel_lists/example.txt \
--source-registry .tgcs/sources.json --format json --dry-run
python scripts/source_registry.py import-list channel_lists/example.txt \
--source-registry .tgcs/sources.json --format json
python scripts/source_registry.py list \
--source-registry .tgcs/sources.json --format jsonLegacy channel_lists/*.txt commands remain supported. See
docs/source-registry.example.json for the
schema shape.
tg-channel-scanner/
βββ SKILL.md # Agent-facing operating guide
βββ agents/openai.yaml # Skill metadata for agent installers
βββ tgcs / tgcs.bat # Human-friendly command facade
βββ config.example.toml # Template (actual config at ~/.config/tgcli/)
βββ requirements.txt # telethon
βββ requirements-llm.txt # optional summarizer deps
βββ setup.sh / setup.bat # One-command installer
βββ dashboard/ # Optional Vite React localhost dashboard
βββ profiles/ # Filter profiles
β βββ templates/ # Built-in starter profiles
βββ channel_lists/ # Channel name lists
βββ scripts/
β βββ agent_cli.py # JSON envelope and exit-code helpers
β βββ tgcs.py # Human-friendly command facade implementation
β βββ scan.py # Scanner core (Telethon)
β βββ source_registry.py # Source registry import/list/export/validate
β βββ export_folder.py # Export from Telegram folders
β βββ report.py # Report generator (Markdown + HTML)
β βββ report_diagnostics.py # Empty-result and scan-health diagnostics
β βββ doctor.py # First-run environment checks
β βββ daily_report.py # Scan + report pipeline
β βββ monitor.py # v0.5-alpha profile monitor runner
β βββ monitor_state.py # SQLite state for inbox/alerts/profile diffs
β βββ delivery.py # Delivery adapters
β βββ dashboard_server.py # Localhost dashboard API/static server
β βββ summarize.py # Free-form LLM summary
βββ templates/
β βββ report-job.html # Job report HTML shell
β βββ report-generic.html # Custom mode HTML shell
β βββ report-shared.css # Shared inline report styling
β βββ report-theme.js # Shared inline theme/motion behavior
βββ output/ # gitignored
βββ docs/
βββ agent-cli-contract.md # Agent JSON contract and fallback schemas
βββ demo.mp4 # Full product demo video, kept under 10 MB for GitHub uploads
βββ demo/ # HyperFrames demo source and maintenance notes
βββ licensing.md # AGPL + commercial licensing policy
βββ report-design-context.md # Report UI design constraints
βββ screenshots/ # Report screenshots
- Reads only from channels you've subscribed to
- Respects
FloodWaitErrorβ no API abuse - Use your real account, not a new/virtual number
- Do not use Telegram data for AI training, resale, or bulk harvesting
See docs/tos-risk-analysis.md for details.
| Problem | Fix |
|---|---|
ModuleNotFoundError: telethon |
source .venv/bin/activate |
.sh scripts Permission denied |
chmod +x setup.sh scripts/scan.sh |
| my.telegram.org shows ERROR | docs/getting-api-credentials.md |
| 0 messages collected | Check output/*.errors.log |
| Session expired | Run ./tgcs login again, or delete ~/.config/tgcli/session and rerun |
TG Channel Scanner is dual-licensed:
- Community License:
AGPL-3.0-only - Commercial License: available separately from Sapientropic
See docs/licensing.md for community, commercial, hosted service, and contribution rules.


