feat: exclude specific topics in forum supergroups#118
Conversation
Adds support for excluding individual topics inside forum-enabled supergroups without excluding the entire chat. Configured via SKIP_TOPIC_IDS env var with chat_id:topic_id format. Filtering applied in scheduled backup, gap-fill, forum topic metadata, and real-time listener (new messages + edits). Deletion handler intentionally excluded since MessageDeleted events lack topic info. Closes #117
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 41 minutes and 51 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (10)
📝 WalkthroughWalkthroughAdded forum-topic-level filtering to exclude specific topics within chats during backup. Users configure Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
🐳 Dev images published!
The dev/test instance will pick up these changes automatically (Portainer GitOps). To test locally: docker pull drumsergio/telegram-archive:dev
docker pull drumsergio/telegram-archive-viewer:dev |
- Remove redundant `if topic_id and` guard in backup loops for consistency with listener (should_skip_topic already handles None) - Move new_messages_received stat after topic filter to avoid inflating counts for skipped messages - Fix 4 pre-existing checkpoint tests broken by MagicMock truthiness (mock config.should_skip_topic returned truthy MagicMock) - Add 8 edge-case tests for config parsing (duplicates, trailing commas, extra colons, large IDs, zero topic_id, whitespace-only) - Add 3 integration tests for topic filtering in _backup_dialog - Add CHANGELOG entry under [Unreleased] - Add topic filtering section to README Chat Filtering docs - Document that topic-creating service messages may pass through filter
|
🐳 Dev images published!
The dev/test instance will pick up these changes automatically (Portainer GitOps). To test locally: docker pull drumsergio/telegram-archive:dev
docker pull drumsergio/telegram-archive-viewer:dev |
Consolidates 5 duplicated topic extraction blocks into a single extract_topic_id() function in message_utils.py. Both telegram_backup.py and listener.py now import and call this shared utility. Adds 5 unit tests for extract_topic_id covering all branches.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
🐳 Dev images published!
The dev/test instance will pick up these changes automatically (Portainer GitOps). To test locally: docker pull drumsergio/telegram-archive:dev
docker pull drumsergio/telegram-archive-viewer:dev |
1 similar comment
|
🐳 Dev images published!
The dev/test instance will pick up these changes automatically (Portainer GitOps). To test locally: docker pull drumsergio/telegram-archive:dev
docker pull drumsergio/telegram-archive-viewer:dev |
|
🐳 Dev images published!
The dev/test instance will pick up these changes automatically (Portainer GitOps). To test locally: docker pull drumsergio/telegram-archive:dev
docker pull drumsergio/telegram-archive-viewer:dev |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/telegram_backup.py (1)
663-696:⚠️ Potential issue | 🟠 MajorAdvance the sync cursor even for skipped topics.
Right now skipped messages never update
running_max_id; if the newest messages are all in excluded topics, every scheduled backup will refetch and re-skip the same range.Proposed fix
async for message in self.client.iter_messages(entity, min_id=last_message_id, reverse=True): + running_max_id = max(running_max_id, message.id) + # Skip messages belonging to excluded forum topics if self.config.should_skip_topic(chat_id, extract_topic_id(message)): continue msg_data = await self._process_message(message, chat_id) batch_data.append(msg_data) - running_max_id = max(running_max_id, message.id) @@ - if uncheckpointed_count > 0: + if uncheckpointed_count > 0 or running_max_id > last_message_id: await self.db.update_sync_status(chat_id, running_max_id, uncheckpointed_count)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/telegram_backup.py` around lines 663 - 696, Skipped messages currently bypass updating the sync cursor so running_max_id never advances; inside the loop over iter_messages (in the async for message ... block) update running_max_id for every message (use running_max_id = max(running_max_id, message.id)) before continuing when self.config.should_skip_topic(chat_id, extract_topic_id(message)) is true so that skipped forum-topic messages still advance the cursor while still not being added to batch_data or processed by _process_message/_commit_batch.tests/test_telegram_backup.py (1)
419-422:⚠️ Potential issue | 🟡 MinorFix pipeline lint failure on the
# noqadirective.CI flags
# noqa: unreachableas invalid — the code after a comma must be a lint rule code (e.g.,F841), not a free-form word. Since the intent is just to make the function an async generator, drop the code or use a comment instead.Proposed fix
- return - yield # noqa: unreachable - makes this an async generator + if False: + yield # pragma: no cover - makes this an async generator + returnOr simply:
- return - yield # noqa: unreachable - makes this an async generator + return + yield # noqa (makes this an async generator)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_telegram_backup.py` around lines 419 - 422, The async helper fake_iter currently uses "return" followed by a dummy "yield" with an invalid "# noqa: unreachable" comment; change fake_iter to be a real async generator without the bogus noqa by removing the early return and using a harmless conditional yield (e.g., "if False: yield") or placing the "yield" before returning so the function is an async generator; update the function fake_iter in tests/test_telegram_backup.py accordingly and drop the invalid "# noqa: unreachable" directive.
🧹 Nitpick comments (1)
tests/test_config.py (1)
518-519: Prefer pytest-style tests for new coverage.This new suite extends
unittest.TestCase; use pytest functions/fixtures for new test additions.As per coding guidelines,
**/*test*.py: Use pytest as the testing framework with 80% coverage target.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_config.py` around lines 518 - 519, The new test suite uses unittest.TestCase (TestSkipTopicIds) but our project requires pytest-style tests; convert the class-based TestSkipTopicIds into one or more pytest functions using pytest fixtures instead of class setup/teardown and remove the unittest.TestCase inheritance and any self.* usage; replace methods like setUp/tearDown with fixtures (or module/class-level fixtures) and rename test methods to plain functions prefixed with test_ so pytest discovers them (e.g., test_skip_topic_ids_behavior), ensuring assertions use plain assert and fixtures are referenced as function args.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docker-compose.yml`:
- Line 34: Uncomment or add the SKIP_TOPIC_IDS environment mapping in
docker-compose.yml so the container receives the .env value by default;
specifically ensure the service's environment includes the line setting
SKIP_TOPIC_IDS to ${SKIP_TOPIC_IDS:-} (or ${SKIP_TOPIC_IDS} if you prefer no
default) so values from .env are propagated into the container at startup.
In `@src/config.py`:
- Around line 371-377: The per-chat debug lines leak chat/topic identifiers (cid
and topics) via logger.debug; remove or replace the loop that logs f" Chat
{cid}: skipping topics {topics}" in the skip_topic_ids handling. Instead keep
only the aggregated logger.info summary (using total_topics and
len(self.skip_topic_ids)) or log non-identifying metrics (e.g., number of chats
affected and distribution counts) and ensure references to skip_topic_ids remain
but never emit cid or topic IDs.
In `@src/listener.py`:
- Around line 800-817: The metric semantics are inconsistent: when
self.config.listen_new_messages is False the code increments
stats["new_messages_received"] before filtering by should_skip_topic, but when
True it increments after the topic skip check; to unify semantics (count only
messages that pass topic filtering) move the early-return branch for
listen_new_messages (including the stats increment and return) to after
reply_to_top_id is computed and after the should_skip_topic check so that
extract_topic_id(message), self.config.should_skip_topic(chat_id,
reply_to_top_id) and the skip-return logic run first, then increment
stats["new_messages_received"] and return when listen_new_messages is False;
update or add a brief comment near self.config.listen_new_messages,
extract_topic_id, should_skip_topic and stats["new_messages_received"]
explaining the chosen semantics.
In `@src/message_utils.py`:
- Around line 4-18: The function extract_topic_id is missing a type annotation
for its message parameter; add a type hint to the signature (e.g., change def
extract_topic_id(message) -> int | None: to def extract_topic_id(message: Any)
-> int | None:) and import Any from typing at the top of the module, or replace
Any with the concrete message type used in the project (e.g., Telegram Message
class) if available; keep the existing logic and return type unchanged and
update any imports accordingly.
In `@src/telegram_backup.py`:
- Around line 749-752: The loop over messages using self.client.iter_messages
skips messages in excluded forum topics (config.should_skip_topic +
extract_topic_id) but does not record that skipped ranges are intentionally
ignored, so detect_message_gaps() will repeatedly rediscover them; fix by, when
you hit a skipped topic inside that iter_messages loop, explicitly mark the gap
as ignored (e.g. call whatever gap-ignore API your codebase uses—insert an
ignored placeholder row or call self.mark_gap_ignored / self.db.mark_gap_ignored
with chat_id, gap_start, gap_end and topic_id) so detect_message_gaps() will not
report that range again; ensure the marking uses the same identifiers
detect_message_gaps expects.
- Around line 1665-1667: The debug log in the skip logic currently includes
user-controlled topic titles; update the logger call inside the block that
checks should_skip_topic(chat_id, topic.id) to remove any PII by only logging
the topic identifier (topic.id) and not topic.title—i.e., change the
logger.debug that references topic.title to a message that mentions only
topic.id (and any static context text) in the method where the loop performs
should_skip_topic checks.
In `@tests/test_config.py`:
- Around line 585-598: The nested context managers in
test_skip_topic_ids_invalid_format_non_integer (the two nested with statements
that use patch.dict(os.environ, env_vars, clear=True) and
self.assertRaises(ValueError)) should be collapsed into a single with statement
so Ruff SIM117 is not triggered; replace the nested withs around the Config()
instantiation with a single combined context manager that includes both
patch.dict(...) and self.assertRaises(ValueError) so the test still asserts the
exception while satisfying the linter.
---
Outside diff comments:
In `@src/telegram_backup.py`:
- Around line 663-696: Skipped messages currently bypass updating the sync
cursor so running_max_id never advances; inside the loop over iter_messages (in
the async for message ... block) update running_max_id for every message (use
running_max_id = max(running_max_id, message.id)) before continuing when
self.config.should_skip_topic(chat_id, extract_topic_id(message)) is true so
that skipped forum-topic messages still advance the cursor while still not being
added to batch_data or processed by _process_message/_commit_batch.
In `@tests/test_telegram_backup.py`:
- Around line 419-422: The async helper fake_iter currently uses "return"
followed by a dummy "yield" with an invalid "# noqa: unreachable" comment;
change fake_iter to be a real async generator without the bogus noqa by removing
the early return and using a harmless conditional yield (e.g., "if False:
yield") or placing the "yield" before returning so the function is an async
generator; update the function fake_iter in tests/test_telegram_backup.py
accordingly and drop the invalid "# noqa: unreachable" directive.
---
Nitpick comments:
In `@tests/test_config.py`:
- Around line 518-519: The new test suite uses unittest.TestCase
(TestSkipTopicIds) but our project requires pytest-style tests; convert the
class-based TestSkipTopicIds into one or more pytest functions using pytest
fixtures instead of class setup/teardown and remove the unittest.TestCase
inheritance and any self.* usage; replace methods like setUp/tearDown with
fixtures (or module/class-level fixtures) and rename test methods to plain
functions prefixed with test_ so pytest discovers them (e.g.,
test_skip_topic_ids_behavior), ensuring assertions use plain assert and fixtures
are referenced as function args.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0727a16c-c604-4ad7-b274-b2d424a93d84
📒 Files selected for processing (10)
.env.exampleREADME.mddocker-compose.ymldocs/CHANGELOG.mdsrc/config.pysrc/listener.pysrc/message_utils.pysrc/telegram_backup.pytests/test_config.pytests/test_telegram_backup.py
| async for message in self.client.iter_messages(entity, min_id=gap_start, max_id=gap_end, reverse=True): | ||
| # Skip messages belonging to excluded forum topics | ||
| if self.config.should_skip_topic(chat_id, extract_topic_id(message)): | ||
| continue |
There was a problem hiding this comment.
Avoid rediscovering intentionally skipped gaps.
Skipped topic messages are not inserted or marked ignored, so detect_message_gaps() can keep finding the same excluded-topic gaps on every gap-fill run.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/telegram_backup.py` around lines 749 - 752, The loop over messages using
self.client.iter_messages skips messages in excluded forum topics
(config.should_skip_topic + extract_topic_id) but does not record that skipped
ranges are intentionally ignored, so detect_message_gaps() will repeatedly
rediscover them; fix by, when you hit a skipped topic inside that iter_messages
loop, explicitly mark the gap as ignored (e.g. call whatever gap-ignore API your
codebase uses—insert an ignored placeholder row or call self.mark_gap_ignored /
self.db.mark_gap_ignored with chat_id, gap_start, gap_end and topic_id) so
detect_message_gaps() will not report that range again; ensure the marking uses
the same identifiers detect_message_gaps expects.
- Advance sync cursor for skipped topic messages (prevents re-fetching) - Uncomment SKIP_TOPIC_IDS in docker-compose.yml - Remove PII from logs (chat/topic IDs, topic titles) - Unify listener stats semantics across listen_new_messages branches - Add type hint to extract_topic_id - Fix Ruff SIM117 nested with blocks - Fix invalid noqa directive in test helper
|
🐳 Dev images published!
The dev/test instance will pick up these changes automatically (Portainer GitOps). To test locally: docker pull drumsergio/telegram-archive:dev
docker pull drumsergio/telegram-archive-viewer:dev |
- Fix ruff format issues in config.py and test_telegram_backup.py - Fix gap_fill test failures (MagicMock truthiness with topic filtering) - Add 28 config tests (81% → 96%) - Add 37 listener tests (17% → 43%) - Add 71 telegram_backup tests (29% → 49%) - Overall coverage: 18% → 33%
|
🐳 Dev images published!
The dev/test instance will pick up these changes automatically (Portainer GitOps). To test locally: docker pull drumsergio/telegram-archive:dev
docker pull drumsergio/telegram-archive-viewer:dev |
- Fix license: MIT → GPL-3.0 (matches actual LICENSE file) - Add module structure and backup/listener flow documentation - Add forum topic filtering architecture notes - Add PII logging rules - Add CI pipeline details (ruff check + format, pytest + codecov) - Add CodeRabbit rate limit handling guide - Add MagicMock truthiness pitfall and test patterns - Move testing section with updated frameworks and patterns
|
🐳 Dev images published!
The dev/test instance will pick up these changes automatically (Portainer GitOps). To test locally: docker pull drumsergio/telegram-archive:dev
docker pull drumsergio/telegram-archive-viewer:dev |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #118 +/- ##
==========================================
+ Coverage 34.97% 41.90% +6.93%
==========================================
Files 20 21 +1
Lines 5799 5847 +48
==========================================
+ Hits 2028 2450 +422
+ Misses 3771 3397 -374 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
🐳 Dev images published!
The dev/test instance will pick up these changes automatically (Portainer GitOps). To test locally: docker pull drumsergio/telegram-archive:dev
docker pull drumsergio/telegram-archive-viewer:dev |
Summary
SKIP_TOPIC_IDSenv var to exclude specific forum topics from backup without excluding the entire chatchat_id:topic_id,chat_id:topic_id,...(e.g.-1001234567890:42,-1001234567890:1337)MessageDeletedevents lack topic infoChanges
src/config.py: New_parse_topic_skip_list()parser,skip_topic_idsproperty,should_skip_topic()method, and startup loggingsrc/telegram_backup.py: Topic-level filtering in_backup_dialog,_fill_gap_range, and_backup_forum_topics(both API and fallback paths)src/listener.py: Topic-level filtering inon_new_messageandon_message_editedhandlers, plus startup loggingtests/test_config.py: 12 new tests covering parsing, validation, andshould_skip_topic()behaviorREADME.md,docker-compose.yml,.env.example: Documentation for the new env varTest plan
pytest tests/test_config.py -v)Closes #117
Summary by CodeRabbit
Release Notes
New Features
SKIP_TOPIC_IDSenvironment variable (format:chat_id:topic_id,...). Applies to both scheduled backups and real-time updates.Documentation