fix: resolve dangling dedup symlink infinite redownload loop#119
fix: resolve dangling dedup symlink infinite redownload loop#119
Conversation
Use os.path.lexists() instead of os.path.exists() to detect dangling symlinks during VERIFY_MEDIA cleanup. Capture Telethon download_media() return value to use actual on-disk filename for symlink targets. Remove stale symlinks before recreation to prevent Errno 17 (file exists). Applies to both scheduled backup and real-time listener flows. Closes #115
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 50 minutes and 23 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughFixes dangling symlink handling in media deduplication and verification workflows by detecting broken links with Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
🐳 Dev images published!
The dev/test instance will pick up these changes automatically (Portainer GitOps). To test locally: docker pull drumsergio/telegram-archive:dev
docker pull drumsergio/telegram-archive-viewer:dev |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #119 +/- ##
==========================================
+ Coverage 41.90% 44.56% +2.66%
==========================================
Files 21 21
Lines 5847 5865 +18
==========================================
+ Hits 2450 2614 +164
+ Misses 3397 3251 -146 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Cover all changed lines in listener.py _download_media (dangling symlink pre-unlink, copy2 fallback, download return capture for both dedup and non-dedup paths) and telegram_backup.py _process_media (non-dedup return capture, dedup symlink-fail fallback return capture).
|
🐳 Dev images published!
The dev/test instance will pick up these changes automatically (Portainer GitOps). To test locally: docker pull drumsergio/telegram-archive:dev
docker pull drumsergio/telegram-archive-viewer:dev |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/telegram_backup.py (1)
1349-1362:⚠️ Potential issue | 🟠 MajorPre-unlink stale chat symlinks in the shared-file branch too.
This branch still calls
os.symlink()while a danglingfile_pathmay already exist, so scheduled backups can still hitEEXIST. Mirror the listener/first-download path and remove the stale link before recreating it.🐛 Proposed fix
if os.path.exists(shared_file_path): # File exists in shared - create symlink try: # Use relative symlink for portability rel_path = os.path.relpath(shared_file_path, chat_media_dir) + if os.path.lexists(file_path): + os.unlink(file_path) os.symlink(rel_path, file_path) logger.debug(f"Created symlink for deduplicated media: {file_name}") except OSError as e: - # Symlink failed (e.g., Windows), copy reference instead - logger.warning(f"Symlink failed, downloading copy: {e}") - actual_path = await self.client.download_media(message, file_path) - if actual_path and isinstance(actual_path, str): - file_path = actual_path + # Symlink not supported (e.g., Windows), copy shared file instead + logger.warning(f"Symlink not supported, using direct path: {e}") + import shutil + + shutil.copy2(shared_file_path, file_path)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/telegram_backup.py` around lines 1349 - 1362, The shared-file branch should pre-unlink any stale/dangling symlink at file_path before calling os.symlink to avoid EEXIST; update the block handling shared_file_path so that if os.path.exists(file_path) is True and os.path.islink(file_path) and not os.path.exists(os.readlink(file_path) if os.path.islink(file_path) else ""), or more simply if os.path.islink(file_path) and not os.path.exists(file_path), call os.unlink(file_path) (mirroring the listener/first-download path) prior to creating the relative symlink with os.symlink(rel_path, file_path); keep the existing exception handling that falls back to self.client.download_media(message, file_path) and the same logger calls (logger.debug/logger.warning) and references to file_name.
🧹 Nitpick comments (1)
tests/test_symlink_dedup.py (1)
145-155: Setreply_to = Noneon MagicMock messages.Add this to the message helpers/ad-hoc message mocks to avoid accidental topic-filtering behavior as these tests evolve. As per coding guidelines, “Always set
msg.reply_to = Noneon MagicMock message objects to prevent false-positive topic filtering in tests”.Also applies to: 194-204, 520-528, 716-725, 779-788
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_symlink_dedup.py` around lines 145 - 155, The MagicMock message instances (e.g., the local variable msg used in tests/test_symlink_dedup.py and its helper mocks) must explicitly set msg.reply_to = None to avoid accidental topic-filtering; update each mock block where msg is created (the block with msg.id, msg.date, msg.media and related return-value mocks and any similar blocks at the other noted locations) to include msg.reply_to = None immediately after instantiating msg.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/listener.py`:
- Around line 638-644: The final return currently ignores the possibly-updated
file_path from the download step; update the return to use the captured
file_path (or its relative path under self.config.media_path) instead of always
constructing f"{self.config.media_path}/{chat_id}/{file_name}"; specifically,
after calling self.client.download_media(...) and possibly getting an
actual_path string, return that actual_path (or os.path.relpath(actual_path,
self.config.media_path) if you must store a path relative to the media root) so
the DB records the real file Telethon wrote rather than the originally requested
file_name.
In `@tests/test_symlink_dedup.py`:
- Around line 162-166: The test currently only conditionally checks the symlink
on disk (using os.path.lexists) and compares the resolved path, which allows
download_media() to be ignored; instead assert the function's returned/stored
path directly: remove the if os.path.lexists(...) guard and replace with a
direct equality assertion that actual_returned_path (the value returned by
download_media()) equals the expected resolved path for chat_link (use
os.path.realpath on both sides if needed). Apply the same change pattern to the
other occurrences referenced (around lines 230-234, 619-623, 686-688, 748-752,
822-824) so each test tightens the assertion to validate the captured/returned
path instead of only checking existence.
- Around line 373-380: Add an assertion that the dangling symlink was actually
removed during _verify_and_redownload_media: after the existing assertions that
_process_media and db.insert_media were awaited, assert that os.path.lexists was
called for the problematic symlink and that os.remove was called with that
symlink path (i.e. verify os.path.lexists(...)=True check and
os.remove(symlink_path) invocation), referencing the test's call to
self.backup._verify_and_redownload_media(), self.backup._process_media,
self.backup.db.insert_media, os.path.lexists, and os.remove to ensure the
cleanup step ran rather than only the re-download.
---
Outside diff comments:
In `@src/telegram_backup.py`:
- Around line 1349-1362: The shared-file branch should pre-unlink any
stale/dangling symlink at file_path before calling os.symlink to avoid EEXIST;
update the block handling shared_file_path so that if os.path.exists(file_path)
is True and os.path.islink(file_path) and not
os.path.exists(os.readlink(file_path) if os.path.islink(file_path) else ""), or
more simply if os.path.islink(file_path) and not os.path.exists(file_path), call
os.unlink(file_path) (mirroring the listener/first-download path) prior to
creating the relative symlink with os.symlink(rel_path, file_path); keep the
existing exception handling that falls back to
self.client.download_media(message, file_path) and the same logger calls
(logger.debug/logger.warning) and references to file_name.
---
Nitpick comments:
In `@tests/test_symlink_dedup.py`:
- Around line 145-155: The MagicMock message instances (e.g., the local variable
msg used in tests/test_symlink_dedup.py and its helper mocks) must explicitly
set msg.reply_to = None to avoid accidental topic-filtering; update each mock
block where msg is created (the block with msg.id, msg.date, msg.media and
related return-value mocks and any similar blocks at the other noted locations)
to include msg.reply_to = None immediately after instantiating msg.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8aac0c38-766e-42b8-9fea-a588b2c0bc37
📒 Files selected for processing (4)
docs/CHANGELOG.mdsrc/listener.pysrc/telegram_backup.pytests/test_symlink_dedup.py
| if not os.path.exists(file_path): | ||
| await self.client.download_media(message, file_path) | ||
| actual_path = await self.client.download_media(message, file_path) | ||
| if actual_path and isinstance(actual_path, str): | ||
| file_path = actual_path | ||
|
|
||
| # Return the path as stored in DB (relative to media root) | ||
| return f"{self.config.media_path}/{chat_id}/{file_name}" |
There was a problem hiding this comment.
Return file_path after capturing Telethon’s actual path.
Line 644 discards the updated file_path from Lines 639-641. In non-dedup mode, if Telethon writes photo.mp4 instead of the requested photo, the DB still stores the non-existent requested path.
🐛 Proposed fix
# Return the path as stored in DB (relative to media root)
- return f"{self.config.media_path}/{chat_id}/{file_name}"
+ return file_path📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if not os.path.exists(file_path): | |
| await self.client.download_media(message, file_path) | |
| actual_path = await self.client.download_media(message, file_path) | |
| if actual_path and isinstance(actual_path, str): | |
| file_path = actual_path | |
| # Return the path as stored in DB (relative to media root) | |
| return f"{self.config.media_path}/{chat_id}/{file_name}" | |
| if not os.path.exists(file_path): | |
| actual_path = await self.client.download_media(message, file_path) | |
| if actual_path and isinstance(actual_path, str): | |
| file_path = actual_path | |
| # Return the path as stored in DB (relative to media root) | |
| return file_path |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/listener.py` around lines 638 - 644, The final return currently ignores
the possibly-updated file_path from the download step; update the return to use
the captured file_path (or its relative path under self.config.media_path)
instead of always constructing
f"{self.config.media_path}/{chat_id}/{file_name}"; specifically, after calling
self.client.download_media(...) and possibly getting an actual_path string,
return that actual_path (or os.path.relpath(actual_path, self.config.media_path)
if you must store a path relative to the media root) so the DB records the real
file Telethon wrote rather than the originally requested file_name.
| self._run(self.backup._verify_and_redownload_media()) | ||
|
|
||
| # The cleanup code should have removed the dangling symlink before re-download | ||
| # (lexists check + os.remove in the verification loop) | ||
| # Verify _process_media was called (re-download attempted) | ||
| self.backup._process_media.assert_awaited_once() | ||
| # Verify db.insert_media was called (successful re-download) | ||
| self.backup.db.insert_media.assert_awaited_once() |
There was a problem hiding this comment.
Assert the dangling symlink cleanup happened.
This test currently only proves redownload was attempted. If the os.path.lexists() cleanup regresses, the mocked _process_media() still lets the test pass.
🧪 Proposed assertion
self._run(self.backup._verify_and_redownload_media())
+ self.assertFalse(os.path.lexists(dangling_link))
# The cleanup code should have removed the dangling symlink before re-download
# (lexists check + os.remove in the verification loop)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| self._run(self.backup._verify_and_redownload_media()) | |
| # The cleanup code should have removed the dangling symlink before re-download | |
| # (lexists check + os.remove in the verification loop) | |
| # Verify _process_media was called (re-download attempted) | |
| self.backup._process_media.assert_awaited_once() | |
| # Verify db.insert_media was called (successful re-download) | |
| self.backup.db.insert_media.assert_awaited_once() | |
| self._run(self.backup._verify_and_redownload_media()) | |
| self.assertFalse(os.path.lexists(dangling_link)) | |
| # The cleanup code should have removed the dangling symlink before re-download | |
| # (lexists check + os.remove in the verification loop) | |
| # Verify _process_media was called (re-download attempted) | |
| self.backup._process_media.assert_awaited_once() | |
| # Verify db.insert_media was called (successful re-download) | |
| self.backup.db.insert_media.assert_awaited_once() |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_symlink_dedup.py` around lines 373 - 380, Add an assertion that
the dangling symlink was actually removed during _verify_and_redownload_media:
after the existing assertions that _process_media and db.insert_media were
awaited, assert that os.path.lexists was called for the problematic symlink and
that os.remove was called with that symlink path (i.e. verify
os.path.lexists(...)=True check and os.remove(symlink_path) invocation),
referencing the test's call to self.backup._verify_and_redownload_media(),
self.backup._process_media, self.backup.db.insert_media, os.path.lexists, and
os.remove to ensure the cleanup step ran rather than only the re-download.
Add lexists+unlink guard before os.symlink in the shared-file-exists branch of _process_media (was missing, could still hit EEXIST). Change fallback from re-downloading to shutil.copy2 for consistency with listener. Set msg.reply_to=None on all test mock messages per project convention.
|
🐳 Dev images published!
The dev/test instance will pick up these changes automatically (Portainer GitOps). To test locally: docker pull drumsergio/telegram-archive:dev
docker pull drumsergio/telegram-archive-viewer:dev |
`_verify_and_redownload_media` checks every previously recorded media file with `os.path.exists`, which follows symlink targets. For archived layouts where the actual content is held in an external store (git-annex `objects/`, lazy mount, separate filesystem) the symlink may be intact while its target is unreachable from the running process. The old behavior flagged such entries as "missing" and re-downloaded, atomic-renaming a freshly-downloaded regular file on top of the user's symlink and mutating the working tree (issue #143). Two-part fix in the Phase 1 detection loop: 1. Use `os.path.lexists` to detect "truly missing" entries. A symlink in place -- even a broken one -- is no longer added to the missing-files list. 2. After the lexists check, short-circuit on `os.path.islink`. Symlink contents are managed externally, so we cannot meaningfully verify size or emptiness without following the link, and we will not replace a symlink we did not write. Truly absent paths (no entry on disk at all) are still re-downloaded. The cleanup loop in Phase 2 already used `os.path.lexists` since #119, so the two phases are now consistent. Refs: #143 Co-Authored-By: Claude Code 2.1.128 / Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
VERIFY_MEDIA+DEDUPLICATE_MEDIAinteraction where dangling symlinks cause infinite redownload loops withErrno 17(file exists) andErrno 2(file not found) errorsos.path.exists()follows symlinks (nowlexists()), Telethondownload_media()return value was discarded (now captured), stale symlinks not removed before recreation (now pre-unlinked)Type of Change
Database Changes
scripts/Data Consistency Checklist
N/A — no database operations modified.
Testing
python -m pytest tests/ -v)ruff check .)ruff format --check .)9 new tests in
tests/test_symlink_dedup.pycovering:lexistsvsexists)_process_mediaintegration with dedup enabled_verify_and_redownload_mediacleanup of dangling symlinks_download_mediasymlink handlingshutil.copy2fallback when shared file already existsSecurity Checklist
Deployment Notes
DEDUPLICATE_MEDIA+VERIFY_MEDIAusersCloses #115
Summary by CodeRabbit
Documentation
Bug Fixes
DEDUPLICATE_MEDIAandVERIFY_MEDIAare enabled. The system now properly detects dangling symlinks, removes stale links before recreation, and prevents "file exists" errors in both scheduled backup and real-time listener flows.Tests