x-scraper is a focused CLI for exporting X longform articles to PDF. It reuses a locally saved X browser session, loads the article in Playwright, extracts the article body into a print-safe standalone document, inlines user-added media, and renders the result to PDF.
uv is a fast Python package and environment manager. It handles dependency installation, virtualenv management, and command execution for this project. The recommended workflow uses uv.
Install uv first:
curl -LsSf https://astral.sh/uv/install.sh | shThen set up the project:
uv sync
uv run playwright install chromiumuv run x-scraper loginThis opens a real browser window to https://x.com/login. After you complete login, the tool saves the browser state to auth/x_browser_state.json.
uv run x-scraper article-pdf "https://x.com/i/articles/your-article-id"If you do not pass --output, PDFs are written to out/ automatically.
You can also pass an explicit output path:
uv run x-scraper article-pdf "https://x.com/handle/article/123" -o ./article.pdfSupported URL shapes:
https://x.com/i/articles/...https://x.com/<handle>/article/...https://x.com/<handle>/status/...when the loaded page actually renders longform article content
article-pdf follows a narrow pipeline:
x_scraper.clivalidates that a saved X session exists and forwards the URL to the exporter.x_scraper.x_scraper.x_article_to_pdf()launches Chromium with the savedstorage_state.- The exporter opens the target URL and rejects it if X redirects to login or if the rendered page does not look like longform content.
_prepare_media_for_print()scrolls the page and waits for lazy assets to hydrate before extraction._extract_article_document()clones the highest-signal article container from the live DOM, removes X chrome, filters out avatars/icons/emoji, and inlines the remaining article media as data URLs.- A second Playwright page receives the standalone HTML document and prints it to PDF.
src/x_scraper/
├── __main__.py
├── cli.py
├── config.py
└── x_scraper.py
Responsibilities:
- src/x_scraper/cli.py
Defines the Typer app and the public CLI commands:
loginandarticle-pdf. - src/x_scraper/config.py Holds project paths and browser/login timeouts.
- src/x_scraper/x_scraper.py Implements X session management, article detection, media preparation, DOM extraction, asset inlining, and PDF rendering.
- src/x_scraper/main.py Module entrypoint for the package.
- Login is interactive by design. The exporter does not automate credentials.
- Session state is stored in
auth/x_browser_state.json. - The browser profile directory lives under
auth/x_profile/. - Default PDF output goes to
out/, which is created on demand. - If X redirects to
/loginor/i/flow/login, the command fails fast and asks forx-scraper login.
The exporter does not trust the URL path alone. It accepts article-like URLs, then validates the rendered DOM:
- waits for
articleormain articlecontent to exist - checks for longform signals such as an
Articlelabel, substantial text length, or enough paragraph nodes - rejects ordinary tweet/status pages that do not render longform content
This is why /status/... links can work when X uses that route for longform pages.
The PDF is not generated from X's live layout directly. Instead, the exporter builds its own print document:
- forces lazy images to load
- walks the article subtree and selects the highest-value content container
- keeps likely article images, primarily large
pbs.twimg.com/media/...assets andtweetPhotomedia - removes non-content assets such as avatars, emoji sprites, SVG icons, and other X interface chrome
- fetches remote media through the logged-in browser context and converts them to data URLs so the PDF renderer has self-contained assets
This architecture is why article images survive printing more reliably than a plain browser printToPDF call against the live X page.
The command intentionally fails early in a few cases:
- no saved X session
- expired X session
- target URL is not on
x.com - rendered page does not appear to be an X longform article
- X DOM changes enough that the article container or media selectors are no longer valid
When the last case happens, the fix usually belongs in the heuristics inside src/x_scraper/x_scraper.py.