- Python 100%
Reviewed-on: #31 Co-authored-by: Vincent Jousse <vincent@jousse.org> Co-committed-by: Vincent Jousse <vincent@jousse.org> |
||
|---|---|---|
| .woodpecker | ||
| src/unmerdify | ||
| tests | ||
| .gitignore | ||
| .pre-commit-config.yaml | ||
| .python-version | ||
| LICENSE | ||
| pyproject.toml | ||
| README.md | ||
| requirements.txt | ||
| uv.lock | ||
This is still a work in progress and it's not usable for now.
unmerdify
Get the content, only the content: unenshittificator for the web.
Installation
Get ftr-site-config config
First you will need to clone https://github.com/fivefilters/ftr-site-config locally (it contains the rules needed for this code to work).
With uv
Be sure to have uv installed, then:
uv sync
Why uv? Because after using cargo for Rust, I'm now tired of using broken Python version management. requirements.txt sucks, managing multiple venvs sucks, managing multiple Python versions sucks, using pyproject.toml and uv is the way to go. If you still want to manage all this mess yourself, I'll try to keep a requirements.txt file up to date.
Why not rye? Because uv can do almost all what rye was used to do: https://github.com/astral-sh/rye/discussions/1342.
With pip
Manage your Python version, your virtualenv the way you want to and install the requirements.
pip install -r requirements.txt
Keep requirements up to date with uv
uv pip compile pyproject.toml -o requirements.txt
Usage
With uv
curl -s https://vincent.jousse.org/blog/fr/perso/pourquoi-abandonner-une-croyance-est-si-difficile/ | uv run unmerdify -l DEBUG tests/fixtures/fixtures_site_config/vincent.jousse.org.txt
With pip
curl -s https://vincent.jousse.org/blog/fr/perso/pourquoi-abandonner-une-croyance-est-si-difficile/ | python src/unmerdify -l DEBUG tests/fixtures/fixtures_site_config/vincent.jousse.org.txt
Tests
uv run pytest
Useful links
- Site configs: https://github.com/fivefilters/ftr-site-config
- Testing reports of the five filters rules: https://siteconfig.fivefilters.org/test/
- PHP code using the fivefilters files:
github.com/j0k3r/graby@1281bf3d70/src/Extractor/ContentExtractor.php (L142)
Ideas
Should we clean urls in the parsed html? https://github.com/ClearURLs/Rules