Skip to content

Don't treat sites with 403 status codes as broken links? #1157

@mre

Description

@mre

In a recent test of the top 1000 websites, I found the following 403 status errors:

✗ [403] https://cell.com/ | Failed: Network error: Forbidden
✗ [403] https://ticketmaster.com/ | Failed: Network error: Forbidden
✗ [403] https://kraken.com/ | Failed: Network error: Forbidden
✗ [403] https://media.giphy.com/ | Failed: Network error: Forbidden
✗ [403] https://yummly.com/ | Failed: Network error: Forbidden
✗ [403] https://bhphotovideo.com/ | Failed: Network error: Forbidden
✗ [403] https://inc.com/ | Failed: Network error: Forbidden
✗ [403] https://science.sciencemag.org/ | Failed: Network error: Forbidden
✗ [403] https://support.cloudflare.com/ | Failed: Network error: Forbidden
✗ [403] https://docs.wixstatic.com/ | Failed: Network error: Forbidden
✗ [403] https://deviantart.com/ | Failed: Network error: Forbidden
✗ [403] https://use.fontawesome.com/ | Failed: Network error: Forbidden
✗ [403] https://yoursite.com/ | Failed: Network error: Forbidden
✗ [403] https://upwork.com/ | Failed: Network error: Forbidden
✗ [403] https://gleam.io/ | Failed: Network error: Forbidden
✗ [403] https://kickstarter.com/ | Failed: Network error: Forbidden
✗ [403] https://static.wixstatic.com/ | Failed: Network error: Forbidden
✗ [403] https://onlinelibrary.wiley.com/ | Failed: Network error: Forbidden
✗ [403] https://ericsson.com/ | Failed: Network error: Forbidden
✗ [403] https://journals.sagepub.com/ | Failed: Network error: Forbidden
✗ [403] https://kstatic.googleusercontent.com/ | Failed: Network error: Forbidden
✗ [403] https://coinbase.com/ | Failed: Network error: Forbidden
✗ [403] https://zazzle.com/ | Failed: Network error: Forbidden
✗ [403] https://thelancet.com/ | Failed: Network error: Forbidden
✗ [403] https://bluehost.com/ | Failed: Network error: Forbidden
✗ [403] https://tandfonline.com/ | Failed: Network error: Forbidden
✗ [403] https://event.on24.com/ | Failed: Network error: Forbidden
✗ [403] https://hostgator.com/ | Failed: Network error: Forbidden
✗ [403] https://pixabay.com/ | Failed: Network error: Forbidden
✗ [403] https://avvo.com/ | Failed: Network error: Forbidden
✗ [403] https://pexels.com/ | Failed: Network error: Forbidden
✗ [403] https://canva.com/ | Failed: Network error: Forbidden
✗ [403] https://sciencemag.org/ | Failed: Network error: Forbidden
✗ [403] https://codepen.io/ | Failed: Network error: Forbidden
✗ [403] https://fastcompany.com/ | Failed: Network error: Forbidden
✗ [403] https://homedepot.com/ | Failed: Network error: Forbidden
✗ [403] https://udemy.com/ | Failed: Network error: Forbidden
✗ [403] https://oecd.org/ | Failed: Network error: Forbidden
✗ [403] https://capterra.com/ | Failed: Network error: Forbidden
✗ [403] https://nejm.org/ | Failed: Network error: Forbidden
✗ [403] https://creativemarket.com/ | Failed: Network error: Forbidden
✗ [403] https://s-media-cache-ak0.pinimg.com/ | Failed: Network error: Forbidden

These are websites, which block lychee for one reason or another. For example, they use browser fingerprinting, bot detection, block unknown user agents etc. The list is long.

Given that most of these links work, I vote for treating websites with 403 status codes as non-failing links to reduce the number of false positives. Any objections?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No fields configured for Task.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions