1. 53
    About Clew web clew.se
    1. 12

      I found interesting that they maintain a list of domains owned by big media corporations[1] (and at the moment it seems to be mainly Disney, which goes to show just how big the thing is).

      They also rank sites by how sustainable they are[2] (Lobste.rs gets an A+!).

      It also supports DuckDuckGo-like bangs[3].

      [1] https://codeberg.org/Clew/big-media-domains/src/branch/main/listings.toml

      [2] https://clew.se/green/

      [3] https://clew.se/bangs/

      1. 6

        I really like indieweb projects

        https://marginalia-search.com/ is also a good search engine

        1. 4

          One more to the bucket: https://mwmbl.org

          1. 2

            We cache your old /robots.txt for up to 24 hours, to avoid spamming your server with new requests for it every time we want to check a page

            This might make sense as a default but it may make sense to consider the Cache-Control configuration of the robots.txt as some people want to react to new crawlers faster than 24h. I would probably still clamp the refresh times (maybe between 5min and 7d) and the 24h default sounds reasonable (if the site doesn't indicate a good value) but it would be nice to follow the site's expressed wishes.