I wonder whether it would be worth considering using an existing HTML parser such as BeautifulSoup to avoid having to include C code in the linkchecker package? This might lower the maintenance burden in the long term (since keeping C extensions working across platforms is not trivial).