linkchecker

module
v0.2.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 4, 2025 License: MIT

README

Concurrently checks links in URLs or local HTML files, controlling request intervals to the same host to prevent overload.

en ja fr de

linkchecker is a command-line tool written in Go for checking links found within a specified URL or a local HTML file.

Its key features include:

  • Concurrency with Respect: It checks links concurrently using multiple workers while strictly observing a customizable wait period (-wait) between requests to the same host, preventing accidental DoS or excessive load on target servers.
  • Flexible Link Source: It can crawl links from both remote URLs and local HTML files.
  • Customizable Behavior: You can easily configure the HTTP timeout, User-Agent, and selectively ignore internal links or specific hostnames via an ignore file.

Installation

If you have Go installed on your system, you can install the tool using the following command:

go install github.com/sekika/linkchecker/cmd/linkchecker@latest

Usage

After installation, you can run the tool using the linkchecker command.

Basic Use

Specify the target URL or local HTML file path using the -u flag.

# Check links on a website
linkchecker -u https://example.com/page.html

# Check links in a local file
linkchecker -u path/to/local/file.html
Filtering Results (Displaying Only Failures)

Since linkchecker outputs [OK] or [NG] for each checked link, you can easily filter for only the failing links using grep:

linkchecker -u https://example.com/page.html | grep "\[NG\]"
Script for Local Directory Check

For checking all .html files in a local directory recursively, use the provided shell script:

./check_local_files.sh [directory]

This script runs linkchecker -u <file> for every .html file and only prints the file name and the [NG] results.

Options
Flag Description Default Value
-u Target URL or local HTML file (Required) ""
-no-internal Do not check internal links (links within the same host/domain) false
-ignore Path to a file containing a list of hosts/domains to ignore ""
-timeout HTTP request timeout in seconds 10
-wait Wait time in seconds between requests to the same host. Controls the crawling interval. 3
-user-agent User-Agent string to use for HTTP requests github.com/sekika/linkchecker
Examples
  • Exclude internal links and set the timeout to 5 seconds:
linkchecker -u https://example.com -no-internal -timeout 5

As a Library (Advanced)

Although this repository's primary focus is the command-line tool, you can use the core functionality by importing the relevant package.

Importing Core Functionality

To use the link extraction logic programmatically, you can import the crawler package from the new public path:

package main

import (
    "fmt"
    "log"
    "time"

    "https://github.com/sekika/linkchecker/pkg/crawler"
)

func main() {
    url := "https://example.com"
    timeoutSec := 10
    userAgent := "MyCustomApp/1.0"

    // Extract links from a URL
    links, err := crawler.ExtractLinksFromURL(url, timeoutSec, userAgent)
    if err != nil {
        log.Fatalf("Error extracting links: %v", err)
    }

    fmt.Printf("Found %d links on %s\n", len(links), url)

    // Example of running workers (Note: RunWorkers needs a list of absolute links)
    // crawler.RunWorkers(links, url, false, make(map[string]bool), timeoutSec, 3, userAgent)
}

Code analysis

Go Reference

Directories

Path Synopsis
cmd
linkchecker command
linkchecker is a command-line tool written in Go for checking link health within a specified URL or a local HTML file.
linkchecker is a command-line tool written in Go for checking link health within a specified URL or a local HTML file.
pkg

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL