gotor icon indicating copy to clipboard operation
gotor copied to clipboard

This program provides efficient web scraping services for Tor and non-Tor sites. The program has both a CLI and REST API.

Results 15 gotor issues
Sort by recently updated
recently updated
newest added

Should there be an option to run without using socks5 proxy?

question

There are two methods of interacting with a node. - Crawl: traverse a node and it's children without storing them in memory - Load: stores and and it's children in...

enhancement
Not Available
performance
Hacktoberfest

It should be possible to open multiple tor connections using different SOCKS/CONTROL ports. It may provide a performance boost to execute requests using different connections. How to open multiple connections:...

Low Priority

429 indicates too many requests and possibly has a header attached that indicates when another request should be retried `Retry-After`. This could be used to pull those requests into a...

enhancement
Medium Priority
performance
concurrency

I found this wonderful snippet in `gocolly`, this file could make random headers rather simple: https://github.com/gocolly/colly/blob/master/extensions/random_user_agent.go

## Summary Split docs into a small docs site with quickstarts, configuration reference, API reference, and an ops guide (metrics/pprof). ## Motivation - Faster onboarding - Clear runbooks for operating...

## Summary Adopt streamed parsing for HTML to reduce allocations, and do early content-type sniffing to skip binary/large content unless configured. ## Motivation - Lower memory usage during large crawls...

## Summary Introduce structured logging with fields per fetch and level control via flags. ## Motivation - Parseable logs for pipelines/SIEM - Easier debugging at scale ## Scope - One...

## Summary Expose Prometheus metrics and pprof to operate and profile the crawler. ## Motivation - Visibility into throughput, latency, errors - Ability to capture CPU/heap profiles ## Scope -...

## Summary Add per-host concurrency caps and optional inter-request delay to avoid bans and reduce tail latencies. ## Motivation - Be a good citizen to hosts - Smooth out heavy-tailed...