Automatic discovery of llms.txt or sitemap.xml

## Summary
Implement automatic discovery and parsing of `llms.txt`, `sitemap.xml`, and related files to enhance archons/crawl4ai's capabilities for AI-driven content consumption and comprehensive site crawling.

## Problem Statement
As AI-driven content consumption becomes standard practice, crawlers need to automatically discover and utilize specialized files that help LLMs understand websites better. Currently, crawl4ai requires manual specification of these files, missing valuable structured information that could improve crawling efficiency and content extraction quality.

## Proposed Solution

### Core Discovery Features

#### 1. File Types to Discover

```python
DISCOVERY_TARGETS = {
    'llm_files': [
        '/llms.txt',           # Primary LLM documentation
        '/llms-full.txt',      # Comprehensive version
        '/llms.md',            # Markdown variant
        '/llms-ctx.txt',       # Context-optimized version
    ],
    'sitemap_files': [
        '/sitemap.xml',
        '/sitemap_index.xml',
        '/sitemap-*.xml',      # Numbered/dated variants
        '/sitemaps/*.xml',     # Subdirectory patterns
    ],
    'metadata_files': [
        '/robots.txt',         # Contains sitemap references
        '/.well-known/*',      # RFC 8615 directory
        '/humans.txt',
        '/security.txt',
    ]
}
```

#### 2. Discovery Methods

**Priority Order:**
1. Parse `robots.txt` for Sitemap directives
2. Check standard URL patterns (root directory)
3. Parse HTML meta tags and link elements
4. Check `.well-known` directory
5. Try common variations with wildcards

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatic discovery of llms.txt or sitemap.xml #430

Summary

Problem Statement

Proposed Solution

Core Discovery Features

1. File Types to Discover

2. Discovery Methods

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Automatic discovery of llms.txt or sitemap.xml #430

Description

Summary

Problem Statement

Proposed Solution

Core Discovery Features

1. File Types to Discover

2. Discovery Methods

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions