[Feature]: Improve PDF parsing structure preservation and directory organization

## Problem Statement

When parsing large PDFs (e.g., a 463-page Chinese Pharmacopoeia), the current `PDFParser` loses all chapter/heading structure. The local pdfplumber strategy extracts raw text per page with `` HTML comment markers, but these are excluded from `MarkdownParser._find_headings()`. As a result:

1. **Chapter/TOC structure is lost**: The MarkdownParser finds zero headings and falls back to paragraph-based splitting, producing 538 flat numbered files (`name_1.md` through `name_538.md`) with no semantic organization.
2. **Too many files in a single directory**: All 538 slices land in one flat directory, making it hard to browse, search, or manage. The filesystem structure provides no information about the document's logical organization.

## Proposed Solution

### 1. PDF Bookmark/Outline Extraction (Primary)

Extract the PDF's built-in bookmarks/outlines via `pdfplumber`'s underlying pdfminer (`pdf.doc.get_outlines()`). Convert bookmark entries to markdown headings (`#`, `##`, etc.) and inject them at the correct page positions before passing to `MarkdownParser`. This allows the existing heading-based splitting logic to naturally build a hierarchical directory structure.

### 2. Font-Size Heading Detection (Fallback)

When a PDF has no bookmarks, analyze character-level font information from `page.chars` to detect headings:
- Identify body text size (most frequent font size)
- Classify significantly larger text as headings
- Map font size tiers to heading levels (up to 4 levels)
- Group consecutive same-sized large characters into heading text

### 3. Directory Auto-Grouping (Generic)

Add a `MAX_CHILDREN_PER_DIR` threshold (default 50) to `MarkdownParser`. When any single directory level would contain more files than the threshold:
- **No-heading path**: Group into numbered subdirectories (`doc_001-050/`, `doc_051-100/`)
- **Heading path**: Group consecutive same-level sections into subdirectories named by first/last section

## Alternatives Considered

- **LLM-based structure inference**: Too expensive for parsing phase, and the current architecture deliberately avoids LLM calls during parsing.
- **Page-range based grouping only**: Simple but loses semantic meaning — doesn't leverage the document's actual structure.
- **Relying solely on MinerU**: Not always available; the local pdfplumber path should work well independently.

## Feature Area

Core (Client/Engine)

## Use Case

When users ingest large structured PDFs (textbooks, standards documents, legal codes, pharmacopoeias, technical manuals), the parsed output should preserve the document's chapter/section hierarchy as a directory tree. This makes it possible to:
- Browse resources by chapter
- Search within specific sections
- Load context at the right granularity (L0/L1/L2)
- Avoid filesystem performance issues from hundreds of files in one directory

## Implementation Plan

### Files to modify

| File | Changes |
|------|---------|
| `openviking/parse/parsers/pdf.py` | Add `_extract_bookmarks()`, `_detect_headings_by_font()`, modify `_convert_local()` to inject headings |
| `openviking/parse/parsers/markdown.py` | Add `MAX_CHILDREN_PER_DIR`, `_auto_group_sections()`, modify no-heading branch and `_process_sections_with_merge()` |
| `openviking_cli/utils/config/parser_config.py` | Add config fields: `heading_detection`, `font_heading_min_delta`, `max_children_per_dir` |

### Phases

1. **Bookmark extraction** — Extract PDF outlines, inject as markdown headings
2. **Font-size detection** — Fallback heading detection via character font analysis
3. **Directory auto-grouping** — Generic threshold-based subdirectory creation
4. **Configuration** — Wire new config fields into PDFConfig and ParserConfig

## Additional Context

Example of current broken output for a 463-page PDF:
```
data/viking/my-team/resources/tmppi5lkjtk/
├── tmppi5lkjtk_1.md      # Page 2-3 raw text
├── tmppi5lkjtk_2.md      # Page 4 raw text
├── ...
└── tmppi5lkjtk_538.md    # Last chunk
```

Expected output after fix:
```
data/viking/my-team/resources/中华人民共和国药典/
├── 第一部_药材/
│   ├── 川木通.md
│   ├── 川贝母.md
│   └── ...
├── 第二部_化学药/
│   └── ...
└── ...
```

File	Changes
`openviking/parse/parsers/pdf.py`	Add `_extract_bookmarks()`, `_detect_headings_by_font()`, modify `_convert_local()` to inject headings
`openviking/parse/parsers/markdown.py`	Add `MAX_CHILDREN_PER_DIR`, `_auto_group_sections()`, modify no-heading branch and `_process_sections_with_merge()`
`openviking_cli/utils/config/parser_config.py`	Add config fields: `heading_detection`, `font_heading_min_delta`, `max_children_per_dir`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Improve PDF parsing structure preservation and directory organization #393

Problem Statement

Proposed Solution

1. PDF Bookmark/Outline Extraction (Primary)

2. Font-Size Heading Detection (Fallback)

3. Directory Auto-Grouping (Generic)

Alternatives Considered

Feature Area

Use Case

Implementation Plan

Files to modify

Phases

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature]: Improve PDF parsing structure preservation and directory organization #393

Description

Problem Statement

Proposed Solution

1. PDF Bookmark/Outline Extraction (Primary)

2. Font-Size Heading Detection (Fallback)

3. Directory Auto-Grouping (Generic)

Alternatives Considered

Feature Area

Use Case

Implementation Plan

Files to modify

Phases

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions