-
-
Notifications
You must be signed in to change notification settings - Fork 2.4k
InterSphinx + DirHTML results in 'index' ref in objects.inv not loading correctly #7095
Description
Describe the bug
When loading from an objects.inv from a project built with the dirhtml builder, the index ref has an incorrect URL associated with it.
To Reproduce
$ git clone https://github.com/pypa/pip
$ cd pip
$ pip install -r tools/requirements/docs.txt
$ sphinx-build -W -d /tmp/doctrees/html -b dirhtml docs/html docs/build/html
$ python -m sphinx.ext.intersphinx ./docs/build/html/objects.inv | grep "\tindex"
index - Python Package Installer : pip
index-overview Overview : development/architecture/package-finding/#index-overview
Notice how the first line doesn't have the correct title pip - Python Package Installer or URI ``.
Expected behavior
The correct URI is associated with the index page (eg. # or / or ``)
Your project
pip
Screenshots
N/A
Environment info
(does not matter, see below)
- OS: MacOS
- Python version: 3.8.0
- Sphinx version: 2.3.1
- Sphinx extensions: sphinx.ext.extlinks, pip_sphinxext (repo local file), sphinx.ext.intersphinx
- Extra tools: not needed. :)
Additional context
My investigation into this bug: pypa/pip#7130
This issue occurs due to:
- the
dirhtmlbuilder providinguri = ''for theindexpage - the use of greedy
\s+in the regex used during loading of lines in the intersphinxobjects.invformat.
During writing, InventoryFile.dump writes a line containing <priority><space><space><heading> -- notice the two spaces back to back, one on either side of the empty ("") uri. This extra whitespace is greedily consumed by the regex, since it uses \s+ for matching the whitespace and it does not allow for an empty uri (it uses \S+ for matching uri).
AFAICT, there are two changes that can be made to fix this:
- Stop returning "" as the uri for index in dirhtml's builder
- is
/or#a good option?
- is
- Update regex to:
- use \s+? instead of \s+ for matching whitespace non-greedily
- allow for the uri part to be empty (
\S*instead of\S+).
(?x)(.+?)\s+?(\S*:\S*)\s+?(-?\d+)\s+?(\S*)\s+?(.*)
1 provides the ability for older-sphinx to read objects.inv from newer-sphinx. 2 provides the ability for newer-sphinx to read objects.inv from older-sphinx. Implementing both would likely be best for interoperability between sphinx versions.
| builder ↓ \ loader | old | new |
|---|---|---|
| old | forever broken | 2 |
| new | 1 | 1 or 2 |