Skip to content

InterSphinx + DirHTML results in 'index' ref in objects.inv not loading correctly #7095

@pradyunsg

Description

@pradyunsg

Describe the bug
When loading from an objects.inv from a project built with the dirhtml builder, the index ref has an incorrect URL associated with it.

To Reproduce

$ git clone https://github.com/pypa/pip
$ cd pip
$ pip install -r tools/requirements/docs.txt
$ sphinx-build -W -d /tmp/doctrees/html -b dirhtml docs/html docs/build/html
$ python -m sphinx.ext.intersphinx ./docs/build/html/objects.inv | grep "\tindex"
        index                                    - Python Package Installer              : pip
        index-overview                           Overview                                : development/architecture/package-finding/#index-overview

Notice how the first line doesn't have the correct title pip - Python Package Installer or URI ``.

Expected behavior
The correct URI is associated with the index page (eg. # or / or ``)

Your project
pip

Screenshots
N/A

Environment info

(does not matter, see below)

  • OS: MacOS
  • Python version: 3.8.0
  • Sphinx version: 2.3.1
  • Sphinx extensions: sphinx.ext.extlinks, pip_sphinxext (repo local file), sphinx.ext.intersphinx
  • Extra tools: not needed. :)

Additional context

My investigation into this bug: pypa/pip#7130

This issue occurs due to:

  • the dirhtml builder providing uri = '' for the index page
  • the use of greedy \s+ in the regex used during loading of lines in the intersphinx objects.inv format.

During writing, InventoryFile.dump writes a line containing <priority><space><space><heading> -- notice the two spaces back to back, one on either side of the empty ("") uri. This extra whitespace is greedily consumed by the regex, since it uses \s+ for matching the whitespace and it does not allow for an empty uri (it uses \S+ for matching uri).


AFAICT, there are two changes that can be made to fix this:

  1. Stop returning "" as the uri for index in dirhtml's builder
    • is / or # a good option?
  2. Update regex to:
    • use \s+? instead of \s+ for matching whitespace non-greedily
    • allow for the uri part to be empty (\S* instead of \S+).
    (?x)(.+?)\s+?(\S*:\S*)\s+?(-?\d+)\s+?(\S*)\s+?(.*)
    

1 provides the ability for older-sphinx to read objects.inv from newer-sphinx. 2 provides the ability for newer-sphinx to read objects.inv from older-sphinx. Implementing both would likely be best for interoperability between sphinx versions.

builder ↓ \ loader old new
old forever broken 2
new 1 1 or 2

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions