Add a new references builder#12190
Conversation
|
(cc also @webknjaz, as I can't add you as a reviewer) |
|
(test failure is likely because of a side effect) |
yeh hmm works locally (when calling the singular test), but I perhaps I can't "piggy-back" on the existing |
|
anyway, whilst I fix that, interested to hear your thoughts |
picnixz
left a comment
There was a problem hiding this comment.
A bit of comments (I'll be less available from now)
There was a problem hiding this comment.
For new files like that, could we have explicit __all__ (empty by default if possible, since we don't really know what should be public or not).
|
|
||
| class LocalReference(TypedDict, total=False): | ||
| type: Literal['local'] | ||
| document: str |
There was a problem hiding this comment.
In general, we use document for something else I think. Would it be possible to use docname instead? (I didn't see yet but is it a full path or not?). If so, I'd suggest path instead of filepath. Because document is generally.. the document node.
There was a problem hiding this comment.
Fair, its the full path as thats more helpful for users, so yeh could change to filepath
| from __future__ import annotations | ||
|
|
||
| import json | ||
| from os import path |
There was a problem hiding this comment.
(Actually, it's essentially to reduce the possibility of having a variable shadowing the import)
| data[domainname][otype_name] = {'items': {}} | ||
| if otype := domain.object_types.get(otype_name): | ||
| data[domainname][otype_name]['roles'] = list(otype.roles) | ||
| data.setdefault(domainname, {}).setdefault(otype_name, {})['items'].setdefault( |
There was a problem hiding this comment.
| data.setdefault(domainname, {}).setdefault(otype_name, {})['items'].setdefault( | |
| data[domainname].setdefault(otype_name, {})['items'].setdefault( |
There was a problem hiding this comment.
Or better: use intermediate variables here (because it's a bit hard to parse).
| 'url': url, | ||
| } | ||
| # only add dispname if it is set and not the same as name | ||
| if not (dispname == name or not dispname or dispname == '-'): |
| otype := local_domain.object_types.get(otype_name) | ||
| ): | ||
| data[domainname][otype_name]['roles'] = list(otype.roles) | ||
| data.setdefault(domainname, {}).setdefault(otype_name, {})[ |
| # test the content of the reference file | ||
| content = (app.outdir / 'references.json').read_text('utf-8') | ||
| data = json.loads(content) | ||
| assert data == { |
There was a problem hiding this comment.
Ok, this one might need a factory for the sake of readability.
| # test the content of the reference file | ||
| content = (app.outdir / 'references.json').read_text('utf-8') | ||
| data = json.loads(content) | ||
| assert data == { |
If you are worried about that, use |
Thanks for the review @picnixz, but perhaps I could nudge you for some quick general feedback on the concept 😅 |
|
Read through the new But, here are some other thoughts. Reaction to the 'generate a complete local & remote references list' idea --- +0.25.It might be helpful having all targets, local or intersphinx, in one artifact? But after thinking about it, I don't think it's very important to me, personally---and, it seems to me the bigger problem is the object-type lossiness of the current v2 I think I would rather have better/more accurate information about the targets in my intersphinx-referenced docsets---which would require a new inventory format, as best I figure---than a list of all local and remote references, where the info I get on the remote references in that all-in-one artifact requires as much work to transform into a working cross-reference as the info I can get out of If I'm trying to reference something in another project, I know which project it is, and I don't mind pointing a single-docset tool at that project's docs. (And, there's a good chance I might prefer that single-docset tool if I don't have to mess with an intermediate data file as part of the process.) The 'all in one place' aspect of this may have a broad appeal, but it's less important to me, personally. Reaction to the layout of
|
|
I'll comment tomorrow (for this one, I need a bit of sleep) |
|
A core problem is the use of Currently |
|
Since we are talking about a new Intersphinx format, I would like you to also think about how to serialize the entries in the inventories, especially concerning #11932. After reading Jakob's argument, I also think that domains should be responsible for serializing their intersphinx part however they see fit. It could also solve multiple issues that I could not necessarily find when implementing #11932 but if each domain knows how to properly represent their references in intersphinx, it would be better. In addition, we could change the format of a specific domain (e.g., if there are bugs) without affecting the format of other domains. I suggest using the same approach as for ELF where there is a header section containing the location of each program section. Then each domain would serialize its own intersphinx inventory and intersphinx would only be responsible for merging the parts together. Then, each domain would deserialize its dedicated section and recover its references mapping. The references builder you are suggesting would be responsible to normalize each domain output in a more human-readable format. In the JSON output, you would include "human-readable" entries + an offset and buffer size to the serialized data in the objects.inv binary file. That way, you can use it to recover a single referencable entity, and using in a standalone manner as well. |
|
This PR add a new builder
references, to build a singlereferences.json, which provides a mapping for almost* all targets available to reference in the project, including:objects.invconfigured viaintersphinx_mapping(when using thesphinx.ext.intersphinxextension)* I say almost, because this assumes the objects returned from
domain.get_objectsaccount for the majority of referencable items in a project, but there are currently some notable exceptions, like themathdomain not returning any (but that is for another PR to fix)This partialy addresses #12152, to allow for a clear way for users to understand:
Crucially, the
references.jsonincludes the mapping ofobject typetorole names(this can be one-to-many),since a role name is required for the reference syntax, not the object type.
I would also invisage other tools (like VS Code extension) could utilise this, to provide things like auto-completions, and "jump to target/references"
Some considerations:
I feel a builder is really the only way to do this comprehensively; having a standalone CLI (like the current
python -m sphinx.ext.intersphinx) can only get you so far, and then you will have to start re-implementing features of a normal sphinx build (like reading configuration, etc)Perhaps in a follow up PR I could introduce a complimentary CLI, that reads the
references.jsonand allows users to quickly generate references. Something likesphinx-ref find 're.Match'returning:class:`~re.Match`(i.e https://github.com/orgs/sphinx-doc/discussions/12152#discussioncomment-8862652)There are cases where an
object typehas no matchingrole names, this PR is not addressing that (although I want to eventually)As I mention in Make intersphinx (a.k.a. external references) more user friendly #12152, it would be ideal for this to include, not just the document path where a local target is defined, but also the line number (if available). But this is not within the scope of this PR
Creating a singular
references.jsonis probably the simplest way to do this. But, it could get rather large, for a large project, or one with lots of intersphinx mappings.Is this ok, or do we think another format would be better, like one JSON file per domain / object type, or even something like an sqlite database file?
The other non included in this PR, is any additions to the documention, I could do this here or in a follow-up PR