Skip to content

Better file gathering and validation for the info plugin#6826

Merged
squidfunk merged 6 commits intosquidfunk:masterfrom
kamilkrzyskow:fix-info-plugin
Mar 5, 2024
Merged

Better file gathering and validation for the info plugin#6826
squidfunk merged 6 commits intosquidfunk:masterfrom
kamilkrzyskow:fix-info-plugin

Conversation

@kamilkrzyskow
Copy link
Copy Markdown
Collaborator

@kamilkrzyskow kamilkrzyskow commented Feb 24, 2024

Fixes #6750

Took a bit longer than I expected to make a minimal viable improvement, or perhaps I was running circles in my head for too long instead of actually coding via trial and error 😅

I also played a bit too much with the validation for different things relative to the root directory.
I'll simplify it, but I don't want to remove it completely.

As for the info.gitignore I hope the capabilities aren't too limiting compared to real .gitignore logic.
It's likely I'm missing something when it comes to fnmatch.

Once I'm closer to finishing, I'll also squash some of the commits to keep only relevant final changes.

It's worth commenting that Snippets or any markdown_extension use the default working directory,
so the base_path is always relative to mkdocs build command, not relative to the config file. By default it also
doesn't crash when a include snippet path isn't resolved, just quietly omits it.

Task list:

  • Handle INHERIT configs
  • Read every file, not only the ones that MkDocs sees
  • Exclude paths we don't want
  • Basic validation for paths to assure mkdocs build at the top root directory.
    Currently it's not basic enough, rather convoluted.
  • Improve fnmatch handling
  • Test with the community multi-lang project Setting up multi-language installations #2346
  • Test with a projects plugin docs, like the examples repository
  • Fix styling (rough 80 lines), naming consistency etc.
  • Squash commits into categorized ones
  • Switch to regex from fnmatch

Copy link
Copy Markdown
Owner

@squidfunk squidfunk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I see that you went down the 'more specific' road. As I argued in #6750 (comment), I don't think this is what we should do, because we'll be adding exceptions, exceptions and more exceptions. For instance, mkdocstrings usage is not covered in this solution, as far as I can tell. Additionally, the site package exclusion is based on git, which we should do programmatically, because we'll be adding more and more entries to the .gitignore as we discover other preferences for naming folders for virtual environments.

Moreover, configuration inheritance will likely not catch all cases, as we discussed in #6750. Collections of paths is based on a curated list of names of parameters, which is too labor intense, given that packing up of bug reports should 'just work'. Thus, as outlined in my last comment, I regard it as a better approach to go down the following road:

  • Collect all files recursively from current directory
  • Filter files that are in our .gitignore or in site-packages (this is essential)
  • Iterate and refine our .gitignore if we find other files that we always want to exclude

This should also allow for the projects plugin and other multi-configuration plugins to be accurately collected + is a solution that does not require maintenance from our side as the ecosystem grows.

@kamilkrzyskow
Copy link
Copy Markdown
Collaborator Author

kamilkrzyskow commented Feb 25, 2024

I see that you went down the 'more specific' road.

Yes, I did, but I want to limit the specificity to MkDocs itself, as I mentioned:

I also played a bit too much with the validation for different things relative to the root directory.
I'll simplify it, but I don't want to remove it completely.

which wasn't clear enough, sorry. I want to keep validation for:

  • docs_dir,
  • INHERIT
  • docs_dir in the inherited config(s)
  • INHERIT in the inherited configs(s)

I believe this is a good bare minimum to detect running mkdocs build from the wrong directory, it might be enough to check docs_dir, but INHERIT can influence docs_dir therefore I opted into all 4. EDIT: All 2, as it's enough to check the loaded MkDocsConfig because the loaded config.docs_dir is already the final merged docs_dir after the inheritance, no need to be so thorough and check recursively.

In the code currently there is also the validation for markdown_extensions, which I checked mostly use 3 generic names for path options, but it will be removed soon, as by default, running mkdocs build --config-file=configs/mkdocs.yml will break Snippets if they were setup to work with mkdocs build --config-file=mkdocs.yml, so if Snippets don't support it, then we also don't have to.

I agree that mkdocsstrings shouldn't be validated via the info plugin as it's not a part of the python dependencies of mkdocs-material, Python Markdown Extensions are, so initially I wanted to support Snippets as well but it turned out unnecessary.

If you still insist that this simple validation is out of scope I'll remove it completely ✌️


Additionally, the site package exclusion is based on git

Not completely accurate. In the info.gitignore I added the mostly used generic venv names, as per the Python.gitignore, however it's not the only place where it's checked:

# Exclude the site-packages directory
for path in site.getsitepackages():
if currentpath.startswith(path):
log.debug(f"Excluded Case #1: {path}")
return True

Note to self: I should probably add a comment about that in the info.gitignore 🤔

I think the user might have different virtual environments, perhaps one with mkdocs-material and another with mkdocs-material-insiders, and they would often switch between the 2, so the site.getsitepackages() would not detect an inactive venv, but it would be detected by info.gitignore. I can see that this might be "overprotective" for the user, but I don't see downsides with this approach, unless the user decides to add a docs/venv directory with actual contents for reproduction...

... which we should do programmatically,

Yes, I understand that there might be more custom generated paths in the project directory. I already add a site_dir exclusion pattern.

self.exclusion_patterns.append(os.path.basename(config.site_dir) + "/")

Note to self: Now that I think about it, I forgot about the .cache directory, also since we don't support hooks *.py would also be reasonable 🤔

I'm a bit busy today, but I will still try to clean up the PR before "tomorrow" comes 😅

EDIT: After some more pondering I think that docs/.../site/ or docs/.../env/ isn't such an unreasonable setup, so the current site/ or env/ exclusion would perhaps impact the zip creation negatively 😞 I will have to make sure that we can check for /site in the exclusion patterns for root/site, which is not possible currently.

@squidfunk
Copy link
Copy Markdown
Owner

Checking for docs_dir is a good idea. However, I think supporting INHERIT would require potentially going up a very long chain of inheritance, or we'll loose information. Letting mkdocs resolve INHERIT and only packing up the result will make it impossible for us to troubleshoot whether the problem actually stems from configuration inheritance. It's safer to just ignore it and just include all files that look like mkdocs.yml configurations. We could also include the command that was used for running mkdocs build when packing up the example, in order to disambiguate in case of multiple configurations. We're trying to make it less error prone than it currently is, and from my experience writing and maintaining the plugin, it means that we should do less validation and pack up more selectively.

So, to sum up: the aim should be: pack up all files that are used to run mkdocs build without losing any information. If we pack up resolved stuff, we'll loose information, which might involve more time for debugging and troubleshooting.

I agree that mkdocsstrings shouldn't be validated via the info plugin as it's not a part of the python dependencies of mkdocs-material, Python Markdown Extensions are, so initially I wanted to support Snippets as well but it turned out unnecessary.

Yet, packing up an example with mkdocstrings should work. We should not limit the info plugin to the plugins we officially support, because users might report bugs with plugins that we might want to support by learning about them through bug reports. Thus, less validation, and a more forgiving packing up procedure is key, IMHO.

I think the user might have different virtual environments, perhaps one with mkdocs-material and another with mkdocs-material-insiders, and they would often switch between the 2, so the site.getsitepackages() would not detect an inactive venv, but it would be detected by info.gitignore. I can see that this might be "overprotective" for the user, but I don't see downsides with this approach, unless the user decides to add a docs/venv directory with actual contents for reproduction...

Yet, only one environment is active. I doubt that users use two virtual environments in one project. It sounds like an anti-pattern to me, and IMHO, we should optimize for the common case where only a single environment is present, i.e., the active environment, which we should resolve as discussed.

Yes, I understand that there might be more custom generated paths in the project directory. I already add a site_dir exclusion pattern.

Sounds good! We likely need to repeat this for all projects, i.e, all mkdocs.yml files we find. For completeness: we cannot exclude HTML files, because they can be customizations (which you can also force the plugin to pack with another flag), or static templates in docs_dir.

I'm a bit busy today, but I will still try to clean up the PR before "tomorrow" comes 😅

No hurry! I hope my reasoning makes sense. If not, don't hesitate to ask. I hope I could make it clearer why we need to make the plugin more resilient and less strict, yet keep it simple and powerful. Too much magic is like not a good idea for something that should work with minimal effort from a usage and maintenance perspective.

EDIT: After some more pondering I think that docs/.../site/ or docs/.../env/ isn't such an unreasonable setup, so the current site/ or env/ exclusion would perhaps impact the zip creation negatively 😞 I will have to make sure that we can check for /site in the exclusion patterns for root/site, which is not possible currently.

The venv case should be caught by excluding site-packages, but the site directory inside the docs directory sounds like an anti-pattern to me. Is that even supported by MkDocs? Additionally, resolving the entire paths before removing files from the packing procedure should resolve both cases, shouldn't it? I mean: pack up docs, but don't pack up docs/.../site.

@kamilkrzyskow
Copy link
Copy Markdown
Collaborator Author

kamilkrzyskow commented Feb 26, 2024

I feel like there is some misunderstanding, because

Letting mkdocs resolve INHERIT and only packing up the result will make it impossible for us to troubleshoot whether the problem actually stems from configuration inheritance. It's safer to just ignore it and just include all files that look like mkdocs.yml configurations.

indicates to me that you believe I'm using resolve INHERIT to get the parent.yml file and then include it in the zip file. If I'm wrong then sorry. I resolve INHERIT because MkDocs doesn't include it in MkDocsConfig, and I want to validate that it is inside the mkdocs build directory. If it's not I report it to the user, similar to the version or customization check.

Apart of that I don't use the INHERIT config, and use everything that is provided by the already validated and resolved MkDocsConfig and everything inside the current working directory of mkdocs build. So this should meet the criteria for:

So, to sum up: the aim should be: pack up all files that are used to run mkdocs build without losing any information

I just want to avoid missing any files, so if there is any invalid path outside of CWD the user gets notified and the program ends.

I ended up with checking the following paths, only to inform the user that the info plugin won't find the files:

# Validate different MkDocs paths to assure that
# they're children of the current working directory.
paths_to_validate = [
config.config_file_path,
config.docs_dir,
projects_dir,
*[cfg.get("INHERIT", "") for cfg in loaded_config]
]

# Print help on not in current working directory and exit
def _help_on_not_in_cwd(self, bad_paths):
print(Fore.RED)
print(" The current working (root) directory:\n")
print(f" {os.getcwd()}\n")
print(" is not a parent of the following paths:")
print(Style.NORMAL)
for path in bad_paths:
print(f" {path}\n")
print(" To assure that all project files are found")
print(" please adjust your config or file structure and")
print(" put everything within the root directory of the project.\n")
print(" Please also make sure `mkdocs build` is run in")
print(" the actual root directory of the project.")
print(Style.RESET_ALL)
# Exit, unless explicitly told not to
if self.config.archive_stop_on_violation:
sys.exit(1)


We could also include the command that was used for running mkdocs build when packing up the example, in order to disambiguate in case of multiple configurations.

Good idea. I will also consider adding some kind of output/log copy in the zip. But perhaps that's a feature for a future PR.

we should optimize for the common case where only a single environment is present, i.e., the active environment, which we should resolve as discussed.

OK, I removed the static names, I also modified the exclusion logic, so that exclusion patterns for venvs are generated.

We likely need to repeat this for all projects, i.e, all mkdocs.yml files we find. For completeness: we cannot exclude HTML files, because they can be customizations (which you can also force the plugin to pack with another flag), or static templates in docs_dir.

I considered excluding site_dir based on the inclusion of sitemap.xml because that's not a file that one can easily override.
However, I stuck to the exclusion pattern generation, so I needed to resolve the mkdocs.yml files for each project. This way I resovle each custom name a user could have set.

Note to self: I forgot to add custom_dir handling for the customizations😰

No hurry!

Thanks, but not only did I completely miss both deadlines I set in the PR, I suggested I would fix the plugin some months ago 🥶 So I'm trying to focus on this PR for now.

The venv case should be caught by excluding site-packages, but the site directory inside the docs directory sounds like an anti-pattern to me. Is that even supported by MkDocs? Additionally, resolving the entire paths before removing files from the packing procedure should resolve both cases, shouldn't it? I mean: pack up docs, but don't pack up docs/.../site.

I meant paths that are called docs/.../site and docs/.../venv that have actual documentation index.md contents. Before the paths would be falsely excluded, now they won't.


Here is a slightly modified examples repository with the projects plugin:
7-projects-examples.zip
if you like you can run it with the PR branch and mkdocs build -v to see the debug info about the excluded paths:

The <-- is added here in the comment

DEBUG   -  Excluded pattern '*.cache/': /.cache/ <-- static inside of our .gitignore
DEBUG   -  Excluded pattern '/site/': /site/ <-- dynamic added in code 
DEBUG   -  Excluded pattern '/venv/': /venv/ <-- dynamic added in code 
DEBUG   -  Excluded pattern '*.zip': /9.5.11+insiders.4.53.0-q.zip <-- static inside of our .gitignore
DEBUG   -  Excluded pattern '/examples/fonts_site_dir/': /examples/fonts_site_dir/ <-- dynamic added in code 
DEBUG   -  Excluded pattern '*__pycache__/': /examples/__pycache__/ <-- static inside of our .gitignore
DEBUG   -  Excluded pattern '/examples/blog-basic/site/': /examples/blog-basic/site/ <-- dynamic added in code 
DEBUG   -  Excluded pattern '*__pycache__/': /hooks/__pycache__/ <-- static inside of our .gitignore
DEBUG   -  Excluded pattern '*[!__init__].py': /hooks/archive.py <-- static inside of our .gitignore

The gathered project zip (inside of the zip above) isn't runnable because the archive.py hook is excluded and the config file still includes it.

I think it works fine, perhaps the exclusion is too lax.
I still need to test the community multi-language setup
And add the command info, fix styling.

@squidfunk
Copy link
Copy Markdown
Owner

squidfunk commented Feb 27, 2024

indicates to me that you believe I'm using resolve INHERIT to get the parent.yml file and then include it in the zip file. If I'm wrong then sorry. I resolve INHERIT because MkDocs doesn't include it in MkDocsConfig, and I want to validate that it is inside the mkdocs build directory. If it's not I report it to the user, similar to the version or customization check.

Apart of that I don't use the INHERIT config, and use everything that is provided by the already validated and resolved MkDocsConfig and everything inside the current working directory of mkdocs build. So this should meet the criteria for:

I think I misread the implementation then, sorry! In that case, yes, we can try including the INHERIT validation and test it in the wild ☺️

Good idea. I will also consider adding some kind of output/log copy in the zip. But perhaps that's a feature for a future PR.

Also a great idea, albeit it might be a catch-22?

Thanks, but not only did I completely miss both deadlines I set in the PR, I suggested I would fix the plugin some months ago 🥶 So I'm trying to focus on this PR for now.

Again, no hurry. Albeit we're trying to fix bugs in the plugin, they happen only very rarely with most bug reports being complete and runnable (if users stick to our bug reporting guidelines and include the reproduction in the first place).

I meant paths that are called docs/.../site and docs/.../venv that have actual documentation index.md contents. Before the paths would be falsely excluded, now they won't.

Ah yes, because of the ignore patterns. Yes! I read that this is another reason to be very careful when using exclusion patterns, so doing exclusion as programmatically as possible sounds like a safer approach.

The gathered project zip (inside of the zip above) isn't runnable because the archive.py hook is excluded and the config file still includes it.

Hmm, so the hook should be included I think, albeit the plugin should warn you about using hooks. We also need to make sure that mkdocstrings works as expected. Here's an example that was manually packed because the info plugin did not include all files. We need to make sure that the packed archive has exactly the same contents as this:
9.5.9+insiders.4.52.2-link-style.zip

I think it works fine, perhaps the exclusion is too lax.
I still need to test the community multi-language setup
And ad the command info, fix styling.

We'll learn along the way ☺️

@kamilkrzyskow
Copy link
Copy Markdown
Collaborator Author

I think I misread the implementation then, sorry! In that case, yes, we can try including the INHERIT validation and test it in the wild ☺️

It's good we're finally on the same page. The confusion probably came from the first commit I left for the time being and it will be squashed later..., it included the INHERIT path inclusion statically, but in a further commit I changed to the recursive path lookup. It's definitely not the cleanest PR (as usual), but I wanted to show that I started working on it, instead of make "assurances" in the Issue comments. Your comments definitely keep me within bounds 😄

Hmm, so the hook should be included I think, albeit the plugin should warn you about using hooks.

I forgot to mention that it did warn me about using hooks, so I enabled the config option that allows to create the archive despite bad validation. But when the user sees the warning they can just remove the hooks section from the config file, so I can't use config.hooks to dynamically ignore them from inclusion.

We also need to make sure that mkdocstrings works as expected. Here's an example that was manually packed because the info plugin did not include all files. We need to make sure that the packed archive has exactly the same contents as this:

From the example zip I see it uses a __init__.py script just like the file in the examples repository. The current exclusion pattern would include it, but I'll need to think about it some more, because there is surely a case where a random_name.py is used inside of the project that isn't a hook.

9.5.11+insiders.4.53.0-strings.zip

The contents are the same, but it contains 2 requirements.lock.txt and 2 platform.json 🤯 I didn't know a zip file can do that.

LOG
INFO    -  Started archive creation for bug report

Please name your bug report (2-4 words): strings
DEBUG   -  Excluded pattern '*.zip': /9.5.11+insiders.4.53.0-strings.zip
INFO    -  UserWarning: Duplicate name: '9.5.11+insiders.4.53.0-strings/requirements.lock.txt'
  File "C:\MyFiles\Python312\Lib\zipfile\__init__.py", line 1684, in _open_to_write
    self._writecheck(zinfo)
  File "C:\MyFiles\Python312\Lib\zipfile\__init__.py", line 1786, in _writecheck
    warnings.warn('Duplicate name: %r' % zinfo.filename, stacklevel=3)

INFO    -  UserWarning: Duplicate name: '9.5.11+insiders.4.53.0-strings/platform.json'
  File "C:\MyFiles\Python312\Lib\zipfile\__init__.py", line 1684, in _open_to_write
    self._writecheck(zinfo)
  File "C:\MyFiles\Python312\Lib\zipfile\__init__.py", line 1786, in _writecheck
    warnings.warn('Duplicate name: %r' % zinfo.filename, stacklevel=3)

INFO    -  Archive successfully created:

  9.5.11+insiders.4.53.0-strings/docs/index.md 404.0 B
  9.5.11+insiders.4.53.0-strings/mkdocs.yml 342.0 B
  9.5.11+insiders.4.53.0-strings/platform.json 282.0 B
  9.5.11+insiders.4.53.0-strings/platform.json 79.0 B
  9.5.11+insiders.4.53.0-strings/requirements.lock.txt 350.0 B
  9.5.11+insiders.4.53.0-strings/requirements.lock.txt 395.0 B
  9.5.11+insiders.4.53.0-strings/src/package/__init__.py 288.0 B

  9.5.11+insiders.4.53.0-strings.zip 3.4 kB

Perhaps the best approach would be to allow the user to set config.hooks, and later use the paths as exclusion patterns. This would not include hooks, but would include all other python scripts.
The project wouldn't be runnable because the config file expects the existence of the hooks, but as we have to remove the info plugin from the config inside the zip too, then perhaps this wouldn't be this big of an issue.
Some warning should probably stay to make the user aware that we won't troubleshoot their custom hooks in a bug report. 🤔

Another approach is a bit risky exclusion pattern again */hooks/*.py where I expect the user to save the hooks indie of a directory called hooks, but I've seen cases where hooks are next to mkdocs.yml


Actually all of the above applies to theme.custom_dir as well. Because by default it prevents the plugin from running, but the user might just comment out custom_dir from their config, and leave the files next to the config... So the files will be included.

I guess the previous warnings were created because there was no exclusion logic to prevent files from being added to the zip. Since there is such a thing now, then perhaps the warnings shouldn't be as strict.

@squidfunk
Copy link
Copy Markdown
Owner

I forgot to mention that it did warn me about using hooks, so I enabled the config option that allows to create the archive despite bad validation. But when the user sees the warning they can just remove the hooks section from the config file, so I can't use config.hooks to dynamically ignore them from inclusion.

Perhaps the best approach would be to allow the user to set config.hooks, and later use the paths as exclusion patterns. This would not include hooks, but would include all other python scripts.

The project wouldn't be runnable because the config file expects the existence of the hooks, but as we have to remove the info plugin from the config inside the zip too, then perhaps this wouldn't be this big of an issue.

Another approach is a bit risky exclusion pattern again /hooks/.py where I expect the user to save the hooks indie of a directory called hooks, but I've seen cases where hooks are next to mkdocs.yml

Actually all of the above applies to theme.custom_dir as well. Because by default it prevents the plugin from running, but the user might just comment out custom_dir from their config, and leave the files next to the config... So the files will be included.

The idea of the "ignore violation" flag is: warn the user about customizations and hooks and terminate unless the flag is set, so if the plugin progresses beyond the termination because of the flag, customizations and hooks should be packaged up as well. This might be necessary in very rare cases, e.g. for the customizations we advertise on our docs. Thus, customizations and hooks should never be ignored. The inline comments or our previous conversation might not have made this clear.

Additionally, there's no need to remove info from plugins. We shouldn't make mkdocs.yml transformations, as they might introduce new bugs which makes the plugin less resilient. This plugin needs to be maximally resilient.

From the example zip I see it uses a init.py script just like the file in the examples repository. The current exclusion pattern would include it, but I'll need to think about it some more, because there is surely a case where a random_name.py is used inside of the project that isn't a hook.

We should include it, e.g., as we advertise something similar for the projects plugin transform (to make usage simpler).

The contents are the same, but it contains 2 requirements.lock.txt and 2 platform.json 🤯 I didn't know a zip file can do that.

That is very, very weird. I'm not sure why this is the case!?

@squidfunk
Copy link
Copy Markdown
Owner

As an addendum: I wouldn't handle the case where customizations are packaged up that are not used. This would mandate resolving things from mkdocs.yml. The current iteration of the plugin is too restrictive, and we should now try opening up those restrictions to see "how bad it gets" when we pack up more stuff. The plugin will also warn about the archive being too large, but we should test whether quality and resilience of reproductions improves with making it less strict.

@kamilkrzyskow
Copy link
Copy Markdown
Collaborator Author

Additionally, there's no need to remove info from plugins. We shouldn't make mkdocs.yml transformations, as they might introduce new bugs which makes the plugin less resilient. This plugin needs to be maximally resilient.

I didn't mean in the plugin. I just stated the fact that after unpacking a packed zip, before running mkdocs build locally, we need to remove the info plugin manually, as otherwise it will create yet another zip. So removing hooks or anything else shouldn't be a problem either.

That is very, very weird. I'm not sure why this is the case!?

I don't know why the Python ZipFile allows for that, or why doesn't 7-zip crash when I open the zip later.
However, it happened because I kept the unpacked requirements.lock.txt and platform.json files from your file inside the directory. As this is a very edgy edge case, I don't think there is need to prevent it.

Thus, customizations and hooks should never be ignored.

I wouldn't handle the case where customizations are packaged up that are not used.

Happy days, I don't have to do 4D logic in my head to handle this branching logic.

I finally tested #2346 - (9-community-multi-lang.zip)

And it turned out that we have to exclude site_dir based on the existence of sitemap.xml and sitemap.xml.gz, because the separate mkdocs.yml only saw their own site_dir and not of the other languages.

Note to self: Perhaps checking sitemap.xml.gz is enough 🤔


So I think I'm finished, I only need to fixup the commits and squash them into 2 or 3 properly categorized names.
I also noticed that the plugin class or the on_config function specifically grew quite large, so perhaps I should extract the validation and exclusion logic out of the main class into other files 🤔
Any thoughts?

@squidfunk
Copy link
Copy Markdown
Owner

I didn't mean in the plugin. I just stated the fact that after unpacking a packed zip, before running mkdocs build locally, we need to remove the info plugin manually, as otherwise it will create yet another zip. So removing hooks or anything else shouldn't be a problem either.

Only if you run mkdocs build instead of mkdocs serve or changed the settings. In my daily work, I find it to be quite convenient that it is usually disabled when serving, which is probably 80% of cases. Thus I understand that you're talking about manually removing hooks when we inspect the reproduction. Yes, that's okay, but we should still warn the user and the archive should only proceed packing if forced, like it currently is implemented.

I don't know why the Python ZipFile allows for that, or why doesn't 7-zip crash when I open the zip later.
However, it happened because I kept the unpacked requirements.lock.txt and platform.json files from your file inside the directory. As this is a very edgy edge case, I don't think there is need to prevent it.

Agreed.

And it turned out that we have to exclude site_dir based on the existence of sitemap.xml and sitemap.xml.gz, because the separate mkdocs.yml only saw their own site_dir and not of the other languages.

I'm not sure I entirely understand. So, you essentially want to use sitemap.xml as a criterion to determine if something is a site directory or not in order to not open the can of worms that is trying to resolve the site directory? #2346 is quite special because it keeps configuration files somewhere else entirely – different from what we now recommend with the projects plugin (more detailed tutorials need to be written, yes) where users should just keep them in the projects. Thus, I'm not sure #2346 is a case we should be optimizing for. Would that change the solution? Sorry if I'm missing something, I'm on and off this topic as I'm currently working on entirely different layers of this project 😅

So I think I'm finished, I only need to fixup the commits and squash them into 2 or 3 properly categorized names.
I also noticed that the plugin class or the on_config function specifically grew quite large, so perhaps I should extract the validation and exclusion logic out of the main class into other files 🤔
Any thoughts?

I've started modernizing and professionalizing our Python code base with the latest iteration of the tags plugin in Insiders. Eventually, I want to migrate all plugins to those patterns, since they are now properly documented with Docstrings and cut much, much smaller. If you find a good abstraction of things in the same style, you can definitely go ahead. We can also discuss the conventions I used, as Python is still like Spanish for me – still learning a lot.

@squidfunk
Copy link
Copy Markdown
Owner

To add on the last part: if this is too much of an ask, please don't bother. I see it as my obligation to do that, but since you asked for refactoring opportunities that could be taken, I wanted to point out my latest advancements in this area. I will definitely refactor the info plugin at some point, but if you want to go ahead and help with that, you are very much invited.

Again, and I can't stress this enough: thanks for being such an awesome part of our inner community! Your time investments into Material for MkDocs are very valued and much appreciated. Great work!

@kamilkrzyskow
Copy link
Copy Markdown
Collaborator Author

Only if you run mkdocs build instead of mkdocs serve [...]

🙄I forgot that serve won't run info, so yeah my idea was short-sighted.

I'm not sure I entirely understand. So, you essentially want to use sitemap.xml as a criterion to determine if something is a site directory or not in order to not open the can of worms that is trying to resolve the site directory? #2346 is quite special because it keeps configuration files somewhere else entirely – different from what we now recommend with the projects plugin (more detailed tutorials need to be written, yes) where users should just keep them in the projects. [...]

Recommending something new and better is fine, not supporting something old and "widely" used isn't.
In-part I agree that making a special behaviour in the plugin specifically for this setup might be a bit much. However, the only other reasonable solution is to provide the user with the exclude option in the plugin to let them control what is excluded.
So I think I will just stick to the current setup, exclude resolved site_url paths with dynamic patterns, and additionally check for the sitemap files for any folder which wasn't detected.

If you find a good abstraction of things in the same style, you can definitely go ahead

The most I can do, is to split the on_config into different modules for validation, zip creation, and exclusion logic, so if that will be deleted in the future for a more standardized plugin structure, then I won't bother, as it will be just git blame spam 👌.

I'm also more of a "fan" of flat package structures, and your tags plugin implementation is deeply nested, so I would probably need some diagram to see the whole picture 😅I'm just a simple coder, so deep abstraction structures scare me, reminds me of Java-like business grade structures which are the devil 😆
Jokes aside my skill level isn't up to par for me to split up the info plugin into reasonable micro components.

I've started modernizing and professionalizing our Python code base with the latest iteration of the tags plugin in Insiders.

This is very much welcome, once the Pylint, the tests, and some auto formatter is introduced (like black) it will be way easier to work in the codebase (as currently I have like 10+ warnings in my IDE because the code is styled like TypeScript 😅). Tough the contribution workflow might be more demanding of the users, like installing all of the dependencies then running them before the commit to clean up styles, or people won't like the idea of having to write tests for the PR to be accepted haha

Your time investments into Material for MkDocs are very valued and much appreciated.

I appreciate your time investment as well, this back and forth in the PR really helped me stay on track and not spiral into endless edge case dimension ✌️

@kamilkrzyskow kamilkrzyskow marked this pull request as ready for review February 29, 2024 21:30
@kamilkrzyskow
Copy link
Copy Markdown
Collaborator Author

I've managed to squash 17 commits into 6.
Here are the final generated result zip archives.

9.5.12+insiders.4.53.0-final-examples-projects.zip
9.5.12+insiders.4.53.0-final-mkdocstrings.zip
9.5.12+insiders.4.53.0-final-multi-lang-community.zip

@kamilkrzyskow
Copy link
Copy Markdown
Collaborator Author

Sorry about the last force push I decided to only check for sitemap.xml.gz as the check is done for almost every directory, so it's better to have 2x less checks.

9.5.11-final-multi-lang-community-single-sitemap-check.zip

@kamilkrzyskow
Copy link
Copy Markdown
Collaborator Author

The issue template for bugs asks for a minimal reproduction and links to the page where it says to use the info plugin.
But people create the zip files on their own.
Perhaps the task list entry should also mention the the info plugin?

I have attached a __.zip file__ with a [minimal reproduction](https://squidfunk.github.io/mkdocs-material/guides/creating-a-reproduction/).

@squidfunk
Copy link
Copy Markdown
Owner

Recommending something new and better is fine, not supporting something old and "widely" used isn't.

Can you somehow back up that this is widely used? From what I've seen it isn't. Also, "widely" might be hard to define here. We can keep this approach for now, but if we don't receive any reproductions using it in the coming months, we'll consider removing it again. We have it in the history, so we can go back at any time, but in general, less code to maintain is always something to aim for.

The issue template for bugs asks for a minimal reproduction and links to the page where it says to use the info plugin. But people create the zip files on their own. Perhaps the task list entry should also mention the the info plugin?

I have attached a __.zip file__ with a [minimal reproduction](https://squidfunk.github.io/mkdocs-material/guides/creating-a-reproduction/).

CC @katharinalisalin

@katharinalisalin
Copy link
Copy Markdown
Collaborator

Great idea, added in ecac4e7.

@kamilkrzyskow
Copy link
Copy Markdown
Collaborator Author

kamilkrzyskow commented Mar 2, 2024

Can you somehow back up that this is widely used?

I said "widely" based on the fact it's the most popular discussion in the Show and tell category and discussion board in general:
https://github.com/squidfunk/mkdocs-material/discussions?discussions_q=sort%3Atop

Privacy Guides uses it, though it's my low hanging fruit to support my claim😆
https://github.com/privacyguides/privacyguides.org

I also checked FastAPI's docs, and they use a similar structure (closer to the projects plugin's structure), but they do it more in-house with custom python script that builds their page in different docs. There each config is still only aware of their own site_dir, so the custom sitemap handling is useful here too. (I also noticed some other thing, read note at the bottom)

I also checked that sitemap.xml and sitemap.xml.gz are generated for the most minimal MkDocs build:

site_name: My Docs
plugins: []

Perhaps I should remove the dynamic exclusion pattern generation for site_dir, as checking for sitemap.xml.gz is just as robust, but the projects plugin might one day do more sitemap trickery, so best to keep both methods.
I didn't check performance at all, since minimal reproduction shouldn't really have that much directories to begin with, so optimizing the speed with limiting the amount of patterns didn't really occur to me as necessary.

Note:
When looking at FastAPI I noticed that they do have hooks in ../ paths, so perhaps I should move the file path validation after the customization validation, and if the stop_on_validation_error setting is False then add the hooks and custom_dir paths to the paths_to_validate list 🤔

EDIT: Yes, I think this is the best way 😮‍💨 I'll draft the PR, sorry again for the lack of foresight


Cool that #6859 got merged, but the fixes for info plugin aren't merged yet, so people could create more reproductions with missing files 😅

@kamilkrzyskow kamilkrzyskow marked this pull request as draft March 2, 2024 14:05
@squidfunk
Copy link
Copy Markdown
Owner

Cool that #6859 got merged, but the fixes for info plugin aren't merged yet, so people could create more reproductions with missing files 😅

I'm afraid I'm going to need a little more time to review this. The changes are quite fundamental and #6859 only included documentation changes. Please give me some more time.

@kamilkrzyskow
Copy link
Copy Markdown
Collaborator Author

kamilkrzyskow commented Mar 2, 2024

Please give me some more time.

I didn't want to come off as rushing you, I just made the observation that the issue template got modified before the plugin was updated, and that the order of things was not optimal.

Anyways, more time in the PR pipeline allows me to spiral in the edge case dimension a bit longer, so I hope my force pushes don't break your review flow 😅

@kamilkrzyskow kamilkrzyskow marked this pull request as ready for review March 2, 2024 15:04
@kamilkrzyskow
Copy link
Copy Markdown
Collaborator Author

kamilkrzyskow commented Mar 2, 2024

After some more thought I believe that any .dotfile in the root directory should be excluded.
The reproduction doesn't need any .gitignore .vscode files, so the exclusion part needs to be extended 🤔
And here the limitations of fnmatch get back at me again, as the wildcard * will catch everything, so I would have to set the names for files in the root manually...

So I'm inclined to replace the fnmatch patterns with regex patterns which would allow to use ^ or $ to detect the ends of the path 🤔

@kamilkrzyskow kamilkrzyskow marked this pull request as draft March 2, 2024 23:47
@squidfunk
Copy link
Copy Markdown
Owner

I didn't want to come off as rushing you, I just made the observation that the issue template got modified before the plugin was updated, and that the order of things was not optimal.

Anyways, more time in the PR pipeline allows me to spiral in the edge case dimension a bit longer, so I hope my force pushes don't break your review flow 😅

I wasn't sure whether this PR was ready, because you did not re-request a review + it appeared to me that you were still working on it. As we made sure we're aligned with the general direction, and I trust you testing the examples we talked about, my work will be mainly to check it out some of the examples to learn about the new UX and glaze over the code base, maybe do some formatting. Just request a review when you're done and I look into it ☺️

I don't think that the merge of #6859 is out of order, since using the info plugin is mandatory as mentioned in the description of the Reproduction section of our issue template, so the PR just tries to catch users not having read that.

So I'm inclined to replace the fnmatch patterns with regex patterns which would allow to use ^ or $ to detect the ends of the path 🤔

I'd say we keep it simple first and not try to overthink this. We already changed so much on a very critical component. When we release this with the next version of Material for MkDocs, all people that update will use the new implementation when reporting bugs, so we should try and keep it as minimal as possible while fixing the known issues and aim to reach stability quickly. If we run into problems, we fix them (as we're doing here now).

@squidfunk
Copy link
Copy Markdown
Owner

Additionally, please move the PR out of draft state, once it is stable from your perspective ☺️

@kamilkrzyskow
Copy link
Copy Markdown
Collaborator Author

kamilkrzyskow commented Mar 3, 2024

I wasn't sure whether this PR was ready, because you did not re-request a review + it appeared to me that you were still working on it.

Your assessment turned out to be true, I didn't press the button to re-request the review, but I provided the "final" examples here: #6826 (comment)
So my idea was, that I finished then and there, but as per usual a force push crept in and another followed.

However, this time I hope it's truly final 💪 The change from fnmatch to regex allowed for a finer control over the patterns.
I decided to exclude all .dotfiles in the root of the reproduction, this includes any .gitignore or other .config files.
I want to allow .github and .devcontainer, as this might be related to the issues people are having etc.
So I had to add separate exclusions for IDE directories like .vscode or .idea.

DEBUG   -  Excluded pattern '.*\.cache/?': /.cache/
DEBUG   -  Excluded pattern '^/\.vscode/': /.vscode/
DEBUG   -  Excluded pattern '/site/': /site/
DEBUG   -  Excluded pattern '/venv/': /venv/
DEBUG   -  Excluded pattern '^/\.[^/]+$': /.editorconfig
DEBUG   -  Excluded pattern '^/\.[^/]+$': /.gitignore
DEBUG   -  Excluded pattern '^/.*\.zip': /7-projects-examples.zip
DEBUG   -  Excluded pattern '/examples/fonts_site_dir/': /examples/fonts_site_dir/
DEBUG   -  Excluded pattern '.*__pycache__/': /examples/__pycache__/
DEBUG   -  Excluded pattern '/examples/blog-basic/site/': /examples/blog-basic/site/
DEBUG   -  Excluded pattern '.*__pycache__/': /hooks/__pycache__/

So in the examples repository, .vscode, .cache, .editorconfig, and .gitignore got excluded apart of the other things.
9.5.12+insiders.4.53.0-regex.zip

I use the regex module included in MkDocs dependencies (?), because the docs claim it's more robust when it comes to handling Unicode than the built-in re.
EDIT: Actually you have added regex in the requirements, so all is well ✌️

@kamilkrzyskow kamilkrzyskow marked this pull request as ready for review March 3, 2024 05:03
@kamilkrzyskow kamilkrzyskow requested a review from squidfunk March 3, 2024 05:03
Copy link
Copy Markdown
Owner

@squidfunk squidfunk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a functional perspective, this is top-notch! I only have primarily formatting- and comment-related remarks that we should address before merging. I hope that it's not too pendantic, but I always try to put myself in my own shoes 3 months from now when I have forgotten what and why I did specific things. Especially when I jump between TypeScript and Python, extensive comments on design and architecture decisions make it much easier to rebuild an overview of the project after some time has passed ☺️

Add more elaborate comments
Fix lines to span to 80 characters
Fix import order
Fix variable naming
@squidfunk squidfunk merged commit 63de275 into squidfunk:master Mar 5, 2024
@squidfunk
Copy link
Copy Markdown
Owner

Thanks again for your awesome work! We'll release it in the coming days!

@squidfunk
Copy link
Copy Markdown
Owner

I just updated the dependencies in dc808ca and had to make a change to the build process, because the info.gitignore file would otherwise be deleted. I'm not sure why it built for you, but for me it would be gone, breaking the plugin 😅

@kamilkrzyskow
Copy link
Copy Markdown
Collaborator Author

I got caught red handed I didn't completely follow the contribution guide. At some point I stopped rebuilding the project, but instead just copied the material/plugins/info directory to the src/plugins directory, because I noticed it doesn't make a difference 😅

Seeing the code responsible for copying the config files and assets perhaps naming the file .gitignore wasn't the best idea.

Especially now after I stopped using fnmatch and started using regex, I think this should be it's own patterns.py file with a list of patterns in python which is later simply imported.

Of course I didn't think of this sooner because it worked, and I didn't stop to think of the build process.

From an IDE perspective the current file is recognised as a .gitignore file while it's not such file 😵‍💫

If you won't do it sooner with the plugin rewrite, I will revisit this again in the future, and fixup the methodology of storing patterns.

Unless I should do it immediately?

@squidfunk
Copy link
Copy Markdown
Owner

Especially now after I stopped using fnmatch and started using regex, I think this should be it's own patterns.py file with a list of patterns in python which is later simply imported.

Makes sense! The next release will take 2-3 days at least, so if you can squeeze it in, you can go ahead. If not, no problem, but then we should add a comment with a @todo that this should be done in the future.

@kamilkrzyskow
Copy link
Copy Markdown
Collaborator Author

Great, will do it today ✌️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The info plugin does not includes inherited configurations

3 participants