Incrementally rebuild when a data file is changed#8771
Incrementally rebuild when a data file is changed#8771jekyllbot merged 18 commits intojekyll:masterfrom
Conversation
|
This seems like a great idea! I haven't been able to dig into the code yet but it's a great plan. Thanks for tackling this! |
|
Pinging @parkr for a fresh set of 👀 |
parkr
left a comment
There was a problem hiding this comment.
Thanks for doing this, @ashmaroli! Overall, it looks good. I left a few inline comments about code readability / structure and maybe one or two gotchas. Let me know what you think.
lib/jekyll/site_data/directory.rb
Outdated
| class Directory | ||
| # Delegate given (zero-arity) method(s) to the Hash object stored in instance | ||
| # variable `@meta`. | ||
| def self.delegate_to_meta(*symbols) |
There was a problem hiding this comment.
We do a lot of delegation throughout the Jekyll codebase. Is the "Forwardable" module insufficient for this?
There was a problem hiding this comment.
If I remember correctly, the main motivation behind this was to avoid initialising interim objects. At the time, using Forwardable's def_delegators method resulted in allocating an Array object on every invocation of the delegated method. Anyways, this method has been designated as a private_class_method so that it can be changed or removed with minimal side-effects.
lib/jekyll/site_data/directory.rb
Outdated
| attr_accessor :context | ||
|
|
||
| def initialize | ||
| @meta = {} |
There was a problem hiding this comment.
@meta is a very generic name which doesn't explain what this variable is used for. Could we use a more descriptive name that tells us what it holds?
There was a problem hiding this comment.
The Hash instance assigned to the variable contains the data of constituent data files. I had therefore considered naming the variable @data, then @metadata and finally just @meta.
Naming things is hard 😢
I will try to come up with better names..
There was a problem hiding this comment.
It's @files right? in DataReader, the code does data[key] = DataFile.new... which stores into this hash by method_missing. https://github.com/jekyll/jekyll/pull/8771/files#diff-ae0ebeb5a3a44f07ee61a0e5851d165c646ddab160824029c467bffea3717e7bR51
I'd call it @files or something like that since that's what it's storing? Is it storing anything else?
There was a problem hiding this comment.
@parkr, @files makes me think of an Array of file objects like how we have an array of staticfiles. But DataFile isn't technically a file object either, only a container for the "parsed data" of the actual data file on disk. The closest I think that will express the intention correctly is calling this variable a @registry and DataFile to be DataEntry instead.
lib/jekyll/site_data/directory.rb
Outdated
| end | ||
|
|
||
| def to_liquid | ||
| self |
There was a problem hiding this comment.
to_liquid is generally a means of sanitizing output, converting into a Drop, or returning a simpler data structure. I don't see a private below here so we'd be exposing #merge, #merge!, and all other Hash methods (from method_missing). It seems like we'd rather simply expose the #[](key) method to Liquid? I'd rather see a simple Drop here which grants basic access to the underlying Hash of File objects.
There was a problem hiding this comment.
This was intentional to maintain backwards-compatibility.
As of Jekyll 4.2.x, the top-level objects of site_data are Hash instances. Hash#to_liquid returns the current Hash instance. So all of the public methods should continue to be invokable in the future.
lib/jekyll/site_data/file.rb
Outdated
| attr_accessor :context | ||
| attr_reader :content | ||
|
|
||
| def initialize(site, path, content) |
There was a problem hiding this comment.
This needs a method comment describing the inputs, since it's a bit vague (e.g. is "path" relative or absolute?)
lib/jekyll/site_data/file.rb
Outdated
| def initialize(site, path, content) | ||
| @site = site | ||
| @path = path | ||
| @content = content |
There was a problem hiding this comment.
@content throughout the Jekyll codebase generally refers to plain text. Perhaps @data or @structured_content would help indicate that this is generally not a text value, but rather False, Array, or Hash.
lib/jekyll/site_data/file.rb
Outdated
|
|
||
| module Jekyll | ||
| module SiteData | ||
| class File |
There was a problem hiding this comment.
In generally, I try to not use class names which match stdlib names, like File. It can just confuse things a bit. We could try Jekyll::DataFile and Jekyll::DataDirectory, or something which doesn't match the stdlib name File.
lib/jekyll/site_data/file.rb
Outdated
|
|
||
| module Jekyll | ||
| module SiteData | ||
| class File |
There was a problem hiding this comment.
There are no unit tests for this anywhere – is that intentional? I would be happy to see simple unit tests for creating an instance and checking equality, where you have some custom code.
|
Thank you for the constructive feedback, @parkr. I shall make the requested improvements over a few days. |
parkr
left a comment
There was a problem hiding this comment.
Thanks for addressing my feedback, @ashmaroli! This is looking pretty good. I would just like to see a unit test for DataFile, especially testing for equality (<=>).
| And I should see "John Doe -- Admin" in "_site/about.html" | ||
| And I should see "Rendering: index.html" in the build output | ||
| And I should see "Rendering: _posts/2009-03-27-wargames.markdown" in the build output | ||
| When I wait 1 second |
There was a problem hiding this comment.
The cucumber test suite is quite slow as it is. An optimization in a later PR could be to limit these waits to a smaller value and/or use a more granular mtime in our incremental meantime. 🤔
lib/jekyll/data_file.rb
Outdated
| attr_accessor :context | ||
| attr_reader :data | ||
|
|
||
| def initialize(site, abs_path, data) |
There was a problem hiding this comment.
A comment here would still be nice, since I have to make certain assumptions about these variables. Something like:
# Create a DataFile.
# site - the Jekyll::Site this file belongs to
# abs_path - the absolute path to the file
# data - the parsed contents of the data fileIf the comment is wrong, then it only helps to illustrate why a comment is helpful 😜
There was a problem hiding this comment.
Your assumptions here are correct. I'll add the suggested comment to avoid assumptions. 🙂
lib/jekyll/site_data/directory.rb
Outdated
| attr_accessor :context | ||
|
|
||
| def initialize | ||
| @meta = {} |
There was a problem hiding this comment.
It's @files right? in DataReader, the code does data[key] = DataFile.new... which stores into this hash by method_missing. https://github.com/jekyll/jekyll/pull/8771/files#diff-ae0ebeb5a3a44f07ee61a0e5851d165c646ddab160824029c467bffea3717e7bR51
I'd call it @files or something like that since that's what it's storing? Is it storing anything else?
|
|
||
| def [](key) | ||
| @registry[key].tap do |value| | ||
| value.context = context if value.respond_to?(:context=) |
There was a problem hiding this comment.
Is the only reason we have DataHash for this one line?
There was a problem hiding this comment.
Primarily yes. But also to avoid manipulating the Ruby Core class Hash.
Ultimately, we need a way to detect what entities are dependent on a given data file or directory of data files to trigger the entity's re-rendering when data file(s) change. The most common way end-users directly consume Jekyll::Site#site_data is via Liquid drops: {{ site.data[my_var] }}. Therefore, the only access to the parent entity's path attribute is via the Liquid context. So, we need an object that behaves like Ruby's own Hash yet be aware of render context and remain the same subclass after execution of certain methods (that return a Hash object irrespective of the caller class).
The last point is debatable because I don't quite remember which methods exactly and therefore cannot assert whether modern Ruby versions have addressed that oddity.
There was a problem hiding this comment.
Gotcha, thanks for the reiteration.
|
@jekyllbot: merge +minor |
Ashwin Maroli: Incrementally rebuild when a data file is changed (#8771) Merge pull request 8771
This reverts commit 160a681.
…" (jekyll#9170) Merge pull request 9170
* master: (27 commits) Remove noise in `features/highlighting.feature` Release 💎 v4.3.1 Update history to reflect merge of jekyll#9171 [ci skip] Release post for v4.3.1 (jekyll#9171) Update history to reflect merge of jekyll#9170 [ci skip] Revert "Incrementally rebuild when a data file is changed (jekyll#8771)" (jekyll#9170) Update history to reflect merge of jekyll#9167 [ci skip] Respect user-defined name attribute in documents (jekyll#9167) Revert back to developing 4.3.x Mark initiation of v5.0 development Disable critical GH Actions on `master` Fix spelling errors in History document Release 💎 v4.3.0 Update history to reflect merge of jekyll#9157 [ci skip] Release post for v4.3.0 (jekyll#9157) Clean up HEAD section in History document Document xz dependency on macOS (jekyll#9098) Fix URL to Liquid documentation (jekyll#9158) Bump RuboCop to `v1.37.x` Update history to reflect merge of jekyll#9132 [ci skip] ...
Summary
Currently, data files are read and loaded directly as Ruby primitives. This prevents us from registering data files as dependencies of a page or document.
Therefore, initialize data files into dedicated Jekyll objects.
Context
Resolves #7682