Incrementally rebuild when a data file is changed by ashmaroli · Pull Request #8771 · jekyll/jekyll

ashmaroli · 2021-08-19T12:58:25Z

This is an 🙋 enhancement.
I've added tests.
The test suite passes locally.

Summary

Currently, data files are read and loaded directly as Ruby primitives. This prevents us from registering data files as dependencies of a page or document.
Therefore, initialize data files into dedicated Jekyll objects.

Context

Resolves #7682

parkr · 2021-11-22T19:09:20Z

This seems like a great idea! I haven't been able to dig into the code yet but it's a great plan. Thanks for tackling this!

ashmaroli · 2022-04-01T17:24:36Z

@parkr @mattr- Will you be able to look into this over the weekend? I'd like to include this in 4.3.0..
Thanks.

features/incremental_rebuild.feature

ashmaroli · 2022-08-28T09:29:25Z

Pinging @parkr for a fresh set of 👀

parkr

Thanks for doing this, @ashmaroli! Overall, it looks good. I left a few inline comments about code readability / structure and maybe one or two gotchas. Let me know what you think.

parkr · 2022-08-28T18:34:44Z

lib/jekyll/site_data/directory.rb

+    class Directory
+      # Delegate given (zero-arity) method(s) to the Hash object stored in instance
+      # variable `@meta`.
+      def self.delegate_to_meta(*symbols)


We do a lot of delegation throughout the Jekyll codebase. Is the "Forwardable" module insufficient for this?

If I remember correctly, the main motivation behind this was to avoid initialising interim objects. At the time, using Forwardable's def_delegators method resulted in allocating an Array object on every invocation of the delegated method. Anyways, this method has been designated as a private_class_method so that it can be changed or removed with minimal side-effects.

parkr · 2022-08-28T18:34:59Z

lib/jekyll/site_data/directory.rb

+      attr_accessor :context
+
+      def initialize
+        @meta = {}


@meta is a very generic name which doesn't explain what this variable is used for. Could we use a more descriptive name that tells us what it holds?

The Hash instance assigned to the variable contains the data of constituent data files. I had therefore considered naming the variable @data, then @metadata and finally just @meta.
Naming things is hard 😢
I will try to come up with better names..

It's @files right? in DataReader, the code does data[key] = DataFile.new... which stores into this hash by method_missing. https://github.com/jekyll/jekyll/pull/8771/files#diff-ae0ebeb5a3a44f07ee61a0e5851d165c646ddab160824029c467bffea3717e7bR51

I'd call it @files or something like that since that's what it's storing? Is it storing anything else?

@parkr, @files makes me think of an Array of file objects like how we have an array of staticfiles. But DataFile isn't technically a file object either, only a container for the "parsed data" of the actual data file on disk. The closest I think that will express the intention correctly is calling this variable a @registry and DataFile to be DataEntry instead.

parkr · 2022-08-28T18:39:52Z

lib/jekyll/site_data/directory.rb

+      end
+
+      def to_liquid
+        self


to_liquid is generally a means of sanitizing output, converting into a Drop, or returning a simpler data structure. I don't see a private below here so we'd be exposing #merge, #merge!, and all other Hash methods (from method_missing). It seems like we'd rather simply expose the #[](key) method to Liquid? I'd rather see a simple Drop here which grants basic access to the underlying Hash of File objects.

This was intentional to maintain backwards-compatibility.
As of Jekyll 4.2.x, the top-level objects of site_data are Hash instances. Hash#to_liquid returns the current Hash instance. So all of the public methods should continue to be invokable in the future.

parkr · 2022-08-28T18:41:26Z

lib/jekyll/site_data/file.rb

+      attr_accessor :context
+      attr_reader :content
+
+      def initialize(site, path, content)


This needs a method comment describing the inputs, since it's a bit vague (e.g. is "path" relative or absolute?)

parkr · 2022-08-28T18:42:00Z

lib/jekyll/site_data/file.rb

+      def initialize(site, path, content)
+        @site = site
+        @path = path
+        @content = content


@content throughout the Jekyll codebase generally refers to plain text. Perhaps @data or @structured_content would help indicate that this is generally not a text value, but rather False, Array, or Hash.

parkr · 2022-08-28T18:48:16Z

lib/jekyll/site_data/file.rb

+
+module Jekyll
+  module SiteData
+    class File


In generally, I try to not use class names which match stdlib names, like File. It can just confuse things a bit. We could try Jekyll::DataFile and Jekyll::DataDirectory, or something which doesn't match the stdlib name File.

parkr · 2022-08-28T18:50:54Z

lib/jekyll/site_data/file.rb

+
+module Jekyll
+  module SiteData
+    class File


There are no unit tests for this anywhere – is that intentional? I would be happy to see simple unit tests for creating an instance and checking equality, where you have some custom code.

ashmaroli · 2022-08-29T10:08:32Z

Thank you for the constructive feedback, @parkr. I shall make the requested improvements over a few days.

parkr

Thanks for addressing my feedback, @ashmaroli! This is looking pretty good. I would just like to see a unit test for DataFile, especially testing for equality (<=>).

parkr · 2022-09-06T04:59:05Z

features/incremental_rebuild.feature

+    And I should see "John Doe -- Admin" in "_site/about.html"
+    And I should see "Rendering: index.html" in the build output
+    And I should see "Rendering: _posts/2009-03-27-wargames.markdown" in the build output
+    When I wait 1 second


The cucumber test suite is quite slow as it is. An optimization in a later PR could be to limit these waits to a smaller value and/or use a more granular mtime in our incremental meantime. 🤔

parkr · 2022-09-06T05:06:01Z

lib/jekyll/data_file.rb

+    attr_accessor :context
+    attr_reader :data
+
+    def initialize(site, abs_path, data)


A comment here would still be nice, since I have to make certain assumptions about these variables. Something like:

# Create a DataFile. # site - the Jekyll::Site this file belongs to # abs_path - the absolute path to the file # data - the parsed contents of the data file

If the comment is wrong, then it only helps to illustrate why a comment is helpful 😜

Your assumptions here are correct. I'll add the suggested comment to avoid assumptions. 🙂

parkr · 2022-09-06T05:10:25Z

lib/jekyll/site_data/directory.rb

+      attr_accessor :context
+
+      def initialize
+        @meta = {}


It's @files right? in DataReader, the code does data[key] = DataFile.new... which stores into this hash by method_missing. https://github.com/jekyll/jekyll/pull/8771/files#diff-ae0ebeb5a3a44f07ee61a0e5851d165c646ddab160824029c467bffea3717e7bR51

I'd call it @files or something like that since that's what it's storing? Is it storing anything else?

parkr · 2022-09-25T19:24:04Z

lib/jekyll/data_hash.rb

+
+    def [](key)
+      @registry[key].tap do |value|
+        value.context = context if value.respond_to?(:context=)


Is the only reason we have DataHash for this one line?

Primarily yes. But also to avoid manipulating the Ruby Core class Hash.
Ultimately, we need a way to detect what entities are dependent on a given data file or directory of data files to trigger the entity's re-rendering when data file(s) change. The most common way end-users directly consume Jekyll::Site#site_data is via Liquid drops: {{ site.data[my_var] }}. Therefore, the only access to the parent entity's path attribute is via the Liquid context. So, we need an object that behaves like Ruby's own Hash yet be aware of render context and remain the same subclass after execution of certain methods (that return a Hash object irrespective of the caller class).
The last point is debatable because I don't quite remember which methods exactly and therefore cannot assert whether modern Ruby versions have addressed that oddity.

Gotcha, thanks for the reiteration.

ashmaroli · 2022-09-29T12:46:23Z

@jekyllbot: merge +minor

Ashwin Maroli: Incrementally rebuild when a data file is changed (#8771) Merge pull request 8771

This reverts commit 160a681.

…9170) Merge pull request 9170

Ashwin Maroli: Revert "Incrementally rebuild when a data file is changed (#8771)" (#9170) Merge pull request 9170

…9170) Merge pull request 9170

…" (jekyll#9170) Merge pull request 9170

* master: (27 commits) Remove noise in `features/highlighting.feature` Release 💎 v4.3.1 Update history to reflect merge of jekyll#9171 [ci skip] Release post for v4.3.1 (jekyll#9171) Update history to reflect merge of jekyll#9170 [ci skip] Revert "Incrementally rebuild when a data file is changed (jekyll#8771)" (jekyll#9170) Update history to reflect merge of jekyll#9167 [ci skip] Respect user-defined name attribute in documents (jekyll#9167) Revert back to developing 4.3.x Mark initiation of v5.0 development Disable critical GH Actions on `master` Fix spelling errors in History document Release 💎 v4.3.0 Update history to reflect merge of jekyll#9157 [ci skip] Release post for v4.3.0 (jekyll#9157) Clean up HEAD section in History document Document xz dependency on macOS (jekyll#9098) Fix URL to Liquid documentation (jekyll#9158) Bump RuboCop to `v1.37.x` Update history to reflect merge of jekyll#9132 [ci skip] ...

Incrementally rebuild when a data file is changed

801d3c5

ashmaroli added the enhancement label Aug 19, 2021

ashmaroli added 2 commits August 19, 2021 20:27

Override some methods for backwards compatibility

def2379

Conditionally inject context in SiteDrop

864fb43

ashmaroli requested review from a team, mattr- and parkr and removed request for a team August 19, 2021 16:50

ashmaroli and others added 3 commits September 3, 2021 22:41

Improve Jekyll::SiteData::File

3111739

Merge branch 'master' into site_data-module

da4582c

Delegate to instead of inheriting Ruby core class

b6ba77c

ashmaroli added 5 commits November 24, 2021 20:46

Merge branch 'master' into site_data-module

cc7f316

Designate SiteData::Directory and File as mergable

27c136b

Add private delegate method helper

02eabf6

Revert change to Gemfile

35e4155

Add test coverage for SiteData::Directory

91107b6

mattr- reviewed Apr 2, 2022

View reviewed changes

features/incremental_rebuild.feature Show resolved Hide resolved

Simplify cucumber scenario to reduce confusion

6a2d3cc

parkr reviewed Aug 28, 2022

View reviewed changes

ashmaroli and others added 4 commits September 4, 2022 20:40

Dissolve Jekyll::SiteData module

dbf8590

Rectify reference to called public_method

4a59941

Improve comprehensibility of DataDirectory class

2d7536b

Fix reference to instance variable

be306cd

parkr reviewed Sep 6, 2022

View reviewed changes

ashmaroli added 2 commits September 7, 2022 15:27

Merge branch 'master' into site_data-module

c779634

Rename new classes as DataHash and DataEntry

9fe2d71

ashmaroli requested a review from parkr September 25, 2022 15:22

parkr reviewed Sep 25, 2022

View reviewed changes

parkr approved these changes Sep 27, 2022

View reviewed changes

jekyllbot merged commit 160a681 into jekyll:master Sep 29, 2022

jekyllbot added a commit that referenced this pull request Sep 29, 2022

Update history to reflect merge of #8771 [ci skip]

2cc51e6

github-actions bot pushed a commit that referenced this pull request Sep 29, 2022

Deploy docs from 160a681

8225da2

Ashwin Maroli: Incrementally rebuild when a data file is changed (#8771) Merge pull request 8771

ashmaroli added a commit that referenced this pull request Oct 26, 2022

Revert "Incrementally rebuild when a data file is changed (#8771)"

137ba30

This reverts commit 160a681.

ashmaroli mentioned this pull request Oct 26, 2022

Revert "Incrementally rebuild when a data file is changed" #9170

Merged

jekyllbot pushed a commit that referenced this pull request Oct 26, 2022

Revert "Incrementally rebuild when a data file is changed (#8771)" (#…

5367a02

…9170) Merge pull request 9170

ashmaroli deleted the site_data-module branch October 26, 2022 16:38

github-actions bot pushed a commit that referenced this pull request Oct 26, 2022

Deploy docs from 5367a02

09d544e

Ashwin Maroli: Revert "Incrementally rebuild when a data file is changed (#8771)" (#9170) Merge pull request 9170

ashmaroli added a commit that referenced this pull request Oct 27, 2022

Revert "Incrementally rebuild when a data file is changed (#8771)" (#…

590d0b5

…9170) Merge pull request 9170

lylo pushed a commit to lylo/jekyll that referenced this pull request Oct 27, 2022

Revert "Incrementally rebuild when a data file is changed (jekyll#8771)…

52a16c1

…" (jekyll#9170) Merge pull request 9170

maul-esel mentioned this pull request Apr 15, 2023

Consider page dependencies? gjtorikian/jekyll-last-modified-at#94

Open

jekyll locked and limited conversation to collaborators Oct 26, 2023

jekyllbot added the frozen-due-to-age label Oct 26, 2023

Uh oh!

Conversation

ashmaroli commented Aug 19, 2021

Summary

Context

Uh oh!

parkr commented Nov 22, 2021

Uh oh!

ashmaroli commented Apr 1, 2022

Uh oh!

Uh oh!

ashmaroli commented Aug 28, 2022

Uh oh!

parkr left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ashmaroli commented Aug 29, 2022

Uh oh!

parkr left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ashmaroli commented Sep 29, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants