Skip to content

refactor(gw/dir-index-html): remove gobindata#8834

Closed
frankywahl wants to merge 5 commits intoipfs:masterfrom
OpenSourceProjects:remove-go-bin-data
Closed

refactor(gw/dir-index-html): remove gobindata#8834
frankywahl wants to merge 5 commits intoipfs:masterfrom
OpenSourceProjects:remove-go-bin-data

Conversation

@frankywahl
Copy link
Copy Markdown
Contributor

Replaces https://github.com/go-bindata/go-bindata with the standard library embed

This remove a dependency on go-bindata and makes the tooling flow less complex

@welcome

This comment was marked as resolved.

Comment on lines -90 to -96
if assets.BindataVersionHash != "" {
dirEtag := `"DirIndex-` + assets.BindataVersionHash + `_CID-` + resolvedPath.Cid().String() + `"`
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slight change in behavior appear here. We could put into our tooling a hash of all the files we're after if we want to allow for caching of the index file across versions (as it use to be). Or we could actually compute the hash (md5?) of the file we're after directly here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should restore old behavior. Perhaps:

  • hash dir embed.FS on startup and store it somewhere?
    • we dont need to keep the entire hash, only ~7 characters will be more than enough.
    • fyi we use https://github.com/OneOfOne/xxhash in places where we need a fast non-cryptographic hash

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a really interesting aspect anw why I highlighted it. Etag was actually debated as part of the embed implementation here

The way it was before this change was containing an "flaw" (we could actually file it as a separate issue) in that if there are two files foo.html and bar.html, either of the two files being changed would bust the cache of the other one too despite it being unchanged.

I have implemented it using sha256 in the following commit.

It has the same behavior as before, with a slight impact on startup (computing the hash). We can try to optimize it after with some function call, caching and sync.Once to avoid computing the hash until it is needed at a later stage. I believe the impact would be quite minimal given there are not many assets to hash and the efficiency of sha256 so I defer that to a separate concern. This also avoids bringing in another dependency (as xxxhash would).

Another option would also be to use something like https://github.com/benbjohnson/hashfs - this would allow a per-asset caching. (This definitely could be done as another pull-request).

Given the concern of each one, I though just busting the cache between version changes would be an okay solution.

Let me know what you think, happy to remove the commit or attempt another implementation

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has the same behavior as before, with a slight impact on startup (computing the hash). We can try to optimize it after with some function call, caching and sync.Once to avoid computing the hash until it is needed at a later stage.

That sounds like a good idea, however I think it's better to try the easy stuff first (using a cheaper hash function). And see about that later, probably that xxhash is fast enough anyway and that this isn't needed.

An other option similar to sync.Once is we could have init starting a go Hash.Do(hashfunc) so it is still done at init but asynchronously in the background. And then consumers would also try to do Hash.Do(hashfunc) and be stuck until the once return.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jorropo Not sure I fully understand the last part of your comment (though I get it's hoping to do that in a goroutine), but I think it's not that big of a bother anyhow as we both think that such an optimization can come later if it is needed and doesn't need to be part of this PR I think.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

later if it is needed

I agree with that too, so this point is solved. 🙂

What I proposed, is something like this: (pseudo code)

func init() {
  go etagHashLock.Do(func() {
    etagHash = computeEtagHash()
  })
}

var etagHash string
var etagHashLock sync.Once

func GetETagHash() string {
  etagHashLock.Do(func(){}) // Here we do nothing, basically we just use the once as a way to wait on the goroutine that init started
  return etagHash
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah - got it. Thank for the explanation

@frankywahl frankywahl marked this pull request as ready for review March 31, 2022 13:20
@frankywahl frankywahl force-pushed the remove-go-bin-data branch from 9712935 to 3558855 Compare April 7, 2022 19:16
@frankywahl frankywahl requested a review from lidel as a code owner April 7, 2022 19:16
Copy link
Copy Markdown
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for cleaning this, up! 👍

Before we continue, we need to tweak the Etag behavior (see comment inline)

Comment on lines -90 to -96
if assets.BindataVersionHash != "" {
dirEtag := `"DirIndex-` + assets.BindataVersionHash + `_CID-` + resolvedPath.Cid().String() + `"`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should restore old behavior. Perhaps:

  • hash dir embed.FS on startup and store it somewhere?
    • we dont need to keep the entire hash, only ~7 characters will be more than enough.
    • fyi we use https://github.com/OneOfOne/xxhash in places where we need a fast non-cryptographic hash

@lidel lidel changed the title Remove gobindata refactor(gw/dir-index-html): remove gobindata Apr 10, 2022
@lidel lidel added this to the Best Effort Track milestone Apr 10, 2022
Since go1.16, there are built in tools that allow for embeding
filesystem inside the binary. We now make use of the `embed` package to
have all files put into the binary, removing the need to generate the
files and removes dependencies
Removes go-bindata dependency
@frankywahl frankywahl force-pushed the remove-go-bin-data branch 2 times, most recently from 2736d5a to 5e1e630 Compare April 11, 2022 13:05
Copy link
Copy Markdown
Contributor

@Jorropo Jorropo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still like the old Asset API to be removed, but this can be done later (altho I don't see any reason to do so) and is really solid. Thx for your PR 🙂

Copy link
Copy Markdown
Contributor

@Jorropo Jorropo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thx for the patch 🙂

@Jorropo
Copy link
Copy Markdown
Contributor

Jorropo commented Apr 11, 2022

@frankywahl the CI is stuck and wont run, I can't unstuck it.
Apparently it might be if you have done something to your fork circleci config that makes run on your circleci account ? That doesn't work because the run approval I've did is on that repo not yours. (that just github authorisatyion mess)
Or maybe try pushing a new commit so it retry to rerun it ?

(else I'll just move your branch to a new one on this repo and merge from this) 🙂

frankywahl and others added 3 commits April 11, 2022 20:35
Co-authored-by: Jorropo <jorropo.pgm@gmail.com>
This removes the delegation to the function and requires all callers
that used the `asset.Asset` func to access to asset via the `embed.FS`
@frankywahl
Copy link
Copy Markdown
Contributor Author

@Jorropo I've squashed some commits and forced push - it should re-trigger the CI. I'm not sure what the issue is with Circle as I don't even have a CircleCI account - so I definitely did not change any configurations there. 🤞 it works this time.

@lidel
Copy link
Copy Markdown
Member

lidel commented Apr 11, 2022

CircleCI is broken when your fork comes from an org.

@Jorropo fyi the way to fix this is to push the changes as a new, upstream branch.
that way commits will come from our repo and CI will run ok.

Let's continue in #8872

@lidel lidel closed this Apr 11, 2022
@lidel lidel removed this from the Best Effort Track milestone Apr 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants