refactor(gw/dir-index-html): remove gobindata#8834
refactor(gw/dir-index-html): remove gobindata#8834frankywahl wants to merge 5 commits intoipfs:masterfrom
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
| if assets.BindataVersionHash != "" { | ||
| dirEtag := `"DirIndex-` + assets.BindataVersionHash + `_CID-` + resolvedPath.Cid().String() + `"` |
There was a problem hiding this comment.
Slight change in behavior appear here. We could put into our tooling a hash of all the files we're after if we want to allow for caching of the index file across versions (as it use to be). Or we could actually compute the hash (md5?) of the file we're after directly here.
There was a problem hiding this comment.
We should restore old behavior. Perhaps:
- hash
dir embed.FSon startup and store it somewhere?- we dont need to keep the entire hash, only ~7 characters will be more than enough.
- fyi we use https://github.com/OneOfOne/xxhash in places where we need a fast non-cryptographic hash
There was a problem hiding this comment.
I think this is a really interesting aspect anw why I highlighted it. Etag was actually debated as part of the embed implementation here
The way it was before this change was containing an "flaw" (we could actually file it as a separate issue) in that if there are two files foo.html and bar.html, either of the two files being changed would bust the cache of the other one too despite it being unchanged.
I have implemented it using sha256 in the following commit.
It has the same behavior as before, with a slight impact on startup (computing the hash). We can try to optimize it after with some function call, caching and sync.Once to avoid computing the hash until it is needed at a later stage. I believe the impact would be quite minimal given there are not many assets to hash and the efficiency of sha256 so I defer that to a separate concern. This also avoids bringing in another dependency (as xxxhash would).
Another option would also be to use something like https://github.com/benbjohnson/hashfs - this would allow a per-asset caching. (This definitely could be done as another pull-request).
Given the concern of each one, I though just busting the cache between version changes would be an okay solution.
Let me know what you think, happy to remove the commit or attempt another implementation
There was a problem hiding this comment.
It has the same behavior as before, with a slight impact on startup (computing the hash). We can try to optimize it after with some function call, caching and
sync.Onceto avoid computing the hash until it is needed at a later stage.
That sounds like a good idea, however I think it's better to try the easy stuff first (using a cheaper hash function). And see about that later, probably that xxhash is fast enough anyway and that this isn't needed.
An other option similar to sync.Once is we could have init starting a go Hash.Do(hashfunc) so it is still done at init but asynchronously in the background. And then consumers would also try to do Hash.Do(hashfunc) and be stuck until the once return.
There was a problem hiding this comment.
@Jorropo Not sure I fully understand the last part of your comment (though I get it's hoping to do that in a goroutine), but I think it's not that big of a bother anyhow as we both think that such an optimization can come later if it is needed and doesn't need to be part of this PR I think.
There was a problem hiding this comment.
later if it is needed
I agree with that too, so this point is solved. 🙂
What I proposed, is something like this: (pseudo code)
func init() {
go etagHashLock.Do(func() {
etagHash = computeEtagHash()
})
}
var etagHash string
var etagHashLock sync.Once
func GetETagHash() string {
etagHashLock.Do(func(){}) // Here we do nothing, basically we just use the once as a way to wait on the goroutine that init started
return etagHash
}There was a problem hiding this comment.
Ah - got it. Thank for the explanation
9712935 to
3558855
Compare
lidel
left a comment
There was a problem hiding this comment.
Thank you for cleaning this, up! 👍
Before we continue, we need to tweak the Etag behavior (see comment inline)
| if assets.BindataVersionHash != "" { | ||
| dirEtag := `"DirIndex-` + assets.BindataVersionHash + `_CID-` + resolvedPath.Cid().String() + `"` |
There was a problem hiding this comment.
We should restore old behavior. Perhaps:
- hash
dir embed.FSon startup and store it somewhere?- we dont need to keep the entire hash, only ~7 characters will be more than enough.
- fyi we use https://github.com/OneOfOne/xxhash in places where we need a fast non-cryptographic hash
Since go1.16, there are built in tools that allow for embeding filesystem inside the binary. We now make use of the `embed` package to have all files put into the binary, removing the need to generate the files and removes dependencies
Removes go-bindata dependency
2736d5a to
5e1e630
Compare
Jorropo
left a comment
There was a problem hiding this comment.
LGTM, thx for the patch 🙂
|
@frankywahl the CI is stuck and wont run, I can't unstuck it. (else I'll just move your branch to a new one on this repo and merge from this) 🙂 |
Co-authored-by: Jorropo <jorropo.pgm@gmail.com>
This removes the delegation to the function and requires all callers that used the `asset.Asset` func to access to asset via the `embed.FS`
9b65315 to
201f4ce
Compare
|
@Jorropo I've squashed some commits and forced push - it should re-trigger the CI. I'm not sure what the issue is with Circle as I don't even have a CircleCI account - so I definitely did not change any configurations there. 🤞 it works this time. |
Replaces https://github.com/go-bindata/go-bindata with the standard library
embedThis remove a dependency on go-bindata and makes the tooling flow less complex