Conversation
core/commands/files/files.go
Outdated
| Options: []cmdkit.Option{ | ||
| cmdkit.BoolOption("only-summary", "s", "Only print the summary of the tree's information."), | ||
| // TODO: "local" is already used by the root ipfs command, what do ? | ||
| cmdkit.BoolOption("local2", "l", "Don't request data from the network."), |
There was a problem hiding this comment.
This is a bit of a problem right now. You can leave this option out here, and still use local, since its defined globally. The annoying part is that it then doesnt show up in the helptext.
This PR: #4479 should address the issue.
There was a problem hiding this comment.
The root --local option means 'bypass the daemon, run directly on the repository', but here it means 'run on the daemon, just don't do any network request'. Is it ok for you to bypass the daemon ?
Otherwise we could rename the option to --no-network or something.
There was a problem hiding this comment.
@MichaelMure My intent with the global --local flag was to mean "don't do any network requests". Unless others disagree, I would like to move towards that definition everywhere.
core/commands/files/files.go
Outdated
| // try to decode unixfs | ||
| if pn, ok := state.node.(*dag.ProtoNode); ok { | ||
| unixfs, err := ft.FromBytes(pn.Data()) | ||
| if err == nil { |
There was a problem hiding this comment.
I feel like we should try to rework the logic here to avoid nesting too deeply.
There was a problem hiding this comment.
I can merge the test in one line if unixfs, err := ft.FromBytes(pn.Data()); err == nil { which make it more clearer, but that doesn't change the nesting depth.
I could also use a goto to jump out of the unixfs section when the decoding fail. What do you think ?
core/commands/files/files.go
Outdated
| } | ||
| } | ||
|
|
||
| summary, err := walkBlockStart(res, nd.Context(), dagserv, root, isDirectory, onlySummary) |
core/commands/files/files.go
Outdated
| return nil | ||
| } | ||
|
|
||
| return fmt.Errorf("unknow output type") |
| } | ||
|
|
||
| if item.Depth > 0 { | ||
| fmt.Fprint(w, "└── ") |
There was a problem hiding this comment.
Should use ├ unless this is the last item at this depth. Unfortunately, you'll probably want to use a PostRun function (like the add command) instead of an encoder to do this (as you'll need to keep state).
|
As request by @Stebalien, the ouput is neater now: It's still not fully identical to the unix command |
|
@MichaelMure I think you can do it with the information you have already, heres the code for It looks like this: |
|
Though totally fine to just skip that for now. flashy UI is a bit out of scope for what youre trying to build ;) |
License: MIT Signed-off-by: Michael Muré <batolettre@gmail.com>
License: MIT Signed-off-by: Michael Muré <batolettre@gmail.com>
License: MIT Signed-off-by: Michael Muré <batolettre@gmail.com>
License: MIT Signed-off-by: Michael Muré <batolettre@gmail.com>
License: MIT Signed-off-by: Michael Muré <batolettre@gmail.com>
License: MIT Signed-off-by: Michael Muré <batolettre@gmail.com>
|
@whyrusleeping Can we merge ? |
|
@whyrusleeping I would like a chance to look this over, I will have that done in a day or two |
Stebalien
left a comment
There was a problem hiding this comment.
My primary feedback is that this should better separate the data from the visual representation. We should be thinking about this as both a CLI command and as an API endpoint.
| item := output.Item | ||
|
|
||
| if item.Depth > 1 { | ||
| fmt.Fprint(w, strings.Repeat("│ ", item.Depth-1)) |
There was a problem hiding this comment.
I think this'll have issues in the following case:
.
├── bar
│ └── a
│ └── b
│ └── c
│ └── d
└── foo
I really think we need a PostRun to create this graph correctly.
| return nil | ||
| } | ||
|
|
||
| if output.Type == summaryType { |
There was a problem hiding this comment.
If we had a post-run, we wouldn't have to explicitly include this. We could just include the stats for every entry and aggregate them (makes the interface flexible and reusable and cleanly separates the data from the representation).
There was a problem hiding this comment.
Not including the summary stats in the JSON output would make retrieving this information using a simple tool like jq more difficult. You would probably need to write a small script to compute the summary information. Including extra information that is cheap to compute, in my view, is not a bad thing to include as part of the JSON API.
There was a problem hiding this comment.
I just feel like it's mixing concerns. I'm not exactly against having this information here, I just think it's implied so therefore unnecessary.
There was a problem hiding this comment.
They may be another way (that I just thought of) see #4596 (comment).
| unixType := unixfs.GetType() | ||
|
|
||
| // To distinguish files from chunk | ||
| isFile := unixType == ft.TFile && state.parentIsDirectory |
There was a problem hiding this comment.
Given that we care about different information if we're traversing through a file, we should probably just have a separate walk function. It should significantly reduce the complexity.
| result.Local = false | ||
|
|
||
| if !state.onlySummary { | ||
| state.res.Emit(&treeOutput{ |
There was a problem hiding this comment.
This will make missing blocks in a file look like missing files, if we do display them, I'd just display them as a list of missing blocks (no depth, although you can still pass the depth through if you want).
We should also explicitly mark them as missing blocks. As a matter of fact, we should probably include the types of directories and files as well. That is, I'd expect these "events" to be of the form:
{
"type": "directory|file|missing",
"size": 200, // recursive
"depth": 1,
"lastChild": true|false,
"hash": "QmId",
"name": "thing" // only applies to named things (files and directories, not missing blocks).
}
Note the lack of summary etc, we can compute that in PostRun.
There was a problem hiding this comment.
I would separate the "type" from the status (missing or non-missing).
Even if a file is missing we still know some information about it from the directory info. When the new unixfs lands we should also know its exact type.
I would use directory|file|block|unknown for the possible types. unknown is used when we have a directory entry but we don't know if it is a file or directory. block is used when we know its a part of a file.
There was a problem hiding this comment.
OK, I completely failed to explain this properly.
@kevina file "entries" won't actually be missing because we'll know their names, only blocks will. However, the inability to distinguish between a file and a directory will be annoying. We'll probably have to use the term "unknown" for that.
Maybe "missing-block" is more accurate for missing information?
Basically, I was thinking that the server would send a stream of information like the following (basically, a trace of the dag traversal).
{
"type": "directory",
"size": 800,
"depth": 0,
"lastChild": true, // root
"hash": "QmFoo",
"name": "foo"
}
{
"type": "directory",
"size": 600,
"depth": 1,
"lastChild": true, // root
"hash": "QmBar",
"name": "bar"
}
{
"type": "directory",
"size": 400,
"depth": 2,
"lastChild": true, // root
"hash": "QmBaz",
"name": "baz"
}
{
"type": "file",
"size": 400,
"depth": 2,
"hash": "QmFile1",
"name": "file-1"
}
{
"type": "file",
"size": 400,
"depth": 2,
"hash": "QmFile2",
"name": "file-2"
}
{
"type": "unknown",
"size": 400,
"depth": 2,
"hash": "QmSomething",
"name": "file-or-directory"
"lastChild": true,
}
// Note: we could also collapse this into the above event.
// However, I think making missing blocks separate would be more consistent.
{
"type": "missing-block",
"size": 400,
"depth": 3,
"hash": "QmSomething", // same as above. This represents the fact that we're *missing* the sole block of file-or-directory.
"lastChild": true,
}
There was a problem hiding this comment.
I would collapse it, and instead of using the missing-block use
{
"type": "unknown",
"size": 400,
"depth": 2,
"hash": "QmSomething",
"name": "file-or-directory"
"lastChild": true,
"status": "missing"
}
or something close to that.
There was a problem hiding this comment.
My issue with that is that we might have missing child-blocks and I'd rather not have two different ways to indicate that we have missing information.
On the other hand, missing the top-level block (currently) means that we don't even know if we're dealing with a file or a directory so I guess it is a bit of a special case (at the moment).
There was a problem hiding this comment.
If by missing child block you mean part of a file then I would use:
{
"type": "block",
"size": 400,
"depth": 2,
"hash": "QmSomething",
"status": "missing"
}
I am not sure what the purpose of the lastChild field is.
There was a problem hiding this comment.
if my missing child block you mean part of a file then I would use...
I understand, I'm just trying to reduce the number of cases we have to handle (although I don't fell strongly about this). This is easier to see when assuming that we already encoded filetype information in directory entries:
Missing all blocks (including the root the root):
{
"type": "file",
"size": 400,
"depth": 2,
"hash": "QmFile",
"name": "myFile"
"status": "missing"
}
Missing a child block:
{
"type": "file",
"size": 400,
"depth": 2,
"hash": "QmFileRoot",
"name": "myFile"
}
{
"type": "block",
"size": 400,
"depth": 3,
"hash": "QmFileBlock1",
"name": "file-or-directory",
"status": "missing",
"lastChild": true
}
From a filesystem perspective, both cases are missing (at least one) block of the file. However, we end up representing this differently in both cases.
On the other hand, this is nice because the first case makes it clear that we're missing the entire file.
I am not sure what the purpose of the lastChild field is.
Sorry, that wasn't really necessary for that example. lastChild means the last child at that depth. It allows us to draw the tree without buffering (the current implementation has it for this reason).
There was a problem hiding this comment.
On the other hand, this is nice because the first case makes it clear that we're missing the entire file.
And I think that is an important distinction. For the case when some of the children are missing I would use "status": "incomplete".
(I have thought through many of these case when working with the original filestore, that never made it into ipfs, verify commands. You can find the documentation to that here https://github.com/ipfs-filestore/go-ipfs/blob/master/core/commands/filestore.go#L547. Many of the possible codes don't make since but the 'ok', 'missing' and 'incomplete' codes do.)
Since the original goal of the p.r. was to include information on how much of a tree is downloaded I would also include a field that is the cumulative size found locally (although not sure what to call it). This will work nicely to avoid having to include summary information in the API as the important information will be available by just looking at the root node.
There was a problem hiding this comment.
And I think that is an important distinction. For the case when some of the children are missing I would use
"status": "incomplete".
My point is that thinking of some blocks as "children" and others not is kind of pointless. From the perspective of the filesystem, a file has an array of blocks (no parents and children).
Since the original goal of the p.r. was to include information on how much of a tree is downloaded I would also include a field that is the cumulative size found locally (although not sure what to call it). This will work nicely to avoid having to include summary information in the API as the important information will be available by just looking at the root node.
Unfortunately, we can't compute that until we've processed all the leaves (well after we've sent this event).
| cmdkit.StringArg("path", true, false, "Path to print information about."), | ||
| }, | ||
| Options: []cmdkit.Option{ | ||
| cmdkit.BoolOption("only-summary", "s", "Only print the summary of the tree's information."), |
There was a problem hiding this comment.
That's not a tree. IMO, this doesn't belong in this command.
There was a problem hiding this comment.
Than maybe this command is misnamed. See other comment (#4596 (comment)).
| Hash: link.Cid.String(), | ||
| Type: itemType, | ||
| Item: treeItem{ | ||
| Name: "[missing]", |
There was a problem hiding this comment.
Good example of where this is a reasonable CLI command but a bad API endpoint. This is not the name of the file (also, confusing if I named a file [missing]).
kevina
left a comment
There was a problem hiding this comment.
I agree with all of @Stebalien comments.
The summary does not give a clear indication of missing blocks, in this case one file was missing:
0 directories, 7 files present
20 kB present of 23 kB (88.81%)
I would including something like
1 out of at least 9 blocks MISSING
on a separate line. I use the phrasing "at least" because I am not sure it is possible to get an accurate count of the number of blocks if part of the tree is missing.
I have some additional comments on the purpose of this command that I will leave separately.
|
Guys, I'm struggling to do whatever you want (this PR and #4541 before) so I can actually push forward my own project built on top of IPFS. Without being able to easily and efficiently tell if something is here or not, I'm just stuck. A bit of help would be greatly appreciated, this is a little discouraging .... |
|
If this command is just a fancy way to recursively display a directory then I am okay with it. However, if this command will be the preferred way to determine if a block exists or if a merkle dag has missing blocks than I have serious problems with the presentation method. If the later is the case I would prefer an output that is easier to parse by other standard unix tools such as grep and sed and an option for a flat non-tree view. In particular I think the command should be named something like @MichaelMure I appolgize for the conflicting advice, I think the best thing to do is to wait for @whyrusleeping guidence. |
|
@MichaelMure first, I apologize for any conflicting advice. The issue here is that we have to maintain any patches/features we accept and we have very limited bandwidth to do so. If we add something that has known bugs (not just limitations), we likely won't have time to fix them. Additionally, if we add a bunch (one at a time) of commands that don't generalize well, we'll have to maintain a bunch of independent commands that all do effectively the same thing (we already have this problem, really). To avoid this in the future, I recommend coming up with a complete design in an issue first. #4028 was a good start but the command described there (
This is distinctly not a verify command. It's for getting figuring out how much of a directory tree we have. It should not be under the dag subcommand because it has nothing to do with DAGs (it operates at the unixfs level). At this level, we don't really care about the DAG tree, just the filesystem tree. Also, parsing the output of go-ipfs commands is not a good idea. If you want to parse the output, pass |
|
However, this does raise a good point. We're getting to the point where we need to assign PR shepherds to avoid situations like this... (I've been on the receiving end of reviews like this and I know they're extremely frustrating). |
As I see it it, that was not the original intention based on the original goal of
All I care about the the DAG tree and if all the parts are available locally, not the filesystem tree.
I am not really talking about parsing, but more quick one offs. Doing something like |
The original goal was to determine how much of a DAG we have and how much we still need to download. Unfortunately, we can't do that for general purpose IPLD DAGs because only unixfs dags keep track of cumulative size. That's why we moved to a unixfs oriented command.
We can always add a new command (or an additional flag) but that wouldn't be the Note, I do recommend that you try to get comfortable with |
That seams an overly complicated way to arrive at the original requirements. It also seams weird to me that in order to get the useful stats you have to use the the global If And yes I know about |
These are just two ways to solve similar issues and I wouldn't be opposed to adding a command like that as well. However, this is the approach I suggested on the other PR and it's the approach that's implemented here (and it's something we'll want anyways, that's why I suggested it). Note: You currently have to pass the |
|
@MichaelMure I hate to be the bearer of bad news but... we have this thing called "sharded directories" (we split large directories into multiple blocks) that you may have to deal with. As-is, this command will choke on them. Unfortunately, you'll have to ask @whyrusleeping how they work. |
Being a maintainer myself I certainly understand that. I also understand that good design and UX is critical.
The first design make more sense to me but if you want another design, I'll do another design as long as I can efficiently know how much of a tree is present. One thing that might be confusing about the current PR is that in a normal run, both
The current PR will work with a non-unixfs DAG. The difference is that you will not get the cumulative size, but it's still useful IMHO because you know of much you have. If you want a progress, you can transmit the cumulative size another way. |
You're completely right. For some reason, I thought that PR was spitting out a list of missing blocks (recursively). I'm not sure where I got that impression but it's blatantly wrong and I completely dropped the ball here; I apologize for that. Given that, this command is overkill.
I still worry about having a command with UnixFS logic under an IPLD command. It's kind of like making an HTTP library that understands HTML. However, I'll leave this up to @whyrusleeping; I've already derailed this enough. |
|
So, the requirements from @MichaelMure here are as follows:
Constraints from @Stebalien
Overall, I think this command is pretty close to what we want. The only difficult part now is reconciling the 'summary' and the 'tree' outputs. @Stebalien and @kevina claim that this should be a different command. I'm not entirely sure on that, as then we would end up with something like Anyways, thats my thoughts so far. What do you all think? |
I never made that claim, I don't think @Stebalien did either, he for some reason didn't want the |
I still like |
|
@kevina its not verifying anything though. Its just listing whats local vs whats not. |
|
@whyrusleeping by reading the hashes from the datastore you are verifying that they exists locally. |
|
@dignifiedquire said he like Now i just don't know if it feels weird to have |
|
I don't like |
|
@MichaelMure For now, the best thing for you to do will be to use a version of ipfs with a command that works for you. Its going to take a bit of time to come to consensus on our end on what all this should look like |
|
Proposal: If that flag is set, we compute the amount of that dag we have local, and if possible the total size. That data should be added to the output of the stat command (alongside the other info). We should probably make a new struct type |
|
@whyrusleeping I was thinking something along the same line. |
|
@whyrusleeping could you clarify by "make a new struct type StatOutput instead of augmenting the existing Object" ? Is it this ? type Object struct {
// removed Hash
Size uint64
CumulativeSize uint64
Blocks int
}
type localInfo struct {
Local bool
SizeLocal uint64 `json:",omitempty"`
SizeTotal uint64 `json:",omitempty"`
}
const statType = "stat"
const localityType = "locality"
type StatOutput struct {
Hash string
Type string
Object Object
Locality localInfo
}This would be a breaking change. |
|
@MichaelMure I think the idea was to leave the existing Object untouched and do something like type StatOutput struct {
Object
WithLocality bool `json:",omitempty"`
SizeLocal uint64 `json:",omitempty"`
SizeTotal uint64 `json:",omitempty"`
}However, @whyrusleeping the existing |
|
I think this PR was superseded by #4638. Closing and feel free to reopen if I got that wrong. |
Following #4541, this PR add a
ipfs files treecommandOnly partial tree local: