Skip to content

Add a 'ipfs files tree' command#4596

Closed
MichaelMure wants to merge 6 commits intoipfs:masterfrom
MichaelMure:tree
Closed

Add a 'ipfs files tree' command#4596
MichaelMure wants to merge 6 commits intoipfs:masterfrom
MichaelMure:tree

Conversation

@MichaelMure
Copy link
Copy Markdown
Contributor

Following #4541, this PR add a ipfs files tree command

USAGE
  ipfs files tree <path> - Show informations about a tree of files

SYNOPSIS
  ipfs files tree [--only-summary | -s] [--local2 | -l] [--] <path>

ARGUMENTS

  <path> - Path to print information about.

OPTIONS

  -s, --only-summary bool - Only print the summary of the tree's information.
  -l, --local2       bool - Don't request data from the network.

DESCRIPTION

  'ipfs files tree' display the tree structure of directories/files
  
  The path can be inside of MFS or not.
  
  Examples:
  
  	$ ipfs files tree /ipfs/QmQLXHs7K98JNQdWrBB2cQLJahPhmupbDjRuH1b9ibmwVa/locale
  	QmPA5R3e7FJZpFAT5NYRcYuLJay1qS3enu4zUHAkQMu5uW
  	└── webui-cs.json
  	└── webui-de.json
  	└── webui-en.json
  	└── webui-fr.json
  	└── webui-nl.json
  	└── webui-pl.json
  	└── webui-th.json
  	└── webui-zh.json
  	0 directories, 8 files
  	23 kB present of 23 kB (100.00%)

Only partial tree local:

$ ipfs files tree --local2 /ipfs/QmQLXHs7K98JNQdWrBB2cQLJahPhmupbDjRuH1b9ibmwVa
QmQLXHs7K98JNQdWrBB2cQLJahPhmupbDjRuH1b9ibmwVa
└── [missing]
└── 5da47f31372103568f04118f8d7053f5.png
└── [missing]
└── 735208c91b7b43c74e258505965a6cb4.png
└── 7c550f337e6126b155c8efcc98441540.png
└── [missing]
└── 8f70e85e85c6a23c75c6862292d871c0.png
└── [missing]
└── [missing]
└── [missing]
└── [missing]
└── [missing]
└── [missing]
└── [missing]
└── [missing]
└── [missing]
└── img
│   └── [missing]
│   └── [missing]
└── [missing]
└── ipfs-webui.0.3.0.css
└── ipfs-webui.0.3.0.js
│   └── [missing]
│   └── [missing]
│   └── [missing]
│   └── [missing]
│   └── [missing]
│   └── [missing]
│   └── [missing]
│   └── [missing]
│   └── [missing]
│   └── [missing]
│   └── [missing]
│   └── [missing]
│   └── [missing]
└── locale
│   └── [missing]
│   └── [missing]
│   └── webui-en.json
│   └── [missing]
│   └── [missing]
│   └── [missing]
│   └── [missing]
│   └── [missing]
2 directories, 7 files
284 kB present of 6.9 MB (4.14%)

Options: []cmdkit.Option{
cmdkit.BoolOption("only-summary", "s", "Only print the summary of the tree's information."),
// TODO: "local" is already used by the root ipfs command, what do ?
cmdkit.BoolOption("local2", "l", "Don't request data from the network."),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of a problem right now. You can leave this option out here, and still use local, since its defined globally. The annoying part is that it then doesnt show up in the helptext.

This PR: #4479 should address the issue.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The root --local option means 'bypass the daemon, run directly on the repository', but here it means 'run on the daemon, just don't do any network request'. Is it ok for you to bypass the daemon ?

Otherwise we could rename the option to --no-network or something.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MichaelMure My intent with the global --local flag was to mean "don't do any network requests". Unless others disagree, I would like to move towards that definition everywhere.

// try to decode unixfs
if pn, ok := state.node.(*dag.ProtoNode); ok {
unixfs, err := ft.FromBytes(pn.Data())
if err == nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we should try to rework the logic here to avoid nesting too deeply.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can merge the test in one line if unixfs, err := ft.FromBytes(pn.Data()); err == nil { which make it more clearer, but that doesn't change the nesting depth.
I could also use a goto to jump out of the unixfs section when the decoding fail. What do you think ?

}
}

summary, err := walkBlockStart(res, nd.Context(), dagserv, root, isDirectory, onlySummary)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use req.Context

return nil
}

return fmt.Errorf("unknow output type")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typeo

}

if item.Depth > 0 {
fmt.Fprint(w, "└── ")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use unless this is the last item at this depth. Unfortunately, you'll probably want to use a PostRun function (like the add command) instead of an encoder to do this (as you'll need to keep state).

@MichaelMure
Copy link
Copy Markdown
Contributor Author

As request by @Stebalien, the ouput is neater now:

...
├── coreunix
│   ├── add.go
│   ├── add_test.go
│   ├── cat.go
│   ├── metadata.go
│   ├── metadata_test.go
│   └── test_data
│   │   ├── colors
│   │   │   └── orange
│   │   ├── corps
│   │   │   └── apple
│   │   └── fruits
│   │   │   ├── apple
│   │   │   └── orange
├── mock
│   └── mock.go
├── pathresolver.go
└── pathresolver_test.go

It's still not fully identical to the unix command tree (see apple and orange) but that would require to output for each item if each parent directories are terminal or not. Is that acceptable ?

@whyrusleeping
Copy link
Copy Markdown
Member

@MichaelMure I think you can do it with the information you have already, heres the code for gx deps --tree: https://github.com/whyrusleeping/gx/blob/master/ui.go#L118

It looks like this:

├─ go-log                     QmSpJByNKFX1sCsHBEp3R73FL4NF6FnQTEGyNAXHm2GS52 1.2.0
│  ├─ go-logging              QmQvJiADDe7JR4m968MwXobTCCzUqQkP87aRHe29MEBGHV 0.0.0
│  └─ opentracing-go          QmWLWmRVSiagqP15jczsGME1qpob6HDbtbHAY2he9W5iUo 0.0.3
├─ cors                       QmPG2kW5t27LuHgHnvhUwbHCNHAt2eUcb4gPHqofrESUdB 0.0.0
│  └─ xhandler                QmXMPT8tVwqKHZKWxnrfbYqUEaDJ5SF3Yh3fnFthaSrU37 0.0.0
├─ go-ipfs-util               QmPsAfmDBnZN3kZGSuNwvCNDZiHneERSKmRcFyG3UkvcT3 1.2.6
│  ├─ go-multihash            QmYeKnKpubCMRiq3PGZcTREErthbb5Q9cXsCoSkD9bjEBd 1.0.6
│  │  ├─ go-crypto            QmaPHkZLbQQbvcyavn8q1GFHg6o6yeceyHFSJ3Pjf3p3TQ 0.0.0
│  │  ├─ go-base58            QmT8rehPR3F6bmwL6zjUN8XpiDBFFpMP2myPdC6ApsWfJf 0.0.0
│  │  ├─ keccakpg             QmQPWTeQJnJE7MYu6dJTiNTQRNuqBr41dis6UgY6Uekmgd 0.0.0
│  │  └─ murmur3              QmfJHywXQu98UeZtGJBQrPAR6AtmDjjbe3qjTo9piXHPnx 0.0.0
│  └─ go-base58               QmT8rehPR3F6bmwL6zjUN8XpiDBFFpMP2myPdC6ApsWfJf 0.0.0
├─ go-ipfs-cmdkit             QmQp2a2Hhb7F6eK2A5hN8f9aJy4mtkEikL9Zj4cgB7d1dD 0.3.6
│  ├─ go-ipfs-util            QmPsAfmDBnZN3kZGSuNwvCNDZiHneERSKmRcFyG3UkvcT3 1.2.6
│  │  ├─ go-multihash         QmYeKnKpubCMRiq3PGZcTREErthbb5Q9cXsCoSkD9bjEBd 1.0.6
│  │  │  ├─ go-crypto         QmaPHkZLbQQbvcyavn8q1GFHg6o6yeceyHFSJ3Pjf3p3TQ 0.0.0
│  │  │  ├─ go-base58         QmT8rehPR3F6bmwL6zjUN8XpiDBFFpMP2myPdC6ApsWfJf 0.0.0
│  │  │  ├─ keccakpg          QmQPWTeQJnJE7MYu6dJTiNTQRNuqBr41dis6UgY6Uekmgd 0.0.0
│  │  │  └─ murmur3           QmfJHywXQu98UeZtGJBQrPAR6AtmDjjbe3qjTo9piXHPnx 0.0.0
│  │  └─ go-base58            QmT8rehPR3F6bmwL6zjUN8XpiDBFFpMP2myPdC6ApsWfJf 0.0.0
│  └─ sys                     QmPXvegq26x982cQjSfbTvSzZXn7GiaMwhhVPHkeTEhrPT 0.1.0
├─ go-libp2p-loggables        QmSvcDkiRwB8LuMhUtnvhum2C851Mproo75ZDD19jx43tD 1.1.9
│  ├─ go-libp2p-peer          QmWNY7dV54ZDYmTA1ykVdwNCqC11mpU4zSUp6XDpLTH9eG 2.2.1
│  │  ├─ go-log               QmSpJByNKFX1sCsHBEp3R73FL4NF6FnQTEGyNAXHm2GS52 1.2.0
│  │  │  ├─ go-logging        QmQvJiADDe7JR4m968MwXobTCCzUqQkP87aRHe29MEBGHV 0.0.0
│  │  │  └─ opentracing-go    QmWLWmRVSiagqP15jczsGME1qpob6HDbtbHAY2he9W5iUo 0.0.3
│  │  ├─ go-libp2p-crypto     QmaPbCnUMBohSGo3KnxEa2bHqyJVVeEEcwtqJAYxerieBo 1.5.0
│  │  │  ├─ gogo-protobuf     QmZ4Qi3GaRbjcx28Sme5eMH7RQjGkt8wHxt2a65oLaeFEV 0.0.0
│  │  │  ├─ ed25519           QmQ51pHe6u7CWodkUGDLqaCEMchkbMt7VEZnECF5mp6tVb 0.0.0
│  │  │  └─ btcec             QmWq5PJgAQKDWQerAijYUVKW8mN5MDatK5j7VMp8rizKQd 0.0.1
│  │  ├─ go-multihash         QmYeKnKpubCMRiq3PGZcTREErthbb5Q9cXsCoSkD9bjEBd 1.0.6
│  │  │  ├─ go-crypto         QmaPHkZLbQQbvcyavn8q1GFHg6o6yeceyHFSJ3Pjf3p3TQ 0.0.0
│  │  │  ├─ go-base58         QmT8rehPR3F6bmwL6zjUN8XpiDBFFpMP2myPdC6ApsWfJf 0.0.0
│  │  │  ├─ keccakpg          QmQPWTeQJnJE7MYu6dJTiNTQRNuqBr41dis6UgY6Uekmgd 0.0.0
│  │  │  └─ murmur3           QmfJHywXQu98UeZtGJBQrPAR6AtmDjjbe3qjTo9piXHPnx 0.0.0
│  │  ├─ go-multicodec-packed QmNhVCV7kgAqW6oh6n8m9myxT2ksGPhVZnHkzkBvR5qg2d 0.1.0
│  │  └─ go-base58            QmT8rehPR3F6bmwL6zjUN8XpiDBFFpMP2myPdC6ApsWfJf 0.0.0
│  ├─ go-log                  QmSpJByNKFX1sCsHBEp3R73FL4NF6FnQTEGyNAXHm2GS52 1.2.0
│  │  ├─ go-logging           QmQvJiADDe7JR4m968MwXobTCCzUqQkP87aRHe29MEBGHV 0.0.0
│  │  └─ opentracing-go       QmWLWmRVSiagqP15jczsGME1qpob6HDbtbHAY2he9W5iUo 0.0.3
│  ├─ go-multiaddr            QmW8s4zTsUoX1Q6CeYxVKPyqSKbF7H1YDUyTostBtZ8DaG 1.2.5
│  │  └─ go-multihash         QmYeKnKpubCMRiq3PGZcTREErthbb5Q9cXsCoSkD9bjEBd 1.0.6
│  │     ├─ go-crypto         QmaPHkZLbQQbvcyavn8q1GFHg6o6yeceyHFSJ3Pjf3p3TQ 0.0.0
│  │     ├─ go-base58         QmT8rehPR3F6bmwL6zjUN8XpiDBFFpMP2myPdC6ApsWfJf 0.0.0
│  │     ├─ keccakpg          QmQPWTeQJnJE7MYu6dJTiNTQRNuqBr41dis6UgY6Uekmgd 0.0.0
│  │     └─ murmur3           QmfJHywXQu98UeZtGJBQrPAR6AtmDjjbe3qjTo9piXHPnx 0.0.0
│  └─ go.uuid                 QmcyaFHbyiZfoX5GTpcqqCPYmbjYNAhRDekXSJPFHdYNSV 1.0.0
└─ opentracing-go             QmWLWmRVSiagqP15jczsGME1qpob6HDbtbHAY2he9W5iUo 0.0.3

@whyrusleeping
Copy link
Copy Markdown
Member

Though totally fine to just skip that for now. flashy UI is a bit out of scope for what youre trying to build ;)

License: MIT
Signed-off-by: Michael Muré <batolettre@gmail.com>
License: MIT
Signed-off-by: Michael Muré <batolettre@gmail.com>
License: MIT
Signed-off-by: Michael Muré <batolettre@gmail.com>
License: MIT
Signed-off-by: Michael Muré <batolettre@gmail.com>
License: MIT
Signed-off-by: Michael Muré <batolettre@gmail.com>
License: MIT
Signed-off-by: Michael Muré <batolettre@gmail.com>
@MichaelMure
Copy link
Copy Markdown
Contributor Author

@whyrusleeping Can we merge ?

@kevina kevina self-requested a review January 23, 2018 12:27
@kevina
Copy link
Copy Markdown
Contributor

kevina commented Jan 23, 2018

@whyrusleeping I would like a chance to look this over, I will have that done in a day or two

Copy link
Copy Markdown
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My primary feedback is that this should better separate the data from the visual representation. We should be thinking about this as both a CLI command and as an API endpoint.

item := output.Item

if item.Depth > 1 {
fmt.Fprint(w, strings.Repeat("│   ", item.Depth-1))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this'll have issues in the following case:

.
├── bar
│   └── a
│       └── b
│           └── c
│               └── d
└── foo

I really think we need a PostRun to create this graph correctly.

return nil
}

if output.Type == summaryType {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we had a post-run, we wouldn't have to explicitly include this. We could just include the stats for every entry and aggregate them (makes the interface flexible and reusable and cleanly separates the data from the representation).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not including the summary stats in the JSON output would make retrieving this information using a simple tool like jq more difficult. You would probably need to write a small script to compute the summary information. Including extra information that is cheap to compute, in my view, is not a bad thing to include as part of the JSON API.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just feel like it's mixing concerns. I'm not exactly against having this information here, I just think it's implied so therefore unnecessary.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They may be another way (that I just thought of) see #4596 (comment).

unixType := unixfs.GetType()

// To distinguish files from chunk
isFile := unixType == ft.TFile && state.parentIsDirectory
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we care about different information if we're traversing through a file, we should probably just have a separate walk function. It should significantly reduce the complexity.

result.Local = false

if !state.onlySummary {
state.res.Emit(&treeOutput{
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will make missing blocks in a file look like missing files, if we do display them, I'd just display them as a list of missing blocks (no depth, although you can still pass the depth through if you want).

We should also explicitly mark them as missing blocks. As a matter of fact, we should probably include the types of directories and files as well. That is, I'd expect these "events" to be of the form:

{
  "type": "directory|file|missing",
  "size": 200, // recursive
  "depth": 1,
  "lastChild": true|false,
  "hash": "QmId",
  "name": "thing" // only applies to named things (files and directories, not missing blocks).
}

Note the lack of summary etc, we can compute that in PostRun.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would separate the "type" from the status (missing or non-missing).

Even if a file is missing we still know some information about it from the directory info. When the new unixfs lands we should also know its exact type.

I would use directory|file|block|unknown for the possible types. unknown is used when we have a directory entry but we don't know if it is a file or directory. block is used when we know its a part of a file.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I completely failed to explain this properly.

@kevina file "entries" won't actually be missing because we'll know their names, only blocks will. However, the inability to distinguish between a file and a directory will be annoying. We'll probably have to use the term "unknown" for that.

Maybe "missing-block" is more accurate for missing information?


Basically, I was thinking that the server would send a stream of information like the following (basically, a trace of the dag traversal).

{
  "type": "directory",
  "size": 800,
  "depth": 0,
  "lastChild": true, // root
  "hash": "QmFoo",
  "name": "foo"
}

{
  "type": "directory",
  "size": 600,
  "depth": 1,
  "lastChild": true, // root
  "hash": "QmBar",
  "name": "bar"
}

{
  "type": "directory",
  "size": 400,
  "depth": 2,
  "lastChild": true, // root
  "hash": "QmBaz",
  "name": "baz"
}

{
  "type": "file",
  "size": 400,
  "depth": 2,
  "hash": "QmFile1",
  "name": "file-1"
}

{
  "type": "file",
  "size": 400,
  "depth": 2,
  "hash": "QmFile2",
  "name": "file-2"
}

{
  "type": "unknown",
  "size": 400,
  "depth": 2,
  "hash": "QmSomething",
  "name": "file-or-directory"
  "lastChild": true,
}

// Note: we could also collapse this into the above event.
// However, I think making missing blocks separate would be more consistent.
{
  "type": "missing-block",
  "size": 400,
  "depth": 3,
  "hash": "QmSomething", // same as above. This represents the fact that we're *missing* the sole block of file-or-directory.
  "lastChild": true,
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would collapse it, and instead of using the missing-block use

{
  "type": "unknown",
  "size": 400,
  "depth": 2,
  "hash": "QmSomething",
  "name": "file-or-directory"
  "lastChild": true,
  "status": "missing"
}

or something close to that.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My issue with that is that we might have missing child-blocks and I'd rather not have two different ways to indicate that we have missing information.

On the other hand, missing the top-level block (currently) means that we don't even know if we're dealing with a file or a directory so I guess it is a bit of a special case (at the moment).

Copy link
Copy Markdown
Contributor

@kevina kevina Jan 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If by missing child block you mean part of a file then I would use:

{
  "type": "block",
  "size": 400,
  "depth": 2,
  "hash": "QmSomething",
  "status": "missing"
}

I am not sure what the purpose of the lastChild field is.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if my missing child block you mean part of a file then I would use...

I understand, I'm just trying to reduce the number of cases we have to handle (although I don't fell strongly about this). This is easier to see when assuming that we already encoded filetype information in directory entries:

Missing all blocks (including the root the root):

{
  "type": "file",
  "size": 400,
  "depth": 2,
  "hash": "QmFile",
  "name": "myFile"
  "status": "missing"
}

Missing a child block:

{
  "type": "file",
  "size": 400,
  "depth": 2,
  "hash": "QmFileRoot",
  "name": "myFile"
}
{
  "type": "block",
  "size": 400,
  "depth": 3,
  "hash": "QmFileBlock1",
  "name": "file-or-directory",
  "status": "missing",
  "lastChild": true
}

From a filesystem perspective, both cases are missing (at least one) block of the file. However, we end up representing this differently in both cases.

On the other hand, this is nice because the first case makes it clear that we're missing the entire file.

I am not sure what the purpose of the lastChild field is.

Sorry, that wasn't really necessary for that example. lastChild means the last child at that depth. It allows us to draw the tree without buffering (the current implementation has it for this reason).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, this is nice because the first case makes it clear that we're missing the entire file.

And I think that is an important distinction. For the case when some of the children are missing I would use "status": "incomplete".

(I have thought through many of these case when working with the original filestore, that never made it into ipfs, verify commands. You can find the documentation to that here https://github.com/ipfs-filestore/go-ipfs/blob/master/core/commands/filestore.go#L547. Many of the possible codes don't make since but the 'ok', 'missing' and 'incomplete' codes do.)

Since the original goal of the p.r. was to include information on how much of a tree is downloaded I would also include a field that is the cumulative size found locally (although not sure what to call it). This will work nicely to avoid having to include summary information in the API as the important information will be available by just looking at the root node.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I think that is an important distinction. For the case when some of the children are missing I would use "status": "incomplete".

My point is that thinking of some blocks as "children" and others not is kind of pointless. From the perspective of the filesystem, a file has an array of blocks (no parents and children).

Since the original goal of the p.r. was to include information on how much of a tree is downloaded I would also include a field that is the cumulative size found locally (although not sure what to call it). This will work nicely to avoid having to include summary information in the API as the important information will be available by just looking at the root node.

Unfortunately, we can't compute that until we've processed all the leaves (well after we've sent this event).

cmdkit.StringArg("path", true, false, "Path to print information about."),
},
Options: []cmdkit.Option{
cmdkit.BoolOption("only-summary", "s", "Only print the summary of the tree's information."),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not a tree. IMO, this doesn't belong in this command.

Copy link
Copy Markdown
Contributor

@kevina kevina Jan 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Than maybe this command is misnamed. See other comment (#4596 (comment)).

Hash: link.Cid.String(),
Type: itemType,
Item: treeItem{
Name: "[missing]",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good example of where this is a reasonable CLI command but a bad API endpoint. This is not the name of the file (also, confusing if I named a file [missing]).

Copy link
Copy Markdown
Contributor

@kevina kevina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with all of @Stebalien comments.

The summary does not give a clear indication of missing blocks, in this case one file was missing:

0 directories, 7 files present
20 kB present of 23 kB (88.81%)

I would including something like

1 out of at least 9 blocks MISSING

on a separate line. I use the phrasing "at least" because I am not sure it is possible to get an accurate count of the number of blocks if part of the tree is missing.

I have some additional comments on the purpose of this command that I will leave separately.

@MichaelMure
Copy link
Copy Markdown
Contributor Author

Guys, I'm struggling to do whatever you want (this PR and #4541 before) so I can actually push forward my own project built on top of IPFS. Without being able to easily and efficiently tell if something is here or not, I'm just stuck.

A bit of help would be greatly appreciated, this is a little discouraging ....

@kevina
Copy link
Copy Markdown
Contributor

kevina commented Jan 24, 2018

If this command is just a fancy way to recursively display a directory then I am okay with it.

However, if this command will be the preferred way to determine if a block exists or if a merkle dag has missing blocks than I have serious problems with the presentation method.

If the later is the case I would prefer an output that is easier to parse by other standard unix tools such as grep and sed and an option for a flat non-tree view. In particular I think the command should be named something like ipfs dag verify and the output closer to ipfs filestore verify. That is a command that recursively lists all the blocks and verifies if they exist locally and outputs in a way that can easily be filtered with standard unix tools such as grep with meaningful results.

@MichaelMure I appolgize for the conflicting advice, I think the best thing to do is to wait for @whyrusleeping guidence.

@Stebalien
Copy link
Copy Markdown
Member

@MichaelMure first, I apologize for any conflicting advice.

The issue here is that we have to maintain any patches/features we accept and we have very limited bandwidth to do so. If we add something that has known bugs (not just limitations), we likely won't have time to fix them. Additionally, if we add a bunch (one at a time) of commands that don't generalize well, we'll have to maintain a bunch of independent commands that all do effectively the same thing (we already have this problem, really).

To avoid this in the future, I recommend coming up with a complete design in an issue first. #4028 was a good start but the command described there (ipfs repo has) is nothing like the command implemented in #4541, the former spit out a single answer and the latter built a tree. This command spits out a tree but has a few noticeable bugs and, more importantly, has some design issues that will make it difficult to extend and maintain (it does too much UI work on the server instead of just giving the client what it needs and letting the client decide how to format it).

@kevina

If this command is just a fancy way to recursively display a directory then I am okay with it.

This is distinctly not a verify command. It's for getting figuring out how much of a directory tree we have.

It should not be under the dag subcommand because it has nothing to do with DAGs (it operates at the unixfs level). At this level, we don't really care about the DAG tree, just the filesystem tree.

Also, parsing the output of go-ipfs commands is not a good idea. If you want to parse the output, pass --encoding=json. This is why I care about what the server responds with.

@Stebalien
Copy link
Copy Markdown
Member

Stebalien commented Jan 24, 2018

However, this does raise a good point. We're getting to the point where we need to assign PR shepherds to avoid situations like this... (I've been on the receiving end of reviews like this and I know they're extremely frustrating).

@kevina
Copy link
Copy Markdown
Contributor

kevina commented Jan 24, 2018

This is distinctly not a verify command. It's for getting figuring out how much of a directory tree we have.

As I see it it, that was not the original intention based on the original goal of ipfs repo has.

It should not be under the dag subcommand because it has nothing to do with DAGs (it operates at the unixfs level). At this level, we don't really care about the DAG tree, just the filesystem tree.

All I care about the the DAG tree and if all the parts are available locally, not the filesystem tree.

Also, parsing the output of go-ipfs commands is not a good idea. If you want to parse the output, pass --encoding=json. This is why I care about what the server responds with.

I am not really talking about parsing, but more quick one offs. Doing something like ipfs cmd | grep MISSING and have that produce useful results is simplifier than parsing the JSON response. This output does not lead itself well to that.

@Stebalien
Copy link
Copy Markdown
Member

Stebalien commented Jan 24, 2018

As I see it it, that was not the original intention based on the original goal of ipfs repo has.
...
All I care about the the DAG tree and if all the parts are available locally, not the filesystem tree.

The original goal was to determine how much of a DAG we have and how much we still need to download. Unfortunately, we can't do that for general purpose IPLD DAGs because only unixfs dags keep track of cumulative size. That's why we moved to a unixfs oriented command.

I am not really talking about parsing, but more quick one offs. Doing something like ipfs cmd | grep MISSING and have that produce useful results is simplifier than parsing the JSON response. This output does not lead itself well to that.

We can always add a new command (or an additional flag) but that wouldn't be the tree command. This command is the equivalent of unix tree (and satisfies the original requirement because one can just use the API to extract the desired information).

Note, I do recommend that you try to get comfortable with jq. It will change your life. The command here would be ipfs files tree --encoding=json | jq -r 'select(.Type == "missing") | .Hash'.

@kevina
Copy link
Copy Markdown
Contributor

kevina commented Jan 24, 2018

This command is the equivalent of unix tree (and satisfies the original requirement because one can just use the API to extract the desired information).

That seams an overly complicated way to arrive at the original requirements. It also seams weird to me that in order to get the useful stats you have to use the the global --local option.

If dag verify won't work how about files verify to directly get the information we want? The --tree option to that command could produce the output of files tree --local. I consider this a verify command, we are verifying/checking the blocks are available locally.

And yes I know about jq but it is not a standard tool that is automatically installed in most environments just yet unlike say grep/sed/awk or even perl and python.

@Stebalien
Copy link
Copy Markdown
Member

That seams an overly complicated way to arrive at the original requirements. It also seams weird to me that in order to get the useful stats you have to use the the global --local option.

These are just two ways to solve similar issues and I wouldn't be opposed to adding a command like that as well. However, this is the approach I suggested on the other PR and it's the approach that's implemented here (and it's something we'll want anyways, that's why I suggested it).

Note: You currently have to pass the --local flag, however, in a followup PR, I'd like to change that such that not passing the --local flag simply causes this command to download the directory structure but not the files themselves. Personally, I think that would be the most useful invocation for this particular usecase. This was my original proposal but I haven't pushed that here as we can easily fix it later.

@Stebalien
Copy link
Copy Markdown
Member

@MichaelMure I hate to be the bearer of bad news but... we have this thing called "sharded directories" (we split large directories into multiple blocks) that you may have to deal with. As-is, this command will choke on them. Unfortunately, you'll have to ask @whyrusleeping how they work.

@MichaelMure
Copy link
Copy Markdown
Contributor Author

@Stebalien

The issue here is that we have to maintain any patches/features we accept and we have very limited bandwidth to do so. If we add something that has known bugs (not just limitations), we likely won't have time to fix them. Additionally, if we add a bunch (one at a time) of commands that don't generalize well, we'll have to maintain a bunch of independent commands that all do effectively the same thing (we already have this problem, really).

Being a maintainer myself I certainly understand that. I also understand that good design and UX is critical.

To avoid this in the future, I recommend coming up with a complete design in an issue first. #4028 was a good start but the command described there (ipfs repo has) is nothing like the command implemented in #4541, the former spit out a single answer and the latter built a tree. This command spits out a tree but has a few noticeable bugs and, more importantly, has some design issues that will make it difficult to extend and maintain (it does too much UI work on the server instead of just giving the client what it needs and letting the client decide how to format it).

ipfs repo has was a direct implementation of what was discussed shortly in #4028 (at least in my understanding). You and @whyrusleeping asked me to change the design to what this PR is now (with some delta, there is always some delta).

The first design make more sense to me but if you want another design, I'll do another design as long as I can efficiently know how much of a tree is present.

One thing that might be confusing about the current PR is that in a normal run, both treeItem and treeSummary are outputted. The original design was to have that separated but as I get all the information in a single pass, I thought that I might as well output the summary to show relevant information just after the tree. That's why there is now an inverted --only-summary option.

The original goal was to determine how much of a DAG we have and how much we still need to download. Unfortunately, we can't do that for general purpose IPLD DAGs because only unixfs dags keep track of cumulative size. That's why we moved to a unixfs oriented command.

The current PR will work with a non-unixfs DAG. The difference is that you will not get the cumulative size, but it's still useful IMHO because you know of much you have. If you want a progress, you can transmit the cumulative size another way.

@Stebalien
Copy link
Copy Markdown
Member

ipfs repo has was a direct implementation of what was discussed shortly in #4028 (at least in my understanding). You and @whyrusleeping asked me to change the design to what this PR is now (with some delta, there is always some delta).

You're completely right. For some reason, I thought that PR was spitting out a list of missing blocks (recursively). I'm not sure where I got that impression but it's blatantly wrong and I completely dropped the ball here; I apologize for that. Given that, this command is overkill.

The current PR will work with a non-unixfs DAG. The difference is that you will not get the cumulative size, but it's still useful IMHO because you know of much you have. If you want a progress, you can transmit the cumulative size another way.

I still worry about having a command with UnixFS logic under an IPLD command. It's kind of like making an HTTP library that understands HTML. However, I'll leave this up to @whyrusleeping; I've already derailed this enough.

@whyrusleeping
Copy link
Copy Markdown
Member

So, the requirements from @MichaelMure here are as follows:

  • Given a hash, return how much of it is local
  • Ideally operates over unixfs (for cumulative size support)

Constraints from @Stebalien

  • API needs to be nice AND CLI needs to be nice
  • shouldnt bork on non-unixfs graphs
  • API shouldnt have ambiguous fields ("[missing]" thing)

Overall, I think this command is pretty close to what we want. The only difficult part now is reconciling the 'summary' and the 'tree' outputs. @Stebalien and @kevina claim that this should be a different command. I'm not entirely sure on that, as then we would end up with something like ipfs files tree and ipfs files tree-summary. The original needs that spinned into this PR didnt even call for a tree at all. The original idea was a has command, I don't think that has is the right verb, the thing we're really after here is essentially 'progress' information about a transfer, but that still doesnt feel quite right because it implies an active transfer of data, when you might just be wanting to query whats there right now, kinda like getting that subtrees status. So maybe it fits in ipfs files stat, could be an extra (specifically named) flag like --local-data-info? The tree stuff can all stay in ipfs files tree (and we can punt on that for now since @MichaelMure doesnt even really need that).

Anyways, thats my thoughts so far. What do you all think?

@kevina
Copy link
Copy Markdown
Contributor

kevina commented Jan 24, 2018

@whyrusleeping

Overall, I think this command is pretty close to what we want. The only difficult part now is reconciling the 'summary' and the 'tree' outputs. @Stebalien and @kevina claim that this should be a different command.

I never made that claim, I don't think @Stebalien did either, he for some reason didn't want the --summary-only option for the ipfs files tree command that this p.r. implements.

@kevina
Copy link
Copy Markdown
Contributor

kevina commented Jan 24, 2018

So maybe it fits in ipfs files stat, could be an extra (specifically named) flag like --local-data-info?

I still like ipfs files verify after all we are verifying how much of the file is available locally.

@whyrusleeping
Copy link
Copy Markdown
Member

@kevina its not verifying anything though. Its just listing whats local vs whats not.

@kevina
Copy link
Copy Markdown
Contributor

kevina commented Jan 24, 2018

@whyrusleeping by reading the hashes from the datastore you are verifying that they exists locally.
Now that I think about it stat would be okay also. That is assuming stat will only read the internal nodes of the tree and (possible as an optimization) use has on leaves to figure out what percentage of the tree is available locally.

@whyrusleeping
Copy link
Copy Markdown
Member

@dignifiedquire said he like ipfs files status, other suggestions were ipfs progress or ipfs files progress, but progress seems to imply too much that theres something ongoing.

Now i just don't know if it feels weird to have ipfs files stat and ipfs files status...

@kevina
Copy link
Copy Markdown
Contributor

kevina commented Jan 24, 2018

I don't like files progress at all. By first vote would be for files verify, the second files stat.

@whyrusleeping
Copy link
Copy Markdown
Member

@MichaelMure For now, the best thing for you to do will be to use a version of ipfs with a command that works for you. Its going to take a bit of time to come to consensus on our end on what all this should look like

@whyrusleeping
Copy link
Copy Markdown
Member

whyrusleeping commented Jan 28, 2018

Proposal:

ipfs files stat --with-local

If that flag is set, we compute the amount of that dag we have local, and if possible the total size. That data should be added to the output of the stat command (alongside the other info). We should probably make a new struct type StatOutput instead of augmenting the existing Object type: https://github.com/ipfs/go-ipfs/blob/master/core/commands/files/files.go#L305

@whyrusleeping
Copy link
Copy Markdown
Member

@kevina
Copy link
Copy Markdown
Contributor

kevina commented Jan 28, 2018

@whyrusleeping I was thinking something along the same line.

@MichaelMure
Copy link
Copy Markdown
Contributor Author

@whyrusleeping could you clarify by "make a new struct type StatOutput instead of augmenting the existing Object" ? Is it this ?

type Object struct {
        // removed Hash
	Size           uint64
	CumulativeSize uint64
	Blocks         int
}

type localInfo struct {
	Local     bool
	SizeLocal uint64 `json:",omitempty"`
	SizeTotal uint64 `json:",omitempty"`
}

const statType = "stat"
const localityType = "locality"

type StatOutput struct {
	Hash string
	Type string

	Object   Object
	Locality localInfo
}

This would be a breaking change.

@kevina
Copy link
Copy Markdown
Contributor

kevina commented Jan 29, 2018

@MichaelMure I think the idea was to leave the existing Object untouched and do something like

type StatOutput struct {
	Object
	WithLocality bool   `json:",omitempty"`
	SizeLocal    uint64 `json:",omitempty"`
	SizeTotal    uint64 `json:",omitempty"`
}

However, @whyrusleeping the existing Object struct is only used locally within the same file so I am not sure in the value of this It may be better to just rename it to StatOutput and use WithLocality to determine if locality information is present.

@eingenito
Copy link
Copy Markdown
Contributor

I think this PR was superseded by #4638. Closing and feel free to reopen if I got that wrong.

@eingenito eingenito closed this Dec 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants