Skip to content

Add tracking a recursion depth in System.IO.FileSystem#48148

Merged
jozkee merged 11 commits intodotnet:mainfrom
iSazonov:add-depth-tracking
Mar 31, 2021
Merged

Add tracking a recursion depth in System.IO.FileSystem#48148
jozkee merged 11 commits intodotnet:mainfrom
iSazonov:add-depth-tracking

Conversation

@iSazonov
Copy link
Contributor

Close #43748

/cc @carlossanlop

@ghost
Copy link

ghost commented Feb 11, 2021

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@iSazonov iSazonov changed the title Add tracking a depth recursion in System.IO.FileSystem Add tracking a recursion depth in System.IO.FileSystem Feb 12, 2021
Copy link
Contributor

@carlossanlop carlossanlop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR, @iSazonov! I left some comments.

Comment on lines +100 to +103
/// <summary>
/// Stop recursion if current depth exceeds the value.
/// Default is int.MaxValue.
/// </summary>
Copy link
Contributor

@carlossanlop carlossanlop Feb 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are working on making triple slash comments the source of truth for documentation, we need to give a proper style and language review here before commiting, so it's ready to get ported to MS Docs. So below is my suggestion.

cc @gewarren in case she thinks my suggestion can be improved:

Suggested change
/// <summary>
/// Stop recursion if current depth exceeds the value.
/// Default is int.MaxValue.
/// </summary>
/// <summary>Gets or sets a value that indicates the maximum directory depth to recurse while enumerating, when <see cref="RecurseSubdirectories" /> is set to <see langword="true" />.</summary>
/// <value>A number that represents the maximum directory depth to recurse while enumerating. The default value is <see cref="int.MaxValue" />.</value>
/// <remarks>If <see cref="MaxRecursionDepth" /> is set to a negative number, the default value <see cref="int.MaxValue" /> is used.
/// If <see cref="MaxRecursionDepth" /> is set to zero, returns the contents of the initial directory.</remarks>

Copy link
Contributor Author

@iSazonov iSazonov Feb 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add in remarks about zero value too:

        /// <remarks>If <see cref="MaxRecursionDepth" /> is set to a negative number, the default value <see cref="int.MaxValue" /> is used. If <see cref="MaxRecursionDepth" /> is set to zero, returns the contents of the initial directory.</remarks>

Waiting @gewarren ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iSazonov I added your extra zero value suggestion to my comment. You can commit the suggestion if you agree. @gewarren is OOF but she can let us know later if we should modify it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sharing experience. In PowerShell repo we do the work async to exclude PR merge delaying. We create and reference in PR description new doc issue (or new PR in PowerShell-Docs repo) so that add new docs after functional PR merged.
This approach resolve follow issues:

  1. English is not native for the contributor
  2. The contributor can not know all rules of creating docs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd change the second remark to something like:

If <see cref="MaxRecursionDepth" /> is set to zero, enumeration returns the contents of the initial directory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 41 to +44
_options = options ?? EnumerationOptions.Default;

_remainingDepth = _options.MaxRecursionDepth < 0 ? EnumerationOptions.Default.MaxRecursionDepth : _options.MaxRecursionDepth;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamsitnik do you know if it's costly to generate a default EnumerationOptions instance just to retrieve the default value of MaxRecursionDepth, as opposed to simply using the underlying default value?

@iSazonov I don't expect we will ever change the default value of MaxRecursionDepth, mainly because it's type (int) would never be changed.

I also noticed that line 41 is already consuming EnumerationOptions.Default, so there are two things we could do here:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A) Save EnumerationOptions.Default to a variable and then consume it in both places:

Suggested change
_options = options ?? EnumerationOptions.Default;
_remainingDepth = _options.MaxRecursionDepth < 0 ? EnumerationOptions.Default.MaxRecursionDepth : _options.MaxRecursionDepth;
var defaultOptions = EnumerationOptions.Default;
_options = options ?? defaultOptions;
_remainingDepth = _options.MaxRecursionDepth < 0 ? defaultOptions.MaxRecursionDepth : _options.MaxRecursionDepth;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or B) Use the default value directly:

Suggested change
_options = options ?? EnumerationOptions.Default;
_remainingDepth = _options.MaxRecursionDepth < 0 ? EnumerationOptions.Default.MaxRecursionDepth : _options.MaxRecursionDepth;
_options = options ?? EnumerationOptions.Default;
_remainingDepth = _options.MaxRecursionDepth < 0 ? int.MaxValue : _options.MaxRecursionDepth;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iSazonov I don't expect we will ever change the default value of MaxRecursionDepth, mainly because it's type (int) would never be changed.

I believe it makes the code more readable if we use a sort of named constants.
Also I guess/hope the runtime will optimize the case.

Waiting @adamsitnik ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to use a named constant (perhaps called DefaultMaxRecursionDepth) inside EnumerationOptions and use it where its necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if (isDirectory && !isSpecialDirectory)
{
if (_options.RecurseSubdirectories && ShouldRecurseIntoEntry(ref entry))
if (_options.RecurseSubdirectories && _remainingDepth > 0 && ShouldRecurseIntoEntry(ref entry))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know from experience that any changes in FileSystemEnumerator.MoveNext could cause ugly performance regressions.

Can you please run the enumeration performance benchmarks, before and after your changes, to see if there's an impact?

In case you've not run benchmarks before, here are the nstructions: dotnet/performance/docs/benchmarking-workflow-dotnet-runtime.md#preventing-regressions

These are the relevant benchmarks: dotnet/performance/src/benchmarks/micro/libraries/System.IO.FileSystem/Perf.Directory.cs.

I am particularly interested in the results for RecursiveDepthData() and EnumerateFiles().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to run these tests, but it may take a long time since I have never done this.
Perhaps @adamsitnik will get ahead of me. :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After my first attempt no regression was found. I am not sure I did everything is correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please share the results here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do but I am not still sure I make right steps to make the work. Docs are not so clear about iterations. If an expert like @adamsitnik published a short (5 minutes) video with demo steps how clean start, make a base and iterate with changes it would be great help (because each iteration takes some hours on my notebook).

Currently I can share only "No differences found between the benchmark results with threshold 1%.". Do you want to get an archive with result files?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is very good documentation in the dotnet/performance repo. You should end up with a file containing markdown tables that you can paste directly into your PR here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it should take no more than a few minutes to run the relevant benchmarks. You can just filter to the file enumeration ones.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I extracted your zip file and compared the results using ResultComparer from the performance repo:

D:\performance\src\tools\ResultsComparer> dotnet run --base "C:\Users\carlos\Desktop\New folder\before" --diff "C:\Users\carlos\Desktop\New folder\after" --threshold 0.00001%
summary:
better: 1, geomean: 1.016
total diff: 1

No Slower results for the provided threshold = 0.00001% and noise filter = 0.3ns.

| Faster                                        | base/diff | Base Median (ns) | Diff Median (ns) | Modality|
| --------------------------------------------- | ---------:| ----------------:| ----------------:| --------:|
| System.IO.Tests.Perf_Directory.EnumerateFiles |      1.02 |       8712455.56 |       8576044.64 |         |

@iSazonov can you run a benchmark test for the other enumeration-related method RecursiveDepthData()? Or even better: just run all the methods in that perf class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carlossanlop I don't see regressions but I can not trust myself and you can not too because I never run perf tests before (I am not sure I do all right). In any case we need a confirmation from more experienced developer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All results for System.IO.Tests.Perf_Directory.*
2.zip

@carlossanlop carlossanlop added this to the Future milestone Feb 16, 2021
iSazonov and others added 3 commits February 18, 2021 11:31
…tions.cs

Co-authored-by: Carlos Sanchez <1175054+carlossanlop@users.noreply.github.com>
Base automatically changed from master to main March 1, 2021 09:07
@iSazonov
Copy link
Contributor Author

@adamsitnik Could you please review?

_rootDirectory = Path.TrimEndingDirectorySeparator(path);
_options = options ?? EnumerationOptions.Default;

_remainingDepth = _options.MaxRecursionDepth < 0 ? EnumerationOptions.Default.MaxRecursionDepth : _options.MaxRecursionDepth;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should throw for negatives rather than take it as "use max value".

Copy link
Contributor Author

@iSazonov iSazonov Mar 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems strange to me that we define it as Int and then throw if negative. If we only want non-negative values ​​then we should either be UInt or just use the default for them.
But I don't know that is a best practice in Runtime.

Copy link
Member

@jozkee jozkee Mar 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One reason for using signed over unsigned is CLS-compliance. Another one is that signed integers are already the standard across many .NET APIs e.g:

FileStream.SetLength(long value)
List<T>.Capacity
JsonSerializerOptions.MaxDepth

Notice that the APIs I linked guard against negatives. We should do the same here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for samples! I added a value check in MaxRecursionDepth.

Done.

Comment on lines +100 to +104
/// <summary>Gets or sets a value that indicates the maximum directory depth to recurse while enumerating, when <see cref="RecurseSubdirectories" /> is set to <see langword="true" />.</summary>
/// <value>A number that represents the maximum directory depth to recurse while enumerating. The default value is <see cref="int.MaxValue" />.</value>
/// <remarks>If <see cref="MaxRecursionDepth" /> is set to a negative number, the default value <see cref="int.MaxValue" /> is used.
/// If <see cref="MaxRecursionDepth" /> is set to zero, enumeration returns the contents of the initial directory.</remarks>
public int MaxRecursionDepth { get; set; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indicates the maximum directory depth to recurse while enumerating, when is set to <see langword="true"

Can we consider throw when someone tries to set a value to this property and RecurseSubdirectories is false? It would make more clear that MaxRecursionDepth is not honored in that case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, it would make sense if we did not have the default, but we have regardless of the value of RecurseSubdirectories.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can think of a few options to deal with this:

  • Keep MaxRecursionDepth as int.MaxValue and throw when set_MaxRecursionDepth is called if RecurseSubdirectories == false (what I already proposed in my previous comment).
  • Keep MaxRecursionDepth as default (zero) and change it to int.MaxValue when RecurseSubdirectories is set to true. Then we can throw when someone tries to call set_MaxRecursionDepth if RecurseSubdirectories == false (I feel that is not very intuitive that a property changes if another one does).
  • Do nothing and expect that the users have read the documentation so they can know that MaxRecursionDepth doesn't work on its own and that is only honored when RecurseSubdirectories == true (what we already have).

cc @carlossanlop @adamsitnik

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer third case. The API allows flipping RecurseSubdirectories on the fly so we would have very complex check code. Recursion is common scenario, checking depth is temporary one. If users turn off recursion and set MaxRecursionDepth to a value, they will not get a destructive result - again, there is always a default value, which also implies that there is nothing destructive there.

Copy link
Member

@jozkee jozkee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iSazonov It seems that the tests need to be updated based on the negative check you just introduced.
PS: I pushed to your branch in order to fix the build issues in CI.

{
public class EnumerationOptions
{
private int _maxRecursionDepth = DefaultMaxRecursionDepth;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency can you please set this field in the ctor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it is already in ctor :-)

Dup initialization removed.

@iSazonov
Copy link
Contributor Author

@iSazonov It seems that the tests need to be updated based on the negative check you just introduced.
PS: I pushed to your branch in order to fix the build issues in CI.

I fixed the test.
Make sense to move the negative test to new file for EnumerationOptions class?

@jozkee
Copy link
Member

jozkee commented Mar 30, 2021

@iSazonov not to a new file but to a separate unit test. Can you please do that?

Copy link
Member

@jozkee jozkee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation looks good to me, with the exception of a few nits. Only pending item is splitting the negative scenario to a separate test.

{
public unsafe abstract partial class FileSystemEnumerator<TResult> : CriticalFinalizerObject, IEnumerator<TResult>
{
private int _remainingDepth;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we rename this as _remainingRecursionDepth?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

/// </summary>
internal static EnumerationOptions Default { get; } = new EnumerationOptions();

internal const int DefaultMaxRecursionDepth = int.MaxValue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please move this constant to the beginning of the class (line 15)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines +42 to +43

_remainingDepth = _options.MaxRecursionDepth;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: whitespace.

Suggested change
_remainingDepth = _options.MaxRecursionDepth;
_remainingDepth = _options.MaxRecursionDepth;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@jozkee jozkee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming that CI passes this LGTM, thanks!

@jozkee jozkee merged commit cee9df8 into dotnet:main Mar 31, 2021
@iSazonov
Copy link
Contributor Author

@jozkee Thanks for review! Could you please explain what Preview will get the API?

@iSazonov iSazonov deleted the add-depth-tracking branch March 31, 2021 08:24
@jozkee
Copy link
Member

jozkee commented Mar 31, 2021

@iSazonov it will be available in Preview 4.

@ghost ghost locked as resolved and limited conversation to collaborators Apr 30, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enhance FileSystemEnumerable to track depth recursion

5 participants