Add tracking a recursion depth in System.IO.FileSystem#48148
Add tracking a recursion depth in System.IO.FileSystem#48148jozkee merged 11 commits intodotnet:mainfrom
Conversation
|
Note regarding the This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change. |
carlossanlop
left a comment
There was a problem hiding this comment.
Thanks for your PR, @iSazonov! I left some comments.
| /// <summary> | ||
| /// Stop recursion if current depth exceeds the value. | ||
| /// Default is int.MaxValue. | ||
| /// </summary> |
There was a problem hiding this comment.
Since we are working on making triple slash comments the source of truth for documentation, we need to give a proper style and language review here before commiting, so it's ready to get ported to MS Docs. So below is my suggestion.
cc @gewarren in case she thinks my suggestion can be improved:
| /// <summary> | |
| /// Stop recursion if current depth exceeds the value. | |
| /// Default is int.MaxValue. | |
| /// </summary> | |
| /// <summary>Gets or sets a value that indicates the maximum directory depth to recurse while enumerating, when <see cref="RecurseSubdirectories" /> is set to <see langword="true" />.</summary> | |
| /// <value>A number that represents the maximum directory depth to recurse while enumerating. The default value is <see cref="int.MaxValue" />.</value> | |
| /// <remarks>If <see cref="MaxRecursionDepth" /> is set to a negative number, the default value <see cref="int.MaxValue" /> is used. | |
| /// If <see cref="MaxRecursionDepth" /> is set to zero, returns the contents of the initial directory.</remarks> |
There was a problem hiding this comment.
We could add in remarks about zero value too:
/// <remarks>If <see cref="MaxRecursionDepth" /> is set to a negative number, the default value <see cref="int.MaxValue" /> is used. If <see cref="MaxRecursionDepth" /> is set to zero, returns the contents of the initial directory.</remarks>
Waiting @gewarren ...
There was a problem hiding this comment.
For sharing experience. In PowerShell repo we do the work async to exclude PR merge delaying. We create and reference in PR description new doc issue (or new PR in PowerShell-Docs repo) so that add new docs after functional PR merged.
This approach resolve follow issues:
- English is not native for the contributor
- The contributor can not know all rules of creating docs
There was a problem hiding this comment.
I'd change the second remark to something like:
If <see cref="MaxRecursionDepth" /> is set to zero, enumeration returns the contents of the initial directory.
| _options = options ?? EnumerationOptions.Default; | ||
|
|
||
| _remainingDepth = _options.MaxRecursionDepth < 0 ? EnumerationOptions.Default.MaxRecursionDepth : _options.MaxRecursionDepth; | ||
|
|
There was a problem hiding this comment.
@adamsitnik do you know if it's costly to generate a default EnumerationOptions instance just to retrieve the default value of MaxRecursionDepth, as opposed to simply using the underlying default value?
@iSazonov I don't expect we will ever change the default value of MaxRecursionDepth, mainly because it's type (int) would never be changed.
I also noticed that line 41 is already consuming EnumerationOptions.Default, so there are two things we could do here:
There was a problem hiding this comment.
A) Save EnumerationOptions.Default to a variable and then consume it in both places:
| _options = options ?? EnumerationOptions.Default; | |
| _remainingDepth = _options.MaxRecursionDepth < 0 ? EnumerationOptions.Default.MaxRecursionDepth : _options.MaxRecursionDepth; | |
| var defaultOptions = EnumerationOptions.Default; | |
| _options = options ?? defaultOptions; | |
| _remainingDepth = _options.MaxRecursionDepth < 0 ? defaultOptions.MaxRecursionDepth : _options.MaxRecursionDepth; | |
There was a problem hiding this comment.
or B) Use the default value directly:
| _options = options ?? EnumerationOptions.Default; | |
| _remainingDepth = _options.MaxRecursionDepth < 0 ? EnumerationOptions.Default.MaxRecursionDepth : _options.MaxRecursionDepth; | |
| _options = options ?? EnumerationOptions.Default; | |
| _remainingDepth = _options.MaxRecursionDepth < 0 ? int.MaxValue : _options.MaxRecursionDepth; | |
There was a problem hiding this comment.
@iSazonov I don't expect we will ever change the default value of
MaxRecursionDepth, mainly because it's type (int) would never be changed.
I believe it makes the code more readable if we use a sort of named constants.
Also I guess/hope the runtime will optimize the case.
Waiting @adamsitnik ...
There was a problem hiding this comment.
+1 to use a named constant (perhaps called DefaultMaxRecursionDepth) inside EnumerationOptions and use it where its necessary.
| if (isDirectory && !isSpecialDirectory) | ||
| { | ||
| if (_options.RecurseSubdirectories && ShouldRecurseIntoEntry(ref entry)) | ||
| if (_options.RecurseSubdirectories && _remainingDepth > 0 && ShouldRecurseIntoEntry(ref entry)) |
There was a problem hiding this comment.
I know from experience that any changes in FileSystemEnumerator.MoveNext could cause ugly performance regressions.
Can you please run the enumeration performance benchmarks, before and after your changes, to see if there's an impact?
In case you've not run benchmarks before, here are the nstructions: dotnet/performance/docs/benchmarking-workflow-dotnet-runtime.md#preventing-regressions
These are the relevant benchmarks: dotnet/performance/src/benchmarks/micro/libraries/System.IO.FileSystem/Perf.Directory.cs.
I am particularly interested in the results for RecursiveDepthData() and EnumerateFiles().
There was a problem hiding this comment.
I will try to run these tests, but it may take a long time since I have never done this.
Perhaps @adamsitnik will get ahead of me. :-)
There was a problem hiding this comment.
After my first attempt no regression was found. I am not sure I did everything is correct.
There was a problem hiding this comment.
Can you please share the results here?
There was a problem hiding this comment.
I will do but I am not still sure I make right steps to make the work. Docs are not so clear about iterations. If an expert like @adamsitnik published a short (5 minutes) video with demo steps how clean start, make a base and iterate with changes it would be great help (because each iteration takes some hours on my notebook).
Currently I can share only "No differences found between the benchmark results with threshold 1%.". Do you want to get an archive with result files?
There was a problem hiding this comment.
There is very good documentation in the dotnet/performance repo. You should end up with a file containing markdown tables that you can paste directly into your PR here.
There was a problem hiding this comment.
Also it should take no more than a few minutes to run the relevant benchmarks. You can just filter to the file enumeration ones.
There was a problem hiding this comment.
I extracted your zip file and compared the results using ResultComparer from the performance repo:
D:\performance\src\tools\ResultsComparer> dotnet run --base "C:\Users\carlos\Desktop\New folder\before" --diff "C:\Users\carlos\Desktop\New folder\after" --threshold 0.00001%
summary:
better: 1, geomean: 1.016
total diff: 1
No Slower results for the provided threshold = 0.00001% and noise filter = 0.3ns.
| Faster | base/diff | Base Median (ns) | Diff Median (ns) | Modality|
| --------------------------------------------- | ---------:| ----------------:| ----------------:| --------:|
| System.IO.Tests.Perf_Directory.EnumerateFiles | 1.02 | 8712455.56 | 8576044.64 | |
@iSazonov can you run a benchmark test for the other enumeration-related method RecursiveDepthData()? Or even better: just run all the methods in that perf class.
There was a problem hiding this comment.
@carlossanlop I don't see regressions but I can not trust myself and you can not too because I never run perf tests before (I am not sure I do all right). In any case we need a confirmation from more experienced developer.
There was a problem hiding this comment.
All results for System.IO.Tests.Perf_Directory.*
2.zip
…tions.cs Co-authored-by: Carlos Sanchez <1175054+carlossanlop@users.noreply.github.com>
|
@adamsitnik Could you please review? |
src/libraries/System.IO.FileSystem/src/System/IO/EnumerationOptions.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.FileSystem/tests/System.IO.FileSystem.Tests.csproj
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.FileSystem/src/System/IO/Enumeration/FileSystemEnumerator.Unix.cs
Show resolved
Hide resolved
| _rootDirectory = Path.TrimEndingDirectorySeparator(path); | ||
| _options = options ?? EnumerationOptions.Default; | ||
|
|
||
| _remainingDepth = _options.MaxRecursionDepth < 0 ? EnumerationOptions.Default.MaxRecursionDepth : _options.MaxRecursionDepth; |
There was a problem hiding this comment.
I wonder if we should throw for negatives rather than take it as "use max value".
There was a problem hiding this comment.
It seems strange to me that we define it as Int and then throw if negative. If we only want non-negative values then we should either be UInt or just use the default for them.
But I don't know that is a best practice in Runtime.
There was a problem hiding this comment.
One reason for using signed over unsigned is CLS-compliance. Another one is that signed integers are already the standard across many .NET APIs e.g:
FileStream.SetLength(long value)
List<T>.Capacity
JsonSerializerOptions.MaxDepth
Notice that the APIs I linked guard against negatives. We should do the same here.
There was a problem hiding this comment.
Thanks for samples! I added a value check in MaxRecursionDepth.
Done.
src/libraries/System.IO.FileSystem/tests/Enumeration/DepthTests.cs
Outdated
Show resolved
Hide resolved
| /// <summary>Gets or sets a value that indicates the maximum directory depth to recurse while enumerating, when <see cref="RecurseSubdirectories" /> is set to <see langword="true" />.</summary> | ||
| /// <value>A number that represents the maximum directory depth to recurse while enumerating. The default value is <see cref="int.MaxValue" />.</value> | ||
| /// <remarks>If <see cref="MaxRecursionDepth" /> is set to a negative number, the default value <see cref="int.MaxValue" /> is used. | ||
| /// If <see cref="MaxRecursionDepth" /> is set to zero, enumeration returns the contents of the initial directory.</remarks> | ||
| public int MaxRecursionDepth { get; set; } |
There was a problem hiding this comment.
indicates the maximum directory depth to recurse while enumerating, when is set to <see langword="true"
Can we consider throw when someone tries to set a value to this property and RecurseSubdirectories is false? It would make more clear that MaxRecursionDepth is not honored in that case.
There was a problem hiding this comment.
In my opinion, it would make sense if we did not have the default, but we have regardless of the value of RecurseSubdirectories.
There was a problem hiding this comment.
I can think of a few options to deal with this:
- Keep MaxRecursionDepth as int.MaxValue and throw when set_MaxRecursionDepth is called if RecurseSubdirectories == false (what I already proposed in my previous comment).
- Keep MaxRecursionDepth as default (zero) and change it to int.MaxValue when RecurseSubdirectories is set to true. Then we can throw when someone tries to call set_MaxRecursionDepth if RecurseSubdirectories == false (I feel that is not very intuitive that a property changes if another one does).
- Do nothing and expect that the users have read the documentation so they can know that MaxRecursionDepth doesn't work on its own and that is only honored when RecurseSubdirectories == true (what we already have).
There was a problem hiding this comment.
I prefer third case. The API allows flipping RecurseSubdirectories on the fly so we would have very complex check code. Recursion is common scenario, checking depth is temporary one. If users turn off recursion and set MaxRecursionDepth to a value, they will not get a destructive result - again, there is always a default value, which also implies that there is nothing destructive there.
| { | ||
| public class EnumerationOptions | ||
| { | ||
| private int _maxRecursionDepth = DefaultMaxRecursionDepth; |
There was a problem hiding this comment.
For consistency can you please set this field in the ctor.
There was a problem hiding this comment.
Ah, it is already in ctor :-)
Dup initialization removed.
I fixed the test. |
|
@iSazonov not to a new file but to a separate unit test. Can you please do that? |
jozkee
left a comment
There was a problem hiding this comment.
Implementation looks good to me, with the exception of a few nits. Only pending item is splitting the negative scenario to a separate test.
| { | ||
| public unsafe abstract partial class FileSystemEnumerator<TResult> : CriticalFinalizerObject, IEnumerator<TResult> | ||
| { | ||
| private int _remainingDepth; |
There was a problem hiding this comment.
Could we rename this as _remainingRecursionDepth?
| /// </summary> | ||
| internal static EnumerationOptions Default { get; } = new EnumerationOptions(); | ||
|
|
||
| internal const int DefaultMaxRecursionDepth = int.MaxValue; |
There was a problem hiding this comment.
Can you please move this constant to the beginning of the class (line 15)?
|
|
||
| _remainingDepth = _options.MaxRecursionDepth; |
There was a problem hiding this comment.
nit: whitespace.
| _remainingDepth = _options.MaxRecursionDepth; | |
| _remainingDepth = _options.MaxRecursionDepth; |
jozkee
left a comment
There was a problem hiding this comment.
Assuming that CI passes this LGTM, thanks!
|
@jozkee Thanks for review! Could you please explain what Preview will get the API? |
|
@iSazonov it will be available in Preview 4. |
Close #43748
/cc @carlossanlop