Skip to content

Fix parsing of large arrays#182

Merged
rupertford merged 7 commits intostfc:masterfrom
pelson:fix_array_assignment
Mar 29, 2019
Merged

Fix parsing of large arrays#182
rupertford merged 7 commits intostfc:masterfrom
pelson:fix_array_assignment

Conversation

@pelson
Copy link

@pelson pelson commented Mar 1, 2019

Closes #167

The order defined by R701 wasn't honoured in subclass_names, hence fparser.two was trying to recursively parse the contents of the brackets before it could figure out that it should try treating the expression as an array assignment.

Happy to add a test if desired. It would look like adding a large-ish source file with 400+ reals in an array. I figured that the order of the subclass_names shouldn't now change, so was 50/50 as to whether a test was really needed.

@pelson pelson mentioned this pull request Mar 1, 2019
@codecov-io
Copy link

codecov-io commented Mar 1, 2019

Codecov Report

Merging #182 into master will increase coverage by 0.6%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #182     +/-   ##
=========================================
+ Coverage   89.16%   89.76%   +0.6%     
=========================================
  Files          33       34      +1     
  Lines       11728    11842    +114     
=========================================
+ Hits        10457    10630    +173     
+ Misses       1271     1212     -59
Impacted Files Coverage Δ
src/fparser/two/Fortran2003.py 92.19% <ø> (+0.02%) ⬆️
src/fparser/common/sourceinfo.py 100% <0%> (ø) ⬆️
src/fparser/common/tests/test_utils.py 100% <0%> (ø) ⬆️
src/fparser/scripts/fparser2.py 100% <0%> (ø) ⬆️
src/fparser/one/tests/test_scripts.py 100% <0%> (ø)
src/fparser/two/utils.py 92.24% <0%> (+0.01%) ⬆️
src/fparser/common/readfortran.py 91.52% <0%> (+0.06%) ⬆️
src/fparser/common/base_classes.py 68.98% <0%> (+0.27%) ⬆️
src/fparser/common/utils.py 78.63% <0%> (+2.9%) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e279d5d...e62b3cd. Read the comment docs.

Copy link
Collaborator

@rupertford rupertford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice simple change that addresses an existing issue that manifests itself in UM code with very long data statements. As @pelson states, the specified order was inconsistent with the specification, although I must admit I am unsure whether the order in the specification indicates a suggested matching order.

We have a PR rule that any class that is changed needs to be updated and appropriate tests added. So could you please do this for the Primary class. For this class it involves restructuring the docstring (see other refactored classes) and adding a new test file with appropriate tests for this rule in tests/fortran2003. Again please see other tests in this directory to see examples. You don't need to check the recursion error, just have a simple test for each of the options and perhaps one error check for invalid input.

FYI, you can find PR rules that are applied in the wiki.

@rupertford rupertford added the reviewed with actions PR has been reviewed and is back with developer label Mar 10, 2019
@pelson
Copy link
Author

pelson commented Mar 11, 2019

although I must admit I am unsure whether the order in the specification indicates a suggested matching order

The answer is that the order of subclass_names is important. https://github.com/stfc/fparser/blob/0.0.8/src/fparser/two/utils.py#L437 iterates through them, and only increments the iteration if there is no match with the previous type https://github.com/stfc/fparser/blob/0.0.8/src/fparser/two/utils.py#L455-L459. The Parenthesis type still causes a recursion error, its just we now never get there because the Array constructor type is matched first.

You don't need to check the recursion error, just have a simple test for each of the options and perhaps one error check for invalid input.

Turns out to have been quite challenging... in a isolated context it isn't actually possible to check all of the subclass_types of Primary - for example, it isn't possible to distinguish a Person (12, 'Jones') structure-constructor with a F( X, Y ) function-reference. There is quite some detail that I'm yet to figure out regarding Part_Refs and how these things are resolved. I've also found it somewhat challenging to find out how things like the Type_Param_Name type is actually defined... 🔎

For this class it involves restructuring the docstring (see other refactored classes)

It wasn't immediately obvious what kind of refactoring was needed wrt. docstrings. I've taken the example of the Program type (https://github.com/stfc/fparser/blob/0.0.8/src/fparser/two/Fortran2003.py#L188-L192) as it appears to have a separate test module (https://github.com/stfc/fparser/blob/0.0.8/src/fparser/two/tests/fortran2003/test_program_r201.py)

@rupertford
Copy link
Collaborator

rupertford commented Mar 11, 2019

The answer is that the order of subclass_names is important. https://github.com/stfc/fparser/blob/0.0.8/src/fparser/two/utils.py#L437 iterates through them, and only increments the iteration if there is no match with the previous type https://github.com/stfc/fparser/blob/0.0.8/src/fparser/two/utils.py#L455-L459. The Parenthesis type still causes a recursion error, its just we now never get there because the Array constructor type is matched first.

Thanks @pelson. Yes, I agree that the order that the rules are tested matters in fparser (at least from a tree pruning perspective). What I was questioning was whether there was an inbuilt ordering of the rules in the Fortran specification. I've looked through the (Fortran2003) spec in more detail and can't find any text that states that a earlier clause in an "or" rule should be computed before a later clause. My reading of this is that, from the point of view of the spec, the ordering is undefined i.e. it does not matter which order the rules are tested.

@rupertford
Copy link
Collaborator

Turns out to have been quite challenging... in a isolated context it isn't actually possible to check all of the subclass_types of Primary - for example, it isn't possible to distinguish a Person (12, 'Jones') structure-constructor with a F( X, Y ) function-reference. There is quite some detail that I'm yet to figure out regarding Part_Refs and how these things are resolved. I've also found it somewhat challenging to find out how things like the Type_Param_Name type is actually defined... 🔎

@pelson, sorry if this is complex. There are currently many rules and sub-rules in fparser2 that have not been looked at, so ambiguities, issues and errors will exist. It is not necessary to fix all of these in a PR, just to document any problems with failing tests (xfail) and/or developer documentation and/or new issues.

@rupertford
Copy link
Collaborator

It wasn't immediately obvious what kind of refactoring was needed wrt. docstrings. I've taken the example of the Program type (https://github.com/stfc/fparser/blob/0.0.8/src/fparser/two/Fortran2003.py#L188-L192) as it appears to have a separate test module (https://github.com/stfc/fparser/blob/0.0.8/src/fparser/two/tests/fortran2003/test_program_r201.py)

Yes @pelson, that is the right thing to do. The docstrings in the new tests are meant to more closely follow the rules in the specification document.

The tests in tests/fortran2XXX/test_yyy.py are meant to (gradually) replace the tests in tests/test_fortran2003.py. As new fparser classes are modified and fully tested there should be a migration from tests in tests/test_fortran2003.py to tests/fortran2XXX/test_YYY.py.

@pelson
Copy link
Author

pelson commented Mar 12, 2019

I've looked through the (Fortran2003) spec in more detail and can't find any text that states that a earlier clause in an "or" rule should be computed before a later clause. My reading of this is that, from the point of view of the spec, the ordering is undefined i.e. it does not matter which order the rules are tested.

Seems like a reasonable interpretation which is supported loosely by the statement:

§1.4.4 (2) : "The rules on statement ordering are defined rigorously in the definition of program-unit (R202). Expression hierarchy is described rigorously in the definition of expr (R722)".

This does then beg the question of whether the correct fix here would be to make the Parenthesis type more robust.

@pelson
Copy link
Author

pelson commented Mar 12, 2019

@rupertford - given the information you've provided, I think the changes I pushed yesterday are ready for re-review. Thanks!

Copy link
Collaborator

@rupertford rupertford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see suggested changes inline. Once you've addressed these could you please change the PR status label to ready for review. Thanks.

@pelson
Copy link
Author

pelson commented Mar 21, 2019

Ok, I've iterated some more on this and think it is ready for further review. I haven't yet opened issues for the xfails - I'd like to get confirmation from you that they are desirable tests before I do.

@rupertford rupertford added ready for review and removed reviewed with actions PR has been reviewed and is back with developer labels Mar 21, 2019
Copy link
Collaborator

@rupertford rupertford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Thanks for adding the tests, I know it is a lot to do for a small change but it makes for much more robust code and an understanding where the code is not robust!

I just have a minor request for the xfailing tests. Once that is done I'll perform a final proper check, rather than checking by inspection. I think I'll add your comment about constraints to our developer documentation as well so that it is captured properly.

@rupertford rupertford added reviewed with actions PR has been reviewed and is back with developer and removed ready for review labels Mar 21, 2019
@pelson pelson added ready for review and removed reviewed with actions PR has been reviewed and is back with developer labels Mar 25, 2019
@pelson pelson assigned rupertford and unassigned pelson Mar 25, 2019
Copy link
Collaborator

@rupertford rupertford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only found 3 minor issue and one small issue associated with searching for subclasses. Other than this the pr conforms to all checks so it can go on master once these issues are addressed.

@rupertford rupertford added reviewed with actions PR has been reviewed and is back with developer and removed under review labels Mar 27, 2019
@rupertford
Copy link
Collaborator

Just these last few issues then it can go on master. Back to @pelson.

@pelson pelson added ready for review and removed reviewed with actions PR has been reviewed and is back with developer labels Mar 28, 2019
@pelson pelson requested a review from rupertford March 28, 2019 10:52
Copy link
Collaborator

@rupertford rupertford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a minor error in test_primary_r701.py line 92.
This file also fails pycodestyle with a spurious space at line 91.
I've also just noticed that some function names are capitalised which pylint is not happy about.

Rather than bouncing this back I will accept and make the changes myself. However this must be done on master after the merge (with the update to changelog) as this merge is from a remote repo.

@rupertford rupertford merged commit fee2240 into stfc:master Mar 29, 2019
rupertford added a commit that referenced this pull request Mar 29, 2019
@pelson pelson deleted the fix_array_assignment branch April 1, 2019 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants