Update QC suite#117
Conversation
|
What do we want the QC checks to return? In the original code there are a number of states in the code: QC checks generally return 0 or 1. This could be a boolean as you suggest in the code comments, but we would need to handle the case where a QC flag is not set. The bayesian buddy check is also different (I think) as it returns a value between 0 and 9, but this could be converted to a pass/fail by choosing a threshold. |
|
My first approach was:
But we could keep the original number as well. I think,, for the first step we could keep it as it is. |
|
I'll re-write the format of the docstrings to something like this. Could you please add a docstring to every new function. I'll implement this in the documentation's API later. |
No problem. That's numpy-style docstrings, right?
OK. I will stick to the current scheme which is 0=pass 1=fail for each test. |
|
As I go through, I'm just converting every function for the time being. There are some questions in the comments about whether some functions are needed because they are already done while mapping to the CDM. We can deal with those later, if they are unnecessary. |
|
Some of the QC checks set multiple flags. For example, do_base_mat_qc does four separate checks:
Each of these should be a separate function, which is how I will write it. but those separate functions are very similar for SST, AT, DPT etc. so there will be lots of small functions. for example, the second check (checks if the air temperature anomaly is within acceptable bounds) takes the same form for a range of different variables: the value minus climatology is compared to the upper and lower bounds. |
|
There are some routines that have been copied in to the next_level_qc.py script that run a set of qc routines. These don't work the same as individual checks, In the original code, each qc check set a named qc flag, which was stored in the marine report. The routines like "perform_base_qc" called a group of QC checks to set lots of different flags. There's no single QC decision that comes out of the "perform_base_qc" routine. The idea is to set lots of flags which a user can then use to make their own selection. It also means it is easy to add extra QC checks because you can just add new flags. In the marine_qc.py script, the QC Filters are used to do this kind of selection. |
Right.
That good. |
Ok. First we need the base functions. We go step by step. |
Lots of functions are pretty good. I think, we don't need functions that set multiple flags but call the functions we need one after the other. That makes the entire code more readable. |
Maybe we should delete the "perform_base_qc" routine and replace it with several function calls one after the other. |
|
|
||
| def sun_position(time): | ||
| """Find position of sun in celestial sphere, assuming circular orbit (radians).""" | ||
| def sun_position(time: float) -> float: |
There was a problem hiding this comment.
Here are some "astronomical" functions. Maybe we could import that from other routines:
E.g.
https://github.com/Ouranosinc/xclim/blob/main/src/xclim/indices/helpers.py
There was a problem hiding this comment.
Maybe we can find some functions here: https://www.astropy.org/
|
@jjk-code-otter: In general, I think that in order to work together successfully on the code, it makes sense to push all changes (minor and major) as soon as possible. This makes it easier to discuss. Then, you can review my code and vice versa. |
If I'm working from your repository then I will commit and push more frequently. We can optimise the rate of pushes as we work. |
That's good. I wrote this comment before I recognized your first PR. |
|
In 460551a you can see how I imagined the base_qc_functions: You can see this in the skeleton db tests too: https://github.com/ludwiglierhammer/glamod-marine-processing/blob/next_level_qc/tests/test_qc_on_db.py#L33-L44 I think, we have to split
Do you think my thoughts are reasonable? If so could you please fill the functions, add docstrings and maybe give them new names? |
|
OK, I see. In fact, we have lots of QC flags, so not just I had foreseen something more like:
The idea was to make the system more flexible, in which case we need some way to separate the code from the choice of which QC tests to run. If we hard code which tests are to be used (and in which combination) then someone has to modify the code to run in different configurations. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #117 +/- ##
==========================================
- Coverage 16.97% 15.93% -1.05%
==========================================
Files 42 15 -27
Lines 7063 797 -6266
==========================================
- Hits 1199 127 -1072
+ Misses 5864 670 -5194
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
The function tests are running 🎉. And you pushed the code coverage from 16 to 24% The next step is to apply this on a Databundle. |
Is there a target for coverage? |
No, but the more the better. I hope that the badge will turn from red to green at some point. |
Next level qc tidy up
for more information, see https://pre-commit.ci
|
@jjk-code-otter: I'll tidy up this PR (remove all QC-related stuff) and merge it. Then we can simply import |
There was a problem hiding this comment.
@jjk-code-otter: Do you think this is enough information in the CHANGELOG?
There was a problem hiding this comment.
Is it a good idea to mention that the old marine_qc_scripts are now effectively part of the standard processing levels (if they are)?
|
@jjk-code-otter: I would merge this PR. Do you agree or do you have any further suggestions/comments? |
I don't have any objections (but see my small comment about CHANGE.rst) |
|
@jjk-code-otter: We are finally getting closer to success 🎉 |
This PR is based on #108.
Hi @jjk-code-otter, I started to implement a new QC structure:
next_level_qc.py: This is the main QC routine (see routines frommodules/Extended_IMMA_sb.py)test_qc_functions.py: skeleton tests for QC functionstest_qc_on_db.py: skeleton tests for QC functions applied on aDataBundleYou can fork this branch and create a new PR for this:
next_level_qc.pytest_qc_functions.pyIn the meantime, I try to adjust the QC scripts (
qc_suite/scripts).If we are happy with the row-by-row QC, we can proceed to the next step.
Tasks
qc_suitetoobs_suite.level1aobs_suite.level1a