Update QC suite by ludwiglierhammer · Pull Request #117 · glamod/glamod-marine-processing

ludwiglierhammer · 2025-04-17T10:01:13Z

This PR is based on #108.

Hi @jjk-code-otter, I started to implement a new QC structure:

next_level_qc.py: This is the main QC routine (see routines from modules/Extended_IMMA_sb.py)
test_qc_functions.py: skeleton tests for QC functions
test_qc_on_db.py: skeleton tests for QC functions applied on a DataBundle

You can fork this branch and create a new PR for this:

fill funcitons in next_level_qc.py
fill skeleton tests in test_qc_functions.py

In the meantime, I try to adjust the QC scripts (qc_suite/scripts).

If we are happy with the row-by-row QC, we can proceed to the next step.

Tasks

jjk-code-otter · 2025-04-17T11:11:25Z

What do we want the QC checks to return?

In the original code there are a number of states in the code:
0 = pass
1 = fail
9 = QC flag not set (neither pass nor fail)

QC checks generally return 0 or 1.

This could be a boolean as you suggest in the code comments, but we would need to handle the case where a QC flag is not set.

The bayesian buddy check is also different (I think) as it returns a value between 0 and 9, but this could be converted to a pass/fail by choosing a threshold.

ludwiglierhammer · 2025-04-17T11:20:04Z

My first approach was:

True = passed
False = failed
None = not checked

But we could keep the original number as well.

I think,, for the first step we could keep it as it is.

ludwiglierhammer · 2025-04-17T11:22:53Z

I'll re-write the format of the docstrings to something like this.

Could you please add a docstring to every new function. I'll implement this in the documentation's API later.

jjk-code-otter · 2025-04-17T12:47:19Z

Could you please add a docstring to every new function. I'll implement this in the documentation's API later.

No problem. That's numpy-style docstrings, right?

I think,, for the first step we could keep it as it is.

OK. I will stick to the current scheme which is 0=pass 1=fail for each test.

jjk-code-otter · 2025-04-17T13:09:50Z

As I go through, I'm just converting every function for the time being. There are some questions in the comments about whether some functions are needed because they are already done while mapping to the CDM. We can deal with those later, if they are unnecessary.

jjk-code-otter · 2025-04-17T15:03:00Z

Some of the QC checks set multiple flags. For example, do_base_mat_qc does four separate checks:

checks if air temperature is missing
checks if the air temperature anomaly is within acceptable bounds
checks if a climatology values is present
checks that air temperature is within a set range (hard limits).

Each of these should be a separate function, which is how I will write it. but those separate functions are very similar for SST, AT, DPT etc. so there will be lots of small functions.

for example, the second check (checks if the air temperature anomaly is within acceptable bounds) takes the same form for a range of different variables: the value minus climatology is compared to the upper and lower bounds.

jjk-code-otter · 2025-04-18T13:30:57Z

There are some routines that have been copied in to the next_level_qc.py script that run a set of qc routines. These don't work the same as individual checks,

In the original code, each qc check set a named qc flag, which was stored in the marine report. The routines like "perform_base_qc" called a group of QC checks to set lots of different flags. There's no single QC decision that comes out of the "perform_base_qc" routine.

The idea is to set lots of flags which a user can then use to make their own selection. It also means it is easy to add extra QC checks because you can just add new flags. In the marine_qc.py script, the QC Filters are used to do this kind of selection.

ludwiglierhammer · 2025-04-22T10:33:15Z

Could you please add a docstring to every new function. I'll implement this in the documentation's API later.

No problem. That's numpy-style docstrings, right?

Right.

I think,, for the first step we could keep it as it is.

OK. I will stick to the current scheme which is 0=pass 1=fail for each test.

That good.

ludwiglierhammer · 2025-04-22T10:34:13Z

As I go through, I'm just converting every function for the time being. There are some questions in the comments about whether some functions are needed because they are already done while mapping to the CDM. We can deal with those later, if they are unnecessary.

Ok. First we need the base functions. We go step by step.

ludwiglierhammer · 2025-04-22T10:35:56Z

Some of the QC checks set multiple flags. For example, do_base_mat_qc does four separate checks:

checks if air temperature is missing

checks if the air temperature anomaly is within acceptable bounds

checks if a climatology values is present

checks that air temperature is within a set range (hard limits).

Each of these should be a separate function, which is how I will write it. but those separate functions are very similar for SST, AT, DPT etc. so there will be lots of small functions.

for example, the second check (checks if the air temperature anomaly is within acceptable bounds) takes the same form for a range of different variables: the value minus climatology is compared to the upper and lower bounds.

Lots of functions are pretty good. I think, we don't need functions that set multiple flags but call the functions we need one after the other. That makes the entire code more readable.

ludwiglierhammer · 2025-04-22T10:37:17Z

There are some routines that have been copied in to the next_level_qc.py script that run a set of qc routines. These don't work the same as individual checks,

In the original code, each qc check set a named qc flag, which was stored in the marine report. The routines like "perform_base_qc" called a group of QC checks to set lots of different flags. There's no single QC decision that comes out of the "perform_base_qc" routine.

The idea is to set lots of flags which a user can then use to make their own selection. It also means it is easy to add extra QC checks because you can just add new flags. In the marine_qc.py script, the QC Filters are used to do this kind of selection.

Maybe we should delete the "perform_base_qc" routine and replace it with several function calls one after the other.

ludwiglierhammer · 2025-04-22T10:40:35Z


-def sun_position(time):
-    """Find position of sun in celestial sphere, assuming circular orbit (radians)."""
+def sun_position(time: float) -> float:


Here are some "astronomical" functions. Maybe we could import that from other routines:

E.g.
https://github.com/Ouranosinc/xclim/blob/main/src/xclim/indices/helpers.py

Maybe we can find some functions here: https://www.astropy.org/

ludwiglierhammer · 2025-04-22T13:30:02Z

@jjk-code-otter: In general, I think that in order to work together successfully on the code, it makes sense to push all changes (minor and major) as soon as possible. This makes it easier to discuss. Then, you can review my code and vice versa.

jjk-code-otter · 2025-04-24T08:30:31Z

@jjk-code-otter: In general, I think that in order to work together successfully on the code, it makes sense to push all changes (minor and major) as soon as possible. This makes it easier to discuss. Then, you can review my code and vice versa.

If I'm working from your repository then I will commit and push more frequently. We can optimise the rate of pushes as we work.

ludwiglierhammer · 2025-04-24T08:33:29Z

@jjk-code-otter: In general, I think that in order to work together successfully on the code, it makes sense to push all changes (minor and major) as soon as possible. This makes it easier to discuss. Then, you can review my code and vice versa.

If I'm working from your repository then I will commit and push more frequently. We can optimise the rate of pushes as we work.

That's good. I wrote this comment before I recognized your first PR.

ludwiglierhammer · 2025-04-24T08:56:11Z

In 460551a you can see how I imagined the base_qc_functions:

from cdm_reader_mapper import read_tables

db = read_tables(path_to_cdm_tables)

db[("header", "report_quality")] = db.apply(lambda row: do_base_qc_header(
    list_of_parameters,
    do_first_step=True,
    ...,
    do_last_step=True,
))

db[("observations-at", "quality_flag")] = db.apply(lambda row: do_base_qc_observation(
    list_of_parameters,
    do_first_step=True,
    ...,
    do_last_step=True,
))

You can see this in the skeleton db tests too: https://github.com/ludwiglierhammer/glamod-marine-processing/blob/next_level_qc/tests/test_qc_on_db.py#L33-L44

I think, we have to split do_base_qc_header since we have three quality flags in the header file:

report_quality
location_quality
report_time_quality

Do you think my thoughts are reasonable? If so could you please fill the functions, add docstrings and maybe give them new names?

jjk-code-otter · 2025-04-24T09:13:21Z

OK, I see. In fact, we have lots of QC flags, so not just report_quality, location_quality, and report_time_quality but things like sst_freeze, position_check, super_saturation etc.

I had foreseen something more like:

tests_to_run = load_config(config_filename)

def test_header_all(tests_to_run):
    db_header = read_tables(cache_dir, cdm_tables="header")
    for test in tests_to_run:
        db_header[test.name] = db_header.apply(lambda row: test.function(row, parameters={},), axis=1,)

load_config would take the config_filename and get the qc flag name and function and then load the named functions from the next_level_qc.py script.

The idea was to make the system more flexible, in which case we need some way to separate the code from the choice of which QC tests to run. If we hard code which tests are to be used (and in which combination) then someone has to modify the code to run in different configurations.

codecov-commenter · 2025-04-24T09:31:47Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 15.93%. Comparing base (30eda72) to head (a23d96f).
⚠️ Report is 960 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #117      +/-   ##
==========================================
- Coverage   16.97%   15.93%   -1.05%     
==========================================
  Files          42       15      -27     
  Lines        7063      797    -6266     
==========================================
- Hits         1199      127    -1072     
+ Misses       5864      670    -5194

Flag	Coverage Δ
unittests	`15.93% <100.00%> (-1.05%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ludwiglierhammer · 2025-04-24T09:32:49Z

The function tests are running 🎉.

And you pushed the code coverage from 16 to 24%
https://app.codecov.io/gh/glamod/glamod-marine-processing/tree/ludwiglierhammer%3Anext_level_qc

The next step is to apply this on a Databundle.

jjk-code-otter · 2025-04-24T11:19:09Z

And you pushed the code coverage from 16 to 24%

Is there a target for coverage?

ludwiglierhammer · 2025-04-24T11:45:45Z

And you pushed the code coverage from 16 to 24%

Is there a target for coverage?

No, but the more the better. I hope that the badge will turn from red to green at some point.

Next level qc tidy up

for more information, see https://pre-commit.ci

ludwiglierhammer · 2025-08-07T07:43:48Z

@jjk-code-otter: I'll tidy up this PR (remove all QC-related stuff) and merge it. Then we can simply import marine_qc instead. I'll adjust some scripts in another PR. This ludwiglierhammer#43 should be merged directly into main branch.

ludwiglierhammer · 2025-08-07T08:30:01Z

@jjk-code-otter: Do you think this is enough information in the CHANGELOG?

Is it a good idea to mention that the old marine_qc_scripts are now effectively part of the standard processing levels (if they are)?

Thanks for the hint. I'll add this in another PR (maybe in #156 or in #140) since the old marine_qc_scripts are currently not part of the standard processig levels.

ludwiglierhammer · 2025-08-07T08:31:22Z

@jjk-code-otter: I would merge this PR. Do you agree or do you have any further suggestions/comments?

jjk-code-otter · 2025-08-07T08:41:51Z

@jjk-code-otter: I would merge this PR. Do you agree or do you have any further suggestions/comments?

I don't have any objections (but see my small comment about CHANGE.rst)

ludwiglierhammer · 2025-08-07T09:05:10Z

@jjk-code-otter: We are finally getting closer to success 🎉

github-actions Bot added the qc_suite label Apr 17, 2025

ludwiglierhammer assigned ludwiglierhammer and jjk-code-otter Apr 17, 2025

ludwiglierhammer added this to GLAMOD Arrivals Server Feedback Apr 17, 2025

ludwiglierhammer linked an issue Apr 17, 2025 that may be closed by this pull request

Update quality control suite for GLAMOD Release 8.0 #108

Closed

22 tasks

ludwiglierhammer commented Apr 22, 2025

View reviewed changes

Comment thread tests/test_qc_functions.py Outdated

This was referenced Apr 22, 2025

adding tests for ICOADS_R3.0.0T data #77

Open

Next level qc ludwiglierhammer/glamod-marine-processing#1

Merged

ludwiglierhammer commented Apr 24, 2025

View reviewed changes

jjk-code-otter and others added 19 commits July 25, 2025 12:13

removing month match

1cf32d1

removing month match

503735e

Removing yesterday. We're done with it anyway

f0d77ac

Removing seasons. No more summers!

28a12a9

removing next and last month was functions

c5a1c9e

Removing year month gen

c222720

reformat

3d865a0

Removing angle_diff

c928c68

Removed BackgroundField.py

87d207f

Rename next_level_trackqc.py

cdde86c

More modest module monikers

b9bed98

Removing interpolation

5abd071

rename track_check.py

21e5a1b

Tidying minor things

6085978

Merge pull request #40 from ludwiglierhammer/next_level_qc_tidy_up

fbd9a3b

Next level qc tidy up

[pre-commit.ci] auto fixes from pre-commit.com hooks

35c231a

for more information, see https://pre-commit.ci

Merge branch 'glamod:main' into next_level_qc

7120df8

fix minor pre-commit issues

3c5d1b6

Merge branch 'main' into next_level_qc

eedf8c3

ludwiglierhammer added 2 commits August 7, 2025 10:09

delete QC suite

fb5cb80

update CHANGELOG

f2172d1

github-actions Bot added the information label Aug 7, 2025

ludwiglierhammer added 2 commits August 7, 2025 10:15

remove qc_suite import

0a92208

remove notebook-related hooks

a23d96f

ludwiglierhammer commented Aug 7, 2025

View reviewed changes

ludwiglierhammer merged commit eaee98d into glamod:main Aug 7, 2025
16 checks passed

Conversation

ludwiglierhammer commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Tasks

Uh oh!

jjk-code-otter commented Apr 17, 2025

Uh oh!

ludwiglierhammer commented Apr 17, 2025

Uh oh!

ludwiglierhammer commented Apr 17, 2025

Uh oh!

jjk-code-otter commented Apr 17, 2025

Uh oh!

jjk-code-otter commented Apr 17, 2025

Uh oh!

jjk-code-otter commented Apr 17, 2025

Uh oh!

jjk-code-otter commented Apr 18, 2025

Uh oh!

ludwiglierhammer commented Apr 22, 2025

Uh oh!

ludwiglierhammer commented Apr 22, 2025

Uh oh!

ludwiglierhammer commented Apr 22, 2025

Uh oh!

ludwiglierhammer commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ludwiglierhammer Apr 22, 2025

Choose a reason for hiding this comment

Uh oh!

ludwiglierhammer Apr 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ludwiglierhammer commented Apr 22, 2025

Uh oh!

jjk-code-otter commented Apr 24, 2025

Uh oh!

ludwiglierhammer commented Apr 24, 2025

Uh oh!

ludwiglierhammer commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jjk-code-otter commented Apr 24, 2025

Uh oh!

codecov-commenter commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ludwiglierhammer commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jjk-code-otter commented Apr 24, 2025

Uh oh!

ludwiglierhammer commented Apr 24, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ludwiglierhammer commented Aug 7, 2025

Uh oh!

ludwiglierhammer Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

jjk-code-otter Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

ludwiglierhammer Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

ludwiglierhammer commented Aug 7, 2025

Uh oh!

jjk-code-otter commented Aug 7, 2025

Uh oh!

ludwiglierhammer commented Apr 17, 2025 •

edited

Loading

ludwiglierhammer commented Apr 22, 2025 •

edited

Loading

ludwiglierhammer commented Apr 24, 2025 •

edited

Loading

codecov-commenter commented Apr 24, 2025 •

edited

Loading

ludwiglierhammer commented Apr 24, 2025 •

edited

Loading