Outcomes from OHBM Hackathon 2025 by Lestropie · Pull Request #15 · Lestropie/IP-freely

Lestropie · 2025-06-23T05:41:03Z

This content is being posted as a draft Pull Request in order to demonstrate the volume of changes generated during the OHBM 2025 Hackathon.
The code still requires further modification before the tool can be considered ready for broader uptake.

The following is a list of items flagged within the code base that need to be either addressed prior to making this first merge to master, or need to have a standalone Issue created to be addressed later.

Check expected behaviour of metadata files other than JSON
- Do all such files obey the "use only the nearest" rule under the Inheritance Principle,
  or are there some where it would be a violation for there to be more than one potential match
  regardless of what is stated within the Inheritance Principle?
- Does this behaviour change as a function of Inheritance Principle ruleset?
- Are there metadata file types outside of the MRI space I'm not aware of?
Refactor detection of IP violations between full-graph and standalone-file functions

Currently, the code responsible for generating a list of applicable metadata files per data file takes partial responsibility for detecting violations of the specified Inheritance Principle ruleset while the list is being generated.
This however turned out to be the wrong location for much of that code.
What I want instead is:
1. Function that generates, for a specified data file, all possibly applicable metadata files, without observation of any specific IP ruleset.
2. Function that, for a specified data files:
  1. Generates the list of candidate metadata files (step 1).
  2. Checks for violations of the IP for just that mapping.
  3. Prunes the lists if necessary: depending on metadata file extension, it may be only the "closest" metadata file that is ultimately deemed applicable
3. Class Graph, which:
  1. Generates the list of candidate metadata files for all data files (step 1).
  2. Generates the full inverse mapping.
  3. Checks for all violations of the IP, since step 1 will no longer perform a partial check of such.
  4. Option to in-place prune the whole graph.
This will have the added benefit of better distinguishing between errors that relate to general parsing of the BIDS dataset, and those that relate specifically to the Inheritance Principle.
Currently these are distinguished by their relative Exception classes, but I think it would be better if they occurred at different points during processing.

This may also come with corresponding changes to function names; eg. metadata "applicability".
Implement check for currently overlooked Inheritance Principle rule.

A metadata file is not permitted to be potentially applicable to some data file based on only looking at their relative file names, but for the metadata file to not reside in a parent directory of / the same directory as the data file.
Improve checks of metadata association graph equivalency

For proposal ruleset "I1195" where highly complex inheritance is permitted, it is possible for two metadata files that reside in the same directory and have an equal number of entities to one another to both be applicable to the same data file (as long as they do not have any metadata key-value clashes).
This however poses a problem for comparing file association graphs for equivalence, where metadata files associated with a data file are inferred to be ordered.
A more robust check would involve either:
- Enforcing an equivalent order of these lists when that order is consequential, and not enforcing such an order when it is not.
- For all lists, enforce equivalent order of entries only when that order can be disambiguated; for entries in the same directory and with the same number of entities, do not interpret permutation of those entries as inequivalence of the graphs.
More elegant solution for loading non-JSON metadata

For instance, along with other attributes ascribed to the different metadata file extensions
could be the property that files of a given extension can be read as plaintext matrix data
(eg. .bvec / .bval)
Change how key-value metadata overrides are identified?

Currently this is done through a stand-alone function.
That works, and is technically faster and uses less RAM than a full metadata load, but isn't as general.
Having a function return a tuple containing firstly all of the metadata, and secondly the set of overrides, would allow both of these steps to be done in one go.
Implement checks comparing behaviour of full graph representation and individual functions

While evaluation of the full graph makes sense for validation of datasets and for evaluating IP rulesets, it is likely to be a common use case of this package to simply query what metadata are associated with a nominated data file. The testing therefore needs to ensure that the outcomes from running these functions individually are exactly the same as what is captured by the full graph.
Check ability of individual fetch functions to detect IP violations

Checking whether a dataset violates the IP ruleset in some way is much easier when the full association graph is stored in memory. It is however desired that if a developer chooses to make direct use of the functions that access the metadata per data file / the data files to which a metadata file applies, any violations of the IP under some nominated ruleset will nevertheless be captured.

My expectation is that it won't be possible for those functions to catch all such violations. I do however want to confirm that the set of violations that they fail to identify is acceptable.

3.9-slim omitted numpy, but would attempt to bump Python itself to 3.11 in order to install it, so moved to the non-slim version.

Precludes getting an unhandled Exception regarding subscripting of types in Python 3.8 and earler.

- In 1.1.x, make sure that metadata files that are subject-specific are not placed in the subject-agnostic root directory, and that metadata files that are subject-agnostic (by name) are not placed in a subject-specific directory. - In 1.7.0, ensure that rule 3 of the Inheritance Principle is not violated. - Replace ruleset "1.x" with "1.1.x" and "1.7.x", since those two versions of the specification differ in the exact criteria applied. - Perform testing of new example datasets that demonstrate these properties. - Fix comparison between data file and metadata file suffixes happening in utils.applicability.is_applicable() but not utils.applicability.is_applicable_nameonly().

The version of this file provided in the BIDS Apps example repository is dependent on bootstrapping from a Docker container of a fixed name inside the "bids" organisation.

Evaluation of the validity of the data file - metadata file association graph is deferred until after the full graph is generated.

FOr each sample dataset, the graph is constructed only once; that graph is then considered immutable as it is tested under different rule sets.

This collapses the graph information for those metadata files that do not have key-value dictionaries that can be merged, instead choosing only the nearest metadata file given the filesystem hierarchy and BIDS file names. Some test data have been correspondingly updated so that the comparison is performed against the pruned graph.

Lots of other changes along the way, including defining class BIDSFilePathList to have ability to encapsulate functions that operate exclusively on such data.

Previously, two data file - metadata file association graphs were compared simply by ensuring that all files in each list were present in the other. This however failed to account for the fact that in many circumstances these lists must be interpreted in an ordered way, and so ensuring that their orders are equivalent is important. The lists however cannot simply be tested for equivalence. A ruleset may permit a single data file to load key-value metadata from multiple JSON files that are equivalent in both filesystem directory location and number of entities. In this scenario, the order within that set of equidistant metadats files is not of consequence in a test of equivalence. This commit replaces the earlier placeholder implementation with this more robust algorithm.

- Multiple fixes to datafiles_for_metafile() when a ruleset is specified. - During testing, evalate whether associating a data file with metadata files (or vice versa) is capable of detecting potential IP issues. Not all IP issues can be reasonably caught when operating in this way, so a small number of such tests are skipped.

De-escalate the presence of key-value overrides for rulesets 1.1.x and 1.7.x; while these specifications recommend against using such overrides extensively, they do not recommend against the practise itself. - New ruleset 1.11.x, which reflects the change in #1834 of the BIDS specification, wherein it will be recommended to never use key-value overriding; this should therefore result in the issuing of a warning.

Closes #16.

First implementation

70ed741

Lestropie self-assigned this Jun 23, 2025

Lestropie added 9 commits June 23, 2025 20:21

Fix first implementation of Docker container

79a752f

3.9-slim omitted numpy, but would attempt to bump Python itself to 3.11 in order to install it, so moved to the non-slim version.

Ensure Python version is adequate

51af9da

Precludes getting an unhandled Exception regarding subscripting of types in Python 3.8 and earler.

Fix loading of version info file

e13c517

Add testing of key-value override reporting

d276603

README: Update following originating Hackathon

c4bce15

Some moving and renaming of key-value metadata functions

f459cc8

Minor implementation refactoring

405a5d0

Remove "Singularity"

d251209

The version of this file provided in the BIDS Apps example repository is dependent on bootstrapping from a Docker container of a fixed name inside the "bids" organisation.

Lestropie mentioned this pull request Jun 23, 2025

Summarization principle implementation #20

Open

Lestropie added 18 commits June 24, 2025 08:56

Initial code for loading .tsv metadata into full metadata graph

c4ce576

Add black pre-commit hook

3beab12

Add requirements.txt and requirements-dev.txt

cdd0592

Better error messages on malformed metadata files

b432323

First work on refactoring of evaluation

448e790

Evaluation of the validity of the data file - metadata file association graph is deferred until after the full graph is generated.

Testing: Restructure

67feffc

FOr each sample dataset, the graph is constructed only once; that graph is then considered immutable as it is tested under different rule sets.

Addition of test dataset ipdwi003 and corresponding fixes

36f5fa4

Fixes to non-full-graph-based access functions

67380c1

Lots of other changes along the way, including defining class BIDSFilePathList to have ability to encapsulate functions that operate exclusively on such data.

Rectifications for pylint

65fd694

README.md: Include description of testing

7f7ca3f

Set up auto-detection of BIDS spec version for future 1.11

c6a2144

Utilise logging module

0344c49

Closes #16.

Logger changes and new test datasets

ce70578

Dedicated container for testing

654422d

Lestropie added 4 commits July 7, 2025 23:18

Test both legacy and schema-based validators

c035935

Testing: Remove comments RE current validator outcomes from source code

a87b2ea

First working version of dataset conversion

b35f3e9

Set first tagged version as 0.1.0

5476a11

Lestropie marked this pull request as ready for review July 8, 2025 02:22

Lestropie merged commit 2858239 into master Jul 8, 2025

Lestropie deleted the hackathon branch July 8, 2025 02:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Outcomes from OHBM Hackathon 2025#15

Outcomes from OHBM Hackathon 2025#15
Lestropie merged 32 commits intomasterfrom
hackathon

Lestropie commented Jun 23, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Lestropie commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Lestropie commented Jun 23, 2025 •

edited

Loading