Skip to content

Initial Pseudonymization#10776

Merged
koppor merged 11 commits into
mainfrom
add-anonymization
Jan 14, 2024
Merged

Initial Pseudonymization#10776
koppor merged 11 commits into
mainfrom
add-anonymization

Conversation

@koppor

@koppor koppor commented Jan 13, 2024

Copy link
Copy Markdown
Member

This adds an initial anonymization of BibTeX libraries.

Use case is to make .bib files available inside JabRef's repository for reproducing issues. One thing are performance issues, other things are certain quality checks.

The current implementation is very basic, but enough for me currently.

I would propose to merge this in and refine afterwards. TODOs are inside. Other future work is to include this functionality in the CLI (and the UI). In the UI similar to library-based-on-aux-generation.

NO CHANGELOG entry, because functionality accessible through tests only.

Mandatory checks

  • Change in CHANGELOG.md described in a way that is understandable for the average user (if applicable)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
  • Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
  • Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

@Siedlerchr

Copy link
Copy Markdown
Member

You know that we have a bib file generator python script to quickly generate 1000s of entries ? https://github.com/JabRef/jabref/blob/main/scripts/bib-file-generator.py

In addtion, for reproducing issues or so it does not make any sense.

@koppor

koppor commented Jan 13, 2024

Copy link
Copy Markdown
Member Author

In addtion, for reproducing issues or so it does not make any sense.

I will implement the feature to check a BibTeX file in the context of a paper. Here, I need mappings of real examples. This is (for me) more easy than replicating the properties of the real example in a Python script. I am think of end-to-end tests closer to reality.

I can also include this code in a follow-up pull request.

Additionally:

- .bib files now need to have \n as line ending
- Use \n as newline separator at medline and ris imports
- use @ParameterizedTests
- Improve code of RisImporter
@koppor koppor changed the title Initial anonymization [WIP] Initial Pseudonymization Jan 13, 2024
@koppor

koppor commented Jan 13, 2024

Copy link
Copy Markdown
Member Author

This PR (should) also fix issues with .bib files maintained in this repository.

@koppor

koppor commented Jan 13, 2024

Copy link
Copy Markdown
Member Author

Also fixes equals for BibDatabaseContext. The eventBus object (created for each context) was also compared. Therefore, even if equal at other fields, two context were never equal.

@koppor koppor changed the title [WIP] Initial Pseudonymization Initial Pseudonymization Jan 13, 2024
@koppor

koppor commented Jan 13, 2024

Copy link
Copy Markdown
Member Author

The image at PR description at #10778 is my use case.

Comment on lines +65 to +66
// TODO: Anonymize metadata
// TODO: Anonymize strings

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followup?

Comment on lines +34 to +48
public void writeValuesMappingAsCsv(Path path) throws IOException {
try (
OutputStreamWriter writer = new OutputStreamWriter(Files.newOutputStream(path), StandardCharsets.UTF_8);
CSVPrinter csvPrinter = new CSVPrinter(writer, CSVFormat.DEFAULT)
) {
csvPrinter.printRecord("pseudonymized", "original value");
valueMapping.entrySet().stream()
// We have date-1, date-2, ..., date-10, date-11. That should be sorted accordingly.
.sorted(Comparator.comparing((Map.Entry<String, String> entry) -> getKeyPrefix(entry.getKey())
).thenComparingInt(entry -> extractNumber(entry.getKey())))
.forEach(Unchecked.consumer(entry -> {
csvPrinter.printRecord(entry.getKey(), entry.getValue());
}));
}
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. ;-)

@github-actions

github-actions Bot commented Jan 14, 2024

Copy link
Copy Markdown
Contributor

The build for this PR is no longer available. Please visit https://builds.jabref.org/main/ for the latest build.

@koppor koppor added this pull request to the merge queue Jan 14, 2024
Merged via the queue into main with commit 6582752 Jan 14, 2024
@koppor koppor deleted the add-anonymization branch January 14, 2024 19:30
Siedlerchr added a commit that referenced this pull request Jan 15, 2024
* upstream/main:
  Bump org.apache.lucene:lucene-queries from 9.9.0 to 9.9.1 (#10795)
  Bump com.google.guava:guava from 32.1.3-jre to 33.0.0-jre (#10793)
  Bump com.dlsc.gemsfx:gemsfx from 1.90.0 to 1.92.0 (#10796)
  Bump org.mockito:mockito-core from 5.8.0 to 5.9.0 (#10794)
  Bump lycheeverse/lychee-action from 1.9.0 to 1.9.1 (#10791)
  refactor: Transform calls to `Objects.isNull(..)` and `Objects.nonNull(..)` (#10788)
  refactor: Prefer `String#formatted(Object...)` (#10787)
  refactor: Adopt `SequencedCollection` (#10786)
  Update CSL styles (#10785)
  Initial Pseudonymization (#10776)
  Use StringUtil.intValueOf instead of StringUtil.intValueOfOptional or custom code (#10779)
  Refine loading code (#10780)
  Add wokraround for theme detector issue (#10777)
  Fix enablement of Aux dialog's "Generate" button (#10775)
  Fix package of TypedBibEntry (#10774)
  Fix labeling of PRs for newcomers (#10773)
  Update add-greeting-to-issue.yml
  Update add-greeting-to-issue.yml
  Update add-greeting-to-issue.yml
@koppor koppor mentioned this pull request Jun 20, 2024
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants