Initial Pseudonymization#10776
Conversation
|
You know that we have a bib file generator python script to quickly generate 1000s of entries ? https://github.com/JabRef/jabref/blob/main/scripts/bib-file-generator.py In addtion, for reproducing issues or so it does not make any sense. |
I will implement the feature to check a BibTeX file in the context of a paper. Here, I need mappings of real examples. This is (for me) more easy than replicating the properties of the real example in a Python script. I am think of end-to-end tests closer to reality. I can also include this code in a follow-up pull request. |
Additionally: - .bib files now need to have \n as line ending - Use \n as newline separator at medline and ris imports - use @ParameterizedTests - Improve code of RisImporter
|
This PR (should) also fix issues with |
|
Also fixes |
|
The image at PR description at #10778 is my use case. |
| // TODO: Anonymize metadata | ||
| // TODO: Anonymize strings |
| public void writeValuesMappingAsCsv(Path path) throws IOException { | ||
| try ( | ||
| OutputStreamWriter writer = new OutputStreamWriter(Files.newOutputStream(path), StandardCharsets.UTF_8); | ||
| CSVPrinter csvPrinter = new CSVPrinter(writer, CSVFormat.DEFAULT) | ||
| ) { | ||
| csvPrinter.printRecord("pseudonymized", "original value"); | ||
| valueMapping.entrySet().stream() | ||
| // We have date-1, date-2, ..., date-10, date-11. That should be sorted accordingly. | ||
| .sorted(Comparator.comparing((Map.Entry<String, String> entry) -> getKeyPrefix(entry.getKey()) | ||
| ).thenComparingInt(entry -> extractNumber(entry.getKey()))) | ||
| .forEach(Unchecked.consumer(entry -> { | ||
| csvPrinter.printRecord(entry.getKey(), entry.getValue()); | ||
| })); | ||
| } | ||
| } |
|
The build for this PR is no longer available. Please visit https://builds.jabref.org/main/ for the latest build. |
* upstream/main: Bump org.apache.lucene:lucene-queries from 9.9.0 to 9.9.1 (#10795) Bump com.google.guava:guava from 32.1.3-jre to 33.0.0-jre (#10793) Bump com.dlsc.gemsfx:gemsfx from 1.90.0 to 1.92.0 (#10796) Bump org.mockito:mockito-core from 5.8.0 to 5.9.0 (#10794) Bump lycheeverse/lychee-action from 1.9.0 to 1.9.1 (#10791) refactor: Transform calls to `Objects.isNull(..)` and `Objects.nonNull(..)` (#10788) refactor: Prefer `String#formatted(Object...)` (#10787) refactor: Adopt `SequencedCollection` (#10786) Update CSL styles (#10785) Initial Pseudonymization (#10776) Use StringUtil.intValueOf instead of StringUtil.intValueOfOptional or custom code (#10779) Refine loading code (#10780) Add wokraround for theme detector issue (#10777) Fix enablement of Aux dialog's "Generate" button (#10775) Fix package of TypedBibEntry (#10774) Fix labeling of PRs for newcomers (#10773) Update add-greeting-to-issue.yml Update add-greeting-to-issue.yml Update add-greeting-to-issue.yml
This adds an initial anonymization of BibTeX libraries.
Use case is to make
.bibfiles available inside JabRef's repository for reproducing issues. One thing are performance issues, other things are certain quality checks.The current implementation is very basic, but enough for me currently.
I would propose to merge this in and refine afterwards. TODOs are inside. Other future work is to include this functionality in the CLI (and the UI). In the UI similar to library-based-on-aux-generation.
NO CHANGELOG entry, because functionality accessible through tests only.
Mandatory checks
CHANGELOG.mddescribed in a way that is understandable for the average user (if applicable)