Skip to content

Add an importer for Citavi backup files#8848

Merged
Siedlerchr merged 45 commits into
JabRef:mainfrom
lefeimei:citavixml
Jun 1, 2022
Merged

Add an importer for Citavi backup files#8848
Siedlerchr merged 45 commits into
JabRef:mainfrom
lefeimei:citavixml

Conversation

@lishangyu9

@lishangyu9 lishangyu9 commented May 25, 2022

Copy link
Copy Markdown
Contributor

Fixes #8322 .
This importer for Citavi support ‘ctv5bak’ and ‘ctv6bak’ files.

  • Change in CHANGELOG.md described in a way that is understandable for the average user (if applicable)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
  • Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
  • Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

What we have added

  • The Citavi file format support

    Add the definition of Citavi file in the “StandardFileType.java”.
  • The XSD file defined the Citavi XML file

    Add the file “scitavi.xsd” for generating the java class “CitaviExchangeData.java”.
  • The importer for Citavi file “ctv5bak” and “ctv6bak”

    • Add the class “src/main/java/org/jabref/logic/importer/fileformat/CitaviXmlImporter.java”.
      Because the “ctv6bak” and “ctv5bak” are compressed files. We add the function “getReaderFromZip” to unzip the file before parsing.
    • The XML file inside the “ctv6bak” is UTF-8 BOM format. This format cannot be recognized by the default process of XML. Therefore We create a function “checkForUtf8BOMAndDiscardIfAny” to delete the byte at the beginning of the XML file.
    • The process of parsing XML data is similar to “EndnoteImporter”. However, the date structures of Endnote and Citavi are different. The primary different methods are “getAuthorName”, “getEditorName”, “getKeywords” and “getPublisher”. These data are located in different places with reference data. For example, we need to go through the author’s matching list and match the references ID. If the id is matched we got the author’s ID list. Finally, we can get the author’s name.
  • The test of Citavi importer

    The test class of Citavi importer is “CitaviXmlImporterTestFiles.java”.
    There are 4 test cases:
    “CitaviXmlImporterTest1.ctv6bak”, “CitaviXmlImporterTest1.bib”
    “CitaviXmlImporterTest2.ctv6bak”, “CitaviXmlImporterTest2.bib”
    “CitaviXmlImporterTest3.ctv6bak”, “CitaviXmlImporterTest3.bib”
    “CitaviXmlImporterTest4.ctv6bak”, “CitaviXmlImporterTest4.bib”

How to use the Citavi Importer

  1. Click the “File” → ”Import” → ”Import into new library”.
    1
  2. Select the file. Click “Open”.
    2
  3. The file is imported
    3

Siedlerchr and others added 30 commits April 14, 2022 18:21
Add generated citavi xsd schema
Fix some resouces leaks

Use AuthorListParser
@Siedlerchr

Copy link
Copy Markdown
Member

Thank you very much for your work! Impressive 👍 I took a more in-depth look at the author handling code, that's really ugly but I couldn't find a general better approach. However, you do not have to iterate over the whole String, the ids are all UUIDs that have a fixed length of 36 characters. So you can improve your loops.

Second thing: I implemented an example of using JabRef's Author and AuthorLIst class for adding new authors. You just need to adjust your test bib files (Lastname, Firstname), can be done in JabRef's GUI: (Quality-> Cleanup -> Normalize in persons) and maybe the same for other persons fields.

@Siedlerchr

Copy link
Copy Markdown
Member

@lishangyu9 I came up with a more efficient solution for the authors and editors.
In essential, the oneToN is similar to a relational database schema where the first part as you already identified is the referenced and the other ones are the person ids.

Instead of iterating through the whole author + person list for every reference, I create a mapping between referenceId and Persons. In addition, I introduced a second Map for storing Persons we already know (if the same author with the same uuid comes up multiple times, so we don't have to through all Person objects everytime.

Antoher thing we can use for our advantage: We know that all UUIDs have a length of 36 chars and we know that the first only the first UUID is the referenceID. So we can safely assume that the rest of the string are the Person ids. So we can use split to get an array of all person ids if we just split at the semicolon, after we removed the refId from it.

This can be adapted for other oneToN relationships as well

@lishangyu9

Copy link
Copy Markdown
Contributor Author

Thank you @Siedlerchr . We will check and fix it today.

@lishangyu9

Copy link
Copy Markdown
Contributor Author

Hi @Siedlerchr, we fixed other's OneToN relationships, "Keywords" and "Publishers". We followed the method you have made and created the "buildKeywordList" and "buildPublisherList" functions for mapping the data and reference Id. We appreciate your help.

@Siedlerchr Siedlerchr added the status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers label May 28, 2022

@Siedlerchr Siedlerchr left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor changes, but overall looks very good to me!

Comment thread src/main/java/org/jabref/logic/importer/fileformat/CitaviXmlImporter.java Outdated
Comment thread src/main/java/org/jabref/logic/importer/fileformat/CitaviXmlImporter.java Outdated
Comment thread src/main/java/org/jabref/logic/importer/fileformat/CitaviXmlImporter.java Outdated
Comment thread CHANGELOG.md Outdated
@Siedlerchr

Copy link
Copy Markdown
Member

@claell You were using citavi previously, it would be nice if you could testt this PR.
(best is using github cli tools, gh pr checkout 8848 and then ./gradlew run should assemble & run jabref

@claell

claell commented May 31, 2022

Copy link
Copy Markdown
Contributor

@Siedlerchr when I find the time, I'll try that. I don't have GitHub's cli tools installed. Is there another easy way or should I just use them? Also, I likely cannot do too in depth testing due to constrained time.

@calixtus

Copy link
Copy Markdown
Member

gh cli is supposed to be the easy way... you can of course also follow the "command line instructions" on the bottom of this page

@calixtus

Copy link
Copy Markdown
Member

grafik

@koppor koppor left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, it is OK. Some nitpicks, but we can fix them for ourselves later.

The thing I see here, that there was strong test-driven development somehow:

buildPersonList, buildKeywordList, buildPublisherList should all have been package private and tested by CitaviXmlImporter.

However, I think, this project is over. Think about using JUnit and the play / debug button also for methods themselves the next java project.

Comment thread src/main/java/org/jabref/logic/importer/fileformat/CitaviXmlImporter.java Outdated
@calixtus

calixtus commented Jun 1, 2022

Copy link
Copy Markdown
Member

I just mentioned you are mixing the old io and the new nio file apis. Is there a chance to switch completely to the new nio API? Provides thread safety i.a....

@Siedlerchr

Copy link
Copy Markdown
Member

I just mentioned you are mixing the old io and the new nio file apis. Is there a chance to switch completely to the new nio API? Provides thread safety i.a....

@calixtus Ehm they are meant to be interoperable, for example Files.newInputStream returns an InputStream. And Inputstream is just the base class in java.io. There is not all in nio.

@Siedlerchr Siedlerchr merged commit 943489e into JabRef:main Jun 1, 2022
Siedlerchr added a commit that referenced this pull request Jun 1, 2022
* upstream/main:
  Add an importer for Citavi backup files (#8848)
  Reviewdoc: Comment on PRs (#8878)
  Squashed 'buildres/csl/csl-styles/' changes from 649aac4..e740261
  Use JDK 15 text blocks to improve injected languages readability (#8874)
  Fix fetcher tests (#8877)
  Fix #8390 by allowing multiple group deletion for Remove groups > Kee… (#8875)
  Add restart warning on SSL configuration change (#8871)
  Update to lucene 9.2 (#8868)
  Fix for removing several groups deletes only one of them (#8801)
  Disable Write XMP Button in General tab of Entry-Editor when action is in progress (#8728)
  Bump jsoup from 1.14.3 to 1.15.1 (#8864)
  Bump unirest-java from 3.13.8 to 3.13.10 (#8869)
  Bump unoloader from 7.3.2 to 7.3.3 (#8863)
  Bump pascalgn/automerge-action from 0.15.2 to 0.15.3 (#8860)
  Bump classgraph from 4.8.146 to 4.8.147 (#8861)
  Bump mockito-core from 4.5.1 to 4.6.0 (#8862)
  Lucence dir checkers should only delete lucence dirs (#8854)
  Update README.md (#8858)
  Update adr.md
  Update adr.md
@Siedlerchr

Copy link
Copy Markdown
Member

Thanks again for your valuable contribution and your quick follow up! 👍
We just noticed by chance that the pages field contained unicode minus signs instead of the normal ascii ones. So I created a follow up PR which calls the pages formatter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component: import-load status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create an importer for Citavi

7 participants