Skip to content

Write unmodified entries to bib file in the same format as they were read#391

Merged
koppor merged 52 commits into
masterfrom
stable-serialization
Dec 6, 2015
Merged

Write unmodified entries to bib file in the same format as they were read#391
koppor merged 52 commits into
masterfrom
stable-serialization

Conversation

@lenhard

@lenhard lenhard commented Nov 25, 2015

Copy link
Copy Markdown
Member

As discussed in #116, we want to write entries that are not modified during a session back in the exact same format as they were read. This PR is WIP in this direction and not complete yet. I will continue working on it.

To be able to write back an entry in the same fashion as it is read, we need to store it upon reading. I added a field to BibtexEntry for storing this and tried to modify BibtexParser to store the file content, but to no avail. The current code of that class is pure hell (uses a global PushBackReader to read the file in a very confusing way) and this PR is hopefully a step towards its replacement.

Instead of modifying the logic in BibtexParser, I extend it with additional methods that perform the new functionality on top. Current status:

  1. On initialization the complete file is read into a List<String> of its lines before handing it over to the old parser.
  2. After an entry has been parsed, the original represenation of the entry is looked up in the List<String> and stored in the entry. Blank lines following the entry belong to the entry.

That means we read the file twice, which is terrible for large files, but as this is WIP...

Next step will be to detect when an entry has been changed and the modification of the writing logic.

The handling of newlines currently is not consistent.

@simonharrer

Copy link
Copy Markdown
Contributor

Nice. Hm, what to do when two entries share a line? E.g. @article{key1, title={a}}@article{key2, title={b}}?

@lenhard

lenhard commented Nov 26, 2015

Copy link
Copy Markdown
Member Author

In that case, the current implementation fails.

But that gives me an idea: Isn't @ a valid delimiter that is only used for setting bibtex entry types, but forbidden in other parts of the file? If so, I could use it for reading the file, instead of the newline character. This is very easy to do with a Scanner and would make things way easier.

@oscargus

Copy link
Copy Markdown
Contributor

There is the issue of comments, which potentially can contain @. But as far as I know, everything outside of @xxx{..} is a comment.

@tobiasdiez

Copy link
Copy Markdown
Member

The @ symbol can also occur in an entry, for example in an email-address. The current implementation has also the problem that it fails if the entry has no bibtex key. I fear there is no easy way to find an entry in a bibtex file without actually parsing it completely.

@koppor

koppor commented Nov 26, 2015

Copy link
Copy Markdown
Member

Isn't it possible to add a reference to the source line and column during parsing? This should be offered by ANTLR somehow, isn't it?

I assume that JBibTeX. Doesn't support that. Refs #123.

@lenhard

lenhard commented Nov 26, 2015

Copy link
Copy Markdown
Member Author

@oscargus: that would not be a problem. It's more of a problem if an @ is within an entry

@koppor: This is work in progress. I'll start commenting when I have something closer to finalization. Do we want to parse the file twice? If so, then jbibtex might be an option. Can you tell me if jbibtex writes back an entry in exactly the same format (up to every line feed character) as it was read?

@tobiasdiez Thanks, I am aware of that. This is work in progress. I'd first like to arrive at a solution that does the round-trip for a well-structured database. Then, I'll modify it to match incomplete entries.

@lenhard

lenhard commented Nov 27, 2015

Copy link
Copy Markdown
Member Author

Ok, now this works nicely with my large local database that doesn't include analomies.

Now comes the hard part: Properly reading and writing files with strange content. This is also the reason why some tests are failing at the moment. I might need to discuss some of these aspects and will ping you in this thread if I am unsure.

@lenhard

lenhard commented Nov 27, 2015

Copy link
Copy Markdown
Member Author

Dammit, I wanted to get some work done today and now I am only coding JabRef...

Anyways, this seems to work (and passes the tests)! I ditched my previous parsing logic and rewrote it in integration with the BibtexParser, which wasn't as hard as I claimed above (my bad).

I am now able to store each entry exactly as it is in the file. Each entry serialization contains all text from its end up to the end of previous entry. The main problem with this is that with this logic, the first entry also stores the file header and then the header is serialized twice when rewriting the file. I was able to get around that with some hacking, but there might still be edge cases I am not yet aware of.

To sum up, this needs some more manual testing, but should be pretty close. Feedback is welcome :)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename this (and the corresponding methods, of course) as it is always used as "if not changed then..." to "useParsedSerialization" or even to "hasChanged".

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll do some renaming. On second thought, I will also replace the Deque with a Stack, because this is the way in which it is used.

@matthiasgeiger

Copy link
Copy Markdown
Member

Tested it manually with some entries and it works smoothly!

Good job! 👍

@matthiasgeiger matthiasgeiger added this to the v3.0 milestone Nov 27, 2015
@lenhard

lenhard commented Nov 27, 2015

Copy link
Copy Markdown
Member Author

Should we really introduce this in v3.0 already? I'm not entirely sure it works on all occaison. For instance, I haven't tested it with strings in the bib file so far.

@matthiasgeiger

Copy link
Copy Markdown
Member

@lenhard

lenhard commented Dec 2, 2015

Copy link
Copy Markdown
Member Author

@matthiasgeiger But that does not really depend on the version information. The version is checked, but also every file written by JabRef 3.0 is marked as problematic. The file links upgrade really depends on whether the file contains fields named pdf or ps. It wouldn't make a difference isActionNecessary always returned true, regardless of the version.

Anyways, if you really really want the version parsing, we can keep it.

@matthiasgeiger

Copy link
Copy Markdown
Member

okay... I have to confess that I only performed a "find usages" search without truly checking whether the usages are really needed.

So feel free to remove those lines ;-)

@lenhard

lenhard commented Dec 2, 2015

Copy link
Copy Markdown
Member Author

I removed the old meta flag and version parsing. Version headers are now simply removed on rewrite. File upgrade functionality ultimately depends on preferences (and on whether JabRef finds errors in the file to begin with).

@lenhard

lenhard commented Dec 4, 2015

Copy link
Copy Markdown
Member Author

So, from my point of view this PR is complete and ready to merge.

@koppor and I did some more testing with more advanced things like meta-data and it seems fine. I'm not guaranteeing the absence of bugs, but there should be no more obvious mistakes.

Somebody else can take a final look and merge this with master.

@lenhard lenhard changed the title [WIP] Write unmodified entries to bib file in the same format as they were read Write unmodified entries to bib file in the same format as they were read Dec 4, 2015
@simonharrer

Copy link
Copy Markdown
Contributor

Looks good to me. It worked with my large .bib file without any issues.

@koppor

koppor commented Dec 6, 2015

Copy link
Copy Markdown
Member

@simonharrer Could you check what happens if you change the sort ordering of the file? Change the ordering. If everything is OK, change the ordering and change an entry. 🐇 I it doesn't work, maybe we drop the ordering possibilities of entries within JabRef now or leave that as open issue.

@simonharrer

Copy link
Copy Markdown
Contributor

What do you think is the problem here? When I change the sort ordering, the order of the entries is changed in the file. But nothing else is. And if I also change an entry, this entry is changed as well. I do not get what you imply.

koppor added a commit that referenced this pull request Dec 6, 2015
Write unmodified entries to bib file in the same format as they were read
@koppor koppor merged commit 741c40e into master Dec 6, 2015
@koppor koppor deleted the stable-serialization branch December 6, 2015 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants