Process figures,tables and equations from back/annex section#1215
Conversation
minor intendation issue
…com/elifesciences/grobid into elifesciences-back-section-figure-tables-upstream
…pstream # Conflicts: # grobid-core/src/main/java/org/grobid/core/engines/FullTextParser.java
…pstream # Conflicts: # grobid-core/src/main/java/org/grobid/core/data/Figure.java # grobid-core/src/main/java/org/grobid/core/document/TEIFormatter.java # grobid-core/src/main/java/org/grobid/core/engines/FullTextParser.java
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull Request Overview
This PR extends the processing of figures, tables, and equations to the back/annex section of documents, which previously only processed them in the main body text. The changes also fix table/figure identifier numbering by maintaining sequential IDs across document sections.
- Adds figure, table, and equation processing to the annex section using existing processing methods
- Refactors code to use consistent variable naming and extract reusable methods
- Implements sequential ID assignment for figures and tables across body and annex sections
Reviewed Changes
Copilot reviewed 3 out of 5 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| GrobidRestProcessFiles.java | Removes unused imports |
| FullTextParser.java | Main processing logic changes to handle annex figures/tables/equations and improve code organization |
| TEIFormatter.java | Updates toTEIAnnex method signature to accept figures, tables, and equations parameters |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| @@ -2257,158 +2319,163 @@ private static boolean testClosingTag(StringBuilder buffer, | |||
| * Process figures identified by the full text model | |||
| */ | |||
| protected List<Figure> processFigures(String rese, List<LayoutToken> layoutTokens) { | |||
| return processFigures(rese, layoutTokens,0); | |||
There was a problem hiding this comment.
Magic number 0 should be replaced with a named constant for the default start figure ID.
| String rese, | ||
| List<LayoutToken> tokenizations, | ||
| Document doc) { | ||
| return processTables(rese, tokenizations, doc, 0); |
There was a problem hiding this comment.
Magic number 0 should be replaced with a named constant for the default start table ID.
continue of Pull #738 without conflicts. Also aiming to fix the table/figures identifiers as discussed in #738 (comment)