Skip to content

Process figures,tables and equations from back/annex section#1215

Merged
lfoppiano merged 16 commits into
masterfrom
elifesciences-back-section-figure-tables-upstream
Sep 5, 2025
Merged

Process figures,tables and equations from back/annex section#1215
lfoppiano merged 16 commits into
masterfrom
elifesciences-back-section-figure-tables-upstream

Conversation

@lfoppiano

Copy link
Copy Markdown
Member

continue of Pull #738 without conflicts. Also aiming to fix the table/figures identifiers as discussed in #738 (comment)

de-code and others added 2 commits April 14, 2021 14:53
@coveralls

coveralls commented Dec 26, 2024

Copy link
Copy Markdown

Coverage Status

coverage: 40.394% (+0.1%) from 40.266%
when pulling a90e5b8 on elifesciences-back-section-figure-tables-upstream
into 960a8d5 on master.

@lfoppiano lfoppiano changed the title Elifesciences back section figure tables upstream Process figures,tables and equations from back/annex section Dec 26, 2024
…pstream

# Conflicts:
#	grobid-core/src/main/java/org/grobid/core/engines/FullTextParser.java
…pstream

# Conflicts:
#	grobid-core/src/main/java/org/grobid/core/data/Figure.java
#	grobid-core/src/main/java/org/grobid/core/document/TEIFormatter.java
#	grobid-core/src/main/java/org/grobid/core/engines/FullTextParser.java
@lfoppiano lfoppiano added this to the 0.9.0 milestone Aug 1, 2025
@lfoppiano lfoppiano marked this pull request as ready for review September 2, 2025 03:04
@lfoppiano lfoppiano requested a review from Copilot September 2, 2025 03:05

This comment was marked as outdated.

lfoppiano and others added 2 commits September 2, 2025 04:30
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@lfoppiano lfoppiano requested a review from Copilot September 2, 2025 03:52

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR extends the processing of figures, tables, and equations to the back/annex section of documents, which previously only processed them in the main body text. The changes also fix table/figure identifier numbering by maintaining sequential IDs across document sections.

  • Adds figure, table, and equation processing to the annex section using existing processing methods
  • Refactors code to use consistent variable naming and extract reusable methods
  • Implements sequential ID assignment for figures and tables across body and annex sections

Reviewed Changes

Copilot reviewed 3 out of 5 changed files in this pull request and generated 2 comments.

File Description
GrobidRestProcessFiles.java Removes unused imports
FullTextParser.java Main processing logic changes to handle annex figures/tables/equations and improve code organization
TEIFormatter.java Updates toTEIAnnex method signature to accept figures, tables, and equations parameters

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@@ -2257,158 +2319,163 @@ private static boolean testClosingTag(StringBuilder buffer,
* Process figures identified by the full text model
*/
protected List<Figure> processFigures(String rese, List<LayoutToken> layoutTokens) {
return processFigures(rese, layoutTokens,0);

Copilot AI Sep 2, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic number 0 should be replaced with a named constant for the default start figure ID.

Copilot uses AI. Check for mistakes.
String rese,
List<LayoutToken> tokenizations,
Document doc) {
return processTables(rese, tokenizations, doc, 0);

Copilot AI Sep 2, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic number 0 should be replaced with a named constant for the default start table ID.

Copilot uses AI. Check for mistakes.
@lfoppiano lfoppiano merged commit 1cd4ca3 into master Sep 5, 2025
13 checks passed
@lfoppiano lfoppiano deleted the elifesciences-back-section-figure-tables-upstream branch September 5, 2025 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants