Skip to content

processHeaderDocument returns BibTeX by default instead of TEI #1093

@michamos

Description

@michamos

Hi, I noticed that, at least since v0.7.3, GROBID started returning bibtex by default for /api/processHeaderDocument. This contradicts https://grobid.readthedocs.io/en/latest/Grobid-service/#apiprocessheaderdocument which claims a special Accept: application/x-bibtex header must be used for BibTeX and that the default is TEI XML.

Note that it's possible to get an XML response by using Accept: application/xml.

Steps to reproduce

  1. Get a PDF (I used https://arxiv.org/pdf/2212.12604v1.pdf but anything will do)
  2. Make a request against the GROBID API. I used the HuggingFace demo API:
    curl https://kermitt2-grobid.hf.space/api/processHeaderDocument --form input=@Downloads/2212.12604v1.pdf
  3. See that the output contains BibTeX and not TEI XML:
@misc{-1,
  author = {},
  title = {Search for new physics in the τ lepton plus missing transverse momentum final state in proton-proton collisions at √ s = 13 TeV The CMS Collaboration},
  date = {2022-12-23},
  year = {2022},
  month = {12},
  day = {23},
  eprint = {arXiv:2212.12604v1[hep-ex]},
  abstract = {A search for physics beyond the standard model (SM) in the final state with a hadronically decaying tau lepton and a neutrino is presented. This analysis is based on data recorded by the CMS experiment from proton-proton collisions at a center-ofmass energy of 13 TeV at the LHC, corresponding to a total integrated luminosity of 138 fb-1. The transverse mass spectrum is analyzed for the presence of new physics. No significant deviation from the SM prediction is observed. Limits are set on the production cross section of a W boson decaying into a tau lepton and a neutrino. Lower limits are set on the mass of the sequential SM-like heavy charged vector boson and the mass of a quantum black hole. Upper limits are placed on the couplings of a new boson to the SM fermions. Constraints are put on a nonuniversal gauge interaction model and an effective field theory model. For the first time, upper limits on the cross section of t-channel leptoquark (LQ) exchange are presented. These limits are translated into exclusion limits on the LQ mass and on its coupling in the t-channel. The sensitivity of this analysis extends into the parameter space of LQ models that attempt to explain the anomalies observed in B meson decays. The limits presented for the various interpretations are the most stringent to date. Additionally, a model-independent limit is provided.}
}

Requested info

Linux amd64 through lfoppiano/grobid:0.7.3 Docker image & whatever huggingface is using

  • What is your Java version (java --version)?

openjdk 17.0.2 2022-01-18
OpenJDK Runtime Environment (build 17.0.2+8-86)
OpenJDK 64-Bit Server VM (build 17.0.2+8-86, mixed mode, sharing)

  • In case of build or run errors, please submit the error while running gradlew with --stacktrace and --info for better log traces (e.g. ./gradlew run --stacktrace --info) or attach the log file logs/grobid-service.log.

Metadata

Metadata

Assignees

Labels

bugFrom Hemiptera and especially its suborder Heteropteraneed helpIssues where the contributors are even more incompetent than usual

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions