CERCA – Citation Extraction & Reference Checking Assistant

CERCA is an open-source research tool that supports verification of bibliographic references in scientific manuscripts. It extracts references from PDF files and checks their existence and consistency against authoritative metadata sources, producing explainable diagnostics, audit logs, and reproducible reports.

Key Features

📄 Flexible Reference Input:
- Drag-and-Drop: Parse references automatically from PDF files.
- Manual Entry Paste reference lists directly for quick checks.
🔍 Reference verification using Crossref, OpenAlex and Zenodo metadata
📊 Match scores based on title, authors, and DOI similarity
Interactive Dashboard:
- View real-time Pass/Fail statistics and verification rates.
- Color-coded status badges for quick visual assessment.
📁 Export Data: Save verification reports for further analysis.
- 🧾CSV export for analysis
- 🧾 Diagnosis report (TXT)
🪵 Audit log for transparency and reproducibility
🔎 Right-click search (Google / Google Scholar) for manual inspection
🔒 Local privacy by design — PDFs never leave your machine

📦 How to Run

Windows

Download Cerca_windows.zip.
Unzip the file.
Double-click Cerca-1.0-alpha.jar.

If Windows shows a security warning, choose More info → Run anyway.

macOS

Download Cerca_mac.zip.
Unzip it.
Right-click Cerca-1.0-alpha.jar and select Open.
- Note: Since this is an unverified alpha app, you may need to go to System Settings > Privacy & Security to allow it to run.

Linux

Download Cerca_linux.zip.
Unzip it.
Open a terminal in that folder and run:
```
		java -jar Cerca-1.0-alpha.jar   
```

🛠 Requirements

CERCA is a Java desktop application built with JavaFX.

To run CERCA, you need:

Java 17 or newer
A Java Runtime Environment that includes JavaFX
I recommend installing Azul Zulu JRE with JavaFX

🔒Privacy & Local Processing

CERCA is designed with researcher privacy in mind.

All PDF parsing and reference extraction are performed locally
Manuscripts are never uploaded, stored, or shared
CERCA performs metadata-only lookups (e.g., DOI, title, authors)

How It Works

A PDF file is parsed locally to extract bibliographic references
Each reference is queried against:
- Crossref
- Zenodo
- OpenAlex
- SemanticScholar
Metadata fields (title, authors, DOI) are compared
CERCA assigns:
- A match score
- A status (PASS / CHECK / FAIL)
- A short diagnostic explanation
Results can be saved as:
- TXT report (diagnosis)
- CSV table
Audit logs
- Logs are saved for transparency and reproducibility

Status Definitions

PASS – Strong metadata agreement with authoritative sources
CHECK – Partial or ambiguous match; manual inspection recommended
FAIL – No reliable metadata match found at time of verification

CERCA is an experimental tool. It does not replace manual verification.

Outputs

CERCA generates the following artifacts:

TXT report – Summary and per-reference diagnostics
CSV file – Structured results for analysis or editorial review
Audit log – Timestamped record of verification steps

These outputs support reproducibility, transparency, and review documentation.

Intended Use

CERCA is intended for:

Researchers performing final manuscript checks
Reviewers assessing reference consistency
Editors supporting editorial quality control
Meta-research and reproducibility workflows

Limitations

Verification depends on availability and correctness of external metadata
Some valid references (e.g., books, technical reports, older works) may not be indexed
Match scores are heuristic and intended to support human analysis

License

This project is licensed under the
GNU Affero General Public License, Version 3.0 (AGPL-3.0).

See the LICENSE file for details.

Third-Party Credits

This software uses the CERMINE library, licensed under GNU AGPL v3.

Dominika Tkaczyk, Paweł Szostek, Mateusz Fedoryszak,
Piotr Jan Dendek, Łukasz Bolikowski

CERMINE: automatic extraction of structured metadata from scientific literature.
International Journal on Document Analysis and Recognition, 2015,
Vol. 18, No. 4, pp. 317–335, DOI: 10.1007/s10032-015-0249-8

💻 Contributing

CERCA is an open-source initiative, and contributions are welcomed.

How You Can Help

🐛 Report Bugs: If you encounter a bug, please open an issue. Describe what happened, and what you expected to happen so it can be easily reproduced.
💡 Suggest Features: Have an idea to improve the tool? Open an issue to start a discussion!
🔧 Development: We gladly welcome Pull Requests (PRs) for new features and bug fixes.

Citation

If you use CERCA in your research, please cite it as research software.

Author

Lidiany Cerqueira, PhD
Computer Science Researcher

Acknowledgments

CERCA was developed to support rigorous, transparent, and responsible research practices.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.github/workflows		.github/workflows
docs		docs
lib		lib
src		src
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CERCA – Citation Extraction & Reference Checking Assistant

Key Features

📦 How to Run

Windows

macOS

Linux

🛠 Requirements

🔒Privacy & Local Processing

How It Works

Status Definitions

Outputs

Intended Use

Limitations

License

Third-Party Credits

💻 Contributing

How You Can Help

Citation

Author

Acknowledgments

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CERCA – Citation Extraction & Reference Checking Assistant

Key Features

📦 How to Run

Windows

macOS

Linux

🛠 Requirements

🔒Privacy & Local Processing

How It Works

Status Definitions

Outputs

Intended Use

Limitations

License

Third-Party Credits

💻 Contributing

How You Can Help

Citation

Author

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages