arc-library

A CLI tool to organize, tag, and annotate your research library. Works with documents of any type: research papers, books, articles, videos, notes, and more. Designed for personal knowledge management, research, and learning.

Features

Multiple document types: papers, books, articles, videos, notes, code repos
Flexible import: from directories (with meta.yaml), direct PDF import, DOI resolution
Full-text search: SQLite FTS5 for fast text search (SQL backend)
PDF text extraction: optional full-text indexing (requires pdftotext)
Metadata enrichment: auto-fill metadata from Crossref via DOI
Tags & collections: organize documents by topic or project
Annotations: highlights, notes, bookmarks with page/position
Reading sessions: track time spent reading, pages per session
Statistics: overview of your library usage
Multiple storage backends: SQL (default), KV (JSON), or memory (stateless)
CLI-first, composable: integrates with other arc tools

Installation

go install github.com/mtreilly/arc-library@latest

Or build from source:

git clone https://github.com/mtreilly/arc-library.git
cd arc-library
go build -o arc-library .

Quick Start

Import Documents

From a meta directory (created by arc-arxiv)

arc-library import ~/papers/2304.00067
arc-library import ~/papers --tag ml --collection "thesis"

Import a PDF directly

arc-library import paper.pdf --title "My Paper" --authors "Alice, Bob" --tag ml

With full-text extraction and DOI resolution:

arc-library import paper.pdf --extract-text --doi 10.1234/5678 --resolve-doi

You can also import all PDFs in a directory:

arc-library import ~/downloads --extract-text --tag unread

Organize

# Tag documents
arc-library tag add <doc-id> ml nlp attention
arc-library tag remove <doc-id> obsolete

# Create and manage collections
arc-library collection create "project-x" --description "Papers for project X"
arc-library collection add "project-x" <doc-id>
arc-library collection show "project-x"

Search & Discover

# List documents
arc-library list --tag ml --source arxiv

# Full-text search (SQL backend)
arc-library search "transformer attention" --limit 20

# Find documents by metadata
arc-library list --type book

Annotate

# Add an annotation
arc-library annotate add <doc-id> "Important insight" --page 12 --color "#ff0000"

# List annotations for a document
arc-library annotate list <doc-id>

# Delete annotation
arc-library annotate delete <annotation-id>

Track Reading

# Start a reading session
arc-library session start <doc-id>

# End the session (record pages read, notes)
arc-library session end <session-id> --pages 10 --notes "Read intro"

# List sessions
arc-library session list --document <doc-id>
arc-library session list --limit 10

Statistics

arc-library stats

Shows document counts by type, tag cloud size, collections, annotations, reading sessions, pages read.

Document Types

paper: arXiv, conference, journal articles (default)
book: textbooks, monographs
article: web articles, blog posts
video: lecture videos, tutorials
note: user-created notes (Markdown, text)
repo: git repositories
other: anything else

Specify with --type flag when importing.

PDF Import Options

--extract-text: extract full text using pdftotext (poppler-utils). Enables full-text search.
--doi <doi>: assign a DOI to the document (e.g., 10.1234/5678)
--resolve-doi: fetch metadata from Crossref (requires --doi)
--title, --authors, --abstract: manual metadata (otherwise filename used)

Storage Backends

Control with ARC_LIBRARY_STORAGE environment variable:

sql (default): Relational SQLite schema with FTS5. Best performance for large libraries (>10k docs).
kv: JSON documents in a key-value store. Simpler, portable, good for small libraries (<1k docs).
memory: In-memory only. Useful for quick queries or when persistence not needed.

The default SQLite file is at ~/.local/share/arc/arc.db.

Data Model

Documents: core entity, with flexible metadata (type, source, source_id, title, authors, abstract, full_text, tags, notes, rating, status, meta)
Collections: named groups of documents (many-to-many)
Annotations: per-document highlights/notes with position and color
ReadingSessions: start/end timestamps, pages read, notes
Tags: simple string tags, counted across library

Example Workflows

Literature review

# Import all papers from a directory (with meta.yaml)
arc-library import ~/arxiv-papers --tag literature-review

# Search for relevant papers
arc-library search "neural networks" --tag literature-review

# Create a collection for the review
arc-library collection create "lit-req-2025"
arc-library collection add "lit-req-2025" <paper-id>

# Add notes as you read
arc-library annotate add <paper-id> "Key contribution: ..." --page 3

Student learning

# Import lecture PDFs
arc-library import ~/lectures --extract-text --tag course

# Track reading progress
arc-library session start <lecture-id>
# ... read ...
arc-library session end <session-id> --pages 5 --notes "Understood main concepts"

# Generate flashcards (future)
# arc-library flashcard generate --from-annotations

# View stats to see progress
arc-library stats

Research with books

# Import a book (PDF or epub)
arc-library import book.pdf --type book --title "Deep Learning" --authors "Goodfellow et al." --tag reference

# Create a project collection
arc-library collection create "thesis-chapter-2"

# Add relevant chapters or notes as you read

Advanced Usage

Full-text search

After importing PDFs with --extract-text, use the search command to find content anywhere in the full text:

arc-library search "backpropagation" --type paper

This uses SQLite FTS5 for fast, relevance-ranked search across titles, abstracts, notes, and full text.

Duplicate detection

Find potential duplicates using title similarity and source IDs:

arc-library duplicates --threshold 0.75

Pairs with matching DOIs/arXiv IDs are flagged automatically. Tune the threshold to control strictness.

Crossref DOI resolution

If you have a DOI, you can auto-populate metadata:

arc-library import paper.pdf --doi 10.1234/5678 --resolve-doi

This fetches title, authors, abstract, and publication year.

Flashcards (Spaced Repetition)

Transform your annotations or create new cards for active recall learning:

# Create a basic flashcard
arc-library flashcard add --document <doc-id> --front "What is the capital of France?" --back "Paris" --tag geography

# Create a cloze deletion card
arc-library flashcard add --document <doc-id> --type cloze --cloze "The capital of France is {{c1::Paris}}" --tag geography

# List all due cards
arc-library flashcard due

# Review a card (rate recall 0-5)
arc-library flashcard review <card-id> --quality 4

# List all cards for a document
arc-library flashcard list --document <doc-id>

# Delete a card
arc-library flashcard delete <card-id>

The flashcard system uses the SM-2 algorithm (like Anki) to schedule reviews. Cards automatically update their due date based on your rating quality.

AI Analysis

Leverage the arc-ai daemon with your Pi-agent to get summaries and answers about your documents:

# Generate a summary of a document
arc-library ai summary <doc-id>

# Optionally store the summary in document metadata
arc-library ai summary <doc-id> --store

# Ask a question about a document
arc-library ai qna <doc-id> "What is the main contribution of this paper?"

# Combine with full-text extraction
arc-library import paper.pdf --extract-text
arc-library ai summary <doc-id>

Make sure arc-ai is running in daemon mode: arc-ai start

Reading goals

Use arc-library stats to see how much you've been reading:

Documents:     142
By type:       paper: 120, book: 15, article: 7
Tags:          23 unique
Collections:   5
Annotations:   87
Reading sessions: 42
Pages read:    1234

Export formats

Export your library data to interchange formats:

# BibTeX for LaTeX/BibLaTeX
arc-library export --format bibtex > library.bib

# Markdown (Obsidian, note apps)
arc-library export --format markdown > library.md

# RIS for Zotero, EndNote, Mendeley
arc-library export --format ris > library.ris

# JSON for custom processing
arc-library export --format json > library.json

# Filter exports by tag, collection, source, type
arc-library export --format bibtex --tag "to-read" > toread.bib

The Markdown export includes annotations and can be imported into Obsidian or other PKM tools.

Back up your library

The database file is a single SQLite file. Copy it to back up:

cp ~/.local/share/arc/arc.db ~/backups/arc-$(date +%F).db

Your actual document files remain on the filesystem; the library only stores metadata and indexes.

Related Tools

arc-arxiv - Fetch papers from arXiv with meta.yaml
arc-ai - AI-powered summarization and Q&A via the Pi coding agent
arc-db - Database tools for arc libraries

Design Principles

Stateless modules: Storage is optional; can run entirely in-memory
CLI-first: All features accessible from the command line
Composable: Works with standard Unix tools (grep, find, etc.)
Offline-first: No mandatory cloud APIs; your data stays on your machine
Minimal dependencies: Uses system tools (pdftotext) optionally

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
internal		internal
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Folders and files

Latest commit

History

Repository files navigation

arc-library

Features

Installation

Quick Start

Import Documents

From a meta directory (created by arc-arxiv)

Import a PDF directly

Organize

Search & Discover

Annotate

Track Reading

Statistics

Document Types

PDF Import Options

Storage Backends

Data Model

Example Workflows

Literature review

Student learning

Research with books

Advanced Usage

Full-text search

Duplicate detection

Crossref DOI resolution

Flashcards (Spaced Repetition)

AI Analysis

Reading goals

Export formats

Back up your library

Related Tools

Design Principles

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages