Documentation
¶
Overview ¶
Package goarxiv provides an idiomatic Go SDK for the arXiv API.
Terms of Use and Attribution ¶
All usage of this package must comply with the arXiv API Terms of Use: https://arxiv.org/help/api/tou. When redistributing work derived from arXiv metadata, include the attribution "Thank you to arXiv for use of its open access interoperability." Article PDFs/source files retain their original licenses; check individual articles for redistribution policies.
Rate Limiting ¶
arXiv enforces a mandatory 3-second delay between requests. `Client` enforces this automatically. For local testing against mock servers you may opt into `WithDebugMode`, but never use that mode against the real API because it violates the ToU.
Caching Guidance ¶
arXiv refreshes metadata daily. Cache search responses (for at least 24 hours) to reduce load. The SDK does not include a cache layer so you can integrate with your preferred caching solution.
Quick Start ¶
client, err := goarxiv.New()
if err != nil {
log.Fatal(err)
}
results, err := client.Search(ctx, "all:quantum", nil)
if err != nil {
log.Fatal(err)
}
for _, article := range results.Articles {
fmt.Printf("%s — %s\n", article.ID, article.Title)
}
Pagination ¶
Use `Client.SearchAll`, `Client.StreamResults`, or `Client.Iterate` to traverse multiple pages. The client enforces arXiv's global limit of 30,000 total results per query.
Downloads & Exports ¶
`DownloadPDF` respects rate limiting and provides optional progress callbacks. `Article` helpers convert metadata to BibTeX, JSON, or CSV for downstream workflows.
Observability ¶
Register hooks via `WithRequestHook`/`WithResponseHook` to integrate logging, metrics, or tracing without forking the SDK.
Index ¶
- Constants
- Variables
- func ArticlesToCSV(articles []*Article) ([]byte, error)
- func GetCategoryField(code string) string
- func GetFieldCategories(field string) []string
- func IsValidArxivID(id string) bool
- func IsValidCategory(code string) bool
- func NormalizeArxivID(id string) (string, error)
- func ParseArxivID(id string) (string, int, error)
- func RequireCategory(code string) error
- func SearchCategories(keyword string) []string
- type Article
- type ArticleResult
- type Author
- type CategoryInfo
- type Client
- func (c *Client) BaseURL() string
- func (c *Client) DownloadPDF(ctx context.Context, article *Article, opts *DownloadOptions) error
- func (c *Client) DownloadPDFs(ctx context.Context, articles []*Article, opts *DownloadOptions) error
- func (c *Client) GetByID(ctx context.Context, id string) (*Article, error)
- func (c *Client) GetByIDs(ctx context.Context, ids []string) ([]*Article, error)
- func (c *Client) IsDebugMode() bool
- func (c *Client) Iterate(query string, pageSize int, opts *SearchOptions) *Iterator
- func (c *Client) Search(ctx context.Context, query string, opts *SearchOptions) (*SearchResults, error)
- func (c *Client) SearchAll(ctx context.Context, query string, maxTotal int, opts *SearchOptions) ([]*Article, error)
- func (c *Client) StreamResults(ctx context.Context, query string, pageSize int, opts *SearchOptions) <-chan ArticleResult
- type DownloadOptions
- type Error
- type FixedWindowLimiter
- type Iterator
- type Link
- type Option
- func WithBaseURL(url string) Option
- func WithDebugMode() Option
- func WithHTTPClient(c *http.Client) Option
- func WithRateLimit(delay time.Duration) Option
- func WithRateLimiter(limiter RateLimiter) Option
- func WithRequestHook(hook RequestHook) Option
- func WithResponseHook(hook ResponseHook) Option
- func WithRetries(maxRetries int) Option
- func WithTimeout(timeout time.Duration) Option
- func WithUserAgent(extra string) Option
- type Query
- func (q *Query) Encode() url.Values
- func (q *Query) IDs(ids ...string) *Query
- func (q *Query) MaxResults(limit int) *Query
- func (q *Query) Search(clause string) *Query
- func (q *Query) Sort(by SortBy, order SortOrder) *Query
- func (q *Query) Start(start int) *Query
- func (q *Query) Where(builder *QueryBuilder) *Query
- type QueryBuilder
- func (b *QueryBuilder) Abstract(text string) *QueryBuilder
- func (b *QueryBuilder) AllFields(text string) *QueryBuilder
- func (b *QueryBuilder) And() *QueryBuilder
- func (b *QueryBuilder) AndNot() *QueryBuilder
- func (b *QueryBuilder) Author(name string) *QueryBuilder
- func (b *QueryBuilder) Build() string
- func (b *QueryBuilder) Category(code string) *QueryBuilder
- func (b *QueryBuilder) Comment(text string) *QueryBuilder
- func (b *QueryBuilder) HasClauses() bool
- func (b *QueryBuilder) JournalRef(ref string) *QueryBuilder
- func (b *QueryBuilder) Or() *QueryBuilder
- func (b *QueryBuilder) Raw(clause string) *QueryBuilder
- func (b *QueryBuilder) ReportNumber(rn string) *QueryBuilder
- func (b *QueryBuilder) SubmittedAfter(t time.Time) *QueryBuilder
- func (b *QueryBuilder) SubmittedBefore(t time.Time) *QueryBuilder
- func (b *QueryBuilder) SubmittedBetween(start, end time.Time) *QueryBuilder
- func (b *QueryBuilder) Title(text string) *QueryBuilder
- func (b *QueryBuilder) TitleExact(text string) *QueryBuilder
- func (b *QueryBuilder) Validate() error
- type RateLimiter
- type RequestHook
- type ResponseHook
- type SearchOptions
- type SearchResults
- type SortBy
- type SortOrder
Constants ¶
const ( MaxResultsPerRequest = 2000 MaxResultsTotal = 30000 )
const MinRateLimit = 3 * time.Second
MinRateLimit defines the minimum delay enforced by arXiv ToU.
Variables ¶
var ( // Categories maps a category code to its metadata. Categories map[string]CategoryInfo // FieldCategories lists category codes grouped by high-level field. FieldCategories map[string][]string )
var ( // ErrInvalidID indicates the provided identifier failed validation. ErrInvalidID = errors.New("arxiv: invalid ID format") // ErrRateLimit indicates the API's rate limit has been exceeded. ErrRateLimit = errors.New("arxiv: rate limit exceeded") // ErrMaxResults indicates a request exceeded the maximum total items allowed. ErrMaxResults = errors.New("arxiv: max_results cannot exceed 30000") // ErrNetworkTimeout captures transport timeouts when reaching arXiv. ErrNetworkTimeout = errors.New("arxiv: network timeout") // ErrNotImplemented marks APIs that are still under construction. ErrNotImplemented = errors.New("goarxiv: not implemented") )
Functions ¶
func ArticlesToCSV ¶
ArticlesToCSV converts the provided article slice into CSV bytes.
func GetCategoryField ¶
GetCategoryField returns the high-level field for a code.
func GetFieldCategories ¶
GetFieldCategories returns a copy of codes for the specified field (case-insensitive).
func IsValidArxivID ¶
IsValidArxivID validates both legacy and modern arXiv identifiers.
func IsValidCategory ¶
IsValidCategory verifies that the code exists in the taxonomy.
func NormalizeArxivID ¶
NormalizeArxivID trims whitespace and validates the identifier format.
func ParseArxivID ¶
ParseArxivID splits the identifier into base ID and version number.
func RequireCategory ¶
RequireCategory ensures the code exists or returns an error.
func SearchCategories ¶
SearchCategories performs a case-insensitive search across code, name, and description.
Types ¶
type Article ¶
type Article struct {
ID string
Title string
Summary string
Authors []Author
Published time.Time
Updated time.Time
PrimaryCategory string
Categories []string
Links []Link
Comment *string
JournalRef *string
DOI *string
}
Article represents a single entry returned by the arXiv API.
func (Article) AbstractURL ¶
AbstractURL returns the abstract URL for the article.
type ArticleResult ¶
ArticleResult represents a streamed pagination value.
type CategoryInfo ¶
type CategoryInfo struct {
Code string `json:"code"`
Name string `json:"name"`
Description string `json:"description"`
Field string `json:"field"`
}
CategoryInfo describes an arXiv subject classification entry.
func GetCategoryInfo ¶
func GetCategoryInfo(code string) (*CategoryInfo, error)
GetCategoryInfo returns metadata for the provided code.
func ListCategories ¶
func ListCategories() []CategoryInfo
ListCategories returns a copy of all category infos sorted by code.
type Client ¶
type Client struct {
// contains filtered or unexported fields
}
Client exposes high-level methods for interacting with the arXiv API.
func (*Client) DownloadPDF ¶
DownloadPDF downloads the PDF for a single article.
func (*Client) DownloadPDFs ¶
func (c *Client) DownloadPDFs(ctx context.Context, articles []*Article, opts *DownloadOptions) error
DownloadPDFs downloads PDFs for multiple articles sequentially.
func (*Client) IsDebugMode ¶
IsDebugMode reports whether the client is bypassing rate limiting safeguards.
func (*Client) Iterate ¶
func (c *Client) Iterate(query string, pageSize int, opts *SearchOptions) *Iterator
Iterate constructs an Iterator that pages through results up to 30k articles.
func (*Client) Search ¶
func (c *Client) Search(ctx context.Context, query string, opts *SearchOptions) (*SearchResults, error)
Search executes a query with the provided options and returns a single page.
func (*Client) SearchAll ¶
func (c *Client) SearchAll(ctx context.Context, query string, maxTotal int, opts *SearchOptions) ([]*Article, error)
SearchAll returns up to maxTotal results for a query by automatically paging through responses.
func (*Client) StreamResults ¶
func (c *Client) StreamResults(ctx context.Context, query string, pageSize int, opts *SearchOptions) <-chan ArticleResult
type DownloadOptions ¶
type DownloadOptions struct {
OutputDir string
AllowOverwrite bool
Progress func(downloaded, total int64)
}
DownloadOptions configures PDF download behavior.
type Error ¶
Error wraps arXiv-specific metadata around a root cause while remaining compatible with errors.Is/As.
type FixedWindowLimiter ¶
type FixedWindowLimiter struct {
// contains filtered or unexported fields
}
FixedWindowLimiter enforces a minimum delay between requests.
func NewRateLimiter ¶
func NewRateLimiter(interval time.Duration, debug bool) *FixedWindowLimiter
NewRateLimiter returns a limiter configured with the desired interval and debug mode.
func (*FixedWindowLimiter) IsDebugMode ¶
func (l *FixedWindowLimiter) IsDebugMode() bool
IsDebugMode reports whether the limiter bypasses waiting.
type Iterator ¶
type Iterator struct {
// contains filtered or unexported fields
}
Iterator streams articles across paginated API responses.
func (*Iterator) TotalResults ¶
TotalResults returns the total number of matches reported by arXiv.
type Option ¶
type Option func(*config) error
Option mutates Client configuration.
func WithBaseURL ¶
WithBaseURL overrides the default arXiv endpoint.
func WithDebugMode ¶
func WithDebugMode() Option
WithDebugMode disables rate limiting safeguards. WARNING: violates arXiv ToU; testing only.
func WithHTTPClient ¶
WithHTTPClient injects a custom HTTP client.
func WithRateLimit ¶
WithRateLimit sets the minimum delay between requests (floor at arXiv's 3s limit).
func WithRateLimiter ¶
func WithRateLimiter(limiter RateLimiter) Option
WithRateLimiter supplies a custom rate limiter implementation (advanced usage).
func WithRequestHook ¶
func WithRequestHook(hook RequestHook) Option
WithRequestHook registers a hook invoked before each HTTP request.
func WithResponseHook ¶
func WithResponseHook(hook ResponseHook) Option
WithResponseHook registers a hook invoked after each HTTP response.
func WithRetries ¶
WithRetries sets the number of retry attempts for transient errors.
func WithTimeout ¶
WithTimeout configures the HTTP client timeout used for requests.
func WithUserAgent ¶
WithUserAgent appends identifying text to the default SDK User-Agent string.
type Query ¶
type Query struct {
// contains filtered or unexported fields
}
Query represents a fluent builder for arXiv query parameters.
func (*Query) MaxResults ¶
MaxResults caps the number of results returned.
func (*Query) Where ¶
func (q *Query) Where(builder *QueryBuilder) *Query
Where attaches the result of a QueryBuilder to this query.
type QueryBuilder ¶
type QueryBuilder struct {
// contains filtered or unexported fields
}
QueryBuilder provides a fluent API for constructing search_query strings with field prefixes.
func NewQueryBuilder ¶
func NewQueryBuilder() *QueryBuilder
NewQueryBuilder constructs a builder with no clauses.
func (*QueryBuilder) Abstract ¶
func (b *QueryBuilder) Abstract(text string) *QueryBuilder
Abstract searches within the abstract field (abs: prefix).
func (*QueryBuilder) AllFields ¶
func (b *QueryBuilder) AllFields(text string) *QueryBuilder
AllFields performs a generic search across title, abstract, and comments (all: prefix).
func (*QueryBuilder) And ¶
func (b *QueryBuilder) And() *QueryBuilder
And inserts a logical AND between clauses.
func (*QueryBuilder) AndNot ¶
func (b *QueryBuilder) AndNot() *QueryBuilder
AndNot inserts a logical AND NOT between clauses.
func (*QueryBuilder) Author ¶
func (b *QueryBuilder) Author(name string) *QueryBuilder
Author searches within the author field (au: prefix).
func (*QueryBuilder) Build ¶
func (b *QueryBuilder) Build() string
Build returns the final query string (already URL-safe).
func (*QueryBuilder) Category ¶
func (b *QueryBuilder) Category(code string) *QueryBuilder
Category filters results by arXiv subject category (cat: prefix).
func (*QueryBuilder) Comment ¶
func (b *QueryBuilder) Comment(text string) *QueryBuilder
Comment searches within the comments field (co: prefix).
func (*QueryBuilder) HasClauses ¶
func (b *QueryBuilder) HasClauses() bool
HasClauses reports whether the builder contains any query clauses.
func (*QueryBuilder) JournalRef ¶
func (b *QueryBuilder) JournalRef(ref string) *QueryBuilder
JournalRef searches within the journal reference field (jr: prefix).
func (*QueryBuilder) Or ¶
func (b *QueryBuilder) Or() *QueryBuilder
Or inserts a logical OR between clauses.
func (*QueryBuilder) Raw ¶
func (b *QueryBuilder) Raw(clause string) *QueryBuilder
Raw appends a literal clause, useful for advanced filters.
func (*QueryBuilder) ReportNumber ¶
func (b *QueryBuilder) ReportNumber(rn string) *QueryBuilder
ReportNumber searches within the report number field (rn: prefix).
func (*QueryBuilder) SubmittedAfter ¶
func (b *QueryBuilder) SubmittedAfter(t time.Time) *QueryBuilder
SubmittedAfter restricts submissions to those after the provided time.
func (*QueryBuilder) SubmittedBefore ¶
func (b *QueryBuilder) SubmittedBefore(t time.Time) *QueryBuilder
SubmittedBefore restricts submissions to those before the provided time.
func (*QueryBuilder) SubmittedBetween ¶
func (b *QueryBuilder) SubmittedBetween(start, end time.Time) *QueryBuilder
SubmittedBetween restricts submissions to a specific time window.
func (*QueryBuilder) Title ¶
func (b *QueryBuilder) Title(text string) *QueryBuilder
Title searches within the title field (ti: prefix).
func (*QueryBuilder) TitleExact ¶
func (b *QueryBuilder) TitleExact(text string) *QueryBuilder
TitleExact searches for an exact phrase within the title.
func (*QueryBuilder) Validate ¶
func (b *QueryBuilder) Validate() error
Validate checks for empty queries, invalid categories, and malformed date ranges.
type RateLimiter ¶
RateLimiter gates outbound requests to honor API constraints.
type RequestHook ¶
RequestHook runs before an HTTP request is sent.
type ResponseHook ¶
ResponseHook runs after an HTTP response is received.
type SearchOptions ¶
SearchOptions controls pagination and sorting for arXiv queries.
type SearchResults ¶
type SearchResults struct {
Articles []Article
TotalResults int
StartIndex int
ItemsPerPage int
Query string
}
SearchResults wraps paginated metadata returned from the API.
func ParseSearchResults ¶
func ParseSearchResults(r io.Reader) (SearchResults, error)
ParseSearchResults converts Atom XML into typed SearchResults structures.