Documentation
¶
Index ¶
- Constants
- Variables
- func DebugResultWriter(result *Result, _ *Options) error
- func DefaultResultWriter(result *Result, opts *Options) error
- func IsValidService(name string) bool
- type Doer
- type Grobid
- func (g *Grobid) Ping() error
- func (g *Grobid) Pingmoji() string
- func (g *Grobid) ProcessDirRecursive(dir, service string, numWorkers int, rf ResultFunc, opts *Options) error
- func (g *Grobid) ProcessPDF(filename, service string, opts *Options) (*Result, error)
- func (g *Grobid) ProcessPDFContext(ctx context.Context, filename, service string, opts *Options) (*Result, error)
- func (g *Grobid) ProcessText(filename, service string, opts *Options) (*Result, error)
- type Options
- type Result
- type ResultFunc
Constants ¶
const DefaultExt = "grobid.tei.xml"
DefaultExt for structured metadata outputs.
Variables ¶
var DefaultOptions = &Options{ GenerateIDs: true, ConsolidateHeader: true, ConsolidateCitations: true, IncludeRawCitations: true, IncludeRawAffiliations: true, TEICoordinates: []string{"ref", "figure", "persName", "formula", "biblStruct"}, SegmentSentences: true, Force: false, Verbose: false, OutputDir: "", CreateHashSymlinks: false, }
DefaultOptions to send to GROBID.
var ErrInvalidService = errors.New("invalid service")
ErrInvalidService, if the service name is not known.
var ValidServices = []string{
"processFulltextDocument",
"processHeaderDocument",
"processReferences",
"processCitationList",
"processCitationPatentST36",
"processCitationPatentPDF",
}
ValidServices, see also: https://grobid.readthedocs.io/en/latest/Grobid-service/#grobid-web-services
var Version = "0.2.5"
Version of grobidclient.
Functions ¶
func DebugResultWriter ¶
DebugResultWriter is a dummy result writer, which only logs the result.
func DefaultResultWriter ¶
DefaultResultWriter is a ResultFunc that writes out a single file with the result. It contains handling to write out error results akin to the Python grobid client library.
func IsValidService ¶
IsValidService returns true, if the service name is valid.
Types ¶
type Grobid ¶
Grobid client, with an own HTTP client for flexibility.
func New ¶ added in v0.2.0
New creates a new Grobid client with a recommended, resilient HTTP client.
func (*Grobid) ProcessDirRecursive ¶
func (g *Grobid) ProcessDirRecursive(dir, service string, numWorkers int, rf ResultFunc, opts *Options) error
ProcessDirRecursive recursively walks a given directory "dir" and run parsing using "service" on each file. A number of workers can be started and a ResultFunc can be specified, which gets called for each result, e.g. to write debug output to stderr or to write a file with the structured metadata to disk. Options contain options to be passed to GROBID API, using defaults if they are not set.
func (*Grobid) ProcessPDF ¶
ProcessPDF processes a single PDF with given options. Result contains the HTTP status code, indicating success or failure.
type Options ¶
type Options struct {
GenerateIDs bool
ConsolidateHeader bool
ConsolidateCitations bool
IncludeRawCitations bool
IncludeRawAffiliations bool
TEICoordinates []string // https://grobid.readthedocs.io/en/latest/Coordinates-in-PDF/
SegmentSentences bool
Force bool
Verbose bool
OutputDir string
CreateHashSymlinks bool
}
Options are grobid API options. Full documentation can be found at https://grobid.readthedocs.io/en/latest/Grobid-service/#grobid-web-services.
type Result ¶
type Result struct {
Filename string
SHA1Hex string
StatusCode int
Body []byte
Err error
ProcessingTime time.Duration
}
Result wraps a server response, not necessarily successful. If processing failed, Err will contain the first error encountered.
func (*Result) StringBody ¶
StringBody returns the response body as string.
type ResultFunc ¶ added in v0.2.3
ResultFunc is a function invoked on the result of the processing.