Whitespace issues in extract_text()

I am not able to read text which proper formatting and spaces are not handled during extraction:

PreemptiveInformationExtractionusingUnrestrictedRelationDiscoveryYusukeShinyamaSatoshiSekineNewYorkUniversity715,Broadway,7thFloorNewYork,NY,10003fyusuke,sekineg@cs.nyu.eduAbstractWearetryingtoextendtheboundaryofInformationExtraction(IE)systems.Ex-istingIEsystemsrequirealotoftimeandhumanefforttotuneforanewscenario.

Is it true that pypdf2 is not format aware as given here: http://victorwyee.com/python/convert-pdf-to-text-pypdf-pdfminer-first-impression/


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Whitespace issues in extract_text() #42

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Whitespace issues in extract_text() #42

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions