Skip to content

Use both lattice and columns options #117

@jscottNRG

Description

@jscottNRG

Is your feature request related to a problem? Please describe.
Lattice=True does not work with a specific document because the table does not have visible vertical column lines. I'm using area and columns options to specify the portion of the page to consider and the x coordinates for the column boundaries, which works well. However, I think word wrapping in the cells is an issue, and each row is split into multiple rows.

Describe the solution you'd like
I'd like to use the lattice option to allow tabula to detect row boundaries, while also using the columns option to specify where the column boundaries are. I'm open to alternatives that would achieve what I need, though.

Describe alternatives you've considered
I've tried using just the lattice option, which works as designed to detect rows, but combines all columns. I.e. I get an extracted dataframe that is a single column with the correct number of rows. If I use the columns option instead of lattice, I get the right number of columns but a lot of extra rows.

Additional context
Python 3.6.5
Java 1.8.0_181

PDF document I'm working on is here:
Projects.pdf

lattice option method:
lattice option

columns option method:
columns option

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions