Skip to content

pageGetTextLayout and pageGetTextLayoutForArea #289

@mjepronk

Description

@mjepronk

Hi,
I'm trying to parse a PDF document using the poppler-gi bindings. I have the following simple program that segfaults after reading a few pages:

{-# LANGUAGE OverloadedStrings #-}
module Main where

import GI.Poppler
import Data.Foldable (for_)

data BBox = BBox
    { x1 :: !Double
    , y1 :: !Double
    , x2 :: !Double
    , y2 :: !Double
    } deriving (Eq, Show)

fromRectangle :: Rectangle -> IO BBox
fromRectangle r = do
    x1 <- getRectangleX1 r
    y1 <- getRectangleY1 r
    x2 <- getRectangleX2 r
    y2 <- getRectangleY2 r
    pure (BBox x1 y1 x2 y2)

main :: IO ()
main = do
    doc <- documentNewFromFile "file:///home/mp/Downloads/hitchhikers_guide.pdf" Nothing
    num_pages <- documentGetNPages doc
    for_ [0..num_pages-1] $ \p -> do
        putStrLn $ "Page #" <> show p
        page <- documentGetPage doc p

        (_, xs) <- pageGetTextLayout page
        bbs <- traverse fromRectangle xs
        print bbs

I think that the problem is with memory management. The documentation tells me that I need to free the array given back by pageGetTextLayout, but I've no idea how to do that. Any help greatly appreciated. TIA!

Versions used:

  • ghc 8.8.3
  • gi-poppler-0.18.21
  • haskell-gi-0.23.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions