Skip to content

Added lookahead combinators that allow conditional consumption of tags.#96

Closed
merijn wants to merge 1 commit intosnoyberg:masterfrom
merijn:lookahead
Closed

Added lookahead combinators that allow conditional consumption of tags.#96
merijn wants to merge 1 commit intosnoyberg:masterfrom
merijn:lookahead

Conversation

@merijn
Copy link
Copy Markdown
Contributor

@merijn merijn commented Feb 3, 2017

When dealing with messier XML scraped from the web I occasionally don't know exactly which tags I need to accept, so I find it useful to write some catch-all combinators that output "unexpected" tags. Now, there is a takeAllTreesContent, but that consume anything and as a result makes a rather poor fallback in, for example, uses of many.

The newly added lookahead and friends allow me to conditionally consume an entire tree/tag. One example would be something like:


debugParser :: MonadThrow m => Name -> ConduitM Event o m ()
debugParser name = lookaheadPredicateIgnoreAttrs (not . (==name)) $ do
    t <- renderText def .| fold
   T.putStrLn t

parseAll = do
    manyYield $ choose [fooParser, barParser, debugParser "quux"]
    tagName "quux"  {- ... -}

Here debugParser consumes any tag that is not "quux" and plugs nicely into renderText from Text.XML.Stream.Render to display the offending tags.

@merijn
Copy link
Copy Markdown
Contributor Author

merijn commented Feb 3, 2017

Actually, this patch as-is is broken, since it depends on takeAllTreesContent working as described in the docs, which it doesn't, as mentioned in #98

@k0ral
Copy link
Copy Markdown
Collaborator

k0ral commented Feb 5, 2017

It makes sense, I don't think there's a way to do that with the existing parsers/combinators. I will just try and find a different name for the new functions, as lookahead wrongly suggests that nothing is consumed from the stream. I'll probably also rename takeAllTreesContent accordingly, since it clearly falls into the same category of parsers.

@k0ral
Copy link
Copy Markdown
Collaborator

k0ral commented Feb 8, 2017

After thinking about it, I intend to implement the following:

-- Consume-and-yield events of a single tag tree, as long as it matches given name and attribute parsers
takeTreesContent :: MonadThrow m => NameMatcher a -> AttrParser b -> ConduitM Event Event m (Maybe ())

takeAllTreesContent = takeTreesContent anyName ignoreAttrs

(I've introduced NameMatcher in 48e68c2)

This should cover your use case, as follows:

debugParser :: MonadThrow m => Name -> ConduitM Event o m ()
debugParser name = do
  t <- takeTreesContent (matching (/= name)) ignoreAttrs .| renderBytes def .| foldC
  if null t
  then return Nothing
  else putStrLn t >> return (Just ())

parseAll = do
    manyYield $ choose [fooParser, barParser, debugParser "quux"]
    tagName "quux"  {- ... -}

(I didn't typecheck it, but you get the idea)

If you're okay with that approach, I'll implement it and make a release out of everything you've contributed.

@merijn
Copy link
Copy Markdown
Contributor Author

merijn commented Feb 9, 2017

Yeah, that looks fine for my usecase(s).

@k0ral
Copy link
Copy Markdown
Collaborator

k0ral commented Feb 12, 2017

Implemented in release 1.5.0 (not yet on Hackage). Thank you !

@k0ral k0ral closed this Feb 12, 2017
@k0ral
Copy link
Copy Markdown
Collaborator

k0ral commented Feb 13, 2017

@merijn Release 1.5.0 is now available on Hackage, you can start working with it by setting it explicitly as a dependence of your project, otherwise a version <1.5 will be selected by default by cabal.
Please let me know if you experience any issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants