Parsing with Haskell

kubb · on July 7, 2022

This is one way to do it, but there is another way, which has some advantages, namely using the parsec combinator library. The latter allows you to write a lexerless parser and avoid any code generators. It's the more interesting one for sure.

aroccoli · on July 7, 2022

Yes, you can do it that way!

We also cover parser combinators on our blog for the interested: https://serokell.io/blog/parser-combinators-in-haskell

And if you want to see what are the benefits of this approach over parser combinators, the second part of this series covers a little bit of that: https://serokell.io/blog/parsing-with-happy#happy-vs.-megapa...

massysett · on July 7, 2022

These days megaparsec might be a better way to get started as it has built on Parsec to improve the speed and documentation.

https://markkarpov.com/tutorial/megaparsec.html https://hackage.haskell.org/package/megaparsec

However it does rely on some GHC extensions, which makes things a bit more complicated for beginners, so for simplicity's sake it might be easier to start with Parsec and one of the old papers that goes with it.

sondr3 · on July 7, 2022

It doesn't need any extensions to work, the only reason `OverloadedStrings` is enabled in the tutorial is because the parser uses `Text` over `String` to parse the input.

johnday · on July 7, 2022

In fact, I would recommend against enabling OverloadedStrings while writing parsers, since it can be confusing. Some bright spark added an IsString instance for IsString a => Parser a, meaning that "Hello" is a parser for the string "Hello". It also means that `many "Hello"` will parse "HelloHelloHello", which is kinda cool but confusing for a beginner.

massysett · on July 7, 2022

From Text.Megaparsec:

    {-# LANGUAGE FlexibleContexts #-}
    {-# LANGUAGE FlexibleInstances #-}
    {-# LANGUAGE MultiParamTypeClasses #-}
    {-# LANGUAGE RankNTypes #-}
    {-# LANGUAGE Safe #-}
    {-# LANGUAGE ScopedTypeVariables #-}
    {-# LANGUAGE TypeFamilies #-}
    {-# LANGUAGE UndecidableInstances #-}

And you do have to understand TypeFamilies to use Megaparsec.

nimih · on July 7, 2022

> And you do have to understand TypeFamilies to use Megaparsec.

This is only the case if you're using custom input streams, right? Writing a parser that just operates on ByteString/Text seems like a pretty gentle/reasonable introduction to this style of parsing and doesn't require any understanding of TypeFamilies as far as I can tell (I certainly didn't understand TypeFamilies when I learned to use megaparsec).

massysett · on July 8, 2022

The types of many basic and essential functions, including ‘satisfy’ and ‘token’, use type families with associated types, but yes, it is possible to use megaparsec without actually understanding the types.

simonmic · on July 11, 2022

No you don't.

brennvin · on July 7, 2022

I have never understood why (some) people want to write lexerless parsers. Lexing lets you approach two different concerns separately. What is not to like?

kadoban · on July 7, 2022

For haskell, I think a lot of it is just that parser combinators are _very_ satisfying and memorable. They're what made the whole Functor/Applicative/Monad thing really click for a lot of people as an interesting/useful thing beyond just IO.

egl2021 · on July 7, 2022

Two reasons in the wild:

1. You might not know a priori that the lexical analysis can be done without any knowledge of the surrounding grammatical context.

2. You may be able to give more informative error messages if you know the grammatical context.

simonmic · on July 11, 2022

Agreed, and I would state it more strongly: it's unclear how / significant extra work to provide good error messages if you have a separate lexer stage. And sooner or later you will want good error messages.

bitmapper · on July 7, 2022

For simple languages, it can be a bit of a pain to write and maintain a lexer and a parser. For example, if I'm gonna parse JSON and nothing more, I think it's a bit silly to use a two external DSLs when I could use a simple eDSL. I think Happy and Alex both have their places, absolutely. For parsing full on programming languages I can imagine using parser combinators to be a bit painful. But parser combinators work great for a lot of cases.

nimih · on July 7, 2022

You can still have a lexing step when using a parsing combinator approach, and in practice you probably should for complex parsing tasks. Unfortunately, I'm not sure I've ever read a tutorial which really walks through doing such a thing.