Implementation differences between parser combinators and packrat algorithm
In order to have a better understanding of packrat I've tried to have a look at the provided implementation coming with the paper (I'm focusing on the bind
):
instance Derivs d => Monad (Parser d) where
-- Sequencing combinator
(Parser p1) >>= f = Parser parse
where parse dvs = first (p1 dvs)
first (Parsed val rem err) =
let Parser p2 = f val
in second err (p2 rem)
first (NoParse err) = NoParse err
second err1 (Parsed val rem err) =
Parsed val rem (joinErrors err1 err)
second err1 (NoParse err) =
NoParse (joinErrors err1 err)
-- Result-producing combinator
return x = Parser (dvs -> Parsed x dvs (nullError dvs))
-- Failure combinator
fail = Parser (dvs -> NoParse (nullError dvs))
fail msg = Parser (dvs -> NoParse (msgError (dvPos dvs) msg))
For me it looks like (errors handling aside) to parser combinators (such as this simplified version of Parsec):
bind :: Parser a -> (a -> Parser b) -> Parser b
bind p f = Parser $ s -> concatMap ((a, s') -> parse (f a) s') $ parse p s
I'm quite confused because before that I thought that the big difference was that packrat was a parser generator with a memoization part.
Sadly it seems that this concept is not used in this implementation.
What is the big difference between parser combinators and packrat at implementation level?
PS: I also have had a look at Papillon but it seems to be very different from the implementation coming with the paper.
parsing haskell parsec parser-combinators peg
add a comment |
In order to have a better understanding of packrat I've tried to have a look at the provided implementation coming with the paper (I'm focusing on the bind
):
instance Derivs d => Monad (Parser d) where
-- Sequencing combinator
(Parser p1) >>= f = Parser parse
where parse dvs = first (p1 dvs)
first (Parsed val rem err) =
let Parser p2 = f val
in second err (p2 rem)
first (NoParse err) = NoParse err
second err1 (Parsed val rem err) =
Parsed val rem (joinErrors err1 err)
second err1 (NoParse err) =
NoParse (joinErrors err1 err)
-- Result-producing combinator
return x = Parser (dvs -> Parsed x dvs (nullError dvs))
-- Failure combinator
fail = Parser (dvs -> NoParse (nullError dvs))
fail msg = Parser (dvs -> NoParse (msgError (dvPos dvs) msg))
For me it looks like (errors handling aside) to parser combinators (such as this simplified version of Parsec):
bind :: Parser a -> (a -> Parser b) -> Parser b
bind p f = Parser $ s -> concatMap ((a, s') -> parse (f a) s') $ parse p s
I'm quite confused because before that I thought that the big difference was that packrat was a parser generator with a memoization part.
Sadly it seems that this concept is not used in this implementation.
What is the big difference between parser combinators and packrat at implementation level?
PS: I also have had a look at Papillon but it seems to be very different from the implementation coming with the paper.
parsing haskell parsec parser-combinators peg
The implementation you linked to mixes tabs and spaces, and uses a different tab width than Stack Overflow does, so your copy/paste looks mis-indented. Obviously fixing this will not answer your question, but it will make it easier to read.
– amalloy
Nov 21 '18 at 23:09
1
"Parser generators vs. parser combinators vs. hand-written parsers" and "packrat vs. predictive parsing vs. recursive descent with backtracking vs. bottom-up algorithms" are two completely separate classifications. One's about the tools you use to write the parser, the other's about the algorithm the parser uses. Your linked implementation is a parser combinator implementing packrat parsing. That's not a contradiction.
– sepp2k
Nov 22 '18 at 2:06
1
@sepp2k you're right that his question should have said "what's the difference between the packrat library implementation and recursive descent or predictive parser combinator libraries?". However, the implementation he links to does not actually implement packrat parsing: you can use it, but you have to apply the packrat trick yourself in your own code. See my answer below.
– Dominique Devriese
Jan 25 at 13:59
add a comment |
In order to have a better understanding of packrat I've tried to have a look at the provided implementation coming with the paper (I'm focusing on the bind
):
instance Derivs d => Monad (Parser d) where
-- Sequencing combinator
(Parser p1) >>= f = Parser parse
where parse dvs = first (p1 dvs)
first (Parsed val rem err) =
let Parser p2 = f val
in second err (p2 rem)
first (NoParse err) = NoParse err
second err1 (Parsed val rem err) =
Parsed val rem (joinErrors err1 err)
second err1 (NoParse err) =
NoParse (joinErrors err1 err)
-- Result-producing combinator
return x = Parser (dvs -> Parsed x dvs (nullError dvs))
-- Failure combinator
fail = Parser (dvs -> NoParse (nullError dvs))
fail msg = Parser (dvs -> NoParse (msgError (dvPos dvs) msg))
For me it looks like (errors handling aside) to parser combinators (such as this simplified version of Parsec):
bind :: Parser a -> (a -> Parser b) -> Parser b
bind p f = Parser $ s -> concatMap ((a, s') -> parse (f a) s') $ parse p s
I'm quite confused because before that I thought that the big difference was that packrat was a parser generator with a memoization part.
Sadly it seems that this concept is not used in this implementation.
What is the big difference between parser combinators and packrat at implementation level?
PS: I also have had a look at Papillon but it seems to be very different from the implementation coming with the paper.
parsing haskell parsec parser-combinators peg
In order to have a better understanding of packrat I've tried to have a look at the provided implementation coming with the paper (I'm focusing on the bind
):
instance Derivs d => Monad (Parser d) where
-- Sequencing combinator
(Parser p1) >>= f = Parser parse
where parse dvs = first (p1 dvs)
first (Parsed val rem err) =
let Parser p2 = f val
in second err (p2 rem)
first (NoParse err) = NoParse err
second err1 (Parsed val rem err) =
Parsed val rem (joinErrors err1 err)
second err1 (NoParse err) =
NoParse (joinErrors err1 err)
-- Result-producing combinator
return x = Parser (dvs -> Parsed x dvs (nullError dvs))
-- Failure combinator
fail = Parser (dvs -> NoParse (nullError dvs))
fail msg = Parser (dvs -> NoParse (msgError (dvPos dvs) msg))
For me it looks like (errors handling aside) to parser combinators (such as this simplified version of Parsec):
bind :: Parser a -> (a -> Parser b) -> Parser b
bind p f = Parser $ s -> concatMap ((a, s') -> parse (f a) s') $ parse p s
I'm quite confused because before that I thought that the big difference was that packrat was a parser generator with a memoization part.
Sadly it seems that this concept is not used in this implementation.
What is the big difference between parser combinators and packrat at implementation level?
PS: I also have had a look at Papillon but it seems to be very different from the implementation coming with the paper.
parsing haskell parsec parser-combinators peg
parsing haskell parsec parser-combinators peg
edited Nov 22 '18 at 6:44
GlinesMome
asked Nov 21 '18 at 20:02
GlinesMomeGlinesMome
48411024
48411024
The implementation you linked to mixes tabs and spaces, and uses a different tab width than Stack Overflow does, so your copy/paste looks mis-indented. Obviously fixing this will not answer your question, but it will make it easier to read.
– amalloy
Nov 21 '18 at 23:09
1
"Parser generators vs. parser combinators vs. hand-written parsers" and "packrat vs. predictive parsing vs. recursive descent with backtracking vs. bottom-up algorithms" are two completely separate classifications. One's about the tools you use to write the parser, the other's about the algorithm the parser uses. Your linked implementation is a parser combinator implementing packrat parsing. That's not a contradiction.
– sepp2k
Nov 22 '18 at 2:06
1
@sepp2k you're right that his question should have said "what's the difference between the packrat library implementation and recursive descent or predictive parser combinator libraries?". However, the implementation he links to does not actually implement packrat parsing: you can use it, but you have to apply the packrat trick yourself in your own code. See my answer below.
– Dominique Devriese
Jan 25 at 13:59
add a comment |
The implementation you linked to mixes tabs and spaces, and uses a different tab width than Stack Overflow does, so your copy/paste looks mis-indented. Obviously fixing this will not answer your question, but it will make it easier to read.
– amalloy
Nov 21 '18 at 23:09
1
"Parser generators vs. parser combinators vs. hand-written parsers" and "packrat vs. predictive parsing vs. recursive descent with backtracking vs. bottom-up algorithms" are two completely separate classifications. One's about the tools you use to write the parser, the other's about the algorithm the parser uses. Your linked implementation is a parser combinator implementing packrat parsing. That's not a contradiction.
– sepp2k
Nov 22 '18 at 2:06
1
@sepp2k you're right that his question should have said "what's the difference between the packrat library implementation and recursive descent or predictive parser combinator libraries?". However, the implementation he links to does not actually implement packrat parsing: you can use it, but you have to apply the packrat trick yourself in your own code. See my answer below.
– Dominique Devriese
Jan 25 at 13:59
The implementation you linked to mixes tabs and spaces, and uses a different tab width than Stack Overflow does, so your copy/paste looks mis-indented. Obviously fixing this will not answer your question, but it will make it easier to read.
– amalloy
Nov 21 '18 at 23:09
The implementation you linked to mixes tabs and spaces, and uses a different tab width than Stack Overflow does, so your copy/paste looks mis-indented. Obviously fixing this will not answer your question, but it will make it easier to read.
– amalloy
Nov 21 '18 at 23:09
1
1
"Parser generators vs. parser combinators vs. hand-written parsers" and "packrat vs. predictive parsing vs. recursive descent with backtracking vs. bottom-up algorithms" are two completely separate classifications. One's about the tools you use to write the parser, the other's about the algorithm the parser uses. Your linked implementation is a parser combinator implementing packrat parsing. That's not a contradiction.
– sepp2k
Nov 22 '18 at 2:06
"Parser generators vs. parser combinators vs. hand-written parsers" and "packrat vs. predictive parsing vs. recursive descent with backtracking vs. bottom-up algorithms" are two completely separate classifications. One's about the tools you use to write the parser, the other's about the algorithm the parser uses. Your linked implementation is a parser combinator implementing packrat parsing. That's not a contradiction.
– sepp2k
Nov 22 '18 at 2:06
1
1
@sepp2k you're right that his question should have said "what's the difference between the packrat library implementation and recursive descent or predictive parser combinator libraries?". However, the implementation he links to does not actually implement packrat parsing: you can use it, but you have to apply the packrat trick yourself in your own code. See my answer below.
– Dominique Devriese
Jan 25 at 13:59
@sepp2k you're right that his question should have said "what's the difference between the packrat library implementation and recursive descent or predictive parser combinator libraries?". However, the implementation he links to does not actually implement packrat parsing: you can use it, but you have to apply the packrat trick yourself in your own code. See my answer below.
– Dominique Devriese
Jan 25 at 13:59
add a comment |
2 Answers
2
active
oldest
votes
The point here is really that this Packrat parser combinator library is not a full implementation of the Packrat algorithm, but more like a set of definitions that can be reused between different packrat parsers.
The real trick of the packrat algorithm (namely the memoization of parse results) happens elsewhere.
Look at the following code (taken from Ford's thesis):
data Derivs = Derivs {
dvAdditive :: Result Int,
dvMultitive :: Result Int,
dvPrimary :: Result Int,
dvDecimal :: Result Int,
dvChar :: Result Char}
pExpression :: Derivs -> Result ArithDerivs Int
Parser pExpression = (do char ’(’
l <- Parser dvExpression
char ’+’
r <- Parser dvExpression
char ’)’
return (l + r))
</> (do Parser dvDecimal)
Here, it's important to notice that the recursive call of the expression parser to itself is broken (in a kind of open-recursion fashion) by simply projecting the appropriate component of the Derivs structure.
This recursive knot is then tied in the "recursive tie-up function" (again taken from Ford's thesis):
parse :: String -> Derivs
parse s = d where
d = Derivs add mult prim dec chr
add = pAdditive d
mult = pMultitive d
prim = pPrimary d
dec = pDecimal d
chr = case s of
(c:s’) -> Parsed c (parse s’)
-> NoParse
These snippets are really where the packrat trick happens.
It's important to understand that this trick cannot be implemented in a standard way in a traditional parser combinator library (at least in a pure programming language like Haskell), because it needs to know the recursive structure of the grammar.
There are experimental approaches to parser combinator libraries that use a particular representation of the recursive structure of the grammar, and there it is possible to provide a standard implementation of Packrat.
For example, my own grammar-combinators library (not maintained atm, sorry) offers an implementation of Packrat.
Thanks for your answer, but I'm not sure to understand how this co-recursion make the memoization work. I have done some tests withfix
(on a fibonacci sequence) for example and it have to impact on performances. If you can add some link/insights about it, thanks.
– GlinesMome
Nov 25 '18 at 21:53
Sorry, but it's not clear to me what you're asking. The memoization is the core trick of the packrat algorithm. Essentially, the only values of type Derivs that will ever be generated for a given input, is the initial result ofparse "input"
, as well as the ones generated in the chr case of parse above. For those latter ones, only one value of type Derivs is generated for every character consumed. So there will be as many Derivs values as inputs in the string.
– Dominique Devriese
Nov 26 '18 at 22:20
Those Derivs values have one field for every non-terminal (plus the chr primitive), so in summary, there is one Result ArithDerivs x for every non-terminal (plus the chr primitive) at every input position. All other non-terminals only ever return one of these pre-existing Derivs values, because they just project a field of one of those pre-generated Derivs values.
– Dominique Devriese
Nov 26 '18 at 22:24
Any field of those Derivs values will only be evaluated once, by virtue of Haskell's lazyness, so this means you get the desired memoization.
– Dominique Devriese
Nov 26 '18 at 22:25
1
EvaluatingpAdditive d
will then first evaluateadvMultitive d
. It's important that it'sadvMultitive d
that's being evaluated (a field of the record d), notpMultitive d
(the actual parse of a multiplicative expression a this position).
– Dominique Devriese
Nov 30 '18 at 8:12
|
show 5 more comments
As stated elsewhere, packrat is not an alternative to combinators, but is an implementation option in those parsers. Pyparsing is a combinator that offers an optional packrat optimization by calling ParserElement.enablePackrat()
. Its implementation is almost a drop-in replacement for pyparsing's _parse
method (renamed to _parseNoCache
), with a _parseCache
method.
Pyparsing uses a fixed-length FIFO queue for its cache, since packrat cache entries get stale once the combinator fully matches the cached expression and moves on through the input stream. A custom cache size can be passed as an integer argument to enablePackrat()
, or if None is passed, the cache is unbounded. A packrat cache with the default value of 128 was only 2% less efficient than an unbounded cache against the supplied Verilog parser, with significant savings in memory.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53419694%2fimplementation-differences-between-parser-combinators-and-packrat-algorithm%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
The point here is really that this Packrat parser combinator library is not a full implementation of the Packrat algorithm, but more like a set of definitions that can be reused between different packrat parsers.
The real trick of the packrat algorithm (namely the memoization of parse results) happens elsewhere.
Look at the following code (taken from Ford's thesis):
data Derivs = Derivs {
dvAdditive :: Result Int,
dvMultitive :: Result Int,
dvPrimary :: Result Int,
dvDecimal :: Result Int,
dvChar :: Result Char}
pExpression :: Derivs -> Result ArithDerivs Int
Parser pExpression = (do char ’(’
l <- Parser dvExpression
char ’+’
r <- Parser dvExpression
char ’)’
return (l + r))
</> (do Parser dvDecimal)
Here, it's important to notice that the recursive call of the expression parser to itself is broken (in a kind of open-recursion fashion) by simply projecting the appropriate component of the Derivs structure.
This recursive knot is then tied in the "recursive tie-up function" (again taken from Ford's thesis):
parse :: String -> Derivs
parse s = d where
d = Derivs add mult prim dec chr
add = pAdditive d
mult = pMultitive d
prim = pPrimary d
dec = pDecimal d
chr = case s of
(c:s’) -> Parsed c (parse s’)
-> NoParse
These snippets are really where the packrat trick happens.
It's important to understand that this trick cannot be implemented in a standard way in a traditional parser combinator library (at least in a pure programming language like Haskell), because it needs to know the recursive structure of the grammar.
There are experimental approaches to parser combinator libraries that use a particular representation of the recursive structure of the grammar, and there it is possible to provide a standard implementation of Packrat.
For example, my own grammar-combinators library (not maintained atm, sorry) offers an implementation of Packrat.
Thanks for your answer, but I'm not sure to understand how this co-recursion make the memoization work. I have done some tests withfix
(on a fibonacci sequence) for example and it have to impact on performances. If you can add some link/insights about it, thanks.
– GlinesMome
Nov 25 '18 at 21:53
Sorry, but it's not clear to me what you're asking. The memoization is the core trick of the packrat algorithm. Essentially, the only values of type Derivs that will ever be generated for a given input, is the initial result ofparse "input"
, as well as the ones generated in the chr case of parse above. For those latter ones, only one value of type Derivs is generated for every character consumed. So there will be as many Derivs values as inputs in the string.
– Dominique Devriese
Nov 26 '18 at 22:20
Those Derivs values have one field for every non-terminal (plus the chr primitive), so in summary, there is one Result ArithDerivs x for every non-terminal (plus the chr primitive) at every input position. All other non-terminals only ever return one of these pre-existing Derivs values, because they just project a field of one of those pre-generated Derivs values.
– Dominique Devriese
Nov 26 '18 at 22:24
Any field of those Derivs values will only be evaluated once, by virtue of Haskell's lazyness, so this means you get the desired memoization.
– Dominique Devriese
Nov 26 '18 at 22:25
1
EvaluatingpAdditive d
will then first evaluateadvMultitive d
. It's important that it'sadvMultitive d
that's being evaluated (a field of the record d), notpMultitive d
(the actual parse of a multiplicative expression a this position).
– Dominique Devriese
Nov 30 '18 at 8:12
|
show 5 more comments
The point here is really that this Packrat parser combinator library is not a full implementation of the Packrat algorithm, but more like a set of definitions that can be reused between different packrat parsers.
The real trick of the packrat algorithm (namely the memoization of parse results) happens elsewhere.
Look at the following code (taken from Ford's thesis):
data Derivs = Derivs {
dvAdditive :: Result Int,
dvMultitive :: Result Int,
dvPrimary :: Result Int,
dvDecimal :: Result Int,
dvChar :: Result Char}
pExpression :: Derivs -> Result ArithDerivs Int
Parser pExpression = (do char ’(’
l <- Parser dvExpression
char ’+’
r <- Parser dvExpression
char ’)’
return (l + r))
</> (do Parser dvDecimal)
Here, it's important to notice that the recursive call of the expression parser to itself is broken (in a kind of open-recursion fashion) by simply projecting the appropriate component of the Derivs structure.
This recursive knot is then tied in the "recursive tie-up function" (again taken from Ford's thesis):
parse :: String -> Derivs
parse s = d where
d = Derivs add mult prim dec chr
add = pAdditive d
mult = pMultitive d
prim = pPrimary d
dec = pDecimal d
chr = case s of
(c:s’) -> Parsed c (parse s’)
-> NoParse
These snippets are really where the packrat trick happens.
It's important to understand that this trick cannot be implemented in a standard way in a traditional parser combinator library (at least in a pure programming language like Haskell), because it needs to know the recursive structure of the grammar.
There are experimental approaches to parser combinator libraries that use a particular representation of the recursive structure of the grammar, and there it is possible to provide a standard implementation of Packrat.
For example, my own grammar-combinators library (not maintained atm, sorry) offers an implementation of Packrat.
Thanks for your answer, but I'm not sure to understand how this co-recursion make the memoization work. I have done some tests withfix
(on a fibonacci sequence) for example and it have to impact on performances. If you can add some link/insights about it, thanks.
– GlinesMome
Nov 25 '18 at 21:53
Sorry, but it's not clear to me what you're asking. The memoization is the core trick of the packrat algorithm. Essentially, the only values of type Derivs that will ever be generated for a given input, is the initial result ofparse "input"
, as well as the ones generated in the chr case of parse above. For those latter ones, only one value of type Derivs is generated for every character consumed. So there will be as many Derivs values as inputs in the string.
– Dominique Devriese
Nov 26 '18 at 22:20
Those Derivs values have one field for every non-terminal (plus the chr primitive), so in summary, there is one Result ArithDerivs x for every non-terminal (plus the chr primitive) at every input position. All other non-terminals only ever return one of these pre-existing Derivs values, because they just project a field of one of those pre-generated Derivs values.
– Dominique Devriese
Nov 26 '18 at 22:24
Any field of those Derivs values will only be evaluated once, by virtue of Haskell's lazyness, so this means you get the desired memoization.
– Dominique Devriese
Nov 26 '18 at 22:25
1
EvaluatingpAdditive d
will then first evaluateadvMultitive d
. It's important that it'sadvMultitive d
that's being evaluated (a field of the record d), notpMultitive d
(the actual parse of a multiplicative expression a this position).
– Dominique Devriese
Nov 30 '18 at 8:12
|
show 5 more comments
The point here is really that this Packrat parser combinator library is not a full implementation of the Packrat algorithm, but more like a set of definitions that can be reused between different packrat parsers.
The real trick of the packrat algorithm (namely the memoization of parse results) happens elsewhere.
Look at the following code (taken from Ford's thesis):
data Derivs = Derivs {
dvAdditive :: Result Int,
dvMultitive :: Result Int,
dvPrimary :: Result Int,
dvDecimal :: Result Int,
dvChar :: Result Char}
pExpression :: Derivs -> Result ArithDerivs Int
Parser pExpression = (do char ’(’
l <- Parser dvExpression
char ’+’
r <- Parser dvExpression
char ’)’
return (l + r))
</> (do Parser dvDecimal)
Here, it's important to notice that the recursive call of the expression parser to itself is broken (in a kind of open-recursion fashion) by simply projecting the appropriate component of the Derivs structure.
This recursive knot is then tied in the "recursive tie-up function" (again taken from Ford's thesis):
parse :: String -> Derivs
parse s = d where
d = Derivs add mult prim dec chr
add = pAdditive d
mult = pMultitive d
prim = pPrimary d
dec = pDecimal d
chr = case s of
(c:s’) -> Parsed c (parse s’)
-> NoParse
These snippets are really where the packrat trick happens.
It's important to understand that this trick cannot be implemented in a standard way in a traditional parser combinator library (at least in a pure programming language like Haskell), because it needs to know the recursive structure of the grammar.
There are experimental approaches to parser combinator libraries that use a particular representation of the recursive structure of the grammar, and there it is possible to provide a standard implementation of Packrat.
For example, my own grammar-combinators library (not maintained atm, sorry) offers an implementation of Packrat.
The point here is really that this Packrat parser combinator library is not a full implementation of the Packrat algorithm, but more like a set of definitions that can be reused between different packrat parsers.
The real trick of the packrat algorithm (namely the memoization of parse results) happens elsewhere.
Look at the following code (taken from Ford's thesis):
data Derivs = Derivs {
dvAdditive :: Result Int,
dvMultitive :: Result Int,
dvPrimary :: Result Int,
dvDecimal :: Result Int,
dvChar :: Result Char}
pExpression :: Derivs -> Result ArithDerivs Int
Parser pExpression = (do char ’(’
l <- Parser dvExpression
char ’+’
r <- Parser dvExpression
char ’)’
return (l + r))
</> (do Parser dvDecimal)
Here, it's important to notice that the recursive call of the expression parser to itself is broken (in a kind of open-recursion fashion) by simply projecting the appropriate component of the Derivs structure.
This recursive knot is then tied in the "recursive tie-up function" (again taken from Ford's thesis):
parse :: String -> Derivs
parse s = d where
d = Derivs add mult prim dec chr
add = pAdditive d
mult = pMultitive d
prim = pPrimary d
dec = pDecimal d
chr = case s of
(c:s’) -> Parsed c (parse s’)
-> NoParse
These snippets are really where the packrat trick happens.
It's important to understand that this trick cannot be implemented in a standard way in a traditional parser combinator library (at least in a pure programming language like Haskell), because it needs to know the recursive structure of the grammar.
There are experimental approaches to parser combinator libraries that use a particular representation of the recursive structure of the grammar, and there it is possible to provide a standard implementation of Packrat.
For example, my own grammar-combinators library (not maintained atm, sorry) offers an implementation of Packrat.
edited Jan 25 at 13:52
answered Nov 22 '18 at 7:22
Dominique DevrieseDominique Devriese
2,7101920
2,7101920
Thanks for your answer, but I'm not sure to understand how this co-recursion make the memoization work. I have done some tests withfix
(on a fibonacci sequence) for example and it have to impact on performances. If you can add some link/insights about it, thanks.
– GlinesMome
Nov 25 '18 at 21:53
Sorry, but it's not clear to me what you're asking. The memoization is the core trick of the packrat algorithm. Essentially, the only values of type Derivs that will ever be generated for a given input, is the initial result ofparse "input"
, as well as the ones generated in the chr case of parse above. For those latter ones, only one value of type Derivs is generated for every character consumed. So there will be as many Derivs values as inputs in the string.
– Dominique Devriese
Nov 26 '18 at 22:20
Those Derivs values have one field for every non-terminal (plus the chr primitive), so in summary, there is one Result ArithDerivs x for every non-terminal (plus the chr primitive) at every input position. All other non-terminals only ever return one of these pre-existing Derivs values, because they just project a field of one of those pre-generated Derivs values.
– Dominique Devriese
Nov 26 '18 at 22:24
Any field of those Derivs values will only be evaluated once, by virtue of Haskell's lazyness, so this means you get the desired memoization.
– Dominique Devriese
Nov 26 '18 at 22:25
1
EvaluatingpAdditive d
will then first evaluateadvMultitive d
. It's important that it'sadvMultitive d
that's being evaluated (a field of the record d), notpMultitive d
(the actual parse of a multiplicative expression a this position).
– Dominique Devriese
Nov 30 '18 at 8:12
|
show 5 more comments
Thanks for your answer, but I'm not sure to understand how this co-recursion make the memoization work. I have done some tests withfix
(on a fibonacci sequence) for example and it have to impact on performances. If you can add some link/insights about it, thanks.
– GlinesMome
Nov 25 '18 at 21:53
Sorry, but it's not clear to me what you're asking. The memoization is the core trick of the packrat algorithm. Essentially, the only values of type Derivs that will ever be generated for a given input, is the initial result ofparse "input"
, as well as the ones generated in the chr case of parse above. For those latter ones, only one value of type Derivs is generated for every character consumed. So there will be as many Derivs values as inputs in the string.
– Dominique Devriese
Nov 26 '18 at 22:20
Those Derivs values have one field for every non-terminal (plus the chr primitive), so in summary, there is one Result ArithDerivs x for every non-terminal (plus the chr primitive) at every input position. All other non-terminals only ever return one of these pre-existing Derivs values, because they just project a field of one of those pre-generated Derivs values.
– Dominique Devriese
Nov 26 '18 at 22:24
Any field of those Derivs values will only be evaluated once, by virtue of Haskell's lazyness, so this means you get the desired memoization.
– Dominique Devriese
Nov 26 '18 at 22:25
1
EvaluatingpAdditive d
will then first evaluateadvMultitive d
. It's important that it'sadvMultitive d
that's being evaluated (a field of the record d), notpMultitive d
(the actual parse of a multiplicative expression a this position).
– Dominique Devriese
Nov 30 '18 at 8:12
Thanks for your answer, but I'm not sure to understand how this co-recursion make the memoization work. I have done some tests with
fix
(on a fibonacci sequence) for example and it have to impact on performances. If you can add some link/insights about it, thanks.– GlinesMome
Nov 25 '18 at 21:53
Thanks for your answer, but I'm not sure to understand how this co-recursion make the memoization work. I have done some tests with
fix
(on a fibonacci sequence) for example and it have to impact on performances. If you can add some link/insights about it, thanks.– GlinesMome
Nov 25 '18 at 21:53
Sorry, but it's not clear to me what you're asking. The memoization is the core trick of the packrat algorithm. Essentially, the only values of type Derivs that will ever be generated for a given input, is the initial result of
parse "input"
, as well as the ones generated in the chr case of parse above. For those latter ones, only one value of type Derivs is generated for every character consumed. So there will be as many Derivs values as inputs in the string.– Dominique Devriese
Nov 26 '18 at 22:20
Sorry, but it's not clear to me what you're asking. The memoization is the core trick of the packrat algorithm. Essentially, the only values of type Derivs that will ever be generated for a given input, is the initial result of
parse "input"
, as well as the ones generated in the chr case of parse above. For those latter ones, only one value of type Derivs is generated for every character consumed. So there will be as many Derivs values as inputs in the string.– Dominique Devriese
Nov 26 '18 at 22:20
Those Derivs values have one field for every non-terminal (plus the chr primitive), so in summary, there is one Result ArithDerivs x for every non-terminal (plus the chr primitive) at every input position. All other non-terminals only ever return one of these pre-existing Derivs values, because they just project a field of one of those pre-generated Derivs values.
– Dominique Devriese
Nov 26 '18 at 22:24
Those Derivs values have one field for every non-terminal (plus the chr primitive), so in summary, there is one Result ArithDerivs x for every non-terminal (plus the chr primitive) at every input position. All other non-terminals only ever return one of these pre-existing Derivs values, because they just project a field of one of those pre-generated Derivs values.
– Dominique Devriese
Nov 26 '18 at 22:24
Any field of those Derivs values will only be evaluated once, by virtue of Haskell's lazyness, so this means you get the desired memoization.
– Dominique Devriese
Nov 26 '18 at 22:25
Any field of those Derivs values will only be evaluated once, by virtue of Haskell's lazyness, so this means you get the desired memoization.
– Dominique Devriese
Nov 26 '18 at 22:25
1
1
Evaluating
pAdditive d
will then first evaluate advMultitive d
. It's important that it's advMultitive d
that's being evaluated (a field of the record d), not pMultitive d
(the actual parse of a multiplicative expression a this position).– Dominique Devriese
Nov 30 '18 at 8:12
Evaluating
pAdditive d
will then first evaluate advMultitive d
. It's important that it's advMultitive d
that's being evaluated (a field of the record d), not pMultitive d
(the actual parse of a multiplicative expression a this position).– Dominique Devriese
Nov 30 '18 at 8:12
|
show 5 more comments
As stated elsewhere, packrat is not an alternative to combinators, but is an implementation option in those parsers. Pyparsing is a combinator that offers an optional packrat optimization by calling ParserElement.enablePackrat()
. Its implementation is almost a drop-in replacement for pyparsing's _parse
method (renamed to _parseNoCache
), with a _parseCache
method.
Pyparsing uses a fixed-length FIFO queue for its cache, since packrat cache entries get stale once the combinator fully matches the cached expression and moves on through the input stream. A custom cache size can be passed as an integer argument to enablePackrat()
, or if None is passed, the cache is unbounded. A packrat cache with the default value of 128 was only 2% less efficient than an unbounded cache against the supplied Verilog parser, with significant savings in memory.
add a comment |
As stated elsewhere, packrat is not an alternative to combinators, but is an implementation option in those parsers. Pyparsing is a combinator that offers an optional packrat optimization by calling ParserElement.enablePackrat()
. Its implementation is almost a drop-in replacement for pyparsing's _parse
method (renamed to _parseNoCache
), with a _parseCache
method.
Pyparsing uses a fixed-length FIFO queue for its cache, since packrat cache entries get stale once the combinator fully matches the cached expression and moves on through the input stream. A custom cache size can be passed as an integer argument to enablePackrat()
, or if None is passed, the cache is unbounded. A packrat cache with the default value of 128 was only 2% less efficient than an unbounded cache against the supplied Verilog parser, with significant savings in memory.
add a comment |
As stated elsewhere, packrat is not an alternative to combinators, but is an implementation option in those parsers. Pyparsing is a combinator that offers an optional packrat optimization by calling ParserElement.enablePackrat()
. Its implementation is almost a drop-in replacement for pyparsing's _parse
method (renamed to _parseNoCache
), with a _parseCache
method.
Pyparsing uses a fixed-length FIFO queue for its cache, since packrat cache entries get stale once the combinator fully matches the cached expression and moves on through the input stream. A custom cache size can be passed as an integer argument to enablePackrat()
, or if None is passed, the cache is unbounded. A packrat cache with the default value of 128 was only 2% less efficient than an unbounded cache against the supplied Verilog parser, with significant savings in memory.
As stated elsewhere, packrat is not an alternative to combinators, but is an implementation option in those parsers. Pyparsing is a combinator that offers an optional packrat optimization by calling ParserElement.enablePackrat()
. Its implementation is almost a drop-in replacement for pyparsing's _parse
method (renamed to _parseNoCache
), with a _parseCache
method.
Pyparsing uses a fixed-length FIFO queue for its cache, since packrat cache entries get stale once the combinator fully matches the cached expression and moves on through the input stream. A custom cache size can be passed as an integer argument to enablePackrat()
, or if None is passed, the cache is unbounded. A packrat cache with the default value of 128 was only 2% less efficient than an unbounded cache against the supplied Verilog parser, with significant savings in memory.
edited Dec 29 '18 at 17:07
answered Dec 29 '18 at 16:54
PaulMcGPaulMcG
46.6k968111
46.6k968111
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53419694%2fimplementation-differences-between-parser-combinators-and-packrat-algorithm%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
The implementation you linked to mixes tabs and spaces, and uses a different tab width than Stack Overflow does, so your copy/paste looks mis-indented. Obviously fixing this will not answer your question, but it will make it easier to read.
– amalloy
Nov 21 '18 at 23:09
1
"Parser generators vs. parser combinators vs. hand-written parsers" and "packrat vs. predictive parsing vs. recursive descent with backtracking vs. bottom-up algorithms" are two completely separate classifications. One's about the tools you use to write the parser, the other's about the algorithm the parser uses. Your linked implementation is a parser combinator implementing packrat parsing. That's not a contradiction.
– sepp2k
Nov 22 '18 at 2:06
1
@sepp2k you're right that his question should have said "what's the difference between the packrat library implementation and recursive descent or predictive parser combinator libraries?". However, the implementation he links to does not actually implement packrat parsing: you can use it, but you have to apply the packrat trick yourself in your own code. See my answer below.
– Dominique Devriese
Jan 25 at 13:59