利用者:Muraken/翻訳/Parser combinator

In mathematics and functional programming, Higher Order functions (HOF) are defined as the functions that can take functions as their input and can also produce functions as their output. The use of a HOF as an infix operator in a function-definition is known as a ‘combinator’. When combinators are used as basic building blocks to construct a parsing technique, then they are called parser combinators and the parsing method is called combinatory-parsing (as higher-order functions ‘combine’ different parsers together). Parser combinators use a top-down parsing strategy which facilitates modular piecewise construction and testing.

数学および関数型プログラミングにおいて、高階関数とは、入力と出力で関数をとれる関数として定義される。関数定義中で高階関数を中置演算子として使用することは「結合子 (a combinator)」として知られている。結合子が構文解析器の基本構成要素として使用される場合、それらは「構文解析器結合子 (parser combinators)」と呼ばれ、その構文解析手法は (高階関数が異なる解析器同士を「結合」するので) 「結合構文解析」と呼ばれる。構文解析器結合子は下降解析戦略を使用する。(下降解析はモジュール区分的な構築と試験を容易にする。)

Parser combinators are straightforward to construct, ‘readable’, modular, well-structured and easily maintainable. They have been used extensively in the prototyping of compilers and processors for domain-specific languages such as natural language interfaces to databases, where complex and varied semantic actions are closely integrated with syntactic processing. In 1989, Richard Frost and John Launchbury demonstrated^[1] use of parser combinators to construct Natural Language interpreters. Graham Hutton also used higher-order functions for basic parsing in 1992.^[2] S.D. Swierstra also exhibited the practical aspects of parser combinators in 2001.^[3] In 2008, Frost, Hafiz and Callaghan described^[4] a set of parser-combinators in Haskell that solve the long standing problem of accommodating left-recursion, and work as a complete top-down parsing tool in polynomial time and space.

構文解析器結合子は、可読で、モジュール化されており、よく構造化され、そして容易に保守できるように、素直に作られる。それらは、コンパイラやドメイン固有言語の処理系のプロトタイプに広く使用されている。たとえばデータベースへの自然言語インターフェイスなどである。ここで、複雑かつ多様な意味論的な作用は構文論的な処理に密接に統合される。1989年、Richard Frost と John Launchbury は自然言語インタプリタを構築するために構文解析器結合子を使用した ^[1]。Graham Hutton も同様に、1992年に基本的な構文解析器のために高階関数を使用した ^[2]。S.D. Swierstra は2001年に構文解析器結合子の実用的な局面を示した ^[3]。2008年、Frost、Hafiz、Callaghan は Haskell における構文解析器結合子の集合について述べた。それは、積年の問題だった accommodating 左再帰を解決し、多項式時間と多項式領域で完全な下降構文解析ツールとして働くものである。

基本アイデア

The core idea of parser combinators (which was popularized by Philip Wadler in 1985^[5]) is that the results (success or failure) of a recognizer (or a parser) can be returned as a list. Multiple entries of this list represent multiple successes; repeated entries represent ambiguous results and an empty list represents a failure.

構文解析器結合子 (これは Philip Wadler によって1985年に普及された^[5]) の核となるアイデアは、認識器 (または解析器) の結果をリストで返せる事である。このリストの複数の要素は複数の成功を表す。言い換えると、繰り返される要素は曖昧な結果を表す。そして、ひとつの空リストはひとつの失敗を表す。

In functional programming, parser combinators can be used to build basic parsers and to construct complex parsers for rules (that define nonterminals) from other parsers. A production-rule of a context-free grammar (CFG) may have one or more ‘alternatives’ and each alternative may consist of a sequence of non-terminal(s) and/or terminal(s), or the alternative may consist of a single non-terminal or terminal or ‘empty’. Parser combinators allow parsers to be defined in an embedded style, in code which is similar in structure to the rules of the grammar. As such, implementations can be thought of as executable specifications with all of the associated advantages. In order to achieve this, one has to define a set of combinators or infix operators to ‘glue’ different terminals and non-terminals to form a complete rule.

関数型プログラミングでは、構文解析器結合子は基本的な構文解析器を構築するために使用でき、さらに、他の複数の構文解析器から (非終端記号で定義される) 規則用の複雑な構文解析器を構築することができる。文脈自由文法 (CFG) の生成規則は、ひとつ以上の「代替」を持つことができ、各々の代替は非終端記号か終端記号の列で構成されるか、または単一の非終端記号か終端記号か空で構成される。構文解析器結合子は文法規則構文解析器を埋め込まれたスタイル (文法規則の構造と似たコード) で定義できる。As such, implementations can be thought of as executable specifications with all of the associated advantages. In order to achieve this, one has to define a set of combinators or infix operators to ‘glue’ different terminals and non-terminals to form a complete rule.

The combinators

To keep the discussion relatively straight forward, we discuss parser combinators in terms of recognizer only. Assume that the input is a sequence of tokens, of length #input the members of which are accessed through an index j. Recognizers are functions which take an index j as argument and which return a set of indices. Each index in the result set corresponds to a position at which the parser successfully finished recognizing a sequence of tokens that began at position j. An empty result set indicates that the recognizer failed to recognize any sequence beginning at j. A non-empty result set indicates the recognizer ends at different positions successfully. (Note that, as we are defining results as a set, we cannot express ambiguity as it would require repeated entries in the set. Use of ‘list’ would solve the problem.) Following the definitions of two basic recognizers for terminals, we define two major combinators for alternative and sequencing:

The empty recognizer is a function which always succeeds returning a singleton set containing the current position:

empty\quad j=\{j\}

A recognizer term ’x’ for a terminal x is a function which takes an index j as input, and if j is less than #input and if the token at position j in the input corresponds to the terminal x, it returns a singleton set containing j + 1, otherwise it returns the empty set.

term(x,j)={\begin{cases}\left\{\right\},&j\geq \#input\\\left\{j+1\right\},&j^{th}{\mbox{ element of }}input=x\\\left\{\right\},&{\mbox{otherwise}}\end{cases}}

We call the ‘alternative’ combinator <+>, which is used as an infix operator between two recognizers p and q. The <+> applies both of the recognizers on the same input position j and sums up the results returned by both of the recognizers, which is eventually returned as the final result.

(p\quad <\!+\!>\quad q)\quad j=(p\quad j)\quad \cup \quad (q\quad j)

The sequencing of recognizers is done with the *> combinator. Like <+>, it is also used as an infix operator between two recognizers – p and q. But it applies the first recognizer p to the input position j, and if there is any successful result of this application, then the second recognizer q is applied to every element of the result set returned by the first recognizer. The *> ultimately returns the union of these applications of q.

(p\quad *\!>\quad q)\quad j=\bigcup (map\quad q\quad (p\quad j))

Examples

Consider a highly ambiguous CFG s ::= ‘x’ s s | ɛ. Using the combinators defined earlier, we can modularly define executable notations of this grammar in a modern functional language (e.g. Haskell) as s = term ‘x’ *> s *> s <+> empty. When the recognizer s is applied on an input sequence xxxx at position 1, according to the above definitions it would return a result set {5,4,3,2,1}. Note that in real implementation if result set is defined as data type that supports repetition (i.e. list), then we can have the resulting list with all possible ambiguous results like [5, 4, 3, 2, 1,…., 5, 4, 3, 2,……] .

Shortcomings and solutions

The simple implementations of parser combinators have some shortcomings, which are common in top-down parsing. Naïve combinatory parsing requires exponential time and space when parsing an ambiguous context free grammar. In 1996, Frost and Szydlowski^[6] demonstrated how memoization can be used with parser combinators to reduce the time complexity to polynomial. Later Frost used monads^[7] to construct the combinators for systematic and correct threading of memo-table throughout the computation.

Like any top-down recursive descent parsing, the conventional parser combinators (like the combinators described above) won’t terminate while processing a left-recursive grammar (i.e. s ::= s *> s *> term ‘x’|empty). A recognition algorithm^[8] that accommodates ambiguous grammars with direct left-recursive rules is described by Frost and Hafiz in 2006. The algorithm curtails the otherwise ever-growing left-recursive parse by imposing depth restrictions. That algorithm was extended^[9] to a complete parsing algorithm to accommodate indirect as well as direct left-recursion in polynomial time, and to generate compact polynomial-size representations of the potentially-exponential number of parse trees for highly-ambiguous grammars by Frost, Hafiz and Callaghan in 2007. This extended algorithm accommodates indirect left-recursion by comparing its ‘computed-context’ with ‘current-context’. The same authors also described^[4] their implementation of a set of parser combinators written in the Haskell programming language based on the same algorithm. The X-SAIGA site has more about the algorithms and implementation details.

References

^ ^a ^b Frost, Richard. and Launchbury, John. "Constructing natural language interpreters in a lazy functional language." The Computer Journal — Special edition on Lazy Functional Programming Volume 32, Issue 2. Pages: 108 – 121, 1989
^ ^a ^b Hutton, Graham. "Higher-order functions for parsing." Journal of Functional Programming, Volume 2 Issue 3, Pages: 323– 343, 1992.
^ ^a ^b Swierstra, S. Doaitse. "Combinator parsers: From toys to tools. " In G. Hutton, editor, Electronic Notes in Theoretical Computer Science, volume 41. Elsevier Science Publishers, 2001.
^ ^a ^b Frost, R., Hafiz, R. and Callaghan, P." Parser Combinators for Ambiguous Left-Recursive Grammars." 10th International Symposium on Practical Aspects of Declarative Languages (PADL), ACM-SIGPLAN , Volume 4902, year 2008, Pages: 167-181, January 2008, San Francisco.
^ ^a ^b Wadler, Philip. "How to replace failure by a list of successes" P. Jouannaud (ed.) Functional Programming Languages and Computer Architectures Lecture Notes in Computer Science 201, Springer-Verlag, Heidelberg, 113, Year: 1985, Pages: 113 - 128.
^ Frost, Richard. and Szydlowski, Barbara. "Memoizing Purely Functional Top-Down Backtracking Language Processors. " Sci. Comput. Program. 1996 - 27(3): 263-288.
^ Frost, Richard. "Monadic Memoization towards Correctness-Preserving Reduction of Search. " Canadian Conference on AI 2003. p 66-80.
^ Frost, R. and Hafiz, R." A New Top-Down Parsing Algorithm to Accommodate Ambiguity and Left Recursion in Polynomial Time." ACM SIGPLAN Notices, Volume 41 Issue 5, Pages: 46 - 54. Year: 2006
^ Frost, R., Hafiz, R. and Callaghan, P." Modular and Efficient Top-Down Parsing for Ambiguous Left-Recursive Grammars ." 10th International Workshop on Parsing Technologies (IWPT), ACL-SIGPARSE , Pages: 109 - 120, June 2007, Prague.

External links

X-SAIGA - eXecutable SpecificAtIons of GrAmmars

[FrostandLaunchbury1989-1] Frost, Richard. and Launchbury, John. "Constructing natural language interpreters in a lazy functional language." The Computer Journal — Special edition on Lazy Functional Programming Volume 32, Issue 2. Pages: 108 – 121, 1989

[Hutton1992-2] Hutton, Graham. "Higher-order functions for parsing." Journal of Functional Programming, Volume 2 Issue 3, Pages: 323– 343, 1992.

[Swierstra2001-3] Swierstra, S. Doaitse. "Combinator parsers: From toys to tools. " In G. Hutton, editor, Electronic Notes in Theoretical Computer Science, volume 41. Elsevier Science Publishers, 2001.

[FrostHafizCallaghan_2008-4] Frost, R., Hafiz, R. and Callaghan, P." Parser Combinators for Ambiguous Left-Recursive Grammars." 10th International Symposium on Practical Aspects of Declarative Languages (PADL), ACM-SIGPLAN , Volume 4902, year 2008, Pages: 167-181, January 2008, San Francisco.

[Wadle1985-5] Wadler, Philip. "How to replace failure by a list of successes" P. Jouannaud (ed.) Functional Programming Languages and Computer Architectures Lecture Notes in Computer Science 201, Springer-Verlag, Heidelberg, 113, Year: 1985, Pages: 113 - 128.

[Frost1996-6] Frost, Richard. and Szydlowski, Barbara. "Memoizing Purely Functional Top-Down Backtracking Language Processors. " Sci. Comput. Program. 1996 - 27(3): 263-288.

[Frost2003-7] Frost, Richard. "Monadic Memoization towards Correctness-Preserving Reduction of Search. " Canadian Conference on AI 2003. p 66-80.

[FrostHafiz2006-8] Frost, R. and Hafiz, R." A New Top-Down Parsing Algorithm to Accommodate Ambiguity and Left Recursion in Polynomial Time." ACM SIGPLAN Notices, Volume 41 Issue 5, Pages: 46 - 54. Year: 2006

[FrostHafizCallaghan_2007-9] Frost, R., Hafiz, R. and Callaghan, P." Modular and Efficient Top-Down Parsing for Ambiguous Left-Recursive Grammars ." 10th International Workshop on Parsing Technologies (IWPT), ACL-SIGPARSE , Pages: 109 - 120, June 2007, Prague.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]