木接合文法

木接合文法（英: Tree-Adjoining Grammar、TAG）とは、アラビンド・ジョシ（英語版）らによる、形式文法の一種である。文脈自由文法にいくぶん似ているが、シンボルの書き換えではなく、木の書き換えをベースとすることが特徴で、文脈自由文法は、シンボルの書き換えのための生成規則群から成るが、木のノード群を書き換える規則群から成る（木および木構造参照）。

概要

TAGにおける規則は、foot node と呼ばれる特殊な葉ノードを持つ木であり、foot node には単語が付属している。TAGにおける木は2種類に分類される。initial（初期）木（' $\alpha$ ' とも）と auxiliary（補助）木（' $\beta$ '）である。初期木は基本的な結合価的関係を表し、補助木は再帰を許容する^[1]。補助木には根ノードがあり、foot node は初期木と同じシンボルがラベルとして付けられている。導出は、初期木に対して行われ、「置換(substitution)」か「付加(adjunction)」を行っていく。置換とは、先端ノードを、根ノードが同じシンボルのラベルになっている別の木と置換する操作である。付加とは、補助木を別の木の途中に挿入する操作である^[2]。補助木の根ノードと foot node のラベルは、挿入箇所のノードのラベルと一致していなければならない。

複雑性と応用

言語とオートマトンの理論

木接合文法は、文脈自由文法よりも強力だが文脈依存文法よりも弱い。

具体例として、任意の文字列を二回繰り返すような言語を記述することができる。より一般には、 $\{a^{n}b^{n}c^{n}d^{n}|1\leq n\}$ のような文脈自由文法に含まれない言語を記述することができる。このような処理はembedded pushdown automatonで表現できる。一方で、文字列を3回繰り返す言語や、5つ以上の文字をそれぞれ同じ長さで順に並べた列からなる言語は木接合文法では生成できない。

木接合文法は弱文脈依存文法の1つであり、よってチョムスキー階層には当てはまらないが、その関連性はよく研究されている^[3]。

自然言語

前節で述べたような性質から、計算言語学や自然言語処理でよく使われる。また、構文解析が一般に効率的に行えるなら、自然言語のモデルとして十分に意味があるという主張もある^[4]。

歴史

TAGの研究は、ゼリグ・ハリスの文字列文法である adjunction grammars (AG) ^[5]を Joshi らが研究したことから端を発している。AG は、自然言語学的な観点からは、内心構造（endocentric construction）を自然かつ効率的に扱えるが、外心構造（exocentric construction）はうまく扱えない。句構造文法はその逆である。1969年、Joshi は2種類の規則群を混合することで、少数の非常に単純な書き換え規則で、付加規則のための文字列の語彙を生成する、この相補性を同時に扱えるTAGを生み出した。

脚注

^ Jurafsky, Daniel; James H. Martin (2000年). Speech and Language Processing. Upper Saddle River, NJ: Prentice Hall. pp. 354
^ Joshi, Aravind; Rambow, Owen (2003年). "A Formalism for Dependency Grammar Based on Tree Adjoining Grammar" (PDF). Proceedings of the Conference on Meaning-Text Theory.
^ Joshi, Aravind (1969年). Properties of Formal Grammars with Mixed Types of Rules and Their Linguistic Relevance. Proceedings Third International Symposium on Computational Linguistics, Stockholm, Sweden.
^ Joshi, Aravind (1985年). “How much context-sensitivity is necessary for characterizing structural descriptions”. In D. Dowty, L. Karttunen, and A. Zwicky, (eds.). Natural Language Processing: Theoretical, Computational, and Psychological Perspectives. New York, NY: Cambridge University Press. pp. 206–250
^ Joshi, Aravind; S. R. Kosaraju, H. Yamada (1969年). String Adjunct Grammars. Proceedings Tenth Annual Symposium on Automata Theory, Waterloo, Canada.

外部リンク

The XTAG project 自然言語処理へのTAGの応用
A tutorial on TAG
SemConst Documentation TAGフレームワークでの統語論と意味論のインタフェース問題の概説

[jurafsky-martin2000-1] Jurafsky, Daniel; James H. Martin (2000年). Speech and Language Processing. Upper Saddle River, NJ: Prentice Hall. pp. 354

[joshi-rambow2003-2] Joshi, Aravind; Rambow, Owen (2003年). "A Formalism for Dependency Grammar Based on Tree Adjoining Grammar" (PDF). Proceedings of the Conference on Meaning-Text Theory.

[Joshi1969-3] Joshi, Aravind (1969年). Properties of Formal Grammars with Mixed Types of Rules and Their Linguistic Relevance. Proceedings Third International Symposium on Computational Linguistics, Stockholm, Sweden.

[joshi1985-4] Joshi, Aravind (1985年). “How much context-sensitivity is necessary for characterizing structural descriptions”. In D. Dowty, L. Karttunen, and A. Zwicky, (eds.). Natural Language Processing: Theoretical, Computational, and Psychological Perspectives. New York, NY: Cambridge University Press. pp. 206–250

[JoshiKosarajuYamada1969-5] Joshi, Aravind; S. R. Kosaraju, H. Yamada (1969年). String Adjunct Grammars. Proceedings Tenth Annual Symposium on Automata Theory, Waterloo, Canada.

[1]

[2]

[3]

[4]

[5]