漢字記述言語
この項目「漢字記述言語」は途中まで翻訳されたものです。(原文:英語版 "Chinese character description languages" 07:06, 26 May 2016 (UTC)) 翻訳作業に協力して下さる方を求めています。ノートページや履歴、翻訳のガイドラインも参照してください。要約欄への翻訳情報の記入をお忘れなく。(2016年6月) |
漢字記述言語は、漢字(CJKV漢字)と、その構成要素の一覧、筆画(基本筆画・複合筆画)の一覧、筆順、方形のマス目中に各筆画が配される位置といった漢字の情報を、正確かつ完全に記述する目的で提案されているさまざまな言語である。ビットマップによる記述ではその性質上失われる情報が出てくるため、それを補うように設計されている。この付加情報は、UnicodeやISO/IEC 10646で同一コードポイントに包摂された異体字を区別したり、またUnicodeやISO/IEC 10646には規格化された符号化方法がない稀少字に対して他の形の符号化方法を与えたりするのに用いることができる。多くは楷書体および明朝体を対象とし、また、字の内部構成と類似字の相互参照情報を付与することによって、文字の検索をより簡単に行える、字の内部構造情報を与えることを狙いとしている。
CDL
CDL(Chinese Character Description Language; 漢字字形記述言語)はTom BishopとRichard Cookが文林研究所のために共同開発した、XMLに基づくフォント技術であり、あらゆるCJK漢字を記述するために設計されているが、どんなグリフの記述にも適している。
このXMLベース宣言型言語で実際に定義されるのは、各部品(≒部首)の筆順のほか、より複雑な字の組み立てに使用する、定義済み部品の組み合わせである。この部品はそれ自体で文字であるものが多く、さらに組み立て要素としての機能ももつ。
背景は各辺128ピクセルの正方形をしている。その背景に、以下のように文字が定義される。
- 各種の筆画をSVG形式で描くことができる(50種類以上)。
- 基本的な部品は筆画をいくつか呼び出すことで構成される。この部品において、各筆画は左下と右上の隅を指定して記述される。これは変形(拡大、縮小など)することが可能である。基本部品は1,000以上存在する。
- 字は部品をいくつか呼び出すことで構成される。この字において、各部品は左下と右上の隅を指定して記述される。より複雑な字の一部をなす組み立て部品として使用する際には、部品が漢字の中で占める矩形の領域に適した形になるよう、変形(例えば横方向や縦方向の拡大や縮小)することが可能である。
このようにして、50ほどの筆画によって1,000以上の部品を組み立てることができ、そしてそれが今度は数万の漢字の記述の中に埋め込まれる。基本の50の筆画の一つに対して形の変更を加えると、それはその筆画を含む各字の中にも暗黙に適用される。同様に、部品に対する変更は、構成にその部品を使用している各文字の中にも暗黙に適用される。
T. Bishop and R. Cook explain this as follows:
- "The stroke count of one character is generally related to the stroke counts of other characters. Most characters are built from components, and as long as the stroke counts of those components are defined, there is rarely any difficulty in adding them together to obtain the combined stroke count. Therefore, if a standard defines the strokes of a few thousand characters, it implicitly defines the strokes of many thousands of additional characters."[1]
As of spring 2003, over 50,000 Chinese characters had been described via CDL. As of 26 February 2013, 86,416 Chinese characters had been described via CDL.[2]
HanGlyph
A character description language intended for supplying missing rare characters in documents (addressing the Chinese equivalent of the gaiji problem).[3] Documents can contain markup for missing characters, which will automatically trigger the generation of small fonts to provide the characters. The language itself is a simple postfix notation describing strokes and ways to combine them. The prototype software uses Metapost to render the characters and embed them in LaTeX documents. The language was presented by Wai Wong in 1997,[4] and papers about its implementation in Metapost and LaTeX appeared at TeX user group conferences in 2003.[5][6]
漢字構成記述文字列 (IDS)
Chapter 12 of the Unicode specification[7] defines a syntax for "Ideographic Description Sequences" (IDSes) intended for use in describing characters not included in the standard in terms of combinations of components that do have code points. Twelve special characters in the range U+2FF0 to U+2FFB act as prefix operators to combine other characters or sequences to form larger characters.
Character | Unicode Character Number | Full Unicode Name |
---|---|---|
⿰ | U+2FF0 | Ideographic description character left to right |
⿱ | U+2FF1 | Ideographic description character above to below |
⿲ | U+2FF2 | Ideographic description character left to middle and right |
⿳ | U+2FF3 | Ideographic description character above to middle and below |
⿴ | U+2FF4 | Ideographic description character full surround |
⿵ | U+2FF5 | Ideographic description character surround from above |
⿶ | U+2FF6 | Ideographic description character surround from below |
⿷ | U+2FF7 | Ideographic description character surround from left |
⿸ | U+2FF8 | Ideographic description character surround from upper left |
⿹ | U+2FF9 | Ideographic description character surround from upper right |
⿺ | U+2FFA | Ideographic description character surround from lower left |
⿻ | U+2FFB | Ideographic description character overlaid |
For example, the character “” can be described as “⿰書史”.
These sequences differ from some other character description languages in that they do not include detailed information about the locations and shapes of strokes. They do not, by themselves, provide enough information for an actual rendering of a character being described.
However, these sequences are useful in describing to the reader a character that is not directly printable, either because it is absent in a given font, or is absent from the Unicode standard altogether.
These sequences may incidentally be useful for dictionary lookup purposes, as a sort of rough input method for queries.
Unicode's specification for these sequences is based on the characters and syntax of the earlier GBK standard.
The IDSgrep free software package by Matthew Skala[8][9] extends Unicode's IDS syntax to include additional features for dictionary lookup; it is capable of converting KanjiVG's database to its own extended IDS format, or of searching EIDS files generated by the related Tsukurimashou font family.
KanjiVG
KanjiVG is a free (CC-by-sa-3.0) Japanese character description language (intended to eventually expand to Chinese as well) based on SVG and a wiki system of edition.
この節の加筆が望まれています。 |
SCML
In 2007, Structural Character Modeling Language was proposed as a different kind of XML-based Chinese-character description language whose positioning is not based on a numerical grid, as CDL and HanGlyph are. The known database of characters whose strokes and components are encoded in SCML is for demonstration-of-principle only; no known effort exists to attempt to encode, say, all of Unicode's CJK characters in SCML.
関連項目
- Unicode
- List of Shuowen Jiezi radicals - a system of 540 components used by Xu Shen (d. ~147 AD) in his Shuowen Jiezi
- List of Kangxi radicals - a system of 214 components used by the Kangxi dictionary (1716), made under the leadership of the Kangxi Emperor
- List of unicode radicals - a modern and computer-based ongoing attempt to create a complete and accurate set of CJK component list, led by Unicode.
- Cangjie input method
- CJK characters
- stroke
- stroke order
- radical
脚注
- ^ Bishop, Tom, Cook, Richard & 2003 Oct. 31st, pp. 8–9, point n⁰12
- ^ [1]
- ^ “HanGlyph”. 17 February 2012閲覧。
- ^ Wong, Wai (April 1997). “HanGlyph – a Chinese Character Description Language”. Proceedings of the Seventeenth International Conference on Computer Processing of Oriental Languages, Hong Kong.
- ^ Yiu, Candy L. K.; Wai Wong (July 2003). “Chinese Character Synthesis using METAPOST”. Proceedings of the 24th Annual Meeting and Conference of the TeX User Group, Hawaii, U.S.A..
- ^ Wong, Wai; Candy L. K. Yiu; Kelvin, C. F. Ng (June 2003). “Typesetting Rare Chinese Characters in LaTeX”. Proceedings of the 14th European TeX Conference, Brest, France.
- ^ [2]
- ^ [3]
- ^ Skala, Matthew (2015). “A Structural Query System for Han Characters”. International Journal of Asian Language Processing 23 (2): 127-159 .
外部リンク
- CDL language from Wenlin Institute
- Bishop, Tom; Cook, Richard, CDL specification
- Bishop, Tom; Cook, Richard (2003 Oct. 31st), Specification for CDL
- Cook, Richard (2003, Oct. 26th), Chinese Character Description Languages
- Bishop, Tom (2007), A character description language for CJK, Multilingual, #91, Volume 18 Issue 7, pp. 62–8
- Digital Humanities Start-up Grant from the U.S. National Endowment for the Humanities
- SCML
- Peebles, Daniel G.; Balkcom, Devin (Advisor) (May 29, 2007), SCML: A Structural Representation for Chinese Characters, Technical Report TR2007-592, Dartmouth College, pp. 30
- HanGlyph