(NB: This document has been converted quickly from plain text to HTML. As a result, some of the formatting has been left as it was in the original document. A more elegant version may be developed later.)
CONTENTS:
The KANJIDIC file contains comprehensive information about Japanese kanji. It is a text file currently 6,355 lines long, with one line for each kanji in the two levels of the characters specified in the JIS X 0208-1990 set. (For basic information about this set, see Appendix A.)
The file contains a mixture of ASCII characters and kana/kanji encoded using the EUC (Extended Unix Code) coding.
Attention is drawn to the KANJIDIC LICENCE STATEMENT AND COPYRIGHT NOTICE included below in this document.
A similar file, KANJD212, is available for the 5,801 supplementary kanji in the JIS X 0212-1990 set.
From June 2003, the KANJIDIC file has been generated from a database developed from KANJIDIC to support the KANJIDIC2 XML-format version. The legacy KANJIDIC format file will continue to be distributed.
The first part of each line is of a fixed format, indicating which character the line is for, while the rest is more free-format.
The first two bytes are the kanji itself. There is then a space, the 4-byte ASCII representation of the hexadecimal coding of the two-byte JIS encoding, and another space.
The rest of the line is composed of a combination of three kinds of fields (which may be in any order and interspersed):
(Other Tn classes may be created at a later date.)
There are currently a variety of predefined fields (programs using KANJIDIC should not make any assumptions about the presence or absence of any of these fields, as KANJIDIC is certain to be extended in the future):
As far as possible all entries will have their yomikata and readings attached, even if they are a recognized variant of another kanji. This is to facilitate electronic searches using these fields as keys, and should not be taken as a recommendation to use such obscure kanji.
KANJIDIC is used now to build the "kinfo.dat" file which is used by JDIC and JREADER, and by Stephen Chung's JWP. "kinfo.dat" contains the identical information, but in a compressed form and in a structure suitable for fast indexed access.
KANJIDIC is also used in the XJDIC and MacJDic dictionary programs, and a growing number of other programs such as KDRILL and KDIC.
KANJIDIC was originally compiled, and is maintained by:
KANJIDIC is now rather large, and has information in it which is not much use for people who are not studying and researching Japanese orthography. It is still appropriate to maintain it as a useful freely-available compendium of such information.
For people who only wish to use a subset of the information in KANJIDIC, there is a program "kdfilt.c", also available as kdfilt.exe for MS-DOS, which will strip out unwanted fields. Dan Crevier has also released a program (kanjidicSplit) which does the same for MacJDic users. (For users of the JDIC program, the KANJDFIX.EXE utility also strips out unwanted fields prior to building the KINFO.DAT file.)
(some comments by Jim Breen)
KANJIDIC began as two files: jis1detl.lst and jis2detl.lst, which were later merged into a single file.
The first file was compiled initially from the file "kinfo.dat" supplied by Stephen Chung, who in turn compiled his file from a file prepared by Mike Erickson. I originally added about 1900 "meanings" by James Heisig keyed in by Kevin Moore from the book "Remembering The Kanji". I later added the meanings from Rik Smoody's files, compiled when he was working for Sony in Japan. These appear to have been based on Nelson.
The second file was compiled from a complete JIS2 list with Bushu and stroke counts kindly supplied to me by Jon Crossley, to which I added Nelson numbers, yomikata and meanings extracted from Rik Smoody's file.
Theresa Martin was an early assister with this file, particularly with tracking down and correcting many mistranscribed yomikata (the old zu/dzu, oo/ou, ji/dji, etc. problems).
Jeffrey Friedl did a major overhaul in September-October 1992, in which he added the original frequency rankings, Halpern codes, SKIP patterns, updated the grading ("G" fields) to reflect the modern Jouyou lists, corrected radical numbers, corrected stroke counts and readings to fall in line with modern usage.
Magnus Halldorsson corrected some erroneous Halpern numbers, and provided them for a lot of the radicals. He provided the list of Heisig indices, which he originally compiled himself, then verified and expanded using lists from Richard Walters and Antti Karttunen. He also passed on to me the list of Gakken indices compiled by Antti Karttunen.
Lee Collins provided the Unicode mappings (see appendix B)
Iain Sinclair has provided the yomikata, meanings and S&H indices of many of the obscure JIS2 kanji.
Christian Wittern, a Sinologist working at Kyoto University, sent me a monster file prepared by Dr Urs App from Hanazono College. From this I have extracted the Four Corner and Morohashi information. Christian also provided the original Pinyin details, which were later replaced. I am very grateful for these significant contributions.
In March 1994 the Morohashi indices were proof-read and corrected by Christian.
Alfredo Pinochet supplied all the Henshall numbers.
Ingar Holst has provided considerable assistance in regularizing the Bnnn and Cnnn radical classifications to remove some errors that were in the original JIS2 file, and to make it all conform to Nelson's classification.
In mid-1993 I withdrew the SKIP codes from the distributed file as it appeared that their presence violated Jack Halpern's copyright on these codes. Jeffrey Friedl contacted Jack about this, and Jack obtained permission from his publisher for the codes to be included subject to the copyright and usage restrictions stated in this document. In March 1994 the Halpern indices and SKIP codes were checked against an extract from Jack's files, and the "Z" mis-classification codes added, again from his files. Jack has also made a lot of useful comments and suggestions about the content and format of the file. I am most grateful to Jack for his permission and assistance, and also to Jeffrey for making the contact.
In May 1995, a number of updates took place. Jeffrey Friedl established contact with James Heisig, and obtained a further set of his indices. I contacted Mark Spahn (via the "honyaku" mailing list) and he kindly provided most of the missing S&H descriptors, and Jack Halpern released to me the SKIP codes of the kanji not in the New Japanese-English Character Dictionary. For all this material I am most grateful.
In August 1995, I added the O'Neill index numbers. These were compiled by Jenny Nazak, David Rosenfeld and myself. Thanks to Jenny & David for their assistance.
In January and February 1996 the Morohashi numbers were checked thoroughly against two important sources: a file of Unicode-Morohashi data (Uni2Dict) which was prepared by Koichi Yasuoka from the allocation in the JIS X 0221 standard, and the review draft of the proposed revision of the JIS X 0208 standard, which was prepared by the INSTAC Committee, and made available in a text file, thus enabling comparisons. All the mismatches between the three files were examined against the Morohashi text, and extensive corrections made to all three files. I am grateful to Koichi Yasuoka and Masayuki Toyoshima for their considerable assistance in this task.
In March 1996 the Korean readings were added. They were provided by Dr Charles Muller of Toyo Gakuen University (acmuller@gol.com), to whom I am most grateful. Chuck's compilation of Korean readings is extremely thorough and scholarly, and I am pleased to be able to incorporate them.
In April 1996 the readings of all the kanji were compared with those in the JIS X 0208 draft, and a number of corrections and additions made.
In May 1996 I carried out a "unification" of the readings of the KANJIDIC and KANJD212 files, wherein all the readings of the "itaiji" were brought into line. The identification of these itaiji was drawn from a file posted to the fj.kanji group by Taichi Kawabata (kawabata@is.s.u-tokyo.ac.jp), which was compiled at the ETL from the itaiji identification in the JIS X 0208 and JIS X 0212 standards. I corrected a few errors, and added some extra sets which were indicated in the JIS X 0208-1996 draft.
In July 1996 the Pinyin details were completely replaced by a new set. The original Pinyin were from an earlier compilation by Christian Wittern, and and contained many errors. Two more reliable sources had become available: the Uni2Pinyin file compiled by Koichi Yasuoka, which is based in part on the TONEPY.tit by Yongguang Zhang; and the PYCHAR set of readings of Big5 hanzi compiled by Christian Wittern. The Pinyin currently in the KANJIDIC file is a combination of the two, following the order in the Uni2Pinyin file.
In August 1996 I corrected a few more missing and erroneous Nelson numbers, using a massive Nelson list prepared by Wolfgang Cronrath. He also flagged the kokuji, so I added these to the readings fields as "{(kokuji)}".
Also in August 1996 I deleted the handful of former "XJxxxx" cross-references, and replaced them with a much more comprehensive set, so that they now represent all the recognized "itaiji". The file I used for this was the corrected itaiji file mentioned above.
In April 1997 I corrected a large number of bushu codes. Many of these had been identified as errors by Jean-Luc Leger (reiga@iria.mines.u-nancy.fr) who analyzed and examined all the Nelson bushu. I also identified and added a large number of missing Cnnn codes.
Also in April 1997 I added the S&H "Kanji & Kana" indices. These had been keyed by Olivier Galibert (Olivier.Galibert@mines.u-nancy.fr). (There must be an outbreak of kanji interest on Nancy.)
In February 1998, the long-awaited inclusion of the "New Nelson" numbers took place. I had been waiting for the editor of the New Nelson, John Haig, to supply a list (as he had agreed some years before), but in the meantime, Jean-Luc Leger keyed a list, so they are now available.
Also between December 1997 and February 1998 a large number of Level 2 kanji had their stroke counts corrected to bring them into line with the counting principles used in the Level 1 kanji. This usually aligned the counts with those used in the New Nelson and in S&H. Appendix E of this document was amended to reflect this. The leg-work in tracking this material down was done by Wolfgang Cronrath.
During December 1998 & Jan 1999 I updated the stroke counts of many of the Level 2 kanji, using an analysis of them carried out by Wolfgang Cronrath. I also added the De Roo codes, which had been keyed by Jasmin Blanchette, who also typed the explanatory material. I contacted Fr De Roo in Tokyo who readily agreed to the inclusion of thecodes.
The extension of the S&H Kana & Kanji numbers to the 2nd edition was done by Enrique Sanchez Rosa.
The Hangul versions of the Korean readings (which only appear in the XML version) were provided by Francis Bond and Kyonghee Paik.
I did the Tuttle card numbers myself.
James Rose provided the numbers from Crowley's "The Kanji Way to Japanese Language Power" and Sakade's "A Guide To Reading and Writing Japanese".
The "Kodansha's compact Kanji guide" codes were provided by Richard Fremmerlid.
The "Kanji in Context" codes were provided by Randy Foreman.
The Spanish kanji meanings (which appear in the XML format, and may also appear in special versions of KANJIDIC) were compiled by Francisco Gutierrez and provided by Gabriel Sanroman.
KANJIDIC LICENCE STATEMENT AND COPYRIGHT NOTICE
In March 2000, James William Breen assigned ownership of the copyright of the dictionary files assembled, coordinated and edited by him to the The Electronic Dictionary Research and Development Group at Monash University.
Information about the formal usage arrangement for KANJIDIC can be found on the Group's WWW page.
In summary, KANJIDIC can be freely used provided satisfactory acknowledgement is made, and a number of other conditions are met.
The following people have granted permission for material for which they hold copyright to be included in the files, and distributed under the above conditions, while retaining their copyright over that material:
Jack HALPERN: The SKIP codes in the KANJIDIC file.
With regard to the SKIP codes, Mr Halpern draws your attention to the statement he has prepared on the matter, which is included at Appendix F.
Christian WITTERN and Koichi YASUOKA: The Pinyin information in the KANJIDIC file.
Urs APP: the Four Corner codes and the Morohashi information in the KANJIDIC file.
Mark SPAHN and Wolfgang HADAMITZKY: the kanji descriptors from their dictionary.
Charles MULLER: the romanized Korean readings.
Joseph DE ROOO: the De Roo codes.
For full information about JIS codes, please see Ken Lunde's "japan.inf" file, or his book "Understanding Japanese Information Processing", O'Reilly 1993. The following is a brief extract from the "japan.inf" file.
"The Japanese character set as described in the document JIS X 0208-1990 specifies 6,879 standard characters; 6,355 kanji in 2 levels (Level 1: 2,965 kanji arranged by pronunciation; Level 2: 3,390 kanji arranged by radical), 86 katakana, 83 hiragana, 10 numerals, 52 Roman characters, 147 symbols, 66 Russian characters, 48 Greek characters, and 32 line elements (for making charts).
This standard was first established in 1978, modified for the first time in 1983 (character position swapping, glyph changes, and four kanji appended to JIS Level 2), and modified again in 1990 (two kanji were appended to JIS Level 2). This character set is widely implemented on a variety of platforms. Encoding methods for JIS X 0208-1990 include Shift-JIS, EUC, and JIS."
The following information about Unicode was provided in 1992 by Lee Collins at Taligent.
(The Unicode sequences are) "the final, official mapping to JIS of the CJK-JRG's (Chinese, Japanese, Korean- Joint Research Group) "Unified Repertoire and Ordering Version 2.0" which is the unified Han character set of ISO 10646 and Unicode. All of the Unicode companies (Apple, IBM, Microsoft, NeXT, Taligent, etc) are now using this mapping. There has been some confusion because of difference in nomenclature. Unicode people call it UniHan, the Chinese sometimes call it HCS (Han Character Set) and ISO calls it "Ideographic CJK Character Unified Repertoire and Ordering". ISO can't use the term "Han" character because Japan was very sensitive to this (even though it is a direct translation of "Kanzi") and it can't be called a character set because only ISO WG2 is empowered with the authority to encode characters. Problems of naming aside, they are all the same thing.
The CJK-JRG was formed under the aegis of ISO in 1990 to investigate and propose a unified Han character set for inclusion in ISO 10646. It brought together various experts on Han characters from China, Hong Kong, Japan, Korea, Taiwan and the United States selected by the national bodies participating in ISO WG2.
Including the initial work in the US on Unicode and in China on GB 13000, which were merged and became the basis for the URO, the task spanned about 4 years. The work was completed in April of this year. It contains 21,000 Han characters from all of the major standards used in East Asia, including JIS X 0208-1990 and JIS X 0212-1990. The Unicode consortium provides a cross-reference file for all of the source sets. To get a copy contact
Steve Greenfield
unicode-inc@HQ.M4.Metaphor.COM
For further details about the URO/UniHan, you might want to pick up a copy of the "The Unicode Standard Version 1.0 Vol II". It's published by Addison Wesley, ISBN 0-201-60845-6. It's been available in the USA for over a month now. For a slightly different presentation of the characters, a copy of 10646 or of the "Ideographic CJK Character Unified Repertoire and Ordering Version 2.0" might be available through the the Australian national body to ISO WG2."
S K I P - SYSTEM OF KANJI INDEXING BY PATTERNS
[This document contains the text and examples from the covers of the "New Japanese-English Character Dictionary" edited by Jack Halpern and published by Kenkyusha and NTC. It is reproduced with Mr Halpern's kind permission.
The text on which this is based used four patterns which are not able to be reproduced in this document. They are referred to below as #1 through #4, and relate to the following shapes in the NJECD: . ¢£¢£¡±¡±¡Ã ¢£¢£¢£¢£ ¢£¢£¢£¢£ ¢£¢£¢£¢£ . ¢£¢£ ¡Ã ¢£¢£¢£¢£ ¢£¢£¢£¢£ ¢£¢£¢£¢£ . ¢£¢£ ¡Ã ¢£¢£¢£¢£ ¢£ ¢£ ¢£¢£¢£¢£ . ¢£¢£ ¡Ã ¡Ã ¡Ã ¢£ ¢£ ¢£¢£¢£¢£ . ¢£¢£ ¡Ã ¡Ã ¡Ã ¢£¢£¢£¢£ ¢£¢£¢£¢£ . ¢£¢£¡²¡²¡× ¡Ã¡²¡²¡× ¢£¢£¢£¢£ ¢£¢£¢£¢£ . #1 #2 #3 #4 . LEFT- TOP- ENCLOSURE SOLID . RIGHT BOTTOM] . HOW TO LOCATE AN ENTRY A. Determine the SKIP number of your character. STEP 1 IDENTIFY PATTERN Determine to which of the four PATTERNS your character belongs to get the first part of the SKIP number (the PATTERN NUMBER). If your character belongs to pattern #1, #2 or #3 (Áꢪ#1), carry out the steps in the left column; if it belongs to pattern #4 (²¼¢ª#4), carry out the steps in the right column. (REF: R4. How to Identify the Pattern) . #1 #2 #3 #4 STEP 2 DIVIDE CHARACTER OMIT Divide the character into two parts at (Since solid characters the first division point. [Áê=ÌÚ+ÌÜ] cannot be divided, go to REF: R5. How to Divide the Character STEP 3.) REF: R6. How to Subclassify the Solid Pattern STEP 3 COUNT STROKES OF SHADED PART DETERMINE TOTAL STROKE-COUNT Count the strokes of the SHADED PART Determine the total stroke-count of to get the second part of the SKIP your character to get the second part number. [Áê #1 1-4-] of the SKIP number. [²¼ #4 4-3-] REF: Appendix 2. How to Count Strokes REF: Appendix 2. How to Count Strokes STEP 4 COUNT STROKES OF BLANK PART IDENTIFY SOLID SUBPATTERN Count the strokes of the BLANK PART Determine to which of the four to get the third part of the SKIP SOLID SUBPATTERNS your character number. [Áê #1 1-4-5] belongs to get the third part of the REF: Appendix 2. How to Count Strokes SKIP number. Select from: `¡±' 1, `¡²' 2, `|' 3, or `¢£' 4. [²¼ #4 4-3-1] REF: R6. How to Subclassify the Solid Pattern After determining the SKIP number of your character, locate your character entry in one of two ways: 1. Determine the entry number in the Pattern Index beginning on p. 1952 then locate your character entry in the main part of the dictionary. See R3.1.2 Index Method for details. 2. Locate your character entry directly (without referring to the Pattern Index) from its SKIP number. See R3.1.3 Direct Method for details. NOTE: All references preceded by a section mark (R) refer to SYSTEM OF KANJI INDEXING BY PATTERNS beginning on p. 106a HOW TO IDENTIFY THE PATTERN DETERMINE TO WHICH OF THE FOUR PATTERNS YOUR CHARACTER BELONGS #1 Characters that can be divided into left and right parts RIGHT: Áê 4-5 Ȭ 1-1 ½ç 1-11 °· 3-3 WRONG: ÊÒ 1-3 ÍÑ 1-4 ²Ä 3-2 ¿ 3-3 #2 Characters that can be divided into top and bottom parts RIGHT: Æó 1-1 »û 3-3 ¸Å 2-3 ½Õ 5-4 WRONG: Ëü 1-2 ¹Í 4-2 ´Ö 8-4 ºÁ 4-3 #3 Characters that can be divided by an enclosure element RIGHT: ¿Ê 3-8 ¹ 3-2 Ìä 8-3 ¹ñ 3-5 WRONG: Æþ 1-1 ¸â 4-3 ̾ 3-3 °Ù 5-4 #4 Characters that cannot be classified under patterns #1, #2, or #3 RIGHT: ±« 8-1 ʼ 5-2 Ãæ 4-3 Í¿ 3-4 WRONG: Åá 2-1 Æü 4-1 ¿å 4-3 IF A CHARACTER CAN BE CLASSIFIED UNDER MORE THAN ONE PATTERN, SELECT THE ONE THAT FOLLOWS THE NATURAL CONSTRUCTION OF THE CHARACTER RIGHT: »ù 2-5-2 È¢ 2-6-9 WRONG: »ù 1-2-5 È¢ 1-7-8 HOW TO DIVIDE THE CHARACTER DIVIDE THE CHARACTER INTO TWO PARTS AT THE FIRST DIVISION POINT #1 Going from left to right, divide at the first space RIGHT: ÌÀ 4-4 ¾® 1-2 °· 3-3 WRONG: ¾® 2-1 ³¹ 9-3 #2 Going from top to bottom, divide at the first space, horizontal line, or frame element, whichever comes first RIGHT: »° 1-2 ¶¼ 2-8 ÀÖ 3-4 ¸Å 2-3 WRONG: »° 2-1 ¶¼ 6-4 ÀÖ 2-5 ²¼ 1-2 #3 Going from the outside toward the inside, divide after the first enclosure element RIGHT: ÅÙ 3-6 ¿Ê 3-8 ÊÄ 8-3 ÌÜ 3-2 WRONG: ÅÙ 7-2 Ëá 11-5 DO NOT VIOLATE THE PRINCIPLE OF ELEMENT INTEGRITY . 1. Never break through strokes . RIGHT: ¶§ 3-2-2 WRONG: ¶§ 1-1-4 . 2. Never break through indivisible units . RIGHT: ¾ð 1-3-8 WRONG: ¾ð 1-1-10 . 3. Never make unnatural divisions . RIGHT: µ¤ 3-4-2 WRONG: µ¤ 2-2-4 HOW TO SUBCLASSIFY THE SOLID PATTERN A. DETERMINE TO WHICH OF THE FOUR SOLID SUBPATTERNS YOUR CHARACTER BELONGS `T' 1. Characters that contain a top line RIGHT: ±« 8-1 ²¼ 3-1 ¼ª 6-1 ²Ì 8-1 WRONG: Åá 2-1 Àé 3-2 ¿â 8-1 ʼ 5-1 2. Characters that contain a bottom line RIGHT: ¾å 3-2 ʼ 5-2 ¿â 8-2 WRONG: »³ 3-2 Êñ 5-2 ¼Ô 8-2 3. Characters that contain a through line RIGHT: Ãæ 4-3 Åì 8-3 ÌÓ 4-3 WRONG: ¿å 4-3 À£ 3-3 ¸á 4-3 Äï 7-3 4. Characters that do not contain a top line, bottom line, or through line RIGHT: Í¿ 3-4 Âç 3-4 ¼÷ 7-4 WRONG: »å 6-4 µ× 3-4 ͧ 4-4 Îô 6-4 B. IF A CHARACTER CAN BE CLASSIFIED UNDER MORE THAN ONE SUBPATTERN, THE SUBPATTERN WITH THE SMALLEST NUMBER TAKES PRECEDENCE RIGHT: ²¦ 4-1 ¸Ê 3-1 ÆÓ 7-1 ²Ì 8-1 ½Ð 5-2 À¸ 5-2 ¹Ã 5-1 WRONG: ²¦ 4-2 ¸Ê 3-2 ÆÓ 7-2 ²Ì 8-3 ½Ð 5-3 À¸ 5-3 ¹Ã 5-3
APPENDIX D: - AN OVERVIEW OF THE FOUR CORNER CODING SYSTEM
The Four Corner System has been used for many years in China and Japan for classifying kanji. In China it is losing popularity in favour of Pinyin ordering. Some Japanese dictionaries, such as the Morohashi Daikanwajiten have a Four Corner Index.
The following overview of the system has been condensed from the article "The Four Corner System: an introduction with exercises" by Dr Urs App, which appeared in the Electronic Bodhidharma No 2, February 1992, published by the International Research Institute for Zen Buddhism, Hanazono College. (More examples will be added from that article in due course.)
1. Stroke shapes are divided into ten classes: . 0 LID е . 1 HORIZONTAL LINE °ì . 2 VERTICAL LINE ¡Ã . 3 DOT Ц . 4 CROSS ½½ . 5 SKEWER ¥ . 6 BOX ¸ý . 7 ANGLE ÒÌ . 8 HACHI Ȭ . 9 CHIISAI ¾® 2. The Four Digits are derived from the Four Corners in a Z-shaped order. . A B 7 1 7 7 . for example: ¸¶ ·î . C D 2 9 2 2 Some examples: »Å 2421 ¹Ô 2122 Îò 7121 µû 2733 »ì 0762 Ʊ 7722 ¶¶ 4292 3. A shape is only used once. If it fills several corners, it is counted as zero in subsequent corners. Some examples: ¸ý 6000 ¼ó 8060 ʬ 8022 Âç 2003 Ï 2690 ÉÊ 6066 µþ 0096 4. When the upper or lower half of a character consists of only one (single or composite) shape, it is, regardless of its position, counted as a left corner. The right corner is counted as zero. Some examples: Ω 0010 ͳ 5060 Àã 1017 Êý 0022 Äí 0024 »å 2090 ¼ê 2050 5. When there is no additional element to the four sides of the characters .¸ý, Ìç, ò¨ (and sometimes ¹Ô), whatever is inside these characters is taken for the lower two corners. Some examples: Ìä 7760 ¼ü 6080 Ô¢ 6015 ÌÜ 6010 ³« 7744 ÌÌ 1060 îò 2110 6. The analysis is based on the block-style handwritten kaisho (Ü´½ñ) shape of characters. (This needs attention, as ¸Í is 3027, not 1027. The top stroke is treated as a Ц.) 7. Some points to note when analyzing shapes: o Shape 0: When the horizontal line below a DOT shape (number 3) is connected to another stroke at its right-hand end (as in Õß ¸Í, etc.) it is not counted as a LID (number 0) but as a DOT. Examples: °Â 3040 ¿À 3520 µ§ 3222 o Shape 6: Characters such as »® and Õù where one of the strokes of the square extends beyond it, are not considered to be square (number 6) shapes, but corners (number 7). Examples: ³î 7710 ½ê 3222 »® 7710 ´Û 8377 µ¹ 3010 o Shape 7: Only the cornered end of corner shapes (number 7) is counted as 7. Examples: ¶è 7171 ¶Ô 7222 ¶ç 2762 È¿ 7124 o Shape 8: Strokes that cross other strokes are not counted as shape number 8 (Ȭ). Examples: Èþ 8043 ´Ø 7743 Âç 4003 ¼º 8043 ¹Õ 2143 Àí 9043 o Shape 9: Shapes resembling shape 9, but featuring two strokes in the middle (as in the top part of ¶È or ÁÑ) or two strokes on one side (as in ¿å or the bottom part of Êé) are not considered as 9 shapes. Examples: Êé 4433 ¶È 3290 ÁÑ 3214 8. Some points to note when choosing corners. - when a corner is occupied by more than one independent or parallel strokes, the one that extend furthest to the left or right is taken as the corner, regardless of how high or low it is. examples: Èó 1111 Ðë 2124 ¼À 0013 Äë 0022 ¼Ò 3421 ÌÔ 4721 - if there is another shape above (or, at the bottom of the character, below) the leftmost or rightmost stroke of a character, that shape is given preference and is taken as the corner. examples: »¡ 3090 ¹¬ 4040 ᶠ6020 ½÷ 4040 ã¹ 3521 ¶ 4480 - when two composite stroke shapes are interwoven and each could be regarded as a corner, the shape that is higher is taken as the upper corner, and the lower stroke as lower corner. - when a stroke that slopes downwards to the left or right is supported by another stroke, the latter is taken as the corner. examples: ±° 2740 ΢ 0073 ¾Ë 1962 é° 4464 ·Ô 4410 Èï 3424 - a left slanting stroke on the upper left is taken for the left corner only; for the right corner one takes a stroke more to the right. examples: ¿È 2740 ̶ 2350 ³û 6752 Ū 2762 ½Ü 2762 Åç 2772 9. Shape variations: (Dr App includes several pages of examples) 10. The fifth corner: In order to differentiate between the several characters with the same code, an optional "fifth corner" is sometimes used. This is, loosely, a shape above the fourth corner which has not been used in any other shape.
APPENDIX E. RADICAL AND STROKE COUNTING RULES
These rules apply:
The radicals listed below are ones where there are differing approaches to the counting of radicals in the various references. The stroke counting in this file does not strictly follow any reference, but tends to more aligned to Halpern.
APPENDIX F.CONDITIONS FOR USING SKIP DATA by Jack Halpern (jack@kanji.org)
Ever since my New Japanese-English Character Dictionary (NJECD) came out (Kenkyusha 1990, NTC 1993), I have been getting inquiries asking for permission to use SKIP (System of Kanji Indexing by Patterns) data in software products and electronic dictionaries. Below I explain the policy of the Kanji Dictionary Publishing Society (KDPS) on how to use copyright issues when distributing SKIP data or using it in software product or electronic dictionary.
WHAT IS SKIP?
Briefly, SKIP is an indexing system that enables the user to locate kanji quickly and accurately. The system is extremely convenient because it can be learned in a very short time, is easy to use, and requires very little prior knowledge of kanji.
The central idea of SKIP is the classification of characters into four major categories on the basis of easy-to-identify geometrical <patterns>:
1. Left-right 2. Up-down 3. Enclosure 4. SolidCharacters belonging to the first three categories are arranged in ascending order of hyphenated numerals that represent the number of strokes in the <shaded part,> and the number of strokes in the <blank part.> See http://www.kanji.org and NJECD front matter for details.
To distribute SKIP data within a group or use it in a commercial or non-commercial product, please confirm that you agree to the following conditions:
SKIP data is protected by copyright, copyleft and patent laws. The copyright holder is Jack Halpern, chief editor of KDPS (the Kanji Dictionary Publishing Society). The SKIP data must be protected from illegal copying and distribution, using such meaasures encryption. The data must be encrypted if it is to be used in any kind of product, including commercial products, software and freeware. The data, or extracts from it, must not be distributed to a third party, must not be sold as part of any commercial software package, and must not be incorporated in any published dictionary or other printed document without the specific permission of the copyright holder.
The source of SKIP data shall be acknowledged in the information screens of the product, and the following disclaimer should appear in the documentation and/or help screens:
"SKIP (System of Kanji Indexing by Patterns) numbers are derived from the New Japanese-English Character Dictionary (Kenkyusha 1990, NTC 1993) and The Kodansha Kanji Learner's Dictionary (Kodansha International, 1999). SKIP is protected by copyright, copyleft and patent laws. The commercial or non-commercial utilization of SKIP in any form is strictly forbidden without the written permission of Jack Halpern, the copyright holder. Such permission is normally granted. Please contact jack@kanji.org and/or see http://www.kanji.org."
SKIP is a product of seven years of computer-assisted research and experimentation on how kanji elements are intuitively perceived in terms of their parts. Development work was financed by private funds and research grants. To enable us to continue to develop useful data and products, we ask for you cooperation by paying KDPS (the Kanji Dictionary Publishing Society) a royalty 0.5% (negotiable) if you are using the data for a commercial product. Depending on the circumstances, it is also possible to use SKIP data free of charge or at a lower royalty.
Finally, please send a copy of your product to Jack Halpern
AN OVERVIEW OF THE DE ROO SYSTEM
[This document contains the text found in the second edition of "2001 Kanji" edited by Joseph R. De Roo and published by Bonjinsha.]
The system used in "2001 Kanji" is intended for the beginner who encounters a kanji and wants to look it up, knowing neither its radical, pronunciation, nor its exact number of strokes. The method consists of looking at the top of the kanji, and then at its bottom, disregarding its other parts.
"2001 Kanji" provides drawings for all graphic elements. This information cannot be reproduced here. However, an attempt was made to describe each element as much as possible given the constraints of a computer text file, and examples of characters possessing the element are always given.
Two-step visual method for locating a kanji: 1. Observe its EXTREME TOP or LEFT TOP. There are only four possibilities: DOT (Ц), VERTICAL LINE (¡Ã), DIAGONAL LINE (¥Î), HORIZONTAL LINE (°ì). Each of these four strokes can occur either in isolation or in connection with one or more strokes. Each of the four groups of graphic elements correspond to the four basic strokes in their immediate environment. Each element has a number wich will become the first half of the kanji number. DOT (Ц): 3 DOT (Ц) ÇÈ Îä ±Ê ¿´ ɬ ³Ú ¿Þ 4 ROOF (е) µþ 5 DOTTED CLIFF (Öø) Ä£ ¼À 6 ALTAR ¡¡ ¡¡ Îé Èï Ç· 7 KANA U (Õß) °Â 8 LID ¡¡ ¡¡ Çò ÎÉ ¿ã ½® ¼« ¿È µ´ Åç ¸þ ½° 9 HORNS ¡¡ ¡¡ °Ù Äï Á° ³Ø ¸· µó Áã VERTICAL LINE (¡Ã): 10 SMALL ON BOX ¡¡ ¡¡ ¶È ·ô ÊÆ È¾ ¾° ÊÀ ¸÷ Åö ¾Ó 11 SMALL (¾®) ¾® ¿å ɹ À ϧ 12 VERTICAL LINE (¡Ã) »Ý ÅÀ »ß ¸© ¿Í µ¢ ¸â Ò¸ ¼ý ÊÒ Ãû ¡¡ Èó ÅÍ Àî ½£ »³ 13 HAND TO THE LEFT ¡¡ ¡¡ »ý 14 CROSS (½½) ¸ ¼Ô ¼° Âç É× ÁÕ À£ µá ±¦ Ë® ¼· ¡¡ ÅÚ ÆÇ Íè ºÊ ÆÖ 15 CROSS ON BOX (¸Å) ¿¿ Æî ¼Ö ·Ã Åì Ä« « »É ¸Å »ö 16 KANA KA (¥«) ´Ý ½ñ °ã Æâ Éý Èé Ãæ ¿½ ±û Äá 17 WOMAN (½÷) °ù 18 TREE (ÌÚ) ËÜ 19 LETTER H (×°) Çü ³× °æ ´Å ÂÓ À¤ ʦ ¶ ¶Ê Áâ DIAGONAL LINE (¥Î): 20 KANA NO (¥Î) ¼õ º© ¹Ô ˳ ë Ȭ Éã 21 MAN TO THE LEFT (¥¤) »ø 22 THOUSAND (Àé) ×Û ·Ï ¼á Íø ¾£ ²æ ¼ê Àé ÌÓ ¿â ¾è ¡¡ ½Å 23 MAN TO THE TOP ¡¡ ¡¡ ̵ ´¿ ¸á Ìð ÃÝ ²· Ëè ¸ð 24 COW (µí) ¾Ç µí ¼º ¼ë ¹ð À¸ Àè À© 25 KANA KU (¥¯) ³° Á³ Ôé µ× ³Ñ Ò± 26 HILL TOP ¡¡ ¡¡ »á Äþ ·Þ α Íñ ÀÍ ¹¡ ½â °õ ÃÊ 27 LEFT ARROW (¡ã) Âæ Öß »å ÍÄ ¶¿ 28 ROOF (¢Ê) ¶â ¿© ÁÒ ²ñ ²ð 29 X (¡ß) ÈÈ ´¢ ´õ »¦ HORIZONTAL LINE (°ì): 30 HORIZONTAL LINE (°ì) ¸À Éû Ʀ ¸Í Æõ Îï ¼¨ ¸µ ±¾ 31 FOURTH (Ãú) Îó »ê ±« Ãà ÉÔ Ëü Å· ¹¹ ²Ä ²¼ ¸ß ¡¡ ¸Þ Ê¿ ¹© ²¦ 32 BALD (Ѻ) ²ç »à À¾ Í× ÆÓ ·Á ¼ª °¡ ¼¥ 33 CLIFF (ÒÌ) ÀРä Îå ¸¶ È¿ 34 TOP-LEFT CORNER ¡¡ ¿Ã ÇÏ Ä¹ °å 35 TOP-RIGHT CORNER ¡¡ Æþ ȯ ͽ Ëô Åá λ ×® ²µ Èô µÝ ¿Ò ¡¡ ·¯ Á 36 UPSIDE-DOWN CAN (ÑÄ) Ʊ ÑÜ »Í »® ÅÄ ¹ü ð Êì ÆÌ ±ú 37 MOUTH (¸ý) Õù ¼Ü À× Â Ì± 38 SUN (Æü) ¨ º± Ìç 39 EYE TOP ¡¡ ·î ÌÜ ³î ³ ¸« 2. Observe its EXTREME BOTTOM or RIGHT BOTTOM. There are nine possibilities: DOT (Ц), LEFT HOOK (Ð), VERTICAL LINE (¡Ã), RIGHT HOOK, DIAGONAL LINE (¥Î), BACK DIAGONAL LINE (¡³), BOTTOM OF HEAD É¥, BOTTOM OF WATAKUSHI ÒÓ, HORIZONTAL LINE (°ì). They are listed in association with one or more strokes. The number of the bottom element will become the second half of the kanji number. DOT (Ц): 40 FOUR DOTS ¡¡ ̵ ×Û ±÷ Åß ½Â ´¨ ¿Ô 41 SMALL (¾®) µþ ¾® ¸¶ ¼¨ ; ÀÖ »å ¸© 42 WATER (¿å) µá ±Ê ɹ ¿å LEFT HOOK (Ð): 43 KANA RI (¥ê) Íø 44 SEAL (ÒÇ) ʦ Ĥ Äï ÒÇ »Ô Éô 45 SWORD BOTTOM (Åá) ±ß ³Ñ ǵ ÎÏ Ëü Åá ¿Ï 46 MOON (·î) ÌÀ Í 47 DOTLESS INCH ¡¡ ºÆ ºý Õú в Í· Í¿ Êì Ëè ð »Ò ¾µ ¡¡ ¼ê ¿È ºÍ ²ç 48 INCH (À£) ½® 49 MOUTH LEFT HOOK ¡¡ ¡¡ ¼þ ²Ä »Ê ¶É 50 BIRD BOTTOM ¡¡ ¡¡ Ä» 51 ANIMAL (ÌÞ) ìµ Êª 52 BOW BOTTOM ¡¡ ¡¡ µà µÝ Åç Ò± 53 LEFT HOOK (Ð) ±© Ìç Ãú λ ÍÑ ºý ÑÄ ±© VERTICAL LINE (¡Ã): 54 VERTICAL LINE (¡Ã) ÉÔ ËÎ Æã Èó ÊÒ ¶Ô Àî ʹ 55 CROSS (½½) ¶« Á¤ ÅÍ ´³ ÍÓ ËÜ ¿½ Áá ¼Ö ÀÍ Ãæ ¡¡ ³× ¼ª ×° °æ RIGHT HOOK: 56 RIGHT HOOK ¡¡ ¸Ê Ìé Ýã ²µ ´¤ ÑÜ »á ×µ ε Ò¸ 57 LEGS (ѹ) Õ÷ µ´ Ѻ ȯ Ãû ´Ý ¹Ó 58 HEART (¿´) Ç° 59 TASSELED SPEAR BOTTOM ¡¡ Øù ɬ DIAGONAL LINE (¥Î): 60 KANA NO (¥Î) ×Ä ¾¯ º£ ͼ Õù À¼ Õú µÕ BACK DIAGONAL LINE (¡³): 61 SMALL PODIUM ¡¡ Âþ ³ ¸² ¶¦ Ï» 62 BACK KANA NO (¡³) °ç °Ê ¼Ü µ× Æþ ²Ð ±» ¿Í Öß 63 BIG (Âç) Ìð ÂÀ Å· ¾Ð É× ¼Â ·ð Íè ·è ±û 64 TREE (ÌÚ) Ûù « Åì ¼ë Íè ¾è Ãã ̤ ²Ó ·ó ½Ò ¡¡ ÊÆ 65 SMALL SPOON ¡¡ °á ´Ä º± ι Ĺ ä ÇÉ ½° 66 GOVERN (Щ) ¿Þ Éã ¾æ ʸ Ú¾ Íù ¹¹ 67 AGAIN (Ëô) Ìë Ôé µÚ Ôé ×® 68 WINDY AGAIN (ÝÕ) Ìò 69 WOMAN (½÷) °Â HEAD BOTTOM: 70 HEAD BOTTOM ¡¡ É¥ Äê  Áö Ç· WATAKUSHI BOTTOM: 71 WATAKUSHI BOTTOM ¡¡ Ãî Öö ÒÓ HORIZONTAL LINE (°ì): 72 HORIZONTAL LINE (°ì) ¹© ÅÚ ²¦ ¾å À¸ Τ ¿â ¶Ì ¶â ¶Ë ¸ß 73 STANDING BOTTOM ¡¡ °¡ »ß ³î ¸Þ µÖ Ω Ʀ 74 DISH BOTTOM ¡¡ Ê »® 75 BOTTOM CORNER ¡¡ ÆÌ ±ú µÔ Åö ľ Ñá Ò¹ ð² À¾ ÆÓ Ë´ ¡¡ À¤ 76 MOUNTAIN (»³) Àç ½Ð ÍÉ ´Ì ÅÄ Í³ ²Á Í© ¶Ê ÌÌ 77 MOUTH (¸ý) Àê ¸Å Àå Ϥ ´± 78 SUN (Æü) Çò É´ ´Å 79 EYE (ÌÜ) ¼ó ½â ¼« The number of the kanji you are looking for consists of the top number coming first and the bottom number coming second, the two numbers being placed side by side. E.g., ´Á 363 (3 63), »ú 747 (7 47). There are two rules always to keep in mind: a. Ignore the complete enclosure Óø and the "road" radical (as in Æ»). Look at the top and bottom (in some cases only the bottom) of what is inside the complete enclosure, and of what is to the upper right of "road". E.g., ¼ü 1262, ¸Ä 2177, Æ» 979, Ë¥ 2755. b. When a part is enclosed by the "gate" radical, take the bottom or right bottom of that part. E.g., Æ® 3848, Íó 1864.