Evan Martin (evan) wrote in evan_tech,
Evan Martin
evan
evan_tech

kanji database

kanji.db.bz2: 279kb SQLite3 dump of Jim Breen's KANJIDIC2 (a Japanese kanji dictionary). This includes not only their readings and meanings, but also which grade level the kanji are learned, as well as their frequency rank found by counting frequencies in newspapers. Perfect for studying!

(Man, parsing XML is like pulling teeth.)

Unfortunately,
sqlite> select count(*) from kanji where grade < 10;
2232

So I'll basically never be more literate than a middle schooler.

Anyway, here's a peek:
sqlite> select * from kanji where grade is not null order by grade asc, freq asc limit 10;
id          literal     grade       freq        on_reading      kun_reading      meaning
----------  ----------  ----------  ----------  --------------  ---------------  ---------------
2160        日         1           1           ニチ; ジツ  ひ; -び; -か  day; sun; Japan
76          一         1           2           イチ; イツ  ひと-; ひと  one
1455        人         1           5           ジン; ニン  ひと; -り; -  person
2177        年         1           6           ネン          とし           year
1763        大         1           7           ダイ; タイ  おお-; おお  large; big
1251        十         1           8           ジュウ; ジ  とお; と      ten
2151        二         1           9           ニ; ジ        ふた; ふた.  two
2598        本         1           10          ホン          もと           book; present;
1856        中         1           11          チュウ       なか; うち;  in; inside; mid
1270        出         1           13          シュツ; ス  で.る; -で;   exit; leave

Looks like SQLite's text output doesn't understand doublewidth characters...
Tags: japanese, project
Subscribe

  • your vcs sucks

    I've been hacking on some Haskell stuff lately that's all managed in darcs and it's reminded me of an observation I made over two years ago now (see…

  • ghc llvm

    I read this thesis on an LLVM backend for GHC, primarily because I was curious to learn more about GHC internals. The thesis serves well as an…

  • found my bug!

    Not too interesting, but this has been bugging me for a week. Been working on a toy program that proxies a TCP connection. It was working fine for…

  • Post a new comment

    Error

    default userpic
    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 10 comments

  • your vcs sucks

    I've been hacking on some Haskell stuff lately that's all managed in darcs and it's reminded me of an observation I made over two years ago now (see…

  • ghc llvm

    I read this thesis on an LLVM backend for GHC, primarily because I was curious to learn more about GHC internals. The thesis serves well as an…

  • found my bug!

    Not too interesting, but this has been bugging me for a week. Been working on a toy program that proxies a TCP connection. It was working fine for…