Monday, August 18, 2008

Learning Kanji - It's Called Literacy, Dumbass!

Do You Have To Know All The Joyo Kanji?
I seem to see the question Do I really need to learn all the joyo kanji? It's like two thousands of them and that seems a bit too much...

You know, that list was created by a bunch of bureaucrats who have nothing better to do than invent stupid lists and laws all days. Especially the order of the kanji in the list is completely insane - in many cases complex, compounded characters come before the compounds they're made up from, for instance.

Also, the word 常用(jouyou) means "daily use", right? Go out any exit of any train station in Tokyo and look around, and tell me if you don't see the kanji (don). That kanji is not on the list. I guess the bureaucrats don't eat domburi, but that's their loss. On the other hand, they put 匁(monme) on the list - and that kanji is so stupid and useless I can't help but remember it, but I've never, ever, seen it used.

So, no, you don't have to learn all the joyo kanji! There are maybe 20, maybe even 50 or more on the list that you actually don't need to know. But here's the catch: you have to learn a lot more than that!

How Many Kanji Do You Actually Need To Know?
Unlike the last question, where I held on to the answer until the very last paragraph, I'm gonna answer this one right away: maybe about 3,000. Now, granted, I pulled that number out of my arse, but I think it's a decent estimate. That's in order to be considered literate in Japanese... If you don't need to be able to read all male given names (yeah I understand it's cool to give your kid some uncommon character, but come on...).

You see, even though school teaches the roughly 2,000 joyo kanji until high school graduation, most Japanese people can read more than that, even by the time they finish high school. That's what happens if you spend 19 years surrounded by kanji. Non-joyo kanji are not uncommon - I'm speaking from experience here - and in fact most Japanese people don't really know nor care about that stupid list - kanji are just characters you use to write stuff.

But 95% Is Good Enough For Me - Or Is It?
When I started learning Japanese, and in fact some times since then as well, I've seen statistics saying that 1,000 characters are the 90% most frequent, and 2,000 characters constitutes 98% of the kanji used. That might very well be true - I believe those numbers are more or less correct.

So then a seemingly valid, and common, argument would go something like "I don't need to be able to read specialized texts - or even the newspapers - manga/technical specs/email/whatever is enough for me, so being able to read 80/90/95/98% of the kanji is all I need".

I used to think a little bit that way too, to be honest. But there's a fundamental fault in that reasoning: Yes, no one needs, or can ever hope to be able to, understand 100% or even 99% of everything - I mean a lot of stuff in this world is meant for specialists in a particular field - but that's not the same as not being able to read the characters it's written in - that is called illiteracy! And kids: say Yes to mild stimulants, and No to illiteracy - its the bad.

Let me make up an example. This isn't gonna be the best example ever but bear with me as I'm just making this up. Let's take a word like 国立造幣局. Now, three of those kanji (国・立・局) are very easy - I'm sure they were among the first one or two hundred I learned. 造 is also pretty easy, it's at least below JLPT level 2, and very common. But 幣 is not very common, and has a somewhat specialized meaning (but it's on the joyo list and not knowing it constitutes illiteracy). So in that five-kanji word 60% of the kanji are trivial, and 80% are easy.

But then there's one that - while definitely not complicated - is at least JLPT level 1 worthy of difficult. Yet that's the one kanji that conveys most of the meaning, not to mention you can't pronounce the word without knowing its reading, so those who settled for 90% of the kanji will be 0% literate in this example.

Look at it this way: say that pronouns, prepositions, articles, and conjunctions made up 50% of the words used in an average English sentence. Then, would someone who decides to learn only the pronouns, prepositions, articles, and conjunctions of English be able to get the meaning of an ordinary English sentence? Of course not! Even though that person would understand at least 50% of the words used. Now, kanji are characters used to write words and not words in themselves, but anyway... if you want to get down to the monkey's balls with the Japanese language, you have to learn kanji thoroughly. And you might as well do it right away.

-

Now I hope we have established a shared understanding that almost perfect kanji literacy is indispensable for the Japanese language learner. Next, I will be writing about how that literacy is best achieved.

Labels: , ,

Tuesday, August 12, 2008

JLPT1 Progress - Vocabulary Aside: Good

As I've mentioned before, I intend to pass the Japanese Language Proficiency Test (JLPT), Level 1 - the highest level - this December. And I'm going to pass it with a good margin - defined as a score of above 80% (70% is needed to pass).

I've now done a mock test, using the Unicom book that contains two mock tests, to get a grasp of where I'm at and what I need to focus on. I can highly recommend that book, by the way. I used it for level 2 as well. Besides the tests, it assists in analyzing your weaknesses and tips on what you need to study.

Anyway, here's a breakdown of my scores:
  • Kanji: 82%
  • Vocabulary: 64%
  • Listening: 72%
  • Reading: 68%
  • Grammar: 78%
Interestingly, that means the average score for each of the three sections (kanji/vocabulary, listening, and reading/grammar) is 72% - quite a coincidence.

So what to make of this? First: it's a pass, with a 288 p/72% score. That also means I'm on track for my goal to pass with more than 80%. When I do the mock test at home I'm more strict than at the real thing in that I don't choose randomly when I don't have a clue, and I try to finish it as fast as possible - I don't stop to think and I don't use spare time for reviewing.

I do that because I want data on how much time I actually need so that I can plan how much time to spend on the different parts during the real test - potential points vs time. For the reading section I had more than 15 min to spare, so I think this affects the end result by a few percentage points. Also, when I did the same thing for 2-kyu two years ago at this time, my score was barely above 60%, but on the real thing I scored 81%, so I think my score on the mock test is lower because I don't concentrate as much as on the real test as well.

Second, the surprises: grammar score is high, reading is a bit low, and listening is lower than expected. I haven't studied grammar really, but my studies consist mostly of reading, so I would have expected reading to be higher and grammar much lower. I felt very uncertain when answering many of the grammar problems even though I passed them. The only reason I can think of is that my book reading and radio listening have made me grasp grammar intuitively, much like a native speaker would.

The low reading score might be caused by me doing that section after coming home from work. I felt very tired by the end... And as I mentioned above I didn't do any reviewing using spare time. After all, I read normal (actually, some of them are probably more academic than most people prefer to read) books written for native readers pretty much every day, and I don't feel I'm missing out on the content of those books, so I don't think my reading skill is bad. And time is definitely not a problem - my Japanese reading speed is good.

Low score on listening, despite listening to the radio for a few hours every day, I think was mostly caused by me not being up to date on the vocabulary used. Describing how people look and asking strangers for directions might be very common textbook examples, but it's not something you do very often in real life... I am going to go through the Unicom listening comprehension book for 1-kyu as well, which contains the equivalent of about 4 tests' worth of exercises, and that should be enough to easily get me above 80%.

Third, as expected: kanji is my strongest point and vocabulary is my weakest. Kanji are natural for me now, although recently I've been working on improving my kanji skills even more (I'll write about my study methods some other day). But acquiring vocabulary is tough! I don't really like repeating words or sentences or anything like that - I'm lazy - but I just hope to pick things up after seeing them enough times in books and news articles, and from hearing. The vocabulary used in JLPT is somewhat specific and specialized, albeit limited, and I have not been reading material specifically targeted at the test. Here as well, I am going to rely on the Unicom, namely reading comprehension book. But I'll probably hold off on that until right before the test and keep reading normal literature that I enjoy reading for now.

Lastly for this post, I'd like to mention one more ingenious scheme I've come up with to extract more data from doing mock tests: marking certainty of the answers. I mark them essentially in 4 degrees, although I only make physical marks for 2: feel quite certain (no mark), feel a bit hesitant (one dot), feel like I'm mostly guessing (two dots), and don't have a clue (no answer). Afterwards, I compile the percentage of correct answers for each certainty level (last level is obviously 0%). A stimulating paper exercise if there ever was one! But this time it also told me one thing: if I feel certain or hesitant doesn't impact the score. But for the two-dot level the probability of a correct answer is halved. In other words I can go ahead and use my intuition even if I feel a bit hesitant, which saves time, and focus my reviewing (using time left after answering all questions) on a few questions that I felt very uncertain about.

Anyhow I'm interested in hearing about other's progress on the JLPT and if you're blogging about it, please post a link in a comment. Please also post comments on your own findings regarding the test. I'm quite exited about the test itself, besides becoming fluent in Japanese!

In the near future I also intent to write something about what I've learned about learning - because I feel I'm really getting into that now, and I'm already looking forward to the next language learning adventure - and also about my own study methods targeting JLPT1, and something about learning Japanese vs passing the JLPT.

Don't forget to apply!

Labels: , , , ,

Saturday, July 5, 2008

What Language Is This? Dot Com!

Since the language analyzer is becoming one of the most used web services that I run, the other day I was thinking that it would be cool get it its own domain (and a .dom domain costs just 50 SEK (around 850 yen in normal times) anyway). So I was thinking about what domain name to get - that isn't already taken - and well, one of the most common search phrases people use to find the language analyzer is "what language is this webpage/blog/text/whatever" and luckily whatlanguageisthis.com was available, so there it is! I think it's quite easy to remember and very easy to tell people. 4 stars out of 5, perhaps? Pretty good.


Setting up the new site was pretty easy; it's essentially just a php script that chdirs into the language analyzer directory and continues from there as before.

I also did another nice update: the data file that the app uses to identify the language is now downloaded after the page and all the application javascript files have loaded. That means the page should load much faster, and the user can start reading the instructions or entering text while the data is being downloaded in the background. If the user clicks "Go" before the data file is downloaded, it will stop and wait, while displaying a typical web 2.0-ish loading indicator.

I'm planning to add support for more languages soon, and improve identification of similar-looking languages even further. Anyway, here's the url for the new site again:
http://whatlanguageisthis.com/

Labels: , ,

Tuesday, July 1, 2008

Revised JLPT Announced - New Test Same As Old!

I was reading the JLPT home page (or whatever you'd want to call it) yesterday to see when applications for this year's test will be available (July 15) and if there's some place close where they are sold (there is: the Yurindo bookstore in Atre Ebisu where I always buy books).

Anyway, I also found this shocking announcement of changes to the JLPT test! Shocking not in itself nor in its scope, but because they finally got around to doing it. For my fellow students who have not yet reached level 2, here are changes in a nutshell:
  • Starting July 2009, exams for level 1 and 2 will be held twice a year.

  • Starting 2010, the test itself will change. There will be 5 levels, N1-N5.

The new levels will be laid out like this:
  • N1: Like the current level 1, but with a somewhat higher scope.

  • N2: Like the current level 2.

  • N3: Between the current level 2 and level 3.

  • N4: Like the current level 3.

  • N5: Like the current level 4.

In other words, there'll be a new level between levels 2 and 3, and level 1 will be adjusted to be a little bit harder. There will be no changes in the composition or methodology of the test. So still the same parts, same scoring, still only multiple answer questions, no writing, no speaking, etc. The "N" stands for "Nihongo" and "New"... bit lame if you ask me. ;)



So what to make of this? Having the test twice a year is definitely a good thing. I would have taken it now in July if it was available. As we all know, passing a test doesn't just mean studying the target of the test, one must study the test itself too (unless it's really below one's level). I wonder why they only included levels 1 and 2 for the July exams though. Going from level 4 (N5) to level 3 (N4) in half a year should be possible...

The new N3 level: a good motivator perhaps for people struggling between old level 3 and old level 2? That was probably the largest gap in the levels, since it meant going from essentially only trivial kanji to actually being able to read some real material. But since everything below level 2 is hobby level without practical significance, I can't help thinking that part of the reason is to make more money from applications... as mentioned in the report, there are now over 3 million students of Japanese world wide, and with each application costing 5,500 yen, that's serious money.

Changing level 1: it would have been nice to get a little more concrete information regarding that change. They essentially say "it's gonna be that same... but a teeny weeny bit harder", which isn't very informative. I would have liked to see one more new level above level 1. As I'm approaching level 1, I still feel there's more to go for Japanese fluency. A new top level would not only certify that, but also serve as a motivation. Well, at least there's the Business Japanese test and kanji kentei...

Anyhow, I'm still on track to pass the good old level 1 in December. I'll probably take it the following years too to make sure I'm still progressing. Might as well take the new N1 level in 2010... Keep studying, everyone!

Labels: ,

Friday, June 27, 2008

家に帰らない男たち - Guys Who Don't Go Home

Owing perhaps to what seems like a strong strain of introspectiveness, there are a lot of books in Japanese about what it means to be Japanese. They don't get translated and seldom get any attention outside Japan though. Since I'm interested in both Japanese society and the language this suits me well. Anyway, I thought I'd do my part and write something about one of these books.


It's called 家に帰らない男たち (Guys Who Don't Go Home, roughly) by 松井 計 (Kei Matsui). The book is about men who don't return home after work, many of them having a family that they only see on weekends.

The book has six chapters, each chapter focusing on one particular man and his situation. Following is an outline of the chapters:
  1. A Guy Who Don't Go Home? A Guy Who Can't Go Home?
    44 y.o. advertising agency worker
    Started not going home just after getting married and changed jobs, because he had to work late and the commute too long. Got divorced but still maintains the mostly unused house in the suburbs, two-and-a-half hours from his workplace. Sleeps in capsule hotels and likes to go out drinking on weekdays after work. Sees his kids on the weekends but always brings them to his parents home instead of the house they grew up in. The reason why he retains the house is something of a mystery.

  2. A Dreamless Person Chasing Dreams
    22 y.o. guy who does day jobs for dispatch companies
    Came to Tokyo to get "big", but can't really define what that means. Won't return home until he's "made it" in Tokyo. Says it's important to be independent and take care of himself but still lets his parents pay the mobile phone bill. Sleeps at net/manga cafes. Seems generally quite stupid to me but the author stresses that he is at least polite.

  3. Going Home Is Scary
    43 y.o. salaryman
    Came from the countryside and made it as a sales guy in Tokyo. Has a home in the suburbs and a family. Gets on the train home every day, but when nearing his station, feels scared and gets on to the backwards-bound train into the city again. Says he doesn't want to ruin the perfect balance of his home, which he thinks is what would happen if he was there on weekdays, but enjoys spending perfect weekends with his wife and kids. Sleeps at capsule hotels or saunas or, to save money, at the office.

  4. Weekend Marriage
    38 y.o. high-earning IT industry salaryman
    Spends only the weekends in the house with wife and daughter. Used to rent an apartment between the office and the house, but left it after realizing it was more fun to spend the night at saunas where he could chat with others. The weekend marriage is by mutual consent with the wife, whom the author also met and interviewed. Both enjoy this lifestyle, but are prepared to change it once the kid grows up and maybe starts thinking it's odd.

  5. Has Everything, No Problems
    50 y.o. salaryman-turned-self-employed
    Formerly a salaryman who was stationed all around the country by his company, and even in the Middle East for a few years, but grew tired of that and started his own company with a friend. Lives quite close to the office, but still started to think it's unnecessary to go home in the evening. Enjoys the communal aspect of staying at saunas. Kid has moved out. Returns home occasionally. Wife doesn't seem bothered.

  6. A Double Life
    46 y.o. designer
    Grew up in the sticks where everyone was expected to become a factory worker/engineer, but went to Tokyo to go into design instead. Has wife and kids, but shares an apartment with his 21 y.o. hostess girlfriend during the weeks. Wife thinks he is working hard, or at least that's what he thinks. Loves his family and realizes this can't go on forever. The girlfriend is also interviewed and she seems to enjoy the situation. The girlfriend is otherwise the female equivalent of the guy from chapter 2.
Matsui frequently makes a point of having interviewed many people as material for this book. I think the men that this book centers around are all quite stereotypical and easily imaginable - but all with some disturbed psychological twist in their heads. I'm not sure if that's because he incorporates material from other interviewees into these men, thus making them somewhat generic, or because he hasn't actually interviewed many people at all, but just invented most of it. In any case, it's an interesting read, not an academic paper.

From my own experience, I have heard Japanese coworkers say things like "the office/train is where I can relax", claiming their houses (with wive, kids, and parents) are stressful. It's not uncommon for Japanese office workers to spend all night at the office - it seems to give them credibility and respect among their peers too (despite being completely unproductive the following day). This book sheds some light on why. Saunas' communal aspect, with people napping in reclining chairs in a common area, is one thing.

The language is quite simple: Not much specialized vocabulary outside of society-related concepts such as 脱サラ (quit working as a salaryman) and プータロー (loser). Grammar is about between JLPT level 2 and level 1. The author uses quite a lot of non-general use kanji, though, as well as kanji for words usually written in hiragana, and there is almost no furigana. Not because the vocabulary requires it, but because he just likes to, I suppose. That's good for learning a little extra that probably won't show up on a JLPT exam.

Anyway, this is the first of Matsui's books that I read but it is unlikely to be the last. If you don't know who he is, he's famous for having been homeless, but he then wrote a book about being homeless and now he's a successful author, writing mostly about typical Japanese social phenomena.

Labels: , ,

Saturday, May 10, 2008

Updated the Language Identifier with ranking of most popular languages right now

Over time I've been making some smaller changes to the language analyzer (my language identification web app), like manually tuning it to better distinguish between hard-to-distinguish languages, like the Scandinavian languages, Serbian-Bosnian-Croatian-Slovenian, Afrikaans and Dutch, and Czech and Slovak.

But I've been wondering what languages people use it for, so yesterday evening, while drinking shochu (in spite of which I could only find one bug today! but I did write a processing and database-intensive function, n00b style, which I replaced with a single SQL query today...), I added logging of the results. Only when the language identification certainty is reasonably high is it logged, and only the result; the actual text inputted is not sent. This, of course, happens in the background. A language is only logged once per client, and results from clicking the "example" button (Tower of Babel extracts - I like that story) are not logged.

This morning I added the top ranking to the page. It's generated on the server side in order for the search engines to see it. The top 5 languages for the past seven days are printed. At this time, i.e. about 15 hours after the result logging started, these are Spanish, Korean, Portuguese, and Thai.

You can see the currently most inputted languages live: http://henrikfalck.com/languageanalyzer/

Labels: , ,

Saturday, April 26, 2008

We'll Always Have C

The other day there was an interview in Dr. Dobb's Journal with the managing director of TIOBE Software, who publishes the TIOBE Programming Community Index, a ranking of programming language popularity. It was also discussed on Slashdot.

The methodology used by TIOBE to calculate a language's popularity is basically the good old google hits ad-hoc voodoo index, using "[language] programming" as the query. This measures the "web presence" of a programming language.

First of all, it's obvious to you and me that this measures something, that something being the amount of web pages including the term "[language] programming", obviously. There's nothing wrong with this method, as long as one is aware of what they're measuring. But is it fair to call this the popularity of a language?


Look at this blog, for example. I mostly mention JavaScript and PHP here, just like everyone else. Throw in some Ruby and Python too to max out the buzz factor. There is no mention of relics such as C in this blog. But you know what language I use ten times more than any other? C. I'd love to have a job hacking away in JavaScript, Ruby, and Python all days, but I'd have to settle with half the salary. So here it goes: C programming. Index that. Embedded, heavily multi-threaded, efficient, minimum memory, hardcore badass C programming, that's what I do, and I love doing it.

Most coders can't do C. That's why you see all these Visual This and Dot That and scripting languages on the ranking, because these kids blog about every little insignificant hobby project they manage to cut and paste together, just like I do. But let there be no mistake about it: real programmers can code in C. They do syntactically correct typedefs of function pointers in their sleep. (just kidding that's impossible.)


At work I also hack in Python, Perl, and Makefile. At home it's mostly JavaScript, PHP, Ruby, Python... Lately Python has replaced Ruby as my language of choice for home hacking because of its decent unicode support. (Although I've had to hack the Python standard library in some places where it didn't properly support unicode. I read the next version of Python (2.6?) will use unicode strings by default, which is great, and only ten years late.) I also sold my soul the other day and installed Visual C# 2008 Express Edition for some hobby hacking. Turned out not very fun though, but I haven't given up yet.

At my previous job I used C++ for doing essentially the same thing as I do in C now. I'm completely convinced that C is the right tool for the job. I'm also convinced C does object orientation better than C++, but that is a topic for another post. And I used to be a Java fan, but now I'm considering Java the best examples of software suckiness ever. It's a volatile industry, technologies come and go, but no amount of blogging will convince me that the C programming language is anything but #1.

I'm saying it because it's true: We'll always have C. Because we've got jobs to do.

Labels: , , , , , , ,