Saturday, July 5, 2008

What Language Is This? Dot Com!

Since the language analyzer is becoming one of the most used web services that I run, the other day I was thinking that it would be cool get it its own domain (and a .dom domain costs just 50 SEK (around 850 yen in normal times) anyway). So I was thinking about what domain name to get - that isn't already taken - and well, one of the most common search phrases people use to find the language analyzer is "what language is this webpage/blog/text/whatever" and luckily whatlanguageisthis.com was available, so there it is! I think it's quite easy to remember and very easy to tell people. 4 stars out of 5, perhaps? Pretty good.


Setting up the new site was pretty easy; it's essentially just a php script that chdirs into the language analyzer directory and continues from there as before.

I also did another nice update: the data file that the app uses to identify the language is now downloaded after the page and all the application javascript files have loaded. That means the page should load much faster, and the user can start reading the instructions or entering text while the data is being downloaded in the background. If the user clicks "Go" before the data file is downloaded, it will stop and wait, while displaying a typical web 2.0-ish loading indicator.

I'm planning to add support for more languages soon, and improve identification of similar-looking languages even further. Anyway, here's the url for the new site again:
http://whatlanguageisthis.com/

Labels: , ,

Tuesday, July 1, 2008

Revised JLPT Announced - New Test Same As Old!

I was reading the JLPT home page (or whatever you'd want to call it) yesterday to see when applications for this year's test will be available (July 15) and if there's some place close where they are sold (there is: the Yurindo bookstore in Atre Ebisu where I always buy books).

Anyway, I also found this shocking announcement of changes to the JLPT test! Shocking not in itself nor in its scope, but because they finally got around to doing it. For my fellow students who have not yet reached level 2, here are changes in a nutshell:
  • Starting July 2009, exams for level 1 and 2 will be held twice a year.

  • Starting 2010, the test itself will change. There will be 5 levels, N1-N5.

The new levels will be laid out like this:
  • N1: Like the current level 1, but with a somewhat higher scope.

  • N2: Like the current level 2.

  • N3: Between the current level 2 and level 3.

  • N4: Like the current level 3.

  • N5: Like the current level 4.

In other words, there'll be a new level between levels 2 and 3, and level 1 will be adjusted to be a little bit harder. There will be no changes in the composition or methodology of the test. So still the same parts, same scoring, still only multiple answer questions, no writing, no speaking, etc. The "N" stands for "Nihongo" and "New"... bit lame if you ask me. ;)



So what to make of this? Having the test twice a year is definitely a good thing. I would have taken it now in July if it was available. As we all know, passing a test doesn't just mean studying the target of the test, one must study the test itself too (unless it's really below one's level). I wonder why they only included levels 1 and 2 for the July exams though. Going from level 4 (N5) to level 3 (N4) in half a year should be possible...

The new N3 level: a good motivator perhaps for people struggling between old level 3 and old level 2? That was probably the largest gap in the levels, since it meant going from essentially only trivial kanji to actually being able to read some real material. But since everything below level 2 is hobby level without practical significance, I can't help thinking that part of the reason is to make more money from applications... as mentioned in the report, there are now over 3 million students of Japanese world wide, and with each application costing 5,500 yen, that's serious money.

Changing level 1: it would have been nice to get a little more concrete information regarding that change. They essentially say "it's gonna be that same... but a teeny weeny bit harder", which isn't very informative. I would have liked to see one more new level above level 1. As I'm approaching level 1, I still feel there's more to go for Japanese fluency. A new top level would not only certify that, but also serve as a motivation. Well, at least there's the Business Japanese test and kanji kentei...

Anyhow, I'm still on track to pass the good old level 1 in December. I'll probably take it the following years too to make sure I'm still progressing. Might as well take the new N1 level in 2010... Keep studying, everyone!

Labels: ,

Friday, June 27, 2008

家に帰らない男たち - Guys Who Don't Go Home

Owing perhaps to what seems like a strong strain of introspectiveness, there are a lot of books in Japanese about what it means to be Japanese. They don't get translated and seldom get any attention outside Japan though. Since I'm interested in both Japanese society and the language this suits me well. Anyway, I thought I'd do my part and write something about one of these books.


It's called 家に帰らない男たち (Guys Who Don't Go Home, roughly) by 松井 計 (Kei Matsui). The book is about men who don't return home after work, many of them having a family that they only see on weekends.

The book has six chapters, each chapter focusing on one particular man and his situation. Following is an outline of the chapters:
  1. A Guy Who Don't Go Home? A Guy Who Can't Go Home?
    44 y.o. advertising agency worker
    Started not going home just after getting married and changed jobs, because he had to work late and the commute too long. Got divorced but still maintains the mostly unused house in the suburbs, two-and-a-half hours from his workplace. Sleeps in capsule hotels and likes to go out drinking on weekdays after work. Sees his kids on the weekends but always brings them to his parents home instead of the house they grew up in. The reason why he retains the house is something of a mystery.

  2. A Dreamless Person Chasing Dreams
    22 y.o. guy who does day jobs for dispatch companies
    Came to Tokyo to get "big", but can't really define what that means. Won't return home until he's "made it" in Tokyo. Says it's important to be independent and take care of himself but still lets his parents pay the mobile phone bill. Sleeps at net/manga cafes. Seems generally quite stupid to me but the author stresses that he is at least polite.

  3. Going Home Is Scary
    43 y.o. salaryman
    Came from the countryside and made it as a sales guy in Tokyo. Has a home in the suburbs and a family. Gets on the train home every day, but when nearing his station, feels scared and gets on to the backwards-bound train into the city again. Says he doesn't want to ruin the perfect balance of his home, which he thinks is what would happen if he was there on weekdays, but enjoys spending perfect weekends with his wife and kids. Sleeps at capsule hotels or saunas or, to save money, at the office.

  4. Weekend Marriage
    38 y.o. high-earning IT industry salaryman
    Spends only the weekends in the house with wife and daughter. Used to rent an apartment between the office and the house, but left it after realizing it was more fun to spend the night at saunas where he could chat with others. The weekend marriage is by mutual consent with the wife, whom the author also met and interviewed. Both enjoy this lifestyle, but are prepared to change it once the kid grows up and maybe starts thinking it's odd.

  5. Has Everything, No Problems
    50 y.o. salaryman-turned-self-employed
    Formerly a salaryman who was stationed all around the country by his company, and even in the Middle East for a few years, but grew tired of that and started his own company with a friend. Lives quite close to the office, but still started to think it's unnecessary to go home in the evening. Enjoys the communal aspect of staying at saunas. Kid has moved out. Returns home occasionally. Wife doesn't seem bothered.

  6. A Double Life
    46 y.o. designer
    Grew up in the sticks where everyone was expected to become a factory worker/engineer, but went to Tokyo to go into design instead. Has wife and kids, but shares an apartment with his 21 y.o. hostess girlfriend during the weeks. Wife thinks he is working hard, or at least that's what he thinks. Loves his family and realizes this can't go on forever. The girlfriend is also interviewed and she seems to enjoy the situation. The girlfriend is otherwise the female equivalent of the guy from chapter 2.
Matsui frequently makes a point of having interviewed many people as material for this book. I think the men that this book centers around are all quite stereotypical and easily imaginable - but all with some disturbed psychological twist in their heads. I'm not sure if that's because he incorporates material from other interviewees into these men, thus making them somewhat generic, or because he hasn't actually interviewed many people at all, but just invented most of it. In any case, it's an interesting read, not an academic paper.

From my own experience, I have heard Japanese coworkers say things like "the office/train is where I can relax", claiming their houses (with wive, kids, and parents) are stressful. It's not uncommon for Japanese office workers to spend all night at the office - it seems to give them credibility and respect among their peers too (despite being completely unproductive the following day). This book sheds some light on why. Saunas' communal aspect, with people napping in reclining chairs in a common area, is one thing.

The language is quite simple: Not much specialized vocabulary outside of society-related concepts such as 脱サラ (quit working as a salaryman) and プータロー (loser). Grammar is about between JLPT level 2 and level 1. The author uses quite a lot of non-general use kanji, though, as well as kanji for words usually written in hiragana, and there is almost no furigana. Not because the vocabulary requires it, but because he just likes to, I suppose. That's good for learning a little extra that probably won't show up on a JLPT exam.

Anyway, this is the first of Matsui's books that I read but it is unlikely to be the last. If you don't know who he is, he's famous for having been homeless, but he then wrote a book about being homeless and now he's a successful author, writing mostly about typical Japanese social phenomena.

Labels: , ,

Saturday, May 10, 2008

Updated the Language Identifier with ranking of most popular languages right now

Over time I've been making some smaller changes to the language analyzer (my language identification web app), like manually tuning it to better distinguish between hard-to-distinguish languages, like the Scandinavian languages, Serbian-Bosnian-Croatian-Slovenian, Afrikaans and Dutch, and Czech and Slovak.

But I've been wondering what languages people use it for, so yesterday evening, while drinking shochu (in spite of which I could only find one bug today! but I did write a processing and database-intensive function, n00b style, which I replaced with a single SQL query today...), I added logging of the results. Only when the language identification certainty is reasonably high is it logged, and only the result; the actual text inputted is not sent. This, of course, happens in the background. A language is only logged once per client, and results from clicking the "example" button (Tower of Babel extracts - I like that story) are not logged.

This morning I added the top ranking to the page. It's generated on the server side in order for the search engines to see it. The top 5 languages for the past seven days are printed. At this time, i.e. about 15 hours after the result logging started, these are Spanish, Korean, Portuguese, and Thai.

You can see the currently most inputted languages live: http://henrikfalck.com/languageanalyzer/

Labels: , ,

Saturday, April 26, 2008

We'll Always Have C

The other day there was an interview in Dr. Dobb's Journal with the managing director of TIOBE Software, who publishes the TIOBE Programming Community Index, a ranking of programming language popularity. It was also discussed on Slashdot.

The methodology used by TIOBE to calculate a language's popularity is basically the good old google hits ad-hoc voodoo index, using "[language] programming" as the query. This measures the "web presence" of a programming language.

First of all, it's obvious to you and me that this measures something, that something being the amount of web pages including the term "[language] programming", obviously. There's nothing wrong with this method, as long as one is aware of what they're measuring. But is it fair to call this the popularity of a language?


Look at this blog, for example. I mostly mention JavaScript and PHP here, just like everyone else. Throw in some Ruby and Python too to max out the buzz factor. There is no mention of relics such as C in this blog. But you know what language I use ten times more than any other? C. I'd love to have a job hacking away in JavaScript, Ruby, and Python all days, but I'd have to settle with half the salary. So here it goes: C programming. Index that. Embedded, heavily multi-threaded, efficient, minimum memory, hardcore badass C programming, that's what I do, and I love doing it.

Most coders can't do C. That's why you see all these Visual This and Dot That and scripting languages on the ranking, because these kids blog about every little insignificant hobby project they manage to cut and paste together, just like I do. But let there be no mistake about it: real programmers can code in C. They do syntactically correct typedefs of function pointers in their sleep. (just kidding that's impossible.)


At work I also hack in Python, Perl, and Makefile. At home it's mostly JavaScript, PHP, Ruby, Python... Lately Python has replaced Ruby as my language of choice for home hacking because of its decent unicode support. (Although I've had to hack the Python standard library in some places where it didn't properly support unicode. I read the next version of Python (2.6?) will use unicode strings by default, which is great, and only ten years late.) I also sold my soul the other day and installed Visual C# 2008 Express Edition for some hobby hacking. Turned out not very fun though, but I haven't given up yet.

At my previous job I used C++ for doing essentially the same thing as I do in C now. I'm completely convinced that C is the right tool for the job. I'm also convinced C does object orientation better than C++, but that is a topic for another post. And I used to be a Java fan, but now I'm considering Java the best examples of software suckiness ever. It's a volatile industry, technologies come and go, but no amount of blogging will convince me that the C programming language is anything but #1.

I'm saying it because it's true: We'll always have C. Because we've got jobs to do.

Labels: , , , , , ,

Wednesday, March 12, 2008

Socks: An Engineering Approach

Socks have an intrinsic tendency to pair themselves with dissimilar socks. I believe overcoming this defect is is one of the great engineering challenges for this century. While walking around in mismatched socks might be perfectly tolerable for some, I believe that even engineers have to uphold some level of civilized behavior, and I have never been fond of pairing socks manually.

Furthermore, my sock stock is growing obsolete. I am sure there are items at least 5 years old in there. I soon realized solving this problem required a radical, new approach. Some might say - a paradigm shift.

So I threw away all my old socks and bought new ones: 15 pairs of the same model of sock. The implication is huge: my socks will now invariably pair themselves up neatly, by the simple process of randomly choosing two socks from a pile. (And it's a nice model of sock - black and nifty enough to be worn with a suit.)




But I also need to establish a maintenance process. This model will go out of sale. Strange as it may seem, sock vendors do not provide End Of Service agreements even for small-enterprise customers such as myself, so you never know when this will be. It's safest to assume that once the items have been acquired, replacements are no longer available on the market.

So what I'm thinking is that when I'm - due to wear-and-tear, lossage, etc - down to, say, 12 pairs, I'll buy 15 new pairs of some other model. This of course will mean that the automatic pairing property breaks, but it will still be extremely simple to manually match the socks into two piles, for both of which the property holds.

Once one stock falls to, say, 9 pairs, then that model will be brought out of circulation - they will probably be approaching their natural end-of-life date by then anyway and already written off. So then I'll purchase a new set of 15 pairs. Again, a new model can be chosen to offset changes in taste over the life cycle period.




This means that there will never be more than two models in circulation simultaneously, thus the amount of manual sorting required will be kept at a minimum. And, once in the loop, there will always be at least 24 and at most 27 pairs in circulation, using the numbers I've conjectured above. These numbers will be tweaked based on gathered real-world usage data.

Of course, the exact number of socks in circulation at a given time will have to be estimated from observed data, which will be imperfect because of socks' other intrinsic property of showing up at indeterministic locations. If, due to inconsistency, an older generation model of sock - one already taken out of service - is suddenly found, it will have to be immediately disposed of.

This should solve all my sock problems.

Labels:

Sunday, February 24, 2008

Japanese studies - JLPT - passing the Japanese Language Proficiency Test, Level 1

After having spent last year mostly away from language studies, doing web technology stuff and other programming projects, this year I find myself spending much of my spare time on improving my Japanese. My goal is to pass the Japanese Language Proficiency Test (JLPT) level 1 - the highest level - this year. And not only pass it, but pass it with a good margin, or I'm not satisfied.

Two years ago, in 2006, I decided early during the year to take JLPT level 2. I didn't think I'd pass and neither did my Japanese teacher, but study I did and pass it I did with a score of 81% (60% is necessary to pass). This year I am aiming for over 80% again, preferably closer to 90% (for level 1, 70% is necessary to pass).

But this time I'm using different methods than I did in 2006 to pass JLPT level 2. Back then, I spent time studying kanji, memorizing grammatical patterns, and doing reading exercises from a course book featuring the same kind of texts and questions that appear on the actual test, and also a similar course book for listening. I used the UNICOM books targeting JLPT2, and found the reading and listening books very good, albeit short. I also bought the grammar and vocabulary books, but they were not good. For grammar and vocabulary, I found two books called 日本語総まとめ問題集 grammar (文法編) and vocabulary (語彙編) that were very good. Pictures and fun all over.

For reference, my strong point then was writing/vocabulary, and the weak point was listening. People say if you live in Japan, listening is easy because you hear Japanese all day, but it wasn't for me. After the test I bought a TV, mostly to improve my listening.


This year I've also got the Unicom books, and the Kanzen master grammar and kanji/vocabulary books. As before, I think the Unicom reading book is great, but still short. I haven't tried the listening book yet. As I wrote I was using different methods. Except for the reading comprehension, but that doesn't take you very far since the book is so short. The theme for learning Japanese this year is having fun doing it.

I'm not studying kanji this year. One reason is that kanji is no longer a problem (relatively, of course). The other is that I think I will pick up enough kanji from increased reading. Also if you get dwelling on all the peculiarities of kanji, you risk spending too much time on that. At least I do, since I find the peculiarities interesting.

Grammar: I'm no longer memorizing patterns and functions, I'm copying all the example sentences from the Kanzen master book to flash cards and drilling them. Writing the flash cards is tedious, but drilling them is not (particularly). I'm writing on average about 4 example sentences for around 200 grammatical patterns. I plan to finish next month... I go through some of these flash cards on average a few times every day.

My thinking is that instead of, like I did on the JLPT 2 test, analyzing the grammatical structure of the sentence and remembering how the four alternative answers fit into that structure, this year my brain will do all pattern matching work for me. Like "this reminds me of that sentence, so that answer it is". On top of that, it's great for learnign vocabulary and expressions as well!

But that's all old school - the core of this poodle consists of something entirely different! The first one is reading books. Real books, in Japanese. When you get to JLPT1 level that is very much possible. I was planning to start reading books this summer, hoping to have picked up enough grammar and vocabulary by then. But then my workmate told me he's been reading the Harry Potter series in Japanese and recommended them for simple reading. So I borrowed the first book from him and started reading it - and now I'm hooked. Not hooked on Harry Potter, but on reading books in Japanese.

Harry Potter is really good, since it includes furigana for pretty much all kanji. One could argue this is not good for learning kanji, but I think it is. I don't want to learn incorrect readings - I might think I know the reading when in fact I have just made it up myself, and anyway as I mentioned before I'm not focusing on kanji - I think that will come by itself. Harry Potter is also good because it's a Western book. That makes it easier to read when even when you don't have 100% comprehension - at least you don't have to struggle with cultural understanding. The story isn't very complicated either.


So that's one thing: reading books in Japanese. Grammar, vocabulary, expressions, and reading speed all at once, and it's fun. The other revolutionary idea came from the same coworker. He had an old, analog radio on his desk at work for a while. I work in a high tech software company targeting the next, successor of the next, successor of the successor of the next, and successor of the successor of the successor of the next series Japanese mobile phones. Having an analog radio on your desk is weird. Initially I just thought it eccentric. But then it hit me: how much time I've spent looking for good Japanese podcasts, online radio, and just about any piece of spoken Japanese on the web. A cheap-ass analog radio is actually all that you need! Free (if you avoid paying the NHK fee), simple access to spoken Japanese blurted out like there's no tomorrow, any time of the day, on any subject you can think of.

So I got myself a small portable radio for 2,000 yen at the local electronics store in the alley. It's great! I can get on average around 2 hours of listening every work day. At work! It makes both learning Japanese and working fun. I think the radio is what will make the difference between a good score and a great score on the JLPT in December. For anyone in Japan who's above JLPT2 level I'd really recommend it. This year the listening section will be a breeze.

If only one could get some licensing agreement set up to broadcast all Japanese radio on the web for all the people struggling to pass the Japanese Language Proficiency Test who are not in Japan, that would be great. But probably unfeasible.

Labels: , , ,