The Good, The Bad, And Wikipedia

Wikipedia’s role on the web has been worrying me lately. Not that Wikipedia in itself is in any way particularly bad, but its influence on the web as a whole has some problems.

In the olden days, finding information on the web was hard, and good link directories were worth their bytes in gold, and good content was worth even more. Nowadays we have good search engines, thanks to Google, and lots of good content (and lots of bad content too, but that’s not really a problem).


Wikipedia is one source of good content on the web. But it’s not the only one – it’s just the most obvious one in many cases. My problem with Wikipedia is mainly how people link to it. Here’s an example:

 × I like to feed the pigeons. I sometimes feed the sparrows too.

This I’d like to claim is by far the most common way of referring to Wikipedia. What’s wrong with it?

  1. If your reader doesn’t know what a pigeon or sparrow is, it’s 99.9% certain that’s because the reader isn’t very good at English. In this case, if anything, every word ought to be linked to a dictionary, but that wouldn’t be very useful except for 0.1% of your readers. They should know how to look up words themselves if they’re not proficient in the language of the text they’re reading, don’t you think?

  2. If your reader suddenly becomes very interested in pigeons or sparrows by reading your text, I’m sure one of the first places your reader would look for informations on said birds is Wikipedia. There is no need to point people to Wikipedia, because everyone knows how to find Wikipedia articles anyway. In Firefox, just type “wikipedia pigeon” in the address bar, and you’ll get redirected to the page. In other browsers, go to google.com and type it in and press “I’m feeling lucky”. It’s real easy.
  3. The reader might think that you’re linking to some really interesting tidbit about pigeons or sparrows, or maybe a funny Youtube video. But to find out, the reader has to mouse over the links and check the status bar where they point. Only to find out they just point to the Wikipedia entries, which the reader could easily have found without your assistance. This makes reading cumbersome.
  4. It degrades the quality of interlinking on the web. All you’re doing is helping Wikipedia get a higher PageRank – and their PageRank is already as high as it can get. You’re not helping the guy who has spent serious time documenting pigeons and sparrows and runs a really interesting web site on the subject that your readers may actually enjoy if you had only taken the time to find and link to it – like in the old days.

In the example above, it’d be easy to argue that “pigeons” and “sparrows” don’t need to be linked at all. But this is how thoughtlessly people use hyperlinks. So let’s change the example to something similar yet where hyperlinking a word might be more appropriate:

 × But the pronunciation doesn’t change since the word is a dvandva.

I bet you don’t know what a dvandva is. It doesn’t matter here anyway since I’m just using it as an example but I’m sure you’ve already went and read the Wikipedia article. :-) The Wikipedia entry comes out first on Google for a search on “dvandva”. Why? Probably because of hyperlinks like the above example. But the article sucks. Here’s an example of good use of hyperlinking:

  But the pronunciation doesn’t change since the word is a dvandva.

The target of this link is a paper written at a university regarding dvandvas in Japanese. Of course, if the context isn’t about Japanese then it might not be the best link target, but I’m sure there are more good articles about dvandvas. I thought that paper was da proverbial bomb. Really good reading. That’s why I link to it – to encourage my readers to read it, and to promote it in the search rankings. (Of course this is hypothetical since I’m actually writing about something else right now but if I were writing about dvandvas…)

It takes time to find good link targets – but please take the time! For your own, your readers’, and the authors’ of those link targets sake. And for the future of the Internet.


Note that I am not opposed to linking to Wikipedia completely. If the Wikipedia article on a subject really is the best piece of information on it on the whole web, and the subject demands a hyperlink in order to be understood by most people, then indeed it’s the corresponding Wikipedia article you should link to.

Also of course when discussing Wikipedia itself it is highly appropriate to link to Wikipedia sources. But even in this case, I see it go wrong, for instance like this:

 × Wikipedia recently started adding the “” attribute to outgoing links.

That “nofollow” link to the Wikipedia article on the “nofollow” attribute violates the point outlined above in the same way “pigeons” did. Here’s a better way of linking it:

  Wikipedia recently started adding the “nofollow” attribute to outgoing links.

which links to Wikipedia’s meta wiki describing the policy. That’s a good way of linking to Wikipedia. Here’s an even better way of doing it:

  Wikipedia recently started adding the “nofollow” attribute to outgoing links.

That links to the most interesting text on the subject that I could find in a couple of minutes. I’ll gladly share that good piece of writing on this subject with you – that’s why I link to it.


Lastly, I’d like to mention that I think said Wikipedia policy of adding the “nofollow” attribute to outgoing links on Wikipedia is stupid and bad. I think that if you read my above argumentation, you’ll see why I think that. People are linking to Wikipedia en masse for no good reason, bloating its PageRank and diminishing the chance of other, better, sources of information to get found. If at least being cited in a Wikipedia entry boosted the PageRank of the source, then the chance of someone finding it would improve just a little. Not to mention it would be fair. The paper on dvandva above and Ed Felten’s blog entry deserve that.

[No interwebs were hurt in the writing of this blog post; all links to Wikipedia have the "nofollow" attribute set.]


Maybe size doesn’t matter, but dimension does

When I was studying at the university, every year before the start of the academic year a soapbox car race took place in the slope leading up to the main campus. This was arranged by the computer science students, so one of the rules was the, in my opinion quite funny, nerd joke that went something like “there are limits on the dimensions of the car – they are not allowed to exceed three”.

Now, the other day I came upon the Wikipedia entry on Knock Nevis, the largest ship ever built – with “large” defined as “long”. That page has a thought-provoking graphic comparing the length of this ship with some of the tallest building in the world. Here’s my spiffed up version of it:


So if you were to stand on top of the bow of the Knock Nevis standing on its stern, you’d essentially be at the same height as the observation deck of the Shanghai World Financial Center, inside the thing that looks like the head of a bottle opener to me.


But of course, ships aren’t built to be standing on their sterns. That’s what got me thinking… If someone had asked me which was longer; the length of the longest ship ever built or the height of the highest building ever built?, then if I had to answer impromptu, I would probably have said the ship. Why? Because building horizontally seems so much easier to me than building vertically. When building vertically, you have to fight gravity all the time, haul things up and down, and the whole thing has to be able to stand on its own.

When you give it a moment of thought though, it’s obvious a ship has to be able to maneuver, and not break during harsh seas, so ships of the length that the Knock Nevis is probably just not economically feasible. Also, there’s of course a great difference between building something that can not only move but is also self-propelled, and something that just stands still.

Nevertheless, my conclusion from this drivel is that not only is it a bad idea to compare apples and oranges, such as meters and kilograms, with each other, but it’s also a bad idea to compare meters in one dimension with meters in another dimension. Stashing apples in a row is a lot easier than stashing them on top of each other.


Revisions to the Joyo Kanji List

I’ve ranted about the joyo kanji list before. There’s an ongoing discussion about a proposal for revisions to the list, which has been going on since 2005 and is tentatively scheduled to go live in 2010. I found this recent, very interesting paper about it published by NHK (or something affiliated with NHK, at least) that I would very much recommend anyone who’s interested in the subject to read. I would like to point out some observations about the proposal here.

Characters removed from the joyo list

Only five kanji are proposed for removal: 銑 錘 勺 匁 脹. Notice that 匁 (monme) that I specifically ranted about before is among them. Good! 脹 (as in for instance fukuramu, but I guess we can write that using 膨 anyway) and 錘 (tsumu, although I associate it more with omori, which is usually written 重り anyway) are a little surprising though, I would say.

Characters added to the joyo list

The following characters are highly likely to be added to the list: 藤 誰 俺 岡 頃 奈 阪 韓 弥 那 鹿 斬 虎 狙 脇 熊 尻 旦 闇 籠 呂 亀 頰 膝 鶴 匂 沙 須 椅 股 眉 挨 拶 鎌 凄 謎 稽 曾 喉 拭 貌 塞 蹴 鍵 膳 袖 潰 駒 剝 鍋 湧 葛 梨 貼 拉 枕 顎 苛 蓋 裾 腫 爪 嵐 鬱 妖 藍 捉 宛 崖 叱 瓦 拳 乞 呪 汰 勃 昧 唾 艶 痕 諦 餅 瞳 唄 隙 淫 錦 箸 戚 蒙 妬 蔑 嗅 蜜 戴 瘦 怨 醒 詣 窟 巾 蜂 骸 弄 嫉 罵 璧 阜 埼 伎 曖 餌 爽 詮 芯 綻 肘 麓 憧 頓 牙 咽 嘲 臆 挫 溺 侶 丼 瘍 僅 諜 柵 腎 梗 瑠 羨 酎 畿 畏 瞭 踪 栃 蔽 茨 慄 傲 虹 捻 臼 喩 萎 腺 桁 玩 冶 羞 惧 舷 貪 采 堆 煎 斑 冥 遜 旺 麵 璃 串 塡 箋 脊 緻 辣 摯 汎 憚 哨 氾 諧 媛 彙 恣 聘 沃 憬 捗 訃.

If you’re a gourmand like me you’ll be pleased to find that fond concepts such as 丼 (don, that I specifically asked for), 串 (kushi, skewer), and 酎 (chuu, as in 焼酎 shochu) are among them.

Early in the list we also find some characters used for place names such as 岡 (oka, as in 福岡 Fukuoka), 奈 (na, as in 奈良 Nara), 韓 (kan, as in 韓国 Korea), 阪, 那, 鹿, etc. As you know, place names have as a principle been excluded from the joyo list before, being included instead in the jinmei-yo kanji list, but these have been deemed so frequent and common that they will now be on the joyo list, according to the paper.

By the way, a kanji has to fulfill one of the following in order to be considered for inclusion:

  1. It appears frequently, and also has a strong ability to form words. Examples: 闇, 溺.

  2. In mixed kanji-kana writing, it increases the reading efficiency.
    → Or even if it doesn’t appear frequently, writing it with kanji makes it more easy to understand. Examples: 遜 in 謙遜 (kenson, humility), 堆 in 堆積 (taiseki, pile).
    → Widely used pronouns. Examples: 誰 (dare, who?), 俺 (ore, I/me).
  3. As an exception to the non-inclusion of proper nouns.
    → It’s used in the name of a prefecture or such. Examples: 畿 (kin of the 近畿 Kinki region), 韓 (kan of 韓国 Korea).
  4. It’s often used in social life and seen as necessary.
    → Although its frequency of use in newspapers and magazines is low, it’s a necessary character. Example: 旦 in 元旦 (gantan, New Year’s Day).

On the list we also find such well-known favorites as 誰 (dare, who?), 尻 (shiri, buttocks), 叱 (shika.ru, scold), 桁 (keta, beam or digits), and 嵐 (arashi, storm), that – I don’t know about you, but I at least learned pretty early on in my Japanese studies, so I would say they are kind of basic. 挨拶 (aisatsu, greeting) is also making its joyo debut. Other more contemporary kanji characters includes 癌 (gan, cancer) and 拉 (ra, as in both 拉致 rachi, abduction (as in by North Korea), and the more pleasant connotations of 拉麺 ramen).

Characters considered for inclusion but dropped

Now this list is more surprising, I think. The following characters were being considered for inclusion in the joyo list, but alas they won’t be included: 叩 噓 噂 濡 笠 嬉 朋 覗 撫 庄 溜 鷹 揃 頷 摑 翔 喋 嚙 洩 禄 栗 馴 駕 鴨 淵 駿 賭 蘭 胡 蘇 狼 蝶 搔 惚 蒼 腿 菩 吊 雀 樽 壺 祀 卿 歪 棲 釜 毅 磯 桶 柿 揆 躇 躊 鷲 憐 狽 萌 媚 寵 秤 撥 遡 謳 套 刹 蔓 醬 疼 賤 顚 捏 糊 饉 倦 屛 毀 恍 斡 膠 誼 疇 謗 乖 截 誹 綬.

As you can see, the list includes the very frequently seen 嘘 (uso, lie), 噂 (uwasa, rumor), 喋 (shabe.ru, talk), among others. The paper lists the following as reasons for not including a certain kanji in the list, but I can’t really figure out which one applies to the above…

  1. Although it appears frequently, it has lost its ability to form words. Examples: 濡, 覗.

  2. Although it appears frequently, it is mostly used as a proper noun. Examples: 鷹, 鴨.
  3. Its ability to form words is weak, and instead it can be handled by writing kana or adding furigana. Examples: 醬, 顚.
  4. It has a weak ability to form words, and is restricted to particular fields such as transcriptions or historic words. Examples: 菩, 揆.

I can see why 栗 (kuri, chestnut), 雀 (suzume, sparrow), 柿 (kaki, persimmon) and the like were dropped – even though they’re quite common characters, they refer to very specific and specialized things and aren’t useful for writing anything else (except 麻雀, mahjong), but I would have thought 釜 (kama, kettle) and 淵 (fuchi, abyss) were common enough concepts, and the kanji used in enough compounds as well, to be included.

Also, classics such as (mo.e), 遡 (sakanobo.ru, go back), and the recently popular and esthetically intriguing 乖 of 乖離 (kairi, separation) are apparently not good enough to make it into the list.

-

Anyway, these proposals are tentative, and with the kanji of the year being (chenji, change), who knows how the final list will end up?