Language Hack Learn languages Fri, 12 Aug 2011 20:17:59 +0000 en English Vocabulary Lists Fri, 12 Aug 2011 20:17:59 +0000 admin In the previous post I mentioned a main word list for students wanting to comprehend English, the General Service List.  A sister list to the GSL is the Academic Word List (AWL) which contains a further 570 words which are important to university students and aren’t contained in the GSL.  At that Wiktionary link, the words are broken up into 10 sub-lists in order of importance, with links to a dictionary entry for the word with definition and pronunciation.  AWL was previously called the University Word List (UWL) and contained some 800 words.

Wiktionary also has the set of 850 words in Ogden’s Basic English which was designed to spread English throughout the post-WWII world.  The list is also available here.

Oxford Advanced Learner’s Dictionary also has the AWL online as well as an extended Oxford 3000 list with pronunciation and definition.

This website has some 3000 (actually 2126) common English words grouped by frequency so they can be learned in general order.

Another list of 1000 common words comes from the Brown Corpus from the 1960s and some of the words are dated.

In the past a number of such word frequency lists were published on paper, such as The American Heritage Word Frequency Book.  Those books may still be useful for those who don’t like technology or carry a smartphone, but the online versions are also quite handy to keep bookmarked.

Common Vietnamese Words Lists Fri, 12 Aug 2011 19:38:58 +0000 admin Once you’ve learned how to pronounce Vietnamese and have a handle on the relatively simple-to-learn grammar, learning Vietnamese is mostly a matter of picking up vocabulary.  What you’ll learn in books is the the tip of the iceberg, and may even include a lot of words that aren’t so frequently used.

For English, there is the General Service List (GSL) containing the 2000+ (2284) most important (by some definition of “general service”) words of English.  There are shorter and longer lists such as this Basic Englishword list of 850 words.

No such GSL exists for Vietnamese but there are a few word lists available.

The Corpora of Vietnamese Texts (CVT) is a body of work containing a million or so words and each word is broken down by frequency of use in the Vietnamese Word Frequency List.  The list has some 14000 mostly Vietnamese words in the order of how common they are found with the number of times and percentage listed by the word.  However, the bottom of the list is heavily polluted with non-Vietnamese words that were used just once or twice throughout all the text.

Another set of words is found here.  The file contains four lists, from 11000 words to 74000 words.  The lists contain Vietnamized loan words but written using only the Vietnamese alphabet.  Words are ordered alphabetically and no word frequencies are listed.

Neither list provides an interface for using the words.  They are purely just lists and you’ll have to find ways to use them, whether making flash cards or importing into flash card software.  You’ll also want to translate the words before dumping into software like Anki or Mnemosyne/Mnemogogo.

Chu Nom: Vietnamese Kanji Thu, 04 Aug 2011 20:29:26 +0000 admin Before the present day Vietnamese writing system, Quoc Ngu, based on the Latin alphabet with diacritic marks for vowels and tones, there was a writing system based on Chinese characters called Chu Nom.  Chu Nom, like Japanese kanji, used Chinese characters with localized pronunciation and original meaning.  But this was more or less just a way for people to write Chinese.  Whereas the Japanese added a new set of pronunciations to kanji characters which had similar meanings as Japanese words, in Vietnam they invented new characters, which look like Chinese characters, to represent native Vietnamese words.  Japanese people also invented characters to represent phonemes, similar to an alphabet, called hiragana and katakana.  Today, someone who understands Chu Nom or Kanji can to some extent understand, but not read, Chinese.  The pronunciation preserved in the Japanese and Vietnamese languages is useful for researchers studying Old/Classical Chinese, whereas Mandarin has drifted farther from the original language.


Han Nom example
Sino-Vietnamese reading: Vạn cổ anh linh. Vietnamese meaning: Muôn thủa linh thiêng.

But Chu Nom today is nearly a dead writing system.  Why does this matter? In the words of the Vietnamese Nôm Preservation Foundation:

…from the 10th century and into the 20th—much of Vietnamese literature, philosophy, history, law, medicine, religion, and government policy was written in Nôm script. During the 24 years of the Tây-Sơn emperors (1788-1802), all administrative documents were written in Chữ Nôm. In other words, approximately 1,000 years of Vietnamese cultural history is recorded in this unique system.

This heritage is now nearly lost. With the 17th century advent of quốc ngữ — the modern roman-style script—Nôm literacy gradually died out. The French colonial government decreed against its use. Today, less than 100 scholars world-wide can read Nôm. Much of Việt Nam’s vast, written history is, in effect, inaccessible to the 80 million speakers of the language.

If you are interested in learning the Nom script, besides the Nom Preservation Foundation there is the Nom Na Office.  There is also Nom software for writing Nom, which includes the required fonts.

Austro-Asiatic Numbers Wed, 03 Aug 2011 20:53:44 +0000 admin Vietnamese and Khmer are the two well-known languages in the Mon-Khmer language group which is most of the Austro-Asiatic language family, the only national languages in the family.  Languages in this family are mostly in Southeast Asia but are also found in India and Bangladesh.  The map of the haplogroup O2b-M9 may explain why and that the people who spoke these languages originated in South Asia.

This chart from Paul Sidwell shows how closely related any Austro-Asiatic language is from another.  It shows that Vietnamese is pretty far from all other Mon-Khmer languages except those in the Viet-Muong group such as Muong, Ruc, Sach, Thavung.  It’s quite different from Khmer, even though there is a relation.

When we look at the numbers in these languages, we can see that there is still some connection.  The table below from lists numbers in Austro-Asiatic languages.  Actually, the numbers for Vietnamese are not exactly correct — see #5.

Can you see the similarities between Vietnamese (not Sino-Vietnamese) and Khmer numerals one through five?  Can you guess what the original Proto-Mon-Khmer numerals were?


Bugan bo 55 bio 31 mtse 31 pau 33 mi 33 pio 33 pou 31 sã 33 s;i 33 mã 31
Sino-Viet. nhâ’t nhi tam tú’ ngu~ luc thâ’t bát cu|‘u thâp
Proto-Viet-Muong+ *moc *hal *pa *pon *?dâm *khâw *pây *t’am *cin *mïel
Vietnamese môt hai ba bô’n nâm sáu ba|y tám chín mu’ò’i
Saigon mok. hay ba bóng nâm s,áw bây tám cín mï`y
Muong môc5 hal2 ba2 bôn3 tam2 khaw3 baj4 sam3 cin3 mu’o'l1
May (Ruc, Chut) moic hal pa pón dâm ráw pa,^,.)y thám cín mièy
Thavung (Aheu) muut haal paa póon dam phalu? pih sáam cíin sip
Tay Pong (Hung) motj hal? pa pôn dhâm prâu pal sam chin mal’
Arem mutj hel’ pe puôn dhâm prau po’ tha|m chín mu’o'i
Khasi wey ?aar laay saaw san hnriw hnñew phra khndaaw khat
Amwi mi o la siá san thrau ynthla humphyo hunshia shipho
Palaung u: e:r u-a:i p’o:n p’an to:r pu:r ta: t’i:m kö:r
Rumai hle a2 2 p’Un2 p’an2 to2 pu2 ta2 tiim2 ky2
Danau a àn wi po:n tho tun pei-ut tsìm tsen mà-kyen
Riang håk ka:r kwai k`pwon ka:u twàl pul pretà tim s’kàll
Lamet mus ar lohe pun pan tal pul ta tim kel
Khamet muei la-a la-oi pôn pan tol pool toh teem kel
Plang (Kontoi) keti?2 la?al1 la?oy2 lepun1 lephon1 leh2 hereh1 seti?1 setem1
Tai-loi ka-ti là-òl là-oi pun pàn
Kem Degne la loye póne hone halè leti setine koul
Wa ra: loi bun puån lich a:lich sìtä’ sha:tim kau
Lawa teh la-a la-oei pa-erng puan laeh a-laeh staik staing kua
La (Vo) t’ie ra loi pon p’wan lie a-lie tai tim kow
Phalok (Parauk) ti a: o|i bon pün li ali di dim ko|
Son te: à oi wun pu-on lu-à à-lu-à dai dim kau
En tai loi pun pàn liâ à-li-erh pin-dai dim ko
A Mok mo a: we: pun s`en tàll n`pwi n’tà n’tum n’kyu
P’uman yi erh san p’un sa t’ao p’ua t’a t’i ch’ao
Pou Ma leun song sam si ha hok tiet piet kao sip
Hu ?amò ka?à ka?òy ?aphòn pathán
Khmu’ mò:j pà:r ? en há: hók cét pét káw síp
Mal meie-lae piar paeh pôn piatee piapaeh jed piapon kao maehlach
Puôc môt biêl pe pôn
Mrabri damoi baer paeh pôn terng tán kool teeh gas gul
Yumbri (neremoy) (nakobe)
So muai ba: bai pon so’ng tapet tapu:n tako:n take machhit
Bru muoi bar pái poun sau’ng tapoât tapul takual takêh muoi chít
Van Kieu muôi bar pâi pôn so’n tapât tapu:l takuôl takê: macu’:t
Suei moi bar pa:y pon so’n tapat tapol tagol tagè mui jit
Na Nhyang muei bar pei puo:n chung thpak thpol thkol thke muchit
Kuy mu:j bi:a paj po:n su:ng thepha:t thepho:l thekhual thekheh ncut
Tareng moi bar puan son pat po:l ko:l khiè michet
Pacoh môi bar pe poan xông tapát tapôl tacol takih muchít
Katu mij ?be:r pe puon su:ng sepat tepal teka:l tekieh meghet
Kantu moi bar be: puan son tapat tapol tako:l takhie michet
Ong móoy báar pæ’æ púan tpat
Ngeq mo’yq baar pe puo’n so’o'ng tapu’at tapôôl takool takieh mo’chit
Old Khmer+ muuej Biier pii puuen pram pram muuej prem Biier prem pii prem puuen tap
Khmer múuej piir bèej bùuen pram pram múuej pram py’l pram bèej pram bùuen dap
Stieng muôi baar puôn pram prau poh phaam sên jo’mo’t
Chrau muôi var pe puôn prâm prau pôh pham su’n mât
Biat mwoj bar pee puen pram praw poh pham cin jit
Ko’ho dul bar pe poan prâm pro poh pham su’n jô’t
Sre dùl bàr pe pwan pram praw poh phàm sin jet
E Mnong ju bar pây puân prâm prâw poh pham sîn mât
C Mnong ngwây bar pe pwân prâm prâw po’h pham sîn jô’t
Loven muai bur^ pae puan sang tr^ao poh thaam chiin chet
Lave mui bar^ pae puan sing tr^ao poh thaam chen chit
Sapuan muuj baar pee puan seeng traw pah thaam cin jit
Cheng muuj baar pee pan seeng caw pah thaam cin cit
Suq (Sou) muuj baar pee poon seeng traw puh thaam cin cit
Nyahöñ muei ban puon so’ng trôu pah tham chin chit
Oi mui bar^ pae puan sing tr^ao pah thaam chin chit
Brao mui baar puon chhéng trau pos tham cheu chet
Krung 2 muuj baar pee puan cheeng traw pah thaam ceen cit
Lavi mooj piar pee poon syyng traw pyh thaam ciin cet
Bahnar miñ bar2 pêng3 puan bo’dam to’drou to’po’h to’hngam1 to’sin po’jit
Alak moei bar pei po:n dâm tahrâu poh ham chin jit
Tampuan maoñ paeng pwan petam trao timpaoh tinghaam ñchin tsit
Cua mui bar pe pon po’qdam ko’drôu kapoh tho’m sin ku’l
Rengao môi’ bâr2 pê’3 pôn2 bo’dam to’drô to’po’ih to’hngam1 to’chin bo’jo’t
Jeh muih bal pei puan po’dâm to’drau to’pèh to’ham to’chin jãt
Halang moi bat pe puan dam tarau tape pham chin ajiat
Sedang moi péa pái pún petám tedróu tepah tehéam tochen moi chat
Hrê mo:yq bayq piq pun padam tadràw tapèh nahim hachìn hajàt
Didra (Todrah) muèyq bia pi pudn padabm dadrue tapê`yh nihiam tachìdn jèt
Modra muê`y bar pi pudn padâp tandru tapèyh tahim tachìt jâ`t
Pear mo:y pa: phe:k pho:n phram kedo:ng kenu:l keti: kensa: kenga:y
Bolyu ma:i 31 mbi 55 pa:i 55 pu:n 53 me 31 pju 53 pei 55 sa:m 53 s;e53 ma:n 13
Samre mooy paar phee phoon pram kadang kanuul kentey kensaor raay
Chong moj bar pe? poon pram kadoong kaanuul kaatii kaasaal rai
Old Mon+ moy ?bar pi? pon msun terow dempoh dencam dencit cos
Mon mòa ?ba poe? pon peson kerao hepoh hecam hecit choh
Nyah Kur mùay baar pii? pan chuun traw mpoh ñcaam ñciit cas
Che’ Wong nôi bêi pet pôn limeh nem
Kensiu nay duwa? tiga? ñam
Kenta’ Bong nay bye
Mos nai komam fobieh awah uibem mampoh
Jehai ney dwa? tiga? impat (lima)
Menriq nay dewa? tiga?
Bateg Deh ney tiga?
Bateg Nong nay duwa? tiga?
Mintil sa? pusing tiga?
Semai nano na:r ni ampa:t lima: anam tudju
Temiar nál ampat lima anam tujuh lapan sambilan né-puloh
Lanoh niy na:y tiga:?
Sabum niy ciwel tiga?
Semnam ni:h ?ilwol tiga?
Jah Hut nwey nar tiga:?
Mah Meri (Besisi) muy hma hmpe? ‘mpât lîmâk nam tujoh d’lapan sambilan sapuloh
Semaq Beri muy hma hmpe? hmpon mesong pru? tempoh gênting gêntik mogênor
Semelai muy na:r hmpe? hmpon mesong pru? tempoh kitwit kantim kumai
Temoq moi duaq mpeq mpon mêsong têpêru têmpoh lapan sêmbilan mên gênau
Nicobar Is.
Nicobarese heng ne:t lu:i fe:n tanwi tafu:l sat hewhere macuhtere sam
Nancowry heang a lue fuan tanei lue tafuel ishat nfoan heanghata shaum
Shompen heng au lu:ge fu:at taing lagau aing towe: lung.i te.ya
Car héng né.t lú.y fé.n taníy tafú.l sát hévher~e macúhter~e sí.n
Mundari mid baria apia upunia monrea turuia ea írilia area gelea
Bhumij moyon baria apia upunia monea turia satta aitta nota dosta
Ho miad báriá apeá upuniá moiá turuiá aeá iriliá areá geleá
Korwa mi ba:ri-ta:ng pe:i-ta:ng cha:r pa:ñch chha sa:t a:th nau das
Birhor mia barea pea punia panch chhai sat a:t la: dâ:s
Asuri mi:at’ baria: pe:a: upnia: moyã: turia: aiya: irli:ya: area: gelea:
Santali mit bar pe pon more turui eae iral are gel
Turi miad’ baria pea punia miad’ ti miad’ ti miad’ miad’ ti baria miad’ ti pea miad’ ti punia baran ti
Kurku mi:a: ba:ri: a:pai upu:n mono turu:i: e: ila:r a:re: gel
Kharia moi baria upe ipon moloi tiburu gul t`am tomsing gol
Juang munto bato egota gandami pa:ñch chhao sa:ta a:tha nao daso
Gorum (Parenga) boj bag yag ungi monloy turgi gul-gi gal-gi gal-gab al-gab
Sora (Savara) eboy bagu yagi unji monloy tudru gulji tamji tinji gelji
Gutob (Gadaba) mui-ro: ba:r-ju: ig-ro: uun-ro: manle:i tir guligi ba:gu punza ba:gu punza bo:yi galigi
Remo (Bondo) mui ‘mba:r ingi:n o:ñ molloi t?i:ri gu: tUma:p som-tin go’
Gta’ m-mwing m-bar n-ji õ malwe tur gu tma sõting gwa
Speech Accent Archive Mon, 01 Aug 2011 20:54:15 +0000 admin Curious how speakers in other countries including non-native speakers pronounce English?  Check out the Speech Accent Archive.

Sure, it’s natural to make fun of someone’s accent.  But what makes someone’s English sound non-native?  From a linguists point of view, this site breaks down various speakers accents and makes generalizations about their phonology, how they mispronounce things.

For example, compare a Khmer native speaker’s English to a Vietnamese person.  Or a Japanese and a Korean.

Vietnamese Pronouns: Difficulty Saying You Part 1 - First and Second-Personal Singular Pronouns Fri, 24 Sep 2010 20:37:21 +0000 admin It’s easy to say “I love you” in Vietnamese. It’s either “anh yêu em” if you’re the man or “em yêu anh” if you’re the woman.

But correctly saying “you” in Vietnamese can take some serious effort to master.

As far as I’m aware, this is the most comprehensive explanation of pronouns in Vietnamese on the Internet.

Generally speaking, Vietnamese uses kinship terms instead of what we think of pronouns in English. There is no 1-to-1 translation of the words “I” or “you” or “he”. Instead, they use words that would literally translate to “servant” or “friend” or “older brother”.

tôi: I, me (first-person singular)
bạn: you (literally “friend”)

This is the most basic translation of “I” and “you” which is impersonal and assumes that neither person is older than the other. ”Tôi” literally means servant. In everyday speech it is uncommon but sometimes older people will use “tôi”, but use another word besides “bạn” for you. It is more commonly used in writing and you’ll see it used a lot for subtitles on foreign films.

Children will not refer to themselves as “tôi”. Instead, classmates, children the same age use:

tui, mình: I, me
bạn: you

On TV you’ll often see tớ for me and cậu for you. It is also common for students to use given names in place of pronouns.

“Peter (you), pass Paul (me) a pen.”
“Where was Peter (you) today?”
“Peter (I) was home sick.”

Children may also refer to each other informally as “ông” and “bà”, literally grandfather and grandmother. But for schoolmates who are not in the same grade, simple one-to-one translation of I/you falls apart.

em: I, me (for younger child), you (said to younger child)
chị: I, me (for older girl), you (said to older girl)
anh: I, me (for older boy), you (said to older boy)

So in the case of one student talking to an older girl, “em” will mean either me or you depending on who’s talking! ”Em” literally means younger sibling, “chị” means older sister, and “anh” means older brother. In a family, siblings will use these words to refer to each other in the place of pronouns.

E.g. “Em chào chị.”: (younger) (greets)
(older), or “hello/goodbye”.

[Actually, in the south of Vietnam, particularly the Mekong Delta, brothers and sisters are referred to not by name but in the order in which they were born, but skipping “1″. So younger siblings would refer to the first-born child, a son, as “anh hai” or “older brother two” and a second-born daughter as “chị ba”, “older sister three”.]

But these three words aren’t strictly reserved for blood relations. Anyone of the same generation can use “em/chị/anh” to refer to each other if it is clear that one is older than the other. Often this is the first question asked when meeting someone for the first time. However, even after it’s been confirmed that one is older than the other it is not always appropriate to use “em”. Friends can use given names to refer to each other, like classmates above, or “mày/tao” especially for close friends. However, “mày” and “tao” are also reserved for showing contempt as will be explained later.

It’s not always clear who is older and it can be highly presuming to refer to someone as “em”. In the modern age, with telephones and Internet, Vietnamese speakers even lack visual cues to see who’s older. So adults of the same generation will generally refer to themselves as “em” and the other as “anh/chị” (Mr./Ms.). Not doing so risks offending the other party.

[In a romantic relationship, the woman is always “em” and the man always “anh” even if the woman is older than the man. Literally, a husband and wife are brother and sister in Vietnamese.]

[For cousins, it depends on the age of their parents. So you could be 10 years older than your cousin who is your father’s older sister’s son, but you would still be “em” to him, “anh”.]

con: child, son, daughter
cháu: nephew, niece, grandchild
chú: father’s younger brother
cô: father’s younger sister
bác: father’s older brother or sister
ông: grandfather
bà: grandmother

In a family, relatives will use the above words (and more: cậu, dì, thím, mợ, o, etc.) to refer to each other and by extension people of different generations will use them even for non-family members. Thus, when speaking to somebody who could be your father’s brother or sister age or older:

con: I, me
chú: you (a man younger than your father)
cô: you (a woman younger than your father)
bác: you (man or woman older than your father, but not your grandparents)
ông: you (could be your father’s father)
bà: you (could be your father’s mother)

[In the south, a chú, cô, or bác would be called by their order, e.g. “chú tư” for your father’s younger brother who was born 3rd.]

[Also, “ông” and “bà” would be followed with “nội” or “ngoại” depending on whether they were paternal or paternal grandparents, respectively.]

In the south, “con” is used in these situations because it’s also used for nieces and nephews by their aunts, uncles, and grandparents. In the north, “cháu” is used instead.

Special situations:

tao: I, me
mày (mầy): you

[Sometimes ta/mi]

Close friends will use “mày/tao” but so will adversaries as it shows no respect. It’s possible for any older person to use “mày/tao” but very offensive for a younger person to say this with an elder. It can even be disrespectful to talk with your friends like this around elders. You would also use this construct with a pet.

thầy: male teacher
cô: female teacher

The student would be referred to as “em”.

Royalty involves another set of words. Someone you’d call “sir” would translate to “thưa ngài”.

That about sums it up for first and second-person singular personal pronouns in Vietnamese. Next I’ll talk about third-person pronouns and first-person multiple pronouns.

[Online: since you often neither know how old or what gender the audience is, people often use “em” for themselves and “bác” for the other, as “bác” can refer to man or woman.”

Vietnamese Classifiers: Loại Từ, a list Tue, 21 Sep 2010 21:10:44 +0000 admin In Vietnamese, like many Asian languages, nouns require classifier words to be counted.  Where in English we say “two boxes” in Vietnamese they say “hai cái hộp”, translated “two thing-classifier box”.  For “one cup of coffee” the Vietnamese is “một ly cà phê” or “one glass/classifier coffee”.  Classifiers are nearly always required, but many times a noun can also function as a classifier.  So “ly” means “glass” and “dĩa” means “dish” but “1 dĩa” is acceptable for “1 dish”.  ”Cái” is the most common and universal and can be used to classify other classifier-nouns, e.g. “cái ly”.  Cái is used for inanimate objects, “con” for animals, and “cây” for plants.

However, it gets more complicated.  For people, there are a number of classifiers, most common of them are “người” for adults, “đứa” for kids, “con” for girls, or “thằng” for boys.  But “con” and ”thằng” can also be used pejoratively.  There are a number of other classifiers that can be used pejoratively or to bestow honor.

For food, there are a number of classifiers depending on the type of thing it’s served in: “ly” or “cốc” for cup, “dĩa” or “đĩa” for dish/plate, “tô” for bowl, “xiên” for stick, “chén” for a bowl-like dish, and so on.

For a longer list of Vietnamese classifiers see Trần Ngọc Dụng’s paper on Loại Từ:

And for an interesting take on the possible controversial origins of many of today’s Vietnamese classifiers see V. U. Nguyen’s “ADMIXTURE ASPECTS OF VIETNAMESE CLASSIFIERS”.

Both documents are archived here:
Loại Từ 

Forming English words with un- and other negative prefixes Tue, 23 Mar 2010 13:12:55 +0000 admin Recently I came across the question of translating the word “unlock” into another language.  In Vietnamese the word would be translated “mo khoa” or “open lock”.  This got me thinking why they didn’t just use the word “lock” with another word that means to undo, the same function as “un-” in English.

When I thought about it further I realized that, although “un-” can be used for adjectives and verbs, it can be used for most adjectives but only a handful of verbs.  For example, we can unlock, undo (at least in the computer age), unfasten, unbutton, unzip, undress, or unleash but we cannot undrive, unhit, unpay, unclose, or uneat.  And we can only un-break my heart in certain songs.

So what’s the difference between verbs that can take un- and the verbs that cannot?  Well, all the verbs in the first list represent actions that change the state of something that can only be in two states.  For example, a lock can be locked or unlocked, a seatbelt can be fastened or unfastened, and I can be dressed or undressed but I can dress and undress and then dress and undress ad infinitum.  In fact, all those verbs can also take the prefix re-, for example, re-lock, re-fasten.

For adjectives, the prefix un- basically means “not”.  So unpopular means not popular, unintelligent means not intelligent, unusual means not usual, undressed means not dressed, and unlocked means not locked.  But why don’t we say unpossible, unbalanced, unregular, unaccurate, or unnumerable?  Instead we say impossible, imbalanced, irregular, inaccurate, and innumerable because Latin used im-, in-, and ir-instead of un- depending on the first letter of the word.

And we have some other prefixes that have the same meaning of “not”: a-, de-, and dis- (which comes from a Latin word similar to “dual”).  Sometimes dis- is used used for verbs where otherwise un- may have worked.  For example, disconnect and disappear.

But I still have another question.  Just as the Vietnamese open locks instead of unlocking them, why do we open doors but don’t un-open them?  Or why do we open windows instead of un-closing them?

“Đại Danh Từ Tiếng Việt” (Vietnamese pronouns) - A Paper Fri, 19 Mar 2010 10:01:18 +0000 admin Today I ran across an academic paper of questionable standards on the possible relations or origins of Vietnamese pronouns from a number of languages including Chinese languages, other Southeast Asian languages, and even French, English, and Japanese.

The paper is titled “Đại Danh Từ Tiếng Việt” (Vietnamese pronouns) and the authors are Nguyễn Đức Hiệp and Trần Thị Vĩnh Tường.

Firstly, Vietnamese has many words for the first person singular pronoun, not including the words depending on family or social status: Tôi, ta, tớ, tui, tao, mỗ, mình, miềnh, qua.  Of these, only tôi, ta, tớ, tui, tao, and mình are commonly used today, the standard being tôi.  Besides the common explanation that tôi comes from a Chinese (Middle Chinese) word meaning servant the authors note:

Nhiều tự điển, đặc biệt vài cuốn đầu tiên, như cuốn Annamite-
Portuguese-Latin của Alexandre de Rhodes [3], cho rằng ‘tôi đòi’, ‘đầy
tớ’, ‘tôi tớ’, xuất phát từ Tôi và Tớ. Điều này hoàn toàn phù hợp với phát
âm ngày nay trong tiếng Hẹ và Quảng Đông từ [Toi] { 儓 }. Phiên âm
Quan Thoại cho [toi] có 2 cách: [tai-2] và [dai-4], có nghĩa ‘tôi đòi’, hay
‘đầy tớ’, rất giống với Quan thoại [tai dai]. Thật ra ’Tớ’ có âm rất gần với
từ [tsut] hay [su] trong tiếng Hẹ, và [zeot] hay [syu] Quảng Đông viết là
豎 hay 卒 , cả hai có nghiã ‘tôi đòi’. Tuy vậy, nguồn gốc gần hơn là từ
[Tub] trong tiếng Hmong có nghiã ’Tớ’, “b’ là dấu chỉ âm cao giống như
với dấu sắc trong tiếng Việt. Trong tiếng Tày-Nùng, “Khỏi” tương đương
với ‘Tôi” với cả hai nghĩa: Tôi và đầy tớ.

In summary there is a strong resemblance to words in Cantonese, Hakka (another southern Chinese language/dialect), as well as Hmong which has a word ‘tub’ which is pronounced like ‘tu’ in a rising tone.

[Note: Japanese “boku” used by younger males to older people also comes from a Chinese word meaning “manservant”.]

Mình, pronounced miềnh in some places in central Vietnam and in the Mường language, is compared to mi and mei in Hakka and Cantonese respectively as well as the word me in English!

Mình cũng gần với [mi] 微 của
Hẹ, [mei] Quảng Đông, và cả tiếng Anh [me].

I think it’s a little incredible to assume any relation with the English word.  But it is noted that the Mon-Khmer word [Ming] has the same origins as Mình.

Overall, the paper suggests a number of possible similarities many of which can be ruled out easily.  For example Vietnamese anh is similar to Japanese ani, both meaning older brother, but is also similar to Japanese ane, which means older sister.  Likewise em (younger sibling) and imouto (younger sister) may have some similarities but if the head im/em had a meeting of younger sibling then how does one account for otouto (younger brother)?

Anyways, without reading the full paper you can view the table of languages and suggested cognates to each of a number of pronouns near the bottom paper.  You can read the paper online here:

Learning kanji by reading manga Wed, 17 Mar 2010 17:29:52 +0000 admin One of the more difficult aspects of learning Japanese as a foreign language is learning the writing system especially the thousands of kanji characters necessary to be considered literate.  One must strive to practice reading the characters one has already learned and attempt to learn new ones.  In Japan manga is popular for people of all ages not just children.  They can also be a fun way to practice Japanese.

But constantly looking up unknown kanji by radical and by number of strokes can be quite time consuming.  And so furigana, also known as Ruby text, can be of great assistance to any student of the Japanese language. Furigana is hiragana written in a small size next to a kanji character that will show you how to pronounce the word or sometimes give an alternate translation.  The Firefox browser has a great plug-in called Furigana Injector which can insert furigana next to kanji characters in any webpage.

Many manga books for young people, shounen/shonen for boys or shoujo/shojo for girls, will have furigana.  For example any title from Shonen Jump, Dragon Ball, everybody’s favorite Doraemon, Inspector Conan (Meitantei Konan), and Ranma 1/2.  While it may be easy to find translated copies of these in your country you want to read the original untranslated Japanese.  If you can’t find them locally you may have to resort to importing them from

And while there are many websites where you can read scanned and translated manga online it’s harder to find RAW scans that haven’t been translated.  Unfortunately and are both down.  If you know of any alternate resources please leave a comment!