We selected $23$* diverse target languages, embracing a wide spectrum of linguistic features, writing systems, language families, and geographical regions. Our goal was to not only reach a broad audience but also to highlight the richness of global linguistic diversity. In determining these languages, we consulted the 26th edition of Ethnologue, focusing on the number of speakers to maximize the potential reach of our translations. With LitMT, we aim to provide access to diverse narratives and cultures, offering our readers a gateway to the wide array of linguistic and cultural riches the literary world has to offer.

*We are currently in the process of adding Armenian.

nobackground.png

Figure: Map of the target languages featured on the LitMT website spanning from high resource languages (e.g., English or French) to relatively low resource languages (e.g., Hausa or Armenian). Currently, the website features 23 out of 24 languages presented here. We are in the process of adding Armenian to the translations. *This map is a proxy and is continuously updated. Identifying which countries to include is complex, as many of these languages are spoken or understood in multiple countries, often by minority groups. Please note that not all relevant countries may be marked at this time.

map_langs_msa_updated.png

Table: List of languages available on the LitMT website with number of native speakers (L1) and the total number of speakers (all). MSA denotes Modern Standard Arabic. Statistics sourced from Ethnologue.

id lang all L1 family branch morphology script
1 English 1.452B 372.9M Indo-European Germanic analytic Latin
2 Mandarin Chinese 1.118B 929.0M Sino-tibetan Sinitic analytic Hanzi
3 Hindi 602.2M 343.9M Indo-European Indo-Aryan fusional Devanagari
4 Spanish 548.3M 474.7M Indo-European Romance fusional Latin
5 French 274.1M 79.9M Indo-European Romance fusional Latin
6 MSA 274.0M n/a Afro-asiatic Semitic fusional Naskh
7 Bengali 272.7M 233.7M Indo-European Indo-Aryan fusional Bengali
8 Russian 258.2M 154.0M Indo-European Balto-Slavic fusional Cyrillic
9 Portuguese 257.7M 232.4M Indo-European Romance fusional Latin
10 Urdu 231.3M 70.2M Indo-European Indo-Aryan fusional Nastaliq
11 Indonesian 199.0M 43.6M Austronesian Malayo-Polynesian agglutinative Latin
12 German 134.6M 75.6M Indo-European Germanic fusional Latin
13 Japanese 125.4M 125.3M Japonic - agglutinative Kanji, Hiragana, Katakana
14 Turkish 88.1M 82.2M Turkic Oghuz agglutinative Latin
15 Tamil 86.4M 78.4M Dravidian Southern agglutinative Tamil
16 Vietnamese 85.3M 84.6M Austroasiatic Vietic analytic Latin
17 Korean 81.7M 80.4M Koreanic - agglutinative Hangul
18 Hausa 77.1M 50.8M Afro-asiatic Chadic fusional Latin
19 Swahili 71.4M 16.1M Niger-congo Bantu agglutinative Latin
20 Thai 60.7M 20.7M Kra-dai Zhuang-tai analytic Thai
21 Polish 40.6M 40.0M Indo-European Balto-Slavic fusional Latin
22 Ukrainian 32.8M 27.0M Indo-European Balto-Slavic fusional Cyrillic
23 Mongolian 5.2M ~5.2M Mongolic Mongolian agglutinative Cyrillic
24 Armenian 5.3M ~5.3M Indo-European Armenian fusional Armenian

1. Total number of speakers (including non-native speakers)

num_speak_lang.png

2. Language families

lang_family.png

3. Scripts

script_binary.png

script.png

4. Morphology

morphology.png