Changes between Version 4 and Version 5 of TigrinyaCorpus
- Timestamp:
- Jan 17, 2017, 12:49:04 PM (7 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
TigrinyaCorpus
v4 v5 42 42 Since the corpus is small, the domain variety is also limited. The content of politics, religious and blog sites has a significant presence in the corpus sources. 43 43 44 The most frequent words: 45 ||=Word (Ge'ez script) =||= Word (Sera transliteration) =||= Count =|| 46 ||ኣብ ||ab || 56,290|| 47 ||እዩ ||Iyu || 31,898|| 48 ||ናይ ||nay || 27,420|| 49 ||እቲ ||Iti || 20,705|| 50 ||ካብ ||kab || 20,582|| 51 ||ከም ||kem || 18,167|| 52 ||ድማ ||dma || 18,138|| 53 ||እዚ ||Izi || 15,630|| 54 ||ምስ ||ms || 14,090|| 55 ||ናብ ||nab || 11,792|| 56 ||ነቲ ||neti || 10,900|| 57 ||ከኣ ||kea || 9,234|| 58 ||ኣምላኽ ||amlaK || 9,089|| 59 ||ሓደ ||Hade || 8,791|| 60 ||ቅዱስ ||qdus || 8,330|| 61 44 62 == Corpus query interface == 45 The corpus has been indexed by corpus manager and query system Sketch Engine [5]. The corpus can be searched at http://corpora.fi.muni.cz/habit/ .63 The corpus has been indexed by corpus manager and query system Sketch Engine [5]. The corpus can be searched at http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=tiwac16. 46 64 47 65 == References ==