Changes between Version 5 and Version 6 of SomaliCorpus
- Timestamp:
- Jan 17, 2017, 12:45:07 PM (8 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
SomaliCorpus
v5 v6 46 46 Apart from other African languages represented in HaBiT project corpora, the Somalian corpus consists of texts from a broad number of Web domains. The content of news/politics sites has a significant presence in the corpus sources. 47 47 48 The most frequent words: 49 ||=Word (Latin script) =||= Count =|| 50 ||oo || 2,130,200|| 51 ||ka || 1,808,365|| 52 ||ay || 1,470,184|| 53 ||ku || 1,445,719|| 54 ||iyo || 1,248,166|| 55 ||ee || 1,210,830|| 56 ||ah || 1,062,164|| 57 ||u || 1,041,418|| 58 ||in || 1,037,431|| 59 ||ayaa || 985,020|| 60 ||uu || 950,971|| 61 ||soo || 794,868|| 62 ||la || 720,451|| 63 ||lagu || 397,822|| 64 ||ugu || 365,182|| 65 48 66 == Corpus query interface == 49 The corpus has been indexed by corpus manager and query system Sketch Engine [5]. The corpus can be searched at http://corpora.fi.muni.cz/habit/ .67 The corpus has been indexed by corpus manager and query system Sketch Engine [5]. The corpus can be searched at http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=sowac16. 50 68 51 69 == References ==