Changes between Version 5 and Version 6 of SomaliCorpus


Ignore:
Timestamp:
Jan 17, 2017, 12:45:07 PM (7 years ago)
Author:
xsuchom2
Comment:

The most frequent words

Legend:

Unmodified
Added
Removed
Modified
  • SomaliCorpus

    v5 v6  
    4646Apart from other African languages represented in HaBiT project corpora, the Somalian corpus consists of texts from a broad number of Web domains. The content of news/politics sites has a significant presence in the corpus sources.
    4747
     48The most frequent words:
     49||=Word (Latin script) =||= Count =||
     50||oo    || 2,130,200||
     51||ka    || 1,808,365||
     52||ay    || 1,470,184||
     53||ku    || 1,445,719||
     54||iyo   || 1,248,166||
     55||ee    || 1,210,830||
     56||ah    || 1,062,164||
     57||u         || 1,041,418||
     58||in    || 1,037,431||
     59||ayaa  ||   985,020||
     60||uu    ||   950,971||
     61||soo   ||   794,868||
     62||la    ||   720,451||
     63||lagu  ||   397,822||
     64||ugu   ||   365,182||
     65
    4866== Corpus query interface ==
    49 The corpus has been indexed by corpus manager and query system Sketch Engine [5]. The corpus can be searched at http://corpora.fi.muni.cz/habit/.
     67The corpus has been indexed by corpus manager and query system Sketch Engine [5]. The corpus can be searched at http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=sowac16.
    5068
    5169== References ==