Changes between Version 7 and Version 8 of AmharicCorpus


Ignore:
Timestamp:
Jan 17, 2017, 12:36:09 PM (7 years ago)
Author:
xsuchom2
Comment:

The most frequent words

Legend:

Unmodified
Added
Removed
Modified
  • AmharicCorpus

    v7 v8  
    3535We observe the content of news/politic and religious sites has a significant presence in the corpus sources. Since there are only 149 domains with more than 10 documents represented in the corpus, the result collection would benefit from a greater variety of sources.
    3636
    37 The most frequent parts of speech in both corpora are nouns and verbs. The most frequent part of speech tags:
    38 ||=Part of speech tag =||=Token count =||
    39 ||N     || 7,386,470||
    40 ||NP    || 2,660,200||
    41 ||VP    || 1,601,728||
    42 ||V     || 1,331,531||
    43 ||SENT  ||   946,905||
    44 ||VREL  ||   920,223||
    45 ||PUNC  ||   741,439||
    46 ||PREP  ||   729,404||
    47 ||NUMCR ||   687,686||
    48 ||ADJ   ||   647,608||
    49 ||PRON  ||   391,243||
    50 ||VN    ||   389,152||
    51 ||AUX   ||   373,346||
    52 ||NC    ||   322,592||
    53 ||CONJ  ||   292,046||
    54 ||PRONP ||   204,243||
    55 ||ADV   ||   173,772||
    56 ||NPC   ||   140,109||
    57 ||ADJP  ||   126,138||
     37The most frequent words:
     38||=Word (Ge'ez) =||= Word (Sera) =||= Count =||
     39||ነው    ||new   || 155,520||
     40||ላይ    ||lay   ||  91,592||
     41||እና    ||Ina   ||  49,733||
     42||ውስጥ   ||wsT   ||  42,429||
     43||ግን    ||gn    ||  39,537||
     44||ወደ    ||wede  ||  39,162||
     45||ጋር    ||gar   ||  36,057||
     46||ነበር   ||neber ||  34,055||
     47||ነገር   ||neger ||  30,670||
     48||ጊዜ    ||gizE  ||  27,413||
     49||ደግሞ   ||degmo ||  26,890||
     50||ይህ    ||yh    ||  25,622||
     51||አንድ   ||and   ||  25,546||
     52||ብቻ    ||bca   ||  23,468||
     53||ቤት    ||bEt   ||  22,466||
    5854
    5955== Corpus query interface ==
    60 The corpus has been indexed by corpus manager and query system Sketch Engine [5]. The corpus can be searched at http://corpora.fi.muni.cz/habit/.
     56The corpus has been indexed by corpus manager and query system Sketch Engine [5]. The corpus can be searched at http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=amwac16.
    6157
    6258== References ==