Changes between Version 8 and Version 9 of AmharicCorpus


Ignore:
Timestamp:
Jan 17, 2017, 12:39:47 PM (7 years ago)
Author:
xsuchom2
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • AmharicCorpus

    v8 v9  
    2222||=Token count       =|| 20,287,250||
    2323||=Ge'ez lexicon size=||    955,628||
    24 ||=Sera lexicon size =||    948,553||
     24||=Sera transliteration lexicon size =||    948,553||
    2525
    2626Document count – the most frequent web domains and domain size distribution:
     
    3636
    3737The most frequent words:
    38 ||=Word (Ge'ez) =||= Word (Sera) =||= Count =||
     38||=Word (Ge'ez script) =||= Word (Sera transliteration) =||= Count =||
    3939||ነው    ||new   || 155,520||
    4040||ላይ    ||lay   ||  91,592||