Changes between Version 1 and Version 2 of SetOfEthiopianWebCorpora


Ignore:
Timestamp:
May 31, 2017 6:51:46 PM (2 years ago)
Author:
xmedved1
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SetOfEthiopianWebCorpora

    v1 v2  
    33 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=wic&reload=1 Amharic WIC corpus], 200 thousand tokens
    44
    5   Amharic WIC corpus (News from Walta Information Center), manually tagged.
     5  Amharic WIC corpus (News from Walta Information Center), manually tagged. [[BR]]
     6  [https://nlp.fi.muni.cz/projects/habit/download/wic.vert.gz download corpus]
    67
    78 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=amwac16&reload=1 amWaC16 corpus], 20 million tokens
    89
    9   Amharic Web corpus. Crawled by !SpiderLing  in August 2013, October 2015 and January 2016. Cleaned, de-duplicated. Tagged by !TreeTagger trained on Amharic WiC. [[BR]] [AmharicCorpus Corpus deliverable/technical report]
     10  Amharic Web corpus. Crawled by !SpiderLing  in August 2013, October 2015 and January 2016. Cleaned, de-duplicated. Tagged by !TreeTagger trained on Amharic WiC. [[BR]] [AmharicCorpus Corpus deliverable/technical report][[BR]]
     11  [https://nlp.fi.muni.cz/projects/habit/download/am131516.vert.gz download corpus]
    1012
    1113 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=orwac16 orWaC16 corpus], 5.1 million tokens.
    1214
    13   Oromo Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. [[BR]] [OromoCorpus Corpus deliverable/technical report]
     15  Oromo Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. [[BR]] [OromoCorpus Corpus deliverable/technical report][[BR]]
     16  [https://nlp.fi.muni.cz/projects/habit/download/or16.tag.vert.gz download corpus]
    1417
    1518 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=sowac16 soWaC16 corpus], 80 million tokens.
    1619
    17   Somali Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. [[BR]] [SomaliCorpus Corpus deliverable/technical report]
     20  Somali Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. [[BR]] [SomaliCorpus Corpus deliverable/technical report][[BR]]
     21  [https://nlp.fi.muni.cz/projects/habit/download/so16.tag.vert.gz download corpus]
    1822
    1923 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=tiwac16 tiWaC16 corpus], 2.5 million tokens.
    2024
    21   Tigrinya Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. [[BR]] [TigrinyaCorpus Corpus deliverable/technical report]
     25  Tigrinya Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. [[BR]] [TigrinyaCorpus Corpus deliverable/technical report][[BR]]
     26  [https://nlp.fi.muni.cz/projects/habit/download/ti16.tag.vert.gz download corpus]
    2227
    2328=== Software ===