Changes between Version 30 and Version 31 of InterimResults


Ignore:
Timestamp:
Jan 17, 2017, 12:57:49 PM (7 years ago)
Author:
xsuchom2
Comment:

Web corpora names

Legend:

Unmodified
Added
Removed
Modified
  • InterimResults

    v30 v31  
    1010  Amharic WIC corpus (News from Walta Information Center), manually tagged.
    1111
    12  * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=amwac16&reload=1 Amharic WaC corpus], 20 million tokens
     12 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=amwac16&reload=1 amWaC16 corpus], 20 million tokens
    1313
    1414  Amharic Web corpus. Crawled by !SpiderLing  in August 2013, October 2015 and January 2016. Encoded in UTF-8, cleaned, deduplicated. Automatically tagged by !TreeTagger  trained on Amharic WiC
     
    1717
    1818  Oromo spoken corpus containing 1205 utterances. Built by Text Laboratory, University of Oslo.
    19  * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=orwac16 Oromo WaC corpus], 5.1 million tokens.
     19 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=orwac16 orWaC16 corpus], 5.1 million tokens.
    2020
    2121  Oromo Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated.
    2222
    23  * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=sowac16 Somali WaC corpus], 80 million tokens.
     23 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=sowac16 soWaC16 corpus], 80 million tokens.
    2424
    2525  Somali Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated.
    2626
    27  * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=tiwac16 Tigrinya WaC corpus], 2.5 million tokens.
     27 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=tiwac16 tiWaC16 corpus], 2.5 million tokens.
    2828
    2929  Tigrinya Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated.