Changes between Version 30 and Version 31 of InterimResults
- Timestamp:
- Jan 17, 2017, 12:57:49 PM (7 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
InterimResults
v30 v31 10 10 Amharic WIC corpus (News from Walta Information Center), manually tagged. 11 11 12 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=amwac16&reload=1 Amharic WaCcorpus], 20 million tokens12 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=amwac16&reload=1 amWaC16 corpus], 20 million tokens 13 13 14 14 Amharic Web corpus. Crawled by !SpiderLing in August 2013, October 2015 and January 2016. Encoded in UTF-8, cleaned, deduplicated. Automatically tagged by !TreeTagger trained on Amharic WiC … … 17 17 18 18 Oromo spoken corpus containing 1205 utterances. Built by Text Laboratory, University of Oslo. 19 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=orwac16 Oromo WaCcorpus], 5.1 million tokens.19 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=orwac16 orWaC16 corpus], 5.1 million tokens. 20 20 21 21 Oromo Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. 22 22 23 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=sowac16 Somali WaCcorpus], 80 million tokens.23 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=sowac16 soWaC16 corpus], 80 million tokens. 24 24 25 25 Somali Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. 26 26 27 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=tiwac16 Tigrinya WaCcorpus], 2.5 million tokens.27 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=tiwac16 tiWaC16 corpus], 2.5 million tokens. 28 28 29 29 Tigrinya Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated.