Changes between Version 34 and Version 35 of InterimResults
- Timestamp:
- Jan 17, 2017, 1:07:38 PM (7 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
InterimResults
v34 v35 8 8 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=wic&reload=1 Amharic WIC corpus], 200 thousand tokens 9 9 10 Amharic WIC corpus (News from Walta Information Center), manually tagged. [AmharicCorpus Corpus deliverable/technical report]10 Amharic WIC corpus (News from Walta Information Center), manually tagged. 11 11 12 12 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=amwac16&reload=1 amWaC16 corpus], 20 million tokens 13 13 14 Amharic Web corpus. Crawled by !SpiderLing in August 2013, October 2015 and January 2016. Cleaned, de-duplicated. Tagged by !TreeTagger trained on Amharic WiC. 14 Amharic Web corpus. Crawled by !SpiderLing in August 2013, October 2015 and January 2016. Cleaned, de-duplicated. Tagged by !TreeTagger trained on Amharic WiC. [AmharicCorpus Corpus deliverable/technical report] 15 15 16 16 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=or_spoken Oromo spoken corpus], 7,500 tokens. 17 17 18 18 Oromo spoken corpus containing 1205 utterances. Built by Text Laboratory, University of Oslo. 19 19 20 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=orwac16 orWaC16 corpus], 5.1 million tokens. 20 21 21 Oromo Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. 22 Oromo Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. [OromoCorpus Corpus deliverable/technical report] 22 23 23 24 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=sowac16 soWaC16 corpus], 80 million tokens. 24 25 25 Somali Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. 26 Somali Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. [SomaliCorpus Corpus deliverable/technical report] 26 27 27 28 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=tiwac16 tiWaC16 corpus], 2.5 million tokens. 28 29 29 Tigrinya Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. 30 Tigrinya Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. [TigrinyaCorpus Corpus deliverable/technical report] 30 31 31 32 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=czech_norwegian_opus__norwegian Czech-Norwegian parallel corpus], 4 million aligned segments.