Changes between Version 29 and Version 30 of InterimResults
- Timestamp:
- Jan 17, 2017, 12:56:33 PM (7 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
InterimResults
v29 v30 10 10 Amharic WIC corpus (News from Walta Information Center), manually tagged. 11 11 12 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=amwac16&reload=1 Amharic WaC corpus], 17million tokens12 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=amwac16&reload=1 Amharic WaC corpus], 20 million tokens 13 13 14 Amharic web corpus. Crawled by !SpiderLing in August 2013 and October 2015. Encoded in UTF-8, cleaned, deduplicated. Automatically tagged by !TreeTagger trained on Amharic WiC14 Amharic Web corpus. Crawled by !SpiderLing in August 2013, October 2015 and January 2016. Encoded in UTF-8, cleaned, deduplicated. Automatically tagged by !TreeTagger trained on Amharic WiC 15 15 16 16 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=or_spoken Oromo spoken corpus], 7,500 tokens. … … 19 19 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=orwac16 Oromo WaC corpus], 5.1 million tokens. 20 20 21 Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. 21 Oromo Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. 22 22 23 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=sowac16 Somali WaC corpus], 80 million tokens. 23 24 24 Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. 25 Somali Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. 26 25 27 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=tiwac16 Tigrinya WaC corpus], 2.5 million tokens. 26 28 27 Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated.29 Tigrinya Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. 28 30 29 31 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=czech_norwegian_opus__norwegian Czech-Norwegian parallel corpus], 4 million aligned segments.