Changes between Version 37 and Version 38 of InterimResults
- Timestamp:
- Jan 17, 2017, 9:46:59 PM (7 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
InterimResults
v37 v38 12 12 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=amwac16&reload=1 amWaC16 corpus], 20 million tokens 13 13 14 Amharic Web corpus. Crawled by !SpiderLing in August 2013, October 2015 and January 2016. Cleaned, de-duplicated. Tagged by !TreeTagger trained on Amharic WiC. [ AmharicCorpus Corpus deliverable/technical report]14 Amharic Web corpus. Crawled by !SpiderLing in August 2013, October 2015 and January 2016. Cleaned, de-duplicated. Tagged by !TreeTagger trained on Amharic WiC. [[BR]] [AmharicCorpus Corpus deliverable/technical report] 15 15 16 16 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=or_spoken Oromo spoken corpus], 7,500 tokens. … … 20 20 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=orwac16 orWaC16 corpus], 5.1 million tokens. 21 21 22 Oromo Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. [ OromoCorpus Corpus deliverable/technical report]22 Oromo Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. [[BR]] [OromoCorpus Corpus deliverable/technical report] 23 23 24 24 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=sowac16 soWaC16 corpus], 80 million tokens. 25 25 26 Somali Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. [ SomaliCorpus Corpus deliverable/technical report]26 Somali Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. [[BR]] [SomaliCorpus Corpus deliverable/technical report] 27 27 28 28 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=tiwac16 tiWaC16 corpus], 2.5 million tokens. 29 29 30 Tigrinya Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. [ TigrinyaCorpus Corpus deliverable/technical report]30 Tigrinya Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. [[BR]] [TigrinyaCorpus Corpus deliverable/technical report] 31 31 32 32 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=czech_norwegian_opus__norwegian Czech-Norwegian parallel corpus], 4 million aligned segments. 33 33 34 Czech-Norwegian parallel corpus from subtitles, OpenSubtitles2016 subcorpus of OPUS2, filtered for Czech and Norwegian. 34 Czech-Norwegian parallel corpus from subtitles, OpenSubtitles2016 subcorpus of OPUS2, filtered for Czech and Norwegian. [[BR]] [ParallelCzechNorwegian Corpus deliverable/technical report] 35 35 36 36 == Publications ==