Changes between Version 2 and Version 3 of SetOfEthiopianWebCorpora
- Timestamp:
- Sep 22, 2021, 2:38:45 PM (3 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
SetOfEthiopianWebCorpora
v2 v3 5 5 Amharic WIC corpus (News from Walta Information Center), manually tagged. [[BR]] 6 6 [https://nlp.fi.muni.cz/projects/habit/download/wic.vert.gz download corpus] 7 ([https://nlp.fi.muni.cz/en/LicenceWebCorpus NLP Centre Web Corpus license]) 7 8 8 9 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=amwac16&reload=1 amWaC16 corpus], 20 million tokens … … 10 11 Amharic Web corpus. Crawled by !SpiderLing in August 2013, October 2015 and January 2016. Cleaned, de-duplicated. Tagged by !TreeTagger trained on Amharic WiC. [[BR]] [AmharicCorpus Corpus deliverable/technical report][[BR]] 11 12 [https://nlp.fi.muni.cz/projects/habit/download/am131516.vert.gz download corpus] 13 ([https://nlp.fi.muni.cz/en/LicenceWebCorpus NLP Centre Web Corpus license]) 12 14 13 15 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=orwac16 orWaC16 corpus], 5.1 million tokens. … … 15 17 Oromo Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. [[BR]] [OromoCorpus Corpus deliverable/technical report][[BR]] 16 18 [https://nlp.fi.muni.cz/projects/habit/download/or16.tag.vert.gz download corpus] 19 ([https://nlp.fi.muni.cz/en/LicenceWebCorpus NLP Centre Web Corpus license]) 17 20 18 21 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=sowac16 soWaC16 corpus], 80 million tokens. … … 20 23 Somali Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. [[BR]] [SomaliCorpus Corpus deliverable/technical report][[BR]] 21 24 [https://nlp.fi.muni.cz/projects/habit/download/so16.tag.vert.gz download corpus] 25 ([https://nlp.fi.muni.cz/en/LicenceWebCorpus NLP Centre Web Corpus license]) 22 26 23 27 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=tiwac16 tiWaC16 corpus], 2.5 million tokens. … … 25 29 Tigrinya Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. [[BR]] [TigrinyaCorpus Corpus deliverable/technical report][[BR]] 26 30 [https://nlp.fi.muni.cz/projects/habit/download/ti16.tag.vert.gz download corpus] 31 ([https://nlp.fi.muni.cz/en/LicenceWebCorpus NLP Centre Web Corpus license]) 27 32 28 33 === Software ===