Changes between Version 2 and Version 3 of SetOfEthiopianWebCorpora


Ignore:
Timestamp:
Sep 22, 2021, 2:38:45 PM (3 years ago)
Author:
hales
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SetOfEthiopianWebCorpora

    v2 v3  
    55  Amharic WIC corpus (News from Walta Information Center), manually tagged. [[BR]]
    66  [https://nlp.fi.muni.cz/projects/habit/download/wic.vert.gz download corpus]
     7  ([https://nlp.fi.muni.cz/en/LicenceWebCorpus NLP Centre Web Corpus license])
    78
    89 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=amwac16&reload=1 amWaC16 corpus], 20 million tokens
     
    1011  Amharic Web corpus. Crawled by !SpiderLing  in August 2013, October 2015 and January 2016. Cleaned, de-duplicated. Tagged by !TreeTagger trained on Amharic WiC. [[BR]] [AmharicCorpus Corpus deliverable/technical report][[BR]]
    1112  [https://nlp.fi.muni.cz/projects/habit/download/am131516.vert.gz download corpus]
     13  ([https://nlp.fi.muni.cz/en/LicenceWebCorpus NLP Centre Web Corpus license])
    1214
    1315 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=orwac16 orWaC16 corpus], 5.1 million tokens.
     
    1517  Oromo Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. [[BR]] [OromoCorpus Corpus deliverable/technical report][[BR]]
    1618  [https://nlp.fi.muni.cz/projects/habit/download/or16.tag.vert.gz download corpus]
     19  ([https://nlp.fi.muni.cz/en/LicenceWebCorpus NLP Centre Web Corpus license])
    1720
    1821 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=sowac16 soWaC16 corpus], 80 million tokens.
     
    2023  Somali Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. [[BR]] [SomaliCorpus Corpus deliverable/technical report][[BR]]
    2124  [https://nlp.fi.muni.cz/projects/habit/download/so16.tag.vert.gz download corpus]
     25  ([https://nlp.fi.muni.cz/en/LicenceWebCorpus NLP Centre Web Corpus license])
    2226
    2327 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=tiwac16 tiWaC16 corpus], 2.5 million tokens.
     
    2529  Tigrinya Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. [[BR]] [TigrinyaCorpus Corpus deliverable/technical report][[BR]]
    2630  [https://nlp.fi.muni.cz/projects/habit/download/ti16.tag.vert.gz download corpus]
     31  ([https://nlp.fi.muni.cz/en/LicenceWebCorpus NLP Centre Web Corpus license])
    2732
    2833=== Software ===