Changes between Version 37 and Version 38 of InterimResults


Ignore:
Timestamp:
Jan 17, 2017, 9:46:59 PM (7 years ago)
Author:
hales
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • InterimResults

    v37 v38  
    1212 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=amwac16&reload=1 amWaC16 corpus], 20 million tokens
    1313
    14   Amharic Web corpus. Crawled by !SpiderLing  in August 2013, October 2015 and January 2016. Cleaned, de-duplicated. Tagged by !TreeTagger trained on Amharic WiC. [AmharicCorpus Corpus deliverable/technical report]
     14  Amharic Web corpus. Crawled by !SpiderLing  in August 2013, October 2015 and January 2016. Cleaned, de-duplicated. Tagged by !TreeTagger trained on Amharic WiC. [[BR]] [AmharicCorpus Corpus deliverable/technical report]
    1515
    1616 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=or_spoken Oromo spoken corpus], 7,500 tokens.
     
    2020 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=orwac16 orWaC16 corpus], 5.1 million tokens.
    2121
    22   Oromo Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. [OromoCorpus Corpus deliverable/technical report]
     22  Oromo Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. [[BR]] [OromoCorpus Corpus deliverable/technical report]
    2323
    2424 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=sowac16 soWaC16 corpus], 80 million tokens.
    2525
    26   Somali Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. [SomaliCorpus Corpus deliverable/technical report]
     26  Somali Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. [[BR]] [SomaliCorpus Corpus deliverable/technical report]
    2727
    2828 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=tiwac16 tiWaC16 corpus], 2.5 million tokens.
    2929
    30   Tigrinya Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. [TigrinyaCorpus Corpus deliverable/technical report]
     30  Tigrinya Web corpus crawled by !SpiderLing in January 2016. Cleaned, de-duplicated. [[BR]] [TigrinyaCorpus Corpus deliverable/technical report]
    3131
    3232 * [http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=czech_norwegian_opus__norwegian Czech-Norwegian parallel corpus], 4 million aligned segments.
    3333
    34   Czech-Norwegian parallel corpus from subtitles, OpenSubtitles2016 subcorpus of OPUS2, filtered for Czech and Norwegian.
     34  Czech-Norwegian parallel corpus from subtitles, OpenSubtitles2016 subcorpus of OPUS2, filtered for Czech and Norwegian. [[BR]] [ParallelCzechNorwegian Corpus deliverable/technical report]
    3535
    3636== Publications ==