Changes between Version 2 and Version 3 of TigrinyaCorpus
- Timestamp:
- Jan 16, 2017, 10:03:07 PM (7 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
TigrinyaCorpus
v2 v3 42 42 Since the corpus is small, the domain variety is also limited. The content of politics, religious and blog sites has a significant presence in the corpus sources. 43 43 44 == Corpus query interface == 45 The corpus has been indexed by corpus manager and query system Sketch Engine [5]. The corpus can be searched at http://corpora.fi.muni.cz/habit/. 46 44 47 == References == 45 48 - [1] -- Kilgarriff, Adam, Siva Reddy, Jan Pomikálek, and P. V. S. Avinesh. "A Corpus Factory for Many Languages." In LREC. 2010. … … 47 50 - [3] -- Suchomel, Vít, and Jan Pomikálek. "Efficient web crawling for large text corpora." In Proceedings of the seventh Web as Corpus Workshop (WAC7), pp. 39-43. 2012. 48 51 - [4] -- Pomikálek, Jan. "Removing boilerplate and duplicate content from web corpora." Disertační práce, Masarykova univerzita, Fakulta informatiky (2011). 52 - [5] -- Kilgarriff, Adam, Vít Baisa, Jan Bušta, Miloš Jakubíček, Vojtěch Kovář, Jan Michelfeit, Pavel Rychlý, and Vít Suchomel. "The Sketch Engine: ten years on." Lexicography 1, no. 1 (2014): 7-36.