Changes between Version 3 and Version 4 of SomaliCorpus
- Timestamp:
- Jan 16, 2017, 10:03:02 PM (8 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
SomaliCorpus
v3 v4 46 46 Apart from other African languages represented in HaBiT project corpora, the Somalian corpus consists of texts from a broad number of Web domains. The content of news/politics sites has a significant presence in the corpus sources. 47 47 48 == Corpus query interface == 49 The corpus has been indexed by corpus manager and query system Sketch Engine [5]. The corpus can be searched at http://corpora.fi.muni.cz/habit/. 50 48 51 == References == 49 52 - [1] -- Kilgarriff, Adam, Siva Reddy, Jan Pomikálek, and P. V. S. Avinesh. "A Corpus Factory for Many Languages." In LREC. 2010. … … 51 54 - [3] -- Suchomel, Vít, and Jan Pomikálek. "Efficient web crawling for large text corpora." In Proceedings of the seventh Web as Corpus Workshop (WAC7), pp. 39-43. 2012. 52 55 - [4] -- Pomikálek, Jan. "Removing boilerplate and duplicate content from web corpora." Disertační práce, Masarykova univerzita, Fakulta informatiky (2011). 56 - [5] -- Kilgarriff, Adam, Vít Baisa, Jan Bušta, Miloš Jakubíček, Vojtěch Kovář, Jan Michelfeit, Pavel Rychlý, and Vít Suchomel. "The Sketch Engine: ten years on." Lexicography 1, no. 1 (2014): 7-36.