close
Warning:
AdminModule failed with TracError: Unable to instantiate component <class 'trac.admin.web_ui.BasicsAdminPanel'> (super(type, obj): obj must be an instance or subtype of type)
- Timestamp:
-
Jan 16, 2017, 10:03:02 PM (9 years ago)
- Author:
-
xsuchom2
- Comment:
-
Corpus query interface
Legend:
- Unmodified
- Added
- Removed
- Modified
-
|
v3
|
v4
|
|
| 46 | 46 | Apart from other African languages represented in HaBiT project corpora, the Somalian corpus consists of texts from a broad number of Web domains. The content of news/politics sites has a significant presence in the corpus sources. |
| 47 | 47 | |
| | 48 | == Corpus query interface == |
| | 49 | The corpus has been indexed by corpus manager and query system Sketch Engine [5]. The corpus can be searched at http://corpora.fi.muni.cz/habit/. |
| | 50 | |
| 48 | 51 | == References == |
| 49 | 52 | - [1] -- Kilgarriff, Adam, Siva Reddy, Jan Pomikálek, and P. V. S. Avinesh. "A Corpus Factory for Many Languages." In LREC. 2010. |
| … |
… |
|
| 51 | 54 | - [3] -- Suchomel, Vít, and Jan Pomikálek. "Efficient web crawling for large text corpora." In Proceedings of the seventh Web as Corpus Workshop (WAC7), pp. 39-43. 2012. |
| 52 | 55 | - [4] -- Pomikálek, Jan. "Removing boilerplate and duplicate content from web corpora." Disertační práce, Masarykova univerzita, Fakulta informatiky (2011). |
| | 56 | - [5] -- Kilgarriff, Adam, Vít Baisa, Jan Bušta, Miloš Jakubíček, Vojtěch Kovář, Jan Michelfeit, Pavel Rychlý, and Vít Suchomel. "The Sketch Engine: ten years on." Lexicography 1, no. 1 (2014): 7-36. |