Changes between Version 4 and Version 5 of SomaliCorpus
- Timestamp:
- Jan 17, 2017, 11:13:39 AM (7 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
SomaliCorpus
v4 v5 2 2 3 3 == Building the Somali Web corpus == 4 We have used the following steps to create a big Somali Web corpus: First, adopting the Corpus factory method [1] bigrams of Somali words from the Crúbadán database [2] were used to query Bing search engine for documents in Somali. URLs of 18,108 documents found by the search engine were used as starting points for web crawler SpiderLing [3].4 We have used the following steps to create a big Somali Web corpus: First, adopting the Corpus factory method [1] bigrams of Somali words from the Crúbadán database [2] were used to query Bing search engine for documents in Somali. URLs of 18,108 documents found by the search engine were used as starting points for web crawler !SpiderLing [3]. 5 5 6 6 The following language models were created: