close
Warning:
AdminModule failed with TracError: Unable to instantiate component <class 'trac.admin.web_ui.PluginAdminPanel'> (super(type, obj): obj must be an instance or subtype of type)
- Timestamp:
-
Jun 2, 2017, 12:29:14 PM (9 years ago)
- Author:
-
xsuchom2
- Comment:
-
Additional info
Legend:
- Unmodified
- Added
- Removed
- Modified
-
|
v4
|
v5
|
|
| 37 | 37 | * Corpus size: 20 million tokens. |
| 38 | 38 | * Crawled by !SpiderLing in August 2013, October 2015 and January 2016. Boilerplate-cleaned, de-duplicated. |
| 39 | | * Tagged by !TreeTagger trained on Amharic WiC. |
| | 39 | * Tagged using the [https://nlp.fi.muni.cz/projekty/habit/amtag/index.cgi HaBiT Amharic Tagger module] ([http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/ !TreeTagger] trained on [http://corpora.fi.muni.cz/habit/run.cgi/corp_info?corpname=am_wic Amharic WiC corpus]) applying the [https://www.researchgate.net/profile/Girma_Demeke/publication/237553785_Manual_Annotation_of_Amharic_News_Items_with_Part-of-Speech_Tags_and_its_Challenges/links/57045f2308ae74a08e246382.pdf Amharic POS tagset]. |
| 40 | 40 | * [AmharicCorpus Corpus deliverable/technical report] |
| 41 | 41 | |
| 42 | 42 | ==== Examples of HaBiT System features for the Amharic Web Corpus ==== |
| 43 | | * [http://corpora.fi.muni.cz/habit/run.cgi/corp_info?corpname=amwac16 Corpus information] |
| | 43 | * [http://corpora.fi.muni.cz/habit/run.cgi/corp_info?corpname=amwac16 Corpus information] – Document count, sentence count, word count, lexicon sizes, tagset description. |
| 44 | 44 | * [http://corpora.fi.muni.cz/habit/run.cgi/first?corpname=amwac16&reload=&iquery=%E1%88%98%E1%8A%95%E1%8C%8D%E1%88%A5%E1%89%B5&queryselector=iqueryrow&phrase=&word=&char=&cql=&default_attr=word&fc_lemword_window_type=both&fc_lemword_wsize=5&fc_lemword=&fc_lemword_type=all&fsca_doc.t2ld=&fsca_doc.urldomain= Examples of the use of "መንግሥት" ("government")] – Words or phrases in a natural Amharic context. The base function for language study and the source of good dictionary examples. |
| 45 | 45 | * [http://corpora.fi.muni.cz/habit/run.cgi/wsketch?corpname=amwac16&reload=&lemma=%E1%88%98%E1%8A%95%E1%8C%8D%E1%88%A5%E1%89%B5&minfreq=6&minscore=0.0&maxitems=20&sort_ws_columns=s&show_lemma_coverage=0&clustercolls=0&minsim=0.15&structured=0&structured=1&min_unary_score=5.0&min_mwlink_freq=100&nr_ws_cols=5&bim_corpname=&bim_lemma= Grammatical and collocational behaviour of "መንግሥት" ("government")] – An essential feature for creating dictionaries in Amharic. |
| … |
… |
|
| 51 | 51 | * Corpus size: 5.1 million tokens. |
| 52 | 52 | * Crawled by !SpiderLing in January 2016. Boilerplate-cleaned, de-duplicated. |
| 53 | | * Tagged with the Universal POS tagset. |
| | 53 | * Tagged using the [https://nlp.fi.muni.cz/projekty/habit/omtag/index.cgi HaBiT Oromo Tagger module] applying the [http://universaldependencies.org/u/pos/ Universal POS tagset]. |
| 54 | 54 | * [OromoCorpus Corpus deliverable/technical report] |
| 55 | 55 | |
| 56 | 56 | ==== Examples of HaBiT System features for the Oromo Web Corpus ==== |
| 57 | | * [http://corpora.fi.muni.cz/habit/run.cgi/corp_info?corpname=orwac16 Corpus information] |
| | 57 | * [http://corpora.fi.muni.cz/habit/run.cgi/corp_info?corpname=orwac16 Corpus information] – Document count, sentence count, word count, lexicon sizes, tagset description. |
| 58 | 58 | * [http://corpora.fi.muni.cz/habit/run.cgi/first?corpname=orwac16&reload=&iquery=mootummaa&queryselector=iqueryrow&phrase=&word=&char=&cql=&default_attr=word&fc_lemword_window_type=both&fc_lemword_wsize=5&fc_lemword=&fc_lemword_type=all&fsca_doc.t2ld=&fsca_doc.urldomain= Examples of the use of "mootummaa" ("government") in context] – Words or phrases in a natural Oromo context. The base function for language study and the source of good dictionary examples. |
| 59 | 59 | * [http://corpora.fi.muni.cz/habit/run.cgi/wsketch?corpname=orwac16&reload=&lemma=mootummaa&minfreq=auto&minscore=0.0&maxitems=20&sort_ws_columns=s&show_lemma_coverage=0&clustercolls=0&minsim=0.15&structured=0&structured=1&min_unary_score=5.0&min_mwlink_freq=100&nr_ws_cols=5&bim_corpname=&bim_lemma= Grammatical and collocational behaviour of "mootummaa" ("government")] – An essential feature for creating dictionaries in Oromo. |
| … |
… |
|
| 65 | 65 | * Corpus size: 80 million tokens. |
| 66 | 66 | * Crawled by !SpiderLing in January 2016. Boilerplate-cleaned, de-duplicated. |
| 67 | | * Tagged with the Universal POS tagset. |
| | 67 | * Tagged using the [https://nlp.fi.muni.cz/projekty/habit/sotag/index.cgi HaBiT Somali Tagger module] applying the [http://universaldependencies.org/u/pos/ Universal POS tagset]. |
| 68 | 68 | * [SomaliCorpus Corpus deliverable/technical report] |
| 69 | 69 | |
| 70 | 70 | ==== Examples of HaBiT System features for the Somali Web Corpus ==== |
| 71 | | * [http://corpora.fi.muni.cz/habit/run.cgi/corp_info?corpname=sowac16 Corpus information] |
| | 71 | * [http://corpora.fi.muni.cz/habit/run.cgi/corp_info?corpname=sowac16 Corpus information] – Document count, sentence count, word count, lexicon sizes, tagset description. |
| 72 | 72 | * [http://corpora.fi.muni.cz/habit/run.cgi/first?corpname=sowac16&reload=&iquery=dowladda&queryselector=iqueryrow&phrase=&word=&char=&cql=&default_attr=word&fc_lemword_window_type=both&fc_lemword_wsize=5&fc_lemword=&fc_lemword_type=all&fsca_doc.tld=&fsca_doc.t2ld=&fsca_doc.urldomain= Examples of the use of "dowladda" ("government") in context] – Words or phrases in a natural Somali context. The base function for language study and the source of good dictionary examples. |
| 73 | 73 | * [http://corpora.fi.muni.cz/habit/run.cgi/wsketch?corpname=sowac16&reload=&lemma=dowladda&minfreq=auto&minscore=0.0&maxitems=20&sort_ws_columns=s&show_lemma_coverage=0&clustercolls=0&minsim=0.15&structured=0&structured=1&min_unary_score=5.0&min_mwlink_freq=100&nr_ws_cols=5&bim_corpname=&bim_lemma= Grammatical and collocational behaviour of "dowladda" ("government")] – An essential feature for creating dictionaries in Somali. |
| … |
… |
|
| 79 | 79 | * Corpus size: 2.5 million tokens. |
| 80 | 80 | * Crawled by !SpiderLing in January 2016. Boilerplate-cleaned, de-duplicated. |
| 81 | | * Tagged with the Universal POS tagset. |
| | 81 | * Tagged using the [https://nlp.fi.muni.cz/projekty/habit/titag/index.cgi HaBiT Tigrinya Tagger module] applying the [http://universaldependencies.org/u/pos/ Universal POS tagset]. |
| 82 | 82 | * [TigrinyaCorpus Corpus deliverable/technical report] |
| 83 | 83 | |
| 84 | 84 | ==== Examples of HaBiT System features for the Tigrinya Web Corpus ==== |
| 85 | | * [http://corpora.fi.muni.cz/habit/run.cgi/corp_info?corpname=tiwac16 Corpus information] |
| | 85 | * [http://corpora.fi.muni.cz/habit/run.cgi/corp_info?corpname=tiwac16 Corpus information] – Document count, sentence count, word count, lexicon sizes, tagset description. |
| 86 | 86 | * [http://corpora.fi.muni.cz/habit/run.cgi/first?corpname=tiwac16&reload=&iquery=%E1%88%98%E1%8A%95%E1%8C%8D%E1%88%B5%E1%89%B2&queryselector=iqueryrow&phrase=&word=&wpos=&char=&cql=&default_attr=word&fc_lemword_window_type=both&fc_lemword_wsize=5&fc_lemword=&fc_lemword_type=all&fc_pos_window_type=both&fc_pos_wsize=5&fc_pos_type=all&fsca_doc.t2ld=&fsca_doc.urldomain= Examples of the use of "መንግስቲ " ("government") in context] – Words or phrases in a natural Tigrinya context. The base function for language study and the source of good dictionary examples. |
| 87 | 87 | * [http://corpora.fi.muni.cz/habit/run.cgi/wsketch?corpname=tiwac16&reload=&lemma=%E1%88%98%E1%8A%95%E1%8C%8D%E1%88%B5%E1%89%B2&minfreq=auto&minscore=0.0&maxitems=20&sort_ws_columns=s&show_lemma_coverage=0&clustercolls=0&minsim=0.15&structured=0&structured=1&min_unary_score=5.0&min_mwlink_freq=100&nr_ws_cols=5&bim_corpname=&bim_lemma= Grammatical and collocational behaviour of "መንግስቲ " ("government")] – An essential feature for creating dictionaries in Tigrinya. |
| … |
… |
|
| 96 | 96 | |
| 97 | 97 | ==== Examples of HaBiT System features for the Norwegian Bokmål Web Corpus ==== |
| 98 | | * [http://corpora.fi.muni.cz/habit/run.cgi/corp_info?corpname=notenten15_4_bokmal Corpus information] |
| | 98 | * [http://corpora.fi.muni.cz/habit/run.cgi/corp_info?corpname=notenten15_4_bokmal Corpus information] – Document count, sentence count, word count, lexicon sizes, tagset description. |
| 99 | 99 | * [http://corpora.fi.muni.cz/habit/run.cgi/first?corpname=notenten15_4_bokmal&reload=&iquery=regjering&queryselector=iqueryrow&lemma=&lpos=&phrase=&word=&wpos=&char=&cql=&default_attr=word&fc_lemword_window_type=both&fc_lemword_wsize=5&fc_lemword=&fc_lemword_type=all&fc_pos_window_type=both&fc_pos_wsize=5&fc_pos_type=all&fsca_doc.t2ld=&fsca_doc.urldomain= Examples of the use of "regjering" ("government") in context] – Words or phrases in a natural Norwegian Bokmål context. The base function for language study and the source of good dictionary examples. |
| 100 | 100 | * [http://corpora.fi.muni.cz/habit/run.cgi/wsketch?corpname=notenten15_4_bokmal&reload=&lemma=regjering&lpos=&minfreq=auto&minscore=0.0&maxitems=25&sort_ws_columns=s&show_lemma_coverage=0&clustercolls=0&minsim=0.15&structured=0&structured=1&min_unary_score=5.0&min_mwlink_freq=100&nr_ws_cols=4&bim_corpname=&bim_lemma= Grammatical and collocational behaviour of "regjering" ("government")] – An essential feature for creating dictionaries in Norwegian Bokmål. |
| … |
… |
|
| 109 | 109 | |
| 110 | 110 | ==== Examples of HaBiT System features for the Norwegian Nynorsk Web Corpus ==== |
| 111 | | * [http://corpora.fi.muni.cz/habit/run.cgi/corp_info?corpname=notenten15_4_nynorsk Corpus information] |
| | 111 | * [http://corpora.fi.muni.cz/habit/run.cgi/corp_info?corpname=notenten15_4_nynorsk Corpus information] – Document count, sentence count, word count, lexicon sizes, tagset description. |
| 112 | 112 | * [http://corpora.fi.muni.cz/habit/run.cgi/first?corpname=notenten15_4_nynorsk&reload=&iquery=regjering&queryselector=iqueryrow&phrase=&word=&char=&cql=&default_attr=word&fc_lemword_window_type=both&fc_lemword_wsize=5&fc_lemword=&fc_lemword_type=all&fsca_doc.t2ld=&fsca_doc.urldomain= Examples of the use of "regjering" ("government") in context] – Words or phrases in a natural Norwegian Nynorsk context. The base function for language study and the source of good dictionary examples. |
| 113 | 113 | |
| … |
… |
|
| 119 | 119 | |
| 120 | 120 | ==== Examples of HaBiT System features for the Czech Web Corpus ==== |
| 121 | | * [http://corpora.fi.muni.cz/habit/run.cgi/corp_info?corpname=cztenten16_0 Corpus information] |
| | 121 | * [http://corpora.fi.muni.cz/habit/run.cgi/corp_info?corpname=cztenten16_0 Corpus information] – Document count, sentence count, word count, lexicon sizes, tagset description. |
| 122 | 122 | * [http://corpora.fi.muni.cz/habit/run.cgi/reduce?q=aword%2C%5Blc%3D%22vl%C3%A1da%22%7Clemma_lc%3D%22vl%C3%A1da%22%5D&q=Fdoc&corpname=cztenten16_0&viewmode=sen&attrs=word&ctxattrs=word&structs=p%2Cg&refs=doc&pagesize=40&gdexconf=&iquery=vl%C3%A1da&attr_tooltip=nott&rlines=250 Examples of the use of "vláda" ("government") in context] – Words or phrases in a natural Czech context. The base function for language study and the source of good dictionary examples. |
| 123 | 123 | * [http://corpora.fi.muni.cz/habit/run.cgi/wsketch?corpname=cztenten16_0&reload=&lemma=vl%C3%A1da&minfreq=auto&minscore=0.0&maxitems=25&sort_ws_columns=s&show_lemma_coverage=0&clustercolls=0&minsim=0.15&structured=0&structured=1&min_unary_score=5.0&min_mwlink_freq=100&nr_ws_cols=5&bim_corpname=&bim_lemma= Grammatical and collocational behaviour of "vláda" ("government")] – An essential feature for creating dictionaries in Czech. |
| … |
… |
|
| 132 | 132 | |
| 133 | 133 | ==== Examples of HaBiT System features for the Czech-!Norwegian/Norwegian-Czech Parallel Corpus ==== |
| 134 | | * [http://corpora.fi.muni.cz/habit/run.cgi/corp_info?corpname=czech_norwegian_opus__czech Czech-Norwegian Parallel Corpus information], [http://corpora.fi.muni.cz/habit/run.cgi/corp_info?corpname=czech_norwegian_opus__norwegian Norwegian-Czech Parallel Corpus information] |
| | 134 | * [http://corpora.fi.muni.cz/habit/run.cgi/corp_info?corpname=czech_norwegian_opus__czech Czech-Norwegian Parallel Corpus information], [http://corpora.fi.muni.cz/habit/run.cgi/corp_info?corpname=czech_norwegian_opus__norwegian Norwegian-Czech Parallel Corpus information] – Document count, word count, lexicon sizes, tagset description. |
| 135 | 135 | * [http://corpora.fi.muni.cz/habit/run.cgi/view?q=aword%2C%5Blc%3D%22vl%C3%A1da%22%7Clemma_lc%3D%22vl%C3%A1da%22%5D+within+czech_norwegian_opus__norwegian%3A%5Blc%3D%22regjering%22%5D;corpname=czech_norwegian_opus__czech;viewmode=align;attrs=word&ctxattrs=word&structs=p%2Cg&refs=align&pagesize=40&align=czech_norwegian_opus__norwegian&gdexconf=&iquery=vl%C3%A1da&maincorp=czech_norwegian_opus__czech&attr_tooltip=nott;fromp=1 Examples of the use of Czech "vláda" ("government") with aligned segments of Norwegian "regjering" ("government") in context] – Words or phrases in a natural Czech and Norwegian context. The base function for language study and translation services. |