Changes between Version 2 and Version 3 of OromoTigrinyaSomaliWordSketches


Ignore:
Timestamp:
May 30, 2017, 11:27:20 PM (7 years ago)
Author:
xkovar3
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • OromoTigrinyaSomaliWordSketches

    v2 v3  
    1 == Sketch grammar == #docs-internal-guid-ce966917-55eb-cf87-c498-e89a55adb4f5
    2 The Afaan Oromo, Tigrinya and Somali sketch grammars were developed along the template used in the English and Spanish grammars within the Sketch Engine (so far the two most developed grammars), in terms of names of the grammatical relations (and their mapping to the template names which will enable matching the given language word sketches to word sketches in other languages) as well as in terms of the coding within the scope of the sketch grammar formalism which should simplify reading the grammar and make future modifications easy, which is important as there is not yet any feedback from native speakers. An important issue is, there is no lemmatization for these languages so the word sketches contain only particular word forms. The definitions of the relations are based on the Universal POS tagset ![1].
     1= D4.4d: A new definition of Word Sketches for Afaan Oromo, Tigrinya, and Somali =
    32
    4 In total, there is 19 relations in the current version of the grammar, covering the most important grammatical phenomena, such as modifiers of all parts of speech, subjects, objects, predicates and coordinations.
     3This report describes the new sketch grammars for Afaan Oromo, Tigrinya, and Somali created within the scope of the project.
     4
     5== State of the art ==
     6
     7Before the beginning of our work on the mentioned sketch grammars, there was no available sketch grammar for any of the languages. It was possible to use one of the universal grammars that are language-independent but this has never been done, to our best knowledge.
     8
     9== New sketch grammar ==
     10
     11The Afaan Oromo, Tigrinya and Somali sketch grammars were developed along the template used in the English and Spanish grammars within the Sketch Engine (so far the two most developed grammars), in terms of names of the grammatical relations (and their mapping to the template names which will enable matching the given language word sketches to word sketches in other languages) as well as in terms of the coding within the scope of the sketch grammar formalism which should simplify reading the grammar and make future modifications easy. The definitions of the relations are based on the [http://universaldependencies.org/u/pos/ Universal Dependencies part-of-speech tagset] for which new annotated data were created, and new part-of-speech taggers were trained within the scope of the project, for all of the respective languages.
     12
     13An important issue regarding word sketches for the mentioned languages is that there is no available lemmatization tool for neither of them, so the word sketches can contain particular word forms only. For the user, this means that several items within a particular grammatical relation column can correspond to one lemma. Also, there may be smaller amount of statistically significant collocations, as the space of word forms is much larger than the space of lemmas, so the data sparseness problem can occur in case of less frequent words. On the other hand, it also means that potentially different word sketches are available for the particular word forms, not only for lemmas. Also, the corpora created for Afaan Oromo, Tigrinya and Somali within the scope of the project are significantly smaller than for Norwegian, Czech and Amharic, therefore also numbers of hits of the sketch grammar rules are smaller.
     14
     15In total, there are 19 relations in the current version of the grammar, covering the most important grammatical phenomena, such as modifiers of all parts of speech, subjects, objects, predicates and coordinations. Examples of word sketches for all of the languages are given in the following figures.
    516
    617[[Image(fig1.png)]]
     
    819Figure 1: Word sketch for Oromo noun konkolaataan (car)
    920
    10 As can be seen from Figures 1-3, modifiers are the words that somehow modify or specify meaning of given word. This relation is called ''modifier of “word”''. There are also a few complement relations called'' nouns/verbs/adverbs modified by “word”'', where the word is the one who modifies the meaning (of noun, adjective, adverb or verb). These can be seen in Figures 1 and 3.
     21As can be seen from Figures 1-3, modifiers are the words that somehow modify or specify meaning of given word. This relation is called ''modifier of “word”''. There are also a few complement relations called'' nouns/verbs/adverbs modified by “word”'', where the headword is the one who modifies the meaning (of noun, adjective, adverb or verb). These can be seen in Figures 1 and 3.
    1122
    1223[[Image(fig2.png)]]
     
    2536
    2637The last relation is ''“word” and/or'', which shows frequent coordinations among words. This relation is shown in Figures 1-3.
    27 
    28 ![1] !http://universaldependencies.org/u/pos/