D4.4c: A new definition of Word Sketches for Amharic

This report describes the new sketch grammar for Amharic created within the scope of the project.

State of the art

Before the beginning of our work on the Amharic sketch grammar, there was no available sketch grammar for Amharic. It was possible to use one of the universal grammars that are language-independent but this has never been done, to our best knowledge.

New sketch grammar

The Amharic sketch grammar was developed along the template used in the English and Spanish grammars within the Sketch Engine (so far the two most developed grammars), in terms of names of the grammatical relations (and their mapping to the template names which will enable matching the Amharic word sketches to word sketches in other languages) as well as in terms of the coding within the scope of the sketch grammar formalism which should simplify reading the grammar and make future modifications easy. The definitions of the relations are based on the tagset proposed by Demeke and Getachew for which there is a tagger available.

An important issue regarding Amharic word sketches is that there is no available lemmatization tool for Amharic, so the word sketches can contain particular word forms only. For the user, this means that several items within a particular grammatical relation column can correspond to one lemma. Also, there may be smaller amount of statistically significant collocations, as the space of word forms is much larger than the space of lemmas, so the data sparseness problem can occur in case of less frequent words. On the other hand, it also means that potentially different word sketches are available for the particular word forms, not only for lemmas.

In total, there are 19 relations in the current version of the grammar, covering the most important grammatical phenomena, such as modifiers of all parts of speech, subjects, objects, predicates and coordinations.

Figure 1: Word sketch for noun አፍሪካ(Africa)

As can be seen from Figures 1-3, modifiers are the words that modify or specify meaning of given word. This relation is called modifier of “word”. There are also few complement relations called nouns/adjectives/verbs/adverbs modified by “word” where the headword is the one who modifies the meaning (of noun, adjective, adverb or verb). These can be seen in Figures 1 and 3.

Figure 2: Word sketch for verb አይደለም (is not)

Next grammatical phenomena to be described are subjects and objects, which can be either nominal or pronominal. These relations (objects of “word”, subjects of “word”, pronominal objects of “word”, pronominal subjects of “word”) can be seen in Figure 2. A somewhat special case is relation subject of “be word”. Partially conjoining relation to subject of be “word” is “word” is… as well as relations “word” is a ... and … is a “word”.

Another important grammatical phenomenon is predicate. Word sketches show two relations, verbs with “word” as object as can be seen in Figure 3 and verbs with “word” as subject as can be seen in Figure 1.

Figure 3: Word sketch for adverb አሁን (right now)

The last relation is “word” and/or, which shows frequent coordinations among words. This relation is shown in Figure 1.

