Changes between Initial Version and Version 1 of CorpusAnnotationTool


Ignore:
Timestamp:
Jun 1, 2017, 11:41:47 AM (8 years ago)
Author:
pary
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • CorpusAnnotationTool

    v1 v1  
     1= Corpus Annotation Tool =
     2== Dynamic PoS annotation framework ==
     3The system for part of speech (PoS) annotation of sentences was created. The aim is to create tool for easy and fast creation of small annotated corpora for languages without linguistics resources.
     4
     5The tool is optimized for the following priorities:
     6
     7 1. ''Simple tool for instant usage''. The client part of the tool is a web application which works in any browser, it requires small amount of all resources (memory, internet connection, screen size), it could be used even from mobile devices (tablets, phones).
     8 1. ''Using small standard tag set''. If possible, we have used [http://universaldependencies.org Universal Dependencies] (UD) tag set (version 2). It is well documented and used for many different languages. The main description of the tags is directly included in the tool with links to the UD site.
     9 1. ''Fast navigation.'' Users can use mouse or keyboard for navigation, only one or two mouse clicks or key strokes are needed for annotation of one token. All sentences are pre-annotated via an builtin adaptive tagger and only incorrect annotation have to be corrected by users.
     10 1. ''Clean texts.'' Users can (and are instructed to) reject to annotate a sentence if the sentence is not clear.
     11
     12
     13
     14The tool was used to annotate texts in 6 languages by 16 annotators in total. Czech and Norwegian corpora were annotated mainly for evaluation reasons, there are several PoS annotated (including UD tag set) corpora for both languages. Annotation of 4 Ethiopian languages was used to build respictive PoS taggers which are part of the [http://corpora.fi.muni.cz/habit/ HaBiT system].
     15
     16Online statistics from the annotation are available [https://nlp.fi.muni.cz/projects/habit/stats/ here].