Changes between Initial Version and Version 1 of SketchVisualization


Ignore:
Timestamp:
May 31, 2017, 3:32:29 PM (7 years ago)
Author:
xkovar3
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SketchVisualization

    v1 v1  
     1= D4.3: Visualization tool for Sketch Grammar queries =
     2
     3This tool is aimed at experienced users of the word sketches who want to understand precisely how the collocates were extracted from the corpus.
     4
     5= Building Word Sketches =
     6
     7The word sketch approach consists in combining the statistics
     8with of manually defined syntactic rules that limit what counts
     9as a co-occurrence of particular two words – a sketch grammar. The core
     10of a sketch grammar is a set of queries in corpus query language CQL,
     11each with marked position of the two words that can form a collocation.
     12Only the words matching one of the queries are then considered as co-occurrences
     13for statistical computations. Each of the CQL queries is associated
     14with a label that describes grammatical relationship between the particular
     15two words. One grammatical label (or relation) can be assigned to
     16multiple queries.
     17
     18Macros in the {{{m4}}} format can be used in the sketch grammar, so that
     19the developer does not have to repeat potentially complex queries with regular
     20expressions, and can assign a label to each such query, e.g. ''noun'' for {{{[tag="N.*"]}}}.
     21
     22Also, a few so-called processing directives can be used, which modify the evaluation of the queries.
     23For example, the *DUAL directive allows defining two complement relations (e.g. "modifier of X" vs. "words modified by X") by only one CQL query -- the query is also evaluated only once which makes creating the two relations more efficient. Another possibility is to include a third word into the relationship, more
     24precisely into a grammatical relation name. This is done by the *TRINARY
     25directive – a third word can be labelled by “3:” within the CQL queries
     26and this word replaces the string “%s” in the relation name, potentially creating
     27a large number of different grammatical relations.
     28
     29In the process of sketch grammar development, the developer needs to have the relation names, macros and directives clearly marked so that he does not have to read every piece of the code. Also, for an expert user who wants to be able to reconstruct how a grammatical relation is built up, it will be easier to read and understand the CQL definition.
     30
     31= Visualization tool =
     32
     33Therefore, we have improved the system of syntax highlighting that was available within the Sketch Engine tool. The sketch grammar of a particular corpus can be displayed by clicking on one of the grammatical relation names, which takes the user to an HTML colour display of the sketch grammar.
     34
     35As shown in Figure 1, the macros and directives are in blue, the grammatical relation names are green, as well as the labels of tokens/words that go into the relation. Comments are in dark red, all the rest (use of the macros, as well as common CQL queries) in black. Also, the sketch grammar is formatted in paragraphs in HTML so that its structure is clear.
     36
     37[[Image(wsdef.png)]]
     38
     39On top of this functionality, it is also possible to find out how many hits a particular query had, as well as any errors in the evaluation of the queries -- all of this is logged, and the logs are accessible to the users within the Corpus Architect interface (in cases where they are allowed to view them). Together with the described visualization tool, this functionality provides a convenient environment for the sketch grammar developer, as well as for the expert users that want to know how the system work inside.
     40