Changes between Version 5 and Version 6 of SketchGrammarEvaluation


Ignore:
Timestamp:
Jan 25, 2016, 12:48:01 PM (9 years ago)
Author:
xkocinc
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SketchGrammarEvaluation

    v5 v6  
    3030[[BR]]When preparing the collocation candidates we collected a number of text corpora (both web-based and edited-content-based) and existing collocations dictionaries. The headwords were only nouns, adjectives and verbs chosen randomly from three frequency bands:
    3131
    32 [[BR]]
    3332 * high: top 100–2999 words by frequency
    34 
    3533
    3634 * mid: top 3000–9999 words by frequency
    3735
    38 
    3936 * low top 10,000–30,000 words by frequency
    40 
    41 
    4237
    4338[[BR]]Figure 2: Distribution of good collocations in fiftieths, ordered by score.
     
    6560[[BR]]Recently, we have therefore reviewed the methodology and we are now in preparation of a new gold standard collocation set following a revised methodology for annotation that aims to be more inclusive. The annotators are now classifying into five categories:
    6661
    67 [[BR]]
    6862 * strong collocation
    69 
    7063
    7164 * weak collocation
    7265
    73 
    7466 * correct word combination but not a significant collocation
    75 
    7667
    7768 * error
    7869
    79 
    8070 * I don’t understand
    81 
    82 
    8371
    8472[[BR]]To help reducing the number of unknown collocations, the word sketches have been enhanced by the so called longest-commonest match (LCM) string -- the most common headword-collocation combination (Kilgarriff et al. 2015).
     
    9785KILGARRIFF, Adam, et al. Longest–commonest Match. 2015.
    9886
     87== Download ==
    9988[raw-attachment:D4.1MethodologyofSketchGrammarevaluation.pdf​ link to the attached PDF]