Changes between Version 5 and Version 6 of SketchGrammarEvaluation
- Timestamp:
- Jan 25, 2016, 12:48:01 PM (9 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
SketchGrammarEvaluation
v5 v6 30 30 [[BR]]When preparing the collocation candidates we collected a number of text corpora (both web-based and edited-content-based) and existing collocations dictionaries. The headwords were only nouns, adjectives and verbs chosen randomly from three frequency bands: 31 31 32 [[BR]]33 32 * high: top 100–2999 words by frequency 34 35 33 36 34 * mid: top 3000–9999 words by frequency 37 35 38 39 36 * low top 10,000–30,000 words by frequency 40 41 42 37 43 38 [[BR]]Figure 2: Distribution of good collocations in fiftieths, ordered by score. … … 65 60 [[BR]]Recently, we have therefore reviewed the methodology and we are now in preparation of a new gold standard collocation set following a revised methodology for annotation that aims to be more inclusive. The annotators are now classifying into five categories: 66 61 67 [[BR]]68 62 * strong collocation 69 70 63 71 64 * weak collocation 72 65 73 74 66 * correct word combination but not a significant collocation 75 76 67 77 68 * error 78 69 79 80 70 * I don’t understand 81 82 83 71 84 72 [[BR]]To help reducing the number of unknown collocations, the word sketches have been enhanced by the so called longest-commonest match (LCM) string -- the most common headword-collocation combination (Kilgarriff et al. 2015). … … 97 85 KILGARRIFF, Adam, et al. Longest–commonest Match. 2015. 98 86 87 == Download == 99 88 [raw-attachment:D4.1MethodologyofSketchGrammarevaluation.pdf link to the attached PDF]