close
Warning:
AdminModule failed with TracError: Unable to instantiate component <class 'trac.admin.web_ui.LoggingAdminPanel'> (super(type, obj): obj must be an instance or subtype of type)
- Timestamp:
-
Jan 17, 2017, 12:36:09 PM (9 years ago)
- Author:
-
xsuchom2
- Comment:
-
The most frequent words
Legend:
- Unmodified
- Added
- Removed
- Modified
-
|
v7
|
v8
|
|
| 35 | 35 | We observe the content of news/politic and religious sites has a significant presence in the corpus sources. Since there are only 149 domains with more than 10 documents represented in the corpus, the result collection would benefit from a greater variety of sources. |
| 36 | 36 | |
| 37 | | The most frequent parts of speech in both corpora are nouns and verbs. The most frequent part of speech tags: |
| 38 | | ||=Part of speech tag =||=Token count =|| |
| 39 | | ||N || 7,386,470|| |
| 40 | | ||NP || 2,660,200|| |
| 41 | | ||VP || 1,601,728|| |
| 42 | | ||V || 1,331,531|| |
| 43 | | ||SENT || 946,905|| |
| 44 | | ||VREL || 920,223|| |
| 45 | | ||PUNC || 741,439|| |
| 46 | | ||PREP || 729,404|| |
| 47 | | ||NUMCR || 687,686|| |
| 48 | | ||ADJ || 647,608|| |
| 49 | | ||PRON || 391,243|| |
| 50 | | ||VN || 389,152|| |
| 51 | | ||AUX || 373,346|| |
| 52 | | ||NC || 322,592|| |
| 53 | | ||CONJ || 292,046|| |
| 54 | | ||PRONP || 204,243|| |
| 55 | | ||ADV || 173,772|| |
| 56 | | ||NPC || 140,109|| |
| 57 | | ||ADJP || 126,138|| |
| | 37 | The most frequent words: |
| | 38 | ||=Word (Ge'ez) =||= Word (Sera) =||= Count =|| |
| | 39 | ||ነው ||new || 155,520|| |
| | 40 | ||ላይ ||lay || 91,592|| |
| | 41 | ||እና ||Ina || 49,733|| |
| | 42 | ||ውስጥ ||wsT || 42,429|| |
| | 43 | ||ግን ||gn || 39,537|| |
| | 44 | ||ወደ ||wede || 39,162|| |
| | 45 | ||ጋር ||gar || 36,057|| |
| | 46 | ||ነበር ||neber || 34,055|| |
| | 47 | ||ነገር ||neger || 30,670|| |
| | 48 | ||ጊዜ ||gizE || 27,413|| |
| | 49 | ||ደግሞ ||degmo || 26,890|| |
| | 50 | ||ይህ ||yh || 25,622|| |
| | 51 | ||አንድ ||and || 25,546|| |
| | 52 | ||ብቻ ||bca || 23,468|| |
| | 53 | ||ቤት ||bEt || 22,466|| |
| 58 | 54 | |
| 59 | 55 | == Corpus query interface == |
| 60 | | The corpus has been indexed by corpus manager and query system Sketch Engine [5]. The corpus can be searched at http://corpora.fi.muni.cz/habit/. |
| | 56 | The corpus has been indexed by corpus manager and query system Sketch Engine [5]. The corpus can be searched at http://corpora.fi.muni.cz/habit/run.cgi/first_form?corpname=amwac16. |
| 61 | 57 | |
| 62 | 58 | == References == |