= D2.4: Parallel Czech-Norwegian corpus, size 10 million tokens = == Source == The source for this corpus was taken from !OpenSubtitles corpus made available within [[http://opus.lingfil.uu.se/OpenSubtitles2016.php|OPUS2 parallel corpus]]. == Statistics == ||=Part=||=Tokens=||=Words=||=Segments=|| ||Czech|| 32,345,496|| 24,101,302|| 4,235,111|| ||Norwegian|| 32,549,746|| 25,503,941|| dtto|| == Examples == [[http://corpora.fi.muni.cz/habit/run.cgi/first?corpname=czech_norwegian_opus__czech&reload=&iquery=Praha&queryselector=iqueryrow&lemma=&phrase=&word=&char=&cql=&default_attr=word&sel_aligned=czech_norwegian_opus__norwegian&pcq_pos_neg_czech_norwegian_opus__norwegian=pos&iquery_czech_norwegian_opus__norwegian=Praha&queryselector_czech_norwegian_opus__norwegian=iqueryrow&phrase_czech_norwegian_opus__norwegian=&word_czech_norwegian_opus__norwegian=&char_czech_norwegian_opus__norwegian=&cql_czech_norwegian_opus__norwegian=&filter_nonempty_czech_norwegian_opus__norwegian=on&fc_lemword_window_type=both&fc_lemword_wsize=5&fc_lemword=&fc_lemword_type=all Parallel search for "Praha"]] [[http://corpora.fi.muni.cz/habit/run.cgi/first?corpname=czech_norwegian_opus__czech&reload=&iquery=berla&queryselector=iqueryrow&lemma=&phrase=&word=&char=&cql=&default_attr=word&sel_aligned=czech_norwegian_opus__norwegian&pcq_pos_neg_czech_norwegian_opus__norwegian=pos&iquery_czech_norwegian_opus__norwegian=&queryselector_czech_norwegian_opus__norwegian=iqueryrow&phrase_czech_norwegian_opus__norwegian=&word_czech_norwegian_opus__norwegian=&char_czech_norwegian_opus__norwegian=&cql_czech_norwegian_opus__norwegian=&filter_nonempty_czech_norwegian_opus__norwegian=on&fc_lemword_window_type=both&fc_lemword_wsize=5&fc_lemword=&fc_lemword_type=all Czech word "berla"]] [[http://corpora.fi.muni.cz/habit/index.html Access the corpus]] === Norwegian words with more than 100,000 occurrences === ||er || 821,781|| ||det || 589,721|| ||du || 554,116|| ||Jeg || 547,501|| ||ikke || 506,186|| ||jeg || 418,217|| ||en || 360,871|| ||i || 341,400|| ||har || 315,050|| ||Det || 310,092|| ||på || 307,877|| ||å || 296,603|| ||og || 293,047|| ||til || 271,992|| ||deg || 259,043|| ||meg || 245,155|| ||med || 242,594|| ||for || 213,835|| ||Du || 211,802|| ||at || 204,376|| ||som || 203,379|| ||vi || 171,073|| ||var || 165,487|| ||kan || 162,222|| ||av || 160,980|| ||om || 149,962|| ||den || 148,767|| ||vil || 147,605|| ||så || 147,174|| ||Vi || 145,267|| ||et || 138,850|| ||han || 126,251|| ||skal || 119,570|| ||Hva || 110,797|| ||de || 110,202|| ||Han || 107,929|| ||må || 101,278|| === Czech words with more than 100,000 occurrences === ||to || 656,606|| ||se || 560,332|| ||je || 422,521|| ||že || 345,153|| ||na || 327,317|| ||jsem || 309,133|| ||a || 297,950|| ||si || 231,641|| ||v || 201,975|| ||co || 172,431|| ||To || 160,908|| ||s || 152,526|| ||A || 149,175|| ||mi || 142,779|| ||mě || 132,047|| ||tak || 121,439|| ||jsi || 118,647|| ||do || 113,030|| ||o || 112,856|| ||Je || 106,979|| === Example parallel segments === ||=Norwegian=||=Czech=|| ||Om jeg hadde $ 300, kunne jeg kommet meg til Tyskland.||Ne, ale kdybych měl 300$, dostal bych se do Německa.|| ||Aldri i livet! ||Až naprší a uschne.|| ||Jeg vil bli her... og fiske, slik Manuel gjorde.||Chci zůstat tady... a jezdit na ryby, jako Manuel.|| ||Transilvania.||Transylvánie.|| ||"Polka-Dot banditten og gjengen beskyldt for å utføre røveriet"||"Podezření padá na banditu Polka-Dot ."|| ||Fortsette som før?||Jako dřív?|| ||Nå har vi rikelig med sol for smilefjeset.||Teď svítí sluníčko pro pana Šťastného.|| ||Det minner meg om de ødelagte forsvarsverker på mitt eget slått i Transilvania.||Připomíná mi to zchátralé cimbuří mého vlastního hradu v Transylvánii.|| ||Ikke minn meg på det.||Nepřipomínej mi to.|| ||Følge etter?||- Sledovat?||