Context Navigation

Version 11 (modified by xbaisa, 9 years ago) ( diff )
--

D2.4: Parallel Czech-Norwegian corpus, size 10 million tokens

The source for this corpus was taken from OpenSubtitles corpus made available within OPUS2 parallel corpus [2] and originally taken from http://www.opensubtitles.org/.

Part	Tokens	Words	Segments
Czech	32,345,496	24,101,302	4,235,111
Norwegian	32,549,746	25,503,941	dtto

Norwegian	Czech
Om jeg hadde $ 300, kunne jeg kommet meg til Tyskland.	Ne, ale kdybych měl 300$, dostal bych se do Německa.
Aldri i livet!	Až naprší a uschne.
Jeg vil bli her... og fiske, slik Manuel gjorde.	Chci zůstat tady... a jezdit na ryby, jako Manuel.
Transilvania.	Transylvánie.
"Polka-Dot banditten og gjengen beskyldt for å utføre røveriet"	"Podezření padá na banditu Polka-Dot ."
Fortsette som før?	Jako dřív?
Nå har vi rikelig med sol for smilefjeset.	Teď svítí sluníčko pro pana Šťastného.
Det minner meg om de ødelagte forsvarsverker på mitt eget slått i Transilvania.	Připomíná mi to zchátralé cimbuří mého vlastního hradu v Transylvánii.
Ikke minn meg på det.	Nepřipomínej mi to.
Følge etter?	- Sledovat?

The corpus has been indexed by corpus manager and query system Sketch Engine [2]. The corpus can be searched at http://corpora.fi.muni.cz/habit/.

[1] -- Kilgarriff, Adam, Vít Baisa, Jan Bušta, Miloš Jakubíček, Vojtěch Kovář, Jan Michelfeit, Pavel Rychlý, and Vít Suchomel. "The Sketch Engine: ten years on." Lexicography 1, no. 1 (2014): 7-36.
[2] -- Jörg Tiedemann, 2009, News from OPUS - A Collection of Multilingual Parallel Corpora with Tools and Interfaces. In N. Nicolov and K. Bontcheva and G. Angelova and R. Mitkov (eds.) Recent Advances in Natural Language Processing (vol V), pages 237-248, John Benjamins, Amsterdam/Philadelphia

Note: See TracWiki for help on using the wiki.