Version 3 (modified by hales, 8 years ago) (diff)


HaBiT - Harvesting big text data for under-resourced languages

Start date: 1.10.2014
End date: 30.4.2017


  • MU: Masarykova univerzita, Brno
  • NTNU: Norges teknisk-naturvitenskapelige universitet, Trondheim

Project Goals

  1. build a multi-billion word Norwegian corpus
    • using the tools co-developed by MU and utilized in a joint EU-funded project with NTNU
  2. support linguistic resource building in Ethiopia funded by Norad in project NORHED
  3. build shallow processing applications for Czech and Norwegian, and at least one Ethiopian language

Internal Wiki

Attachments (6)

Download all attachments as: .zip