School of Humanities

Language Technology

This course is part of the programme:
Bachelor's study programme Slovene Studies (1st Level)

Objectives and competences

The course objectives are to give:

an overview of language technology, related topics in information theory, text copora for Slovenian language and corresponding tools, basic understanding of the structure of web pages, the relevant markup languages such as HTML and XML.

Students get the competence in evaluation of electronic language resources, in preparation of language-related reports for the web environment.

They learn a new approach to the possibilities in solving language problems, an approach offered by contemporary, web-based time.

Prerequisites

The course does not require any special skills or knowledge, not covered by previous education of a future linguist. All that is needed is basic knowledge of computer use, some experience in usage of web resources and, last but not least, reasonable command of English language.

Content (Syllabus outline)

  • Overview of the field of language technology
  • Basic web skills
  • Overview of markup languages such as HTML and XML
  • Text corpora and related tools, especially for the Slovenian language
  • Term paper in the form of a web page with statistical analysis of a chosen Slovenian or English fiction text, including its lemmatization and preparation of a dictionary of open-class words.

Intended learning outcomes

Students learn how to use a modern tool for text analysis and its potential in testing of linguistic hypotheses. They understand the inner structure of simple and machine-generated web pages, they get an overview of Slovenian language corpora and their use. Students learn how to make a statistical description of a given text, including the preparation of the frequency dictionary of open-class words.

Readings

  • D. Jurafsky, J. H. Martin, 2009. Speech and language processing, 2. izdaja, Prentice Hall, 1024 str.
  • C. D. Manning in H. Schütze, 1999. Foundations of Statistical Natural Language Processing, MIT Press. Cambridge, MA, 620 str.
  • A. Witt in D. Metzing (Ur.), 2010. Linguistic Modeling of Information and Markup Languages, zbirka Text, Speech and Language Technology, Vol. 40, Springer, 266 str.
  • G. Leech, P. Rayson, A. Wilson, 2001. Word Frequencies in Written and Spoken English: based on the British National Corpus. Longman, London, 320 str.
  • Prispevki s konferenc Association for Computational Linguistics (ACL)
  • ACL wiki
  • V. Gorjanc, 2005. Uvod v korpusno jezikoslovje. Izolit, Domžale, 163 str.

- P. Jakopin, 2002. Entropija v slovenskih leposlovnih besedilih. Založba ZRC, Ljubljana, 208 str.•

Assessment

Term paper in the form of a web page, its presentation (60%), oral exam (40%).

Lecturer's references

Jernej Vičič studied computer and information science at the Faculty of Electrical Engineering and completed his studies at the newly created Faculty of Computer and Information Science.

In 1999 he received his BA degree (his thesis being entitled Napredne grafične metode [Advanced Graphic Methods]).

In 2002 he received his MA degree at the same faculty (his thesis being entitled Avtomatsko prevajanje iz slovenskega v angleški jezik na osnovi statističnega strojnega prevajanja [Automatic Translation from Slovenian to English on the Basis of Statistical Machine Translation]). Under the supervision of Professor Sašo Divjak and Tomaž Erjavec, PhD, his research focused on investigating methods and algorithms of statistical machine translation of natural languages. After obtaining his MA degree, he continued his research in the same field. His research focuses on training computers to translate natural languages, particularly related languages. In 2012, he defended his PhD thesis entitled Hitra postavitev prevajalnih sistemov na osnovi pravil za sorodne naravne jezike (Fast Implementation of Rules-Based Machine Translation Systems for Similar Natural Languages).

Selected publications:

1. VIČIČ, Jernej, GRGUROVIČ, Marko. Method to overcome the ambiguities in shallow parse and transfer machine translation. Computing and informatics, ISSN 1335-9150, 2018, vol. 37, no. 6, str. 1443-1463, graf. prikazi. http://www.cai.sk/ojs/index.php/cai/article/view/2018_6_1443, doi: 10.4149/cai 2018 6 1443. [COBISS.SI-ID 1541127876]

2. BOROS, Endre, GURVICH, Vladimir, MILANIČ, Martin, OUDALOV, Vladimir, VIČIČ, Jernej. A three-person deterministic graphical game without Nash equilibria. Discrete applied mathematics, ISSN 0166-218X. [Print ed.], 2018, vol. 243, str. 21-38. https://www.sciencedirect.com/science/article/pii/S0166218X18300404?via%3Dihub, doi: 10.1016/j.dam.2018.01.008. [COBISS.SI-ID 1540221636]

3. VIČIČ, Jernej, KUBOŇ, Vladislav, HOMOLA, Petr. Česílko Goes Open-source. Prague Bulletin of Mathematical Linguistics, ISSN 0032-6585, 2017, no. 107, str. 57-66. [COBISS.SI-ID 1539534788]

4. VIČIČ, Jernej. Jezikovni viri za prevajalne sisteme. Annales : anali za istrske in mediteranske študije, Series historia et sociologia, ISSN 1408-5348. [Tiskana izd.], 2016, letn. 26, št. 4, str. 751-767, ilustr., doi: 10.19233/ASHS.2016.57. [COBISS.SI-ID 1539062468]

5. VIČIČ, Jernej, HOMOLA, Petr, KUBOŇ, Vladislav. Automated implementation process of machine translation system for related languages. Computing and informatics, ISSN 1335-9150, 2016, vol. 35, no. 2, str. 441-469. [COBISS.SI-ID 1538538948]

6. VIČIČ, Jernej, ŠUKLJAN, Tine. Motivating cultural heritage artifacts presentation using persuasive technology. Informatica : an international journal of computing and informatics, ISSN 0350-5596, 2016, vol. 40, no. 4, str. 457-461. [COBISS.SI-ID 1539062212]

7. KLJUN, Matjaž, VIČIČ, Jernej, ČOPIČ PUCIHAR, Klen, KAVŠEK, Branko. “I agree” : the effects of embedding terms of service key points in online user registration form. V: ABASCAL, Julio (ur.). Human-Computer Interaction – INTERACT 2015. Part II : proceedings : 15th IFIP TC 13 International Conference, Bamberg, Germany, September 14-18, 2015,, (Lecture notes in computer science, ISSN 0302-9743, 9297). Berlin; New York: Springer. cop. 2015, str. 420-427, ilustr. http://link.springer.com/book/10.1007/978-3-319-22668-2, doi: 10.1007/978-3-319-22668-2_32. [COBISS.SI-ID 1537827012]

8. VIČIČ, Jernej, KUBOŇ, Vladislav. A comparison of MT methods for closely related languages : a case study on Czech – Slovak and Croatian – Slovenian language pairs. V: KRÁL, Pavel (ur.), MATOUŠEK, Václav (ur.). Text, speech, ad dialogue : proceedings : 18th International Conference, TSD 2015, Pilsen,Czech Republic, September 14-17, 2015, (Lecture notes in computer science, ISSN 0302-9743, 9302). Berlin; New York: Springer. cop. 2015, str. 216-224, ilustr. http://link.springer.com/chapter/10.1007/978-3-319-24033-6_25, doi: 10.1007/978-3-319-24033-6_25. [COBISS.SI-ID 1537823684]

9. VIČIČ, Jernej, BRODNIK, Andrej. Multiple-cloud platform monitoring. Elektrotehniški vestnik online, ISSN 2232-3236. [Spletna izd.], 2014, vol. 81, no. 3, str. 94-100, graf. prikazi. http://ev.fe.uni-lj.si/3-2014/Vicic.pdf. [COBISS.SI-ID 37581101]

10. VIČIČ, Jernej, BRODNIK, Andrej. Parse tree based machine translation for less-used languages. Metodološki zvezki, ISSN 1854-0023. [Tiskana izd.], 2008, vol. 5, no. 1, str. 65-81, ilustr. http://mrvar.fdv.uni-lj.si/pub/mz/mz5.1/vicic.pdf. [COBISS.SI-ID 2818007]

University course code: 1SI304

Year of study: 3

Semester: poletni

Course principal:

Lecturer:

Assistant:

ECTS: 4

Workload:

  • Lectures: 30 hours
  • Exercises: 30 hours
  • Individual work: 60 hours

Course type: compulsory

Languages: slovene

Learning and teaching methods:
• lectures • conversation • problem solving • seminar • usage of web tools and resources • creation of web pages • presentation of term paper