Current trends in linguistic data collection

Objectives and competences

The main objective is to present experimental and corpus linguistics, toghether with the tools the they use.

At the end of the course, students can competently evaluate electronic language resources and design and perform online data collection.

They learn how to approach solving a linguistic problem with modern tools, an approach offered by contemporary, web-based time.


No special prerequisites for enrolment. The course is related to all other linguistic courses in this program, but it does not assume background stemming from these courses.


The course starts with an overview of the field and with some basic skills like mark-up languages, such as HTML and XML. Freely available corpora and other online tools are presented and how they can be used for quantitative analysis of text and the experimental approach to gathering language data. Analysis of data is also presented. Term paper is a website with statistical analysis of a Slovenian text or statistical analysis of linguistic data gathered through a questionnaire or an online experiment.

Intended learning outcomes

Students learn how to use a modern tool for text analysis and learn about its potential in testing of linguistic hypotheses. They understand the inner structure of simple and machine-generated web pages, they get an overview of Slovenian language corpora and their use. They learn how to use online questionnaires for linguistic purposes. Students learn how to make a statistical description of a given text or how to present the gathered data.


Home assignments, term paper and final exam

Lecturer's references

Red. prof. dr. Franc Marušič je predavatelj jezikoslovja na Fakulteti za humanistiko na Univerzi v Novi Gorici. Njegovo osrednje raziskovalno področje je skladnja, pri čemer je večino svojega raziskovalnega dela opravil na slovenski skladnji.
