Current trends in linguistic data collection

Objectives and competences

The main objective is to present experimental and corpus linguistics, toghether with the tools the they use.

At the end of the course, students can competently evaluate electronic language resources and design and perform online data collection.

They learn how to approach solving a linguistic problem with modern tools, an approach offered by contemporary, web-based time.


No special prerequisites for enrolment. The course is related to all other linguistic courses in this program, but it does not assume background stemming from these courses.


The course starts with an overview of the field and with some basic skills like mark-up languages, such as HTML and XML. Freely available corpora and other online tools are presented and how they can be used for quantitative analysis of text and the experimental approach to gathering language data. Analysis of data is also presented. Term paper is a website with statistical analysis of a Slovenian text or statistical analysis of linguistic data gathered through a questionnaire or an online experiment.

Intended learning outcomes

Students learn how to use a modern tool for text analysis and learn about its potential in testing of linguistic hypotheses. They understand the inner structure of simple and machine-generated web pages, they get an overview of Slovenian language corpora and their use. They learn how to use online questionnaires for linguistic purposes. Students learn how to make a statistical description of a given text or how to present the gathered data.


  • D. Jurafsky, J. H. Martin, 2009. Speech and language processing, 2. izdaja, Prentice Hall, 1024 str. Catalogue E-version
  • C. D. Manning in H. Schütze, 1999. Foundations of Statistical Natural Language Processing, MIT Press. Cambridge, MA, 620 str. Catalogue E-version
  • A. Witt in D. Metzing (Ur.), 2010. Linguistic Modeling of Information and Markup Languages, zbirka Text, Speech and Language Technology, Vol. 40, Springer, 266 str. E-version
  • ACL wiki E-version
  • V. Gorjanc, 2005. Uvod v korpusno jezikoslovje. Izolit, Domžale, 163 str. Catalogue
  • R Manual E-version
  • Ibex docs E-version


Home assignments, term paper and final exam

Lecturer's references

Red. prof. dr. Franc Marušič je predavatelj jezikoslovja na Fakulteti za humanistiko na Univerzi v Novi Gorici. Njegovo osrednje raziskovalno področje je skladnja, pri čemer je večino svojega raziskovalnega dela opravil na slovenski skladnji.
Izbor člankov:
MARUŠIČ, Franc, ŽAUCER, Rok. O določnem ta v pogovorni slovenščini (z navezavo na določno obliko pridevnika). Slavistična revija. [Tiskana izd.], jan.-jun. 2007, letn. 55, št. 1/2, str. 223-247. [COBISS.SI-ID 700923]
MARUŠIČ, Franc. Some thoughts on phase extension to a single interface. Theor. linguist., 2007, vol. 33, no. 1, str. 83-91. [COBISS.SI-ID 637947]
MARUŠIČ, Franc, ŽAUCER, Rok. On the intensional feel-like construction in Slovenian : a case of a phonologically null verb. Nat. lang. linguist. theory, vol. 24, no. 4, str. 1093-1159. [COBISS.SI-ID 589563]
LARSON, Richard K., MARUŠIČ, Franc. On indefinite pronoun structures with APs : reply to Kishimoto. Linguist. inq., 2004, vol. 35, no. 2, str. 268-287. [COBISS.SI-ID 472315]
Boban Arsenijević, Franc Marušič, and Jana Willer Gold. Experimenting with Highest Conjunct Agreement under Left Branch Extraction. V Teodora Radeva-Bork and Peter Kosta (eds.) Current Developments in Slavic Linguistics. Twenty Years After. Berlin: Peter Lang. (2020)
Franc Marušič and Andrew Nevins. Distributed agreement in participial sandwiched configurations. V Peter W. Smith, Johannes Mursell & Katharina Hartmann (eds.), Agree to Agree: Agreement in the Minimalist Programme, 179-198. Berlin: Language Science Press. (2020). DOI:10.5281/zenodo.3541753
Franc Marušič, Petra Mišmaš in Rok Žaucer. Zakaj velika okrogla rdeča čestitka in ne rdeča velika okrogla čestitka? Poskus razlage nezaznamovane stave pridevnikov. Zbornik ob 80-letnici Ade Vidovič-Muhe, (2020)