School of Engineering and Management

Data Mining

This course is part of the programme:
Master in Engineering and Management (Second Level)

Objectives and competences

Knowledge discovery in databases is a process of discovering patterns and models, described by rules or other human understandable representation formalisms. The most important step in this process is data mining, performed by using methods, techniques and tools for automated constructions of pattrns and models from data.

The course objectives are to (a) introduce the basics of data mining, (b) outline the process of knowledge discovery in databases and the CRISP-DM methodology, (c) present the methodology for result evaluation, (d) present selected data mining methods and techniques, and (e) empower the students with the skills for practical use of selected data mining tools.

The students will master the basics of data preprocessing, data mining and knowledge discovery and will be capable of using selected data mining tools and results evaluation methods in practice.

Prerequisites

Basic knowledge of mathematics, computer science and informatics is requested.

Content (Syllabus outline)

1. Introduction

2. Data mining following the CRISP-DM methodology

3. Data mining techniques:

- Heuristics for model and patterns construction

- Quality of learned models and discovered patterns

- Methodology for results evaluation

4. Practical use of selected data mining tools

Intended learning outcomes

Knowledge and understanding:

Mastering of selected data mining methods and techniques, capability of data preprocessing, practical use of selected data mining techniques, and capability of using and interpreting the methods for result evaluation.

Readings

Selected chapters from the following books:

  • D. Mladenić, N. Lavrač, M. Bohanec, S. Moyle (eds.) Data Mining and Decision Support: Integration and Collaboration. Kluwer 2003. ISBN 1-4020-7388-7
  • J.H. Witten, E. Frank, M.A. Hall: Data Mining: Practical Machine Learning Tools and Techniques (Third Edition), Morgan Kaufmann, 2011. ISBN 978-0-12-374856-0
  • M. Berthold (ed.), Bisociative Knowledge Discovery, Springer, 2012. ISBN 978-3-642-31829-0
  • J. Fuernkranz, D. Gamberger, N. Lavrač: Foundations of Rule Learning. Springer, 2012. ISBN 978-3-540-75196-0

Assessment

Competence evaluation: • By written exam we evaluate the basic knowledge of data mining and knowledge discovery, and the knowledge of knowledge discovery process following the CRISP-DM methodology • By seminar or project work and its oral defence we evaluate practical competencies of using data mining tools and methods for results evaluation 50/50

Lecturer's references

Prof. dr. Nada Lavrač, full professor in the field of Computer Science

Principal education and research areas: Knowledge technologies, machine learning, data mining and text mining, relational data mining and inductive logic programming, combining data mining and decision support, computational creativity, knowledge management, marketing, and virtual enterprises, applications of machine learning and data mining techniques in biomedicine, healthcare, and life sciences

Professional career: From 1978 employed at Institute “Jožef Stefan”; founder and head of Department of Knowledge Technologies; since 2002 research councillor IJS; since 2007 full professor at University of Nova Gorica and International Postgraduate School Jožef Stefan; 1996-1998 vice-president of ECCAI (European Coordination Committee for AI); since 1999 board member of AIME (Artificial Intelligence in Medicine); since 2000 leading Data mining section of Slovenian Statistical Society; member of Slovenian AI Society SLAIS.

Publications and achievements: author of numerous scientific papers, author of four scientific monographs, editor of numerous books and proceedings, author of two outstanding scientific achievements (2011 and 2012), coordinator of two EU projects, Slovenian principal investigator of over ten EU projects worth over 3 Mio EUR. Awards: 2013 Zois recognition award for important scientific contributions to intelligent data analysis, 2007 ECCAI Fellow Award for pioneering research and advances in the field of Artificial Intelligence in Europe, 1998 Ambassador of Science of the Republic of Slovenia for outstanding research and contribution to international recognition of Slovenian science, 1986 National award for research excellence (Boris Kidrič Fund Award) for research in knowledge synthesis and qualitative modelling (system KARDIO for ECG diagnosis of cardiac arrhythmias, later published as monograph Kardio: A Study in Deep and Qualitative Knowledge for Expert Systems, MIT Press, 1989, coauthor).

Selected bibliography

Nada Lavrač:

  • Gamberger D., Lavrač, N.: Expert-Guided Subgroup Discovery: Methodology and Application, Journal of Artificial Intelligence Research 17 (2002), 501-527.
  • Lavrač N., Džeroski, S.: Inductive Logic Programming: Techniques and Applications. Ellis Horwood, 1994.
  • Lavrač N., Kavšek, B., Flach P. A., Todorovski, L.: Subgroup discovery with CN2-SD. Journal of Machine Learning Research, 5 (2004), 153-188.
  • Železny F., Lavrač N.: Propositionalization-based relational subgroup discovery with RSD. Machine Learning 62 :1-2 (2006), 33-63.
  • Fuernkranz J., Gamberger D., Lavrač N.: Foundations of Rule Learning. Springer 2012.

University course code: 2GI018

Year of study: 1

Course principal:

Lecturer:

ECTS: 6

Workload:

  • Lectures: 30 hours
  • Individual work: 120 hours

Course type: elective

Languages: slovene

Learning and teaching methods:
• lectures • seminar • exercises • individual work students need to have access to computers and data mining tools. use of data mining tools weka and/or orange is planned.