Data mining

Objectives and competences

Knowledge discovery in databases is a process of discovering patterns and models, described by rules or other human understandable representation formalisms. The most important step in this process is data mining, performed by using methods, techniques and tools for automated constructions of pattrns and models from data.

The course objectives are to (a) introduce the basics of data mining, (b) outline the process of knowledge discovery in databases and the CRISP-DM methodology, (c) present the methodology for result evaluation, (d) present selected data mining methods and techniques, and (e) empower the students with the skills for practical use of selected data mining tools.

The students will master the basics of data preprocessing, data mining and knowledge discovery and will be capable of using selected data mining tools and results evaluation methods in practice.

Prerequisites

Basic knowledge of mathematics, computer science and informatics is requested.

Content

  1. Introduction
  2. Data mining following the CRISP-DM methodology
  3. Data mining techniques:
    - Heuristics for model and patterns construction
    - Quality of learned models and discovered patterns
    - Methodology for results evaluation
  4. Practical use of selected data mining tools

Intended learning outcomes

Knowledge and understanding:

Mastering of selected data mining methods and techniques, capability of data preprocessing, practical use of selected data mining techniques, and capability of using and interpreting the methods for result evaluation.

Readings

Selected chapters from the following books:
• D. Mladenić, N. Lavrač, M. Bohanec, S. Moyle (eds.) Data Mining and Decision Support: Integration and Collaboration. Kluwer 2003. ISBN 1-4020-7388-7 E-version
• J.H. Witten, E. Frank, M.A. Hall: Data Mining: Practical Machine Learning Tools and Techniques (Third Edition), Morgan Kaufmann, 2011. ISBN 978-0-12-374856-0 E-version
• M. Berthold (ed.), Bisociative Knowledge Discovery, Springer, 2012. ISBN 978-3-642-31829-0 E-version
• J. Fuernkranz, D. Gamberger, N. Lavrač: Foundations of Rule Learning. Springer, 2012. ISBN 978-3-540-75196-0 E-version

Assessment

Competence evaluation:
• By written exam we evaluate the basic knowledge of data mining and knowledge discovery, and the knowledge of knowledge discovery process following the CRISP-DM methodology
• By seminar or project work and its oral defence we evaluate practical competencies of using data mining tools and methods for results evaluation

50/50

Lecturer's references

Prof. dr. Nada Lavrač, full professor in the field of Computer Science
Principal education and research areas: Knowledge technologies, machine learning, data mining and text mining, relational data mining and inductive logic programming, combining data mining and decision support, computational creativity, knowledge management, marketing, and virtual enterprises, applications of machine learning and data mining techniques in biomedicine, healthcare, and life sciences
Professional career: From 1978 employed at Institute “Jožef Stefan”; founder and head of Department of Knowledge Technologies; since 2002 research councillor IJS; since 2007 full professor at University of Nova Gorica and International Postgraduate School Jožef Stefan; 1996-1998 vice-president of ECCAI (European Coordination Committee for AI); since 1999 board member of AIME (Artificial Intelligence in Medicine); since 2000 leading Data mining section of Slovenian Statistical Society; member of Slovenian AI Society SLAIS.

Selected bibliography

• Gamberger D., Lavrač, N.: Expert-Guided Subgroup Discovery: Methodology and Application, Journal of Artificial Intelligence Research 17 (2002), 501-527.
• Lavrač N., Džeroski, S.: Inductive Logic Programming: Techniques and Applications. Ellis Horwood, 1994.
• Lavrač N., Kavšek, B., Flach P. A., Todorovski, L.: Subgroup discovery with CN2-SD. Journal of Machine Learning Research, 5 (2004), 153-188.
• Železny F., Lavrač N.: Propositionalization-based relational subgroup discovery with RSD. Machine Learning 62 :1-2 (2006), 33-63.
• Fuernkranz J., Gamberger D., Lavrač N.: Foundations of Rule Learning. Springer 2012.