Course details

General
FacultyHealth Sciences
DepartmentMedicine
Education levelPostgraduate / Master of Science
Course codeE4Semester2
Course titleData Mining
Independent teaching activitiesHours per weekECTS
Lectures2
Practice3
Total54
CoursetypeGeneral setting course, skills development
Prerequisite courses

  • Basic knowledge of Probabilistic Theory and Statistics

  • Basic knowledge of Data Structures

Teaching and assessment languageEnglish

Learning outcomes

Objective

The course aims to introduce students to the knowledge discovery process with a focus on data mining techniques. Within the context of the course the following concepts will be discussed and elaborated: data mining principles (datamining), prediction and characterization (classification), grouping and aggregation (clustering), associative analysis (association rule mining), anomaly detection (outlier analysis).

Knowledge and Capacities

Upon completion of the course, graduate students are expected to:

  • Know the basic principles of data mining theory and the main application domains
  • Understand the fundamental data mining methods and algorithms
  • Apply well-known algorithms to pilot problems
  • Select the most efficient algorithm, based on problem requirements
  • Design the methodology for data mining problems of medium complexity

Course contents

  1. Introduction to Data Mining:
    Definitions Examples Application areas
  2. Data Exploration
  3. Data Preparation and Preprocessing
  4. Data mining techniques (Part I):
    Classification
    Overview Definitions Algorithms
  5. Data mining techniques (Part II):
    Clustering
    Overview Definitions – Algorithms
  6. Data mining algorithms (Part III):
    Association Rules Extraction
    Overview Definitions Algorithms
  7. Advanced Data Mining Topics:
    Ensemble methods – Outlier Analysis

Teaching and learning methods – evaluation

Teaching methodsFace to face
Distance learning
Use of information and
communication technologies (ICT)

  • Use of ICT in Teaching- Moodle Virtual learning environment (VLE)
    (asynchronous learning, wikis, Online Discussion Fora, Educational Portfolio, assignment submission, assessment process)

  • Use of ICT in Communication with students
    (email, instant messaging via Moodle)

Module structureWork Hours per SemesterActivity
Lectures 30
Exercises (Quiz) 10
Exercises (Online discussion fora) 10
Essay background work60
Essay writing20
Overall work for the course130
Assessment Methods
  • One project on data mining problems comprising the models built, supporting material (code, datasets etc), as well as a report, 4000-5000 words long, to be submitted by each student at the end of the course (30%).

  • Two online quizzes, with multiple choice questions (30%+30%)

  • Assessment based on comments submitted by each student in online discussion fora (10%)

Recommended Bibliography

  1. Introduction to Data mining, P. Tan, M. Steinbach & V. Kumar, Addison Wesley, 2005.
  2. Data Mining; Concepts and Techniques, 2nd edition, J. Han and M. Kamber, Morgan Kaufmann, 2006.

Supporting material

  • Notes, Video Lectures, Exercises, Lab demos