数据仓库与数据挖掘》教学大纲

课程代码

045100931

课程名称

数据仓库与数据挖掘

英文名称

Data Warehouse and Data Mining

课程类别

选修课

课程性质

选修

学时

总学时:48实验学时:16实习学时:0其他学时:0

学分

2.5

开课学期

6

开课单位

计算机科学与工程学院

适用专业

计算机科学与技术(全英创新班)、(全英联合班)

授课语言

英文

先修课程

计算机科学概论

课程对毕业要求的支撑

 1.(工程知识)培养学生熟练掌握英语,掌握扎实的计算机科学与技术专业基本原理、方法和手段等方面的基础知识用于解决复杂工程问题,并通过计算机系统分析、建模和计算等方面的先进方法,为将所学基础知识应用到计算机科学与技术研发和工程实践做好准备。

2.(问题分析)培养学生能够创造性地利用计算机科学基本原理解决计算机领域遇到的问题。

 3.(设计/开发解决方案)能够设计针对计算机工程复杂问题的解决方案,设计满足特定需求的计算机软硬件系统,并能够在设计环节中体现创新意识,考虑社会、健康、安全、法律、文化以及环境等因素。

4.(研究) 培养学生具备计算机系统相关知识并对计算机工程复杂问题进行研究,具有计算机系统研发基本能力、具备问题分析和建模的能力,具有系统级的认知能力和实践能力,掌握自底向上和自顶向下的问题分析方法。

课程目标

完成课程后,学生将具备以下能力:

1)掌握数据仓库的基本原理和基本知识,培养学生能对业务活动中产生的海量数据的进行采集、清理、存储、分析、使用与维护的能力。[123]

2)掌握数据挖掘的基本原理和基本知识,培养学生利用数据挖掘技术在数据仓库中发现隐藏在海量数据中人们未知的、有价值的信息的能力。[1234]

课程简介

本课程为计算机科学与技术领域的专业选修课程,主要内容包括:数据挖掘的基本知识、数据预处理方法、数据仓库与联机分析处理方法、聚类分析方法、分类方法、关联分析方法。


教学内容与学时分配

(一)概述                          4学时

教学要求:了解数据仓库与数据挖掘的背景、主要相关技术以及实际应用

(二)数据探索                               4学时

1数据对象与属性                          1学时

2数据的基本统计描述                      1学时

3数据可视化                         1学时

4)数据相似度度量 1学时

教学要求:了解数据对象与属性的定义,掌握数据的基本统计描述、数据可视化方法、数据的相似度度量方法

(三)数据预处理                             4学时

1数据清洗 1学时

2数据整合 1学时

3数据规约 1学时

4)数据转换 1学时

教学要求:了解数据预处理的重要性,了解基本的数据清洗、数据整合、数据规约、数据转换(归一化)方法

(四)数据仓库与联机分析处理4学时

1)数据仓库:基本概念                      1学时

2)数据立方体 1学时

3)联机分析处理(OLAP1学时

4)数据仓库的操作 1学时

教学要求:了解数据仓库的基本概念,了解数据立方体及联机分析处理(OLAP)方法,了解数据仓库的基本操作方法。

(五)聚类分析                               6学时

1聚类分析:基本概念            1学时

2分区方法                                1学时

3)分层方法 1学时

4)基于密度的方法 1学时

5)基于网格的方法 1学时

6)聚类的评估 1学时

教学要求:掌握聚类分析的基本概念,掌握聚类分析的分区方法(k-means)和分层方法(自顶向下、自底向上),了解基于密度的聚类方法和基于网格的聚类方法,掌握聚类方法的评估方式。

(六)分类                               5学时

1分类:基本概念                          1学时

2决策树 1学时

3贝叶斯分类方法                          1学时

4)前向神经网络 1学时

5)集成方法 1学时

教学要求:掌握数据分类的基本概念,掌握决策树方法、贝叶斯分类方法、前向神经网络和集成学习方法。

(七)关联分析                      3学时

1关联分析:基本概念               1学时

2频繁模式树                              1学时

3模式评估方法 1学时

教学要求:掌握关联分析的基本概念,掌握关联分析的频繁模式树方法,掌握模式的评估方法

(八)课程总结                               2学时

教学要求:对本门课程有一个全面系统的认识和把握。


实验教学(包括上机学时、实验学时、实践学时)

16学时

教学方法

以课堂教学、实验教学、课外作业及综合讨论等共同实施。

考核方式

课堂表现:10%

课程作业40%

课程报告50%

教材及参考书

现用教材:

 [1]Jiawei Han等编著,数据挖掘:概念与技术,机械工业出版社,2012


主要参考资料:

[1]MehmedKantardzic著,数据挖掘:概念、模型、方法和算法,清华大学出版社,2013

[2] 袁汉宁王树良等编,数据仓库与数据挖掘,人民邮电出版社,2015

[3] 周根贵主编,数据仓库与数据挖掘,浙江大学出版社,2005

[4] 李春葆等编,数据仓库与数据挖掘实践,电子工业出版社,2014

[5] 郑岩主编,数据仓库与数据挖掘原理及应用,清华大学出版社,2015

制定人及制定时间

龚月姣,2019420


 “Data Warehouse and Data Mining” Syllabus

Course Code

045100931

Course Title

Data Warehouse and Data Mining

Course Category

Elective Course

Course Nature

Elective Course

Class Hours

Total: 48   Laboratory: 16

Credits

2.5

Semester

6

Institute

School of Computer Science and Engineering

ProgramOriented

Computer Science and Technology

Teaching Language

Full English Teaching

Prerequisites

Foundations of Computer Science

 Student Outcomes

 (Special Training Ability)

  1.  Engineering Knowledge: An ability to apply knowledge of English, solid knowledge of professional basic principles, methods and means of computer science and technology for solving complex engineering problems, to well prepare the required knowledge applied to the computer science and technology research & development and engineering practice through computer systems analysis, modeling and calculation and any other aspects of the advanced approach.

  2.  Problem Analysis: An ability to creatively use the basic principles of computer science to solve the problems encountered in the computer field.

  3.  Design / Development Solutions: An ability to design solutions for computer engineering complex problems, to design computer hardware and software systems that meet with specific requirements, and to embody innovation awareness in the design process and take into account social, health, safety, cultural and environmental factors.

  4. Research: An ability to develop computer system-related knowledge and research computer engineering complex issues, to develop the basic capacity of computer systems research & development, systematic cognitive and practice, master the Bottom-up and top-down problem analysis methods.

Course Objectives

After complete this course, students will have the following competencies:

(1) ability to master the basic principle and basic knowledge of data warehouse, and to collect, clean, store, analyze, use and maintain the massive data produced in the business activities. [123]

(2) ability to grasp the basic principles and basic knowledge of data mining, and to use data mining technology to discover the unknown and valuable information hidden in vast amounts of data.[1234]

Course Description

This course is an elective course in the discipline of computer science and technology. The main contents include: data mining backgrounds, data preprocessing, data warehousing and online analytical processing, clustering analysis, classification, and association analysis.

Teaching Content and Class Hours Distribution

1. Introduction, 4 class hours


2.Data Exploration, 4 class hours

(1)Data object and attribute types, 1 class hour

(2)Basic statistical descriptions of data, 1class hour

(3)Data visualization, 1 class hour

(4)Measuring data similarity and dissimilarity, 1 class hour


3.Data Preprocessing, 4 class hours

(1)Data cleaning, 1 class hour

(2)Data integration, 1 class hour

(3)Data reduction, 1 class hour

(4)Data transformation, 1 class hour


4.Data Warehousing and Online Analytical Processing, 4 class hours

(1)Data warehouse: basic concepts, 1 class hour

(2)  Data cube, 1 class hour

(3)Online analytical processing (OLAP), 1 class hour

(4)Data warehouse implementation, 1 class hour


5.Clustering Analysis, 6 class hours

(1)Cluster analysis: basic concepts, 1 class hour

(2)Partitioning methods, 1 class hour

(3)Hierarchical methods, 1 class hour

(4)Density-based methods, 1 class hour

(5)Grid-based methods, 1 class hour

(6)Evaluation of clustering, 1 class hour


6.Classification, 5 class hours

(1)Classification: basic concepts, 1 class hour

(2)Decision tree, 1 class hour

(3)Bayes classification methods, 1 class hour

(4)Feed-forward neural network, 1 class hour

(5)  Ensemble methods, 1 class hour


7.Association analysis, 3 class hours

(1)Association analysis: basic concepts, 1 class hour

(2)Frequent pattern (FP) tree, 1 class hour

(3)Pattern evaluation methods, 1 class hour


8. Summary, 2 class hours


Experimental Teaching

16 class hours

Teaching Method

The course teaching is carried out by classroom teaching, experiment teaching, homework and comprehensive discussion.

Examination Method

Class performance: 10%

Assignments: 40%

Final examination (closed): 50%

Teaching Materials and Reference Books

Teaching Materials:

[1] Jiawei HanData Mining: Concepts and Techniques, Third EditionChina Machine Press2012


Reference Books:

[1] MehmedKantardzicData Mining: Concepts, Models, Method, and AlgorithmsTsinghua University Press2013

[2] Hanning YuanData Warehouse and Data MningPost & Telecom Press2015

[3] Gengui ZhouData Warehouse and Data MiningZhejiang University Press2005

[4] Chunbao LiData Warehouse and Data Mining: Practice and ApplicationsPublishing House of Electronics Industry2014

[5]  Yan ZhengData Warehouse and Data MiningTsinghua University Press2015


Prepared by Whom and When

Yue-Jiao Gong, April 20, 2019


数据仓库与数据挖掘》实验教学大纲


课程代码

045100931

课程名称

数据仓库与数据挖掘

英文名称

Data Warehouse and Data Mining

课程类别

选修课

课程性质

选修

学时

总学时:48实验学时:16实习学时:0其他学时:0

学分

2.5

开课学期

6

开课单位

计算机科学与工程学院

适用专业

计算机科学与技术(全英创新班)、(全英联合班)

授课语言

英文

先修课程

计算机科学概论

毕业要求(专业培养能力)

 1.(工程知识)培养学生熟练掌握英语,掌握扎实的计算机科学与技术专业基本原理、方法和手段等方面的基础知识用于解决复杂工程问题,并通过计算机系统分析、建模和计算等方面的先进方法,为将所学基础知识应用到计算机科学与技术研发和工程实践做好准备。

2.(问题分析)培养学生能够创造性地利用计算机科学基本原理解决计算机领域遇到的问题。

 3.(设计/开发解决方案)能够设计针对计算机工程复杂问题的解决方案,设计满足特定需求的计算机软硬件系统,并能够在设计环节中体现创新意识,考虑社会、健康、安全、法律、文化以及环境等因素。

4.(研究) 培养学生具备计算机系统相关知识并对计算机工程复杂问题进行研究,具有计算机系统研发基本能力、具备问题分析和建模的能力,具有系统级的认知能力和实践能力,掌握自底向上和自顶向下的问题分析方法。

课程培养学生的能力(教学目标)

完成课程后,学生将具备以下能力:

1)掌握数据仓库的基本原理和基本知识,培养学生能对业务活动中产生的海量数据的进行采集、清理、存储、分析、使用与维护的能力。[123]

2)掌握数据挖掘的基本原理和基本知识,培养学生利用数据挖掘技术在数据仓库中发现隐藏在海量数据中人们未知的、有价值的信息的能力。[1234]

课程简介

本课程为计算机科学与技术领域的专业选修课程,主要内容包括:数据挖掘的基本知识、数据预处理方法、数据仓库与联机分析处理方法、聚类分析方法、分类方法、关联分析方法。

主要仪器设备与软件

计算机、Python编程环境或MATLAB编程环境

实验报告

包含算法步骤,实验设置,和输出结果。

考核方式

实验报告检查

教材、实验指导书及教学参考书目

[1]Jiawei Han等编著,数据挖掘:概念与技术,机械工业出版社,2012

[2]张良均等编著,Python数据分析与挖掘实战,机械工业出版社,2015

[3]张良均等编著,MATLAB数据分析与挖掘实战,机械工业出版社,2015

制定人及发布时间

龚月姣,201953

《数据仓库与数据挖掘》实验教学内容与学时分配

实验项目编号

实验项目名称

实验学时

实验内容提要

实验类型

实验要求

每组人数

主要仪器设备与软件

1

数据可视化

4

自选数据集,结合三种以上的数据可视化方法,对数据集进行分析。

验证性

必做

2

计算机、Python编程环境或MATLAB编程环境

2

数据聚类

4

指定数据集,采用两种不同算法完成数据聚类工作,输出结果图表,对算法性能进行对比分析。

验证性

必做

2

计算机、Python编程环境或MATLAB编程环境

3

数据分类

4

指定数据集,采用两种不同算法完成数据分类工作,输出结果图表,对算法性能进行对比分析。

验证性

必做

2

计算机、Python编程环境或MATLAB编程环境

4

综合大作业

4

设想一个应用场景,其中需要用到某类数据挖掘与分析的技术,实现该应用,并撰写一份详细的技术文档或论文。

探索性

必做

4

计算机、Python编程环境或MATLAB编程环境


 “Data Warehouse and Data MiningSyllabus

Course Code

045100931

Course Title

Data Warehouse and Data Mining

Course Category

Elective Course

Course Nature

Elective Course

Class Hours

Total: 48   Laboratory: 16

Credits

2.5

Semester

6

Institute

School of Computer Science and Engineering

Program Oriented

Computer Science and Technology

Teaching Language

Full English Teaching

Prerequisites

Foundations of Computer Science

Student Outcomes (Special Training Ability)

  1.  Engineering Knowledge: An ability to apply knowledge of English, solid knowledge of professional basic principles, methods and means of computer science and technology for solving complex engineering problems, to well prepare the required knowledge applied to the computer science and technology research & development and engineering practice through computer systems analysis, modeling and calculation and any other aspects of the advanced approach.

  2.  Problem Analysis: An ability to creatively use the basic principles of computer science to solve the problems encountered in the computer field.

  3.  Design / Development Solutions: An ability to design solutions for computer engineering complex problems, to design computer hardware and software systems that meet with specific requirements, and to embody innovation awareness in the design process and take into account social, health, safety, cultural and environmental factors.

  4. Research: An ability to develop computer system-related knowledge and research computer engineering complex issues, to develop the basic capacity of computer systems research & development, systematic cognitive and practice, master the Bottom-up and top-down problem analysis methods.

Teaching Objectives

After complete this course, students will have the following competencies:

(1) ability to master the basic principle and basic knowledge of data warehouse, and to collect, clean, store, analyze, use and maintain the massive data produced in the business activities. [123]

(2) ability to grasp the basic principles and basic knowledge of data mining, and to use data mining technology to discover the unknown and valuable information hidden in vast amounts of data.[1234]

Course Description

This course is an elective course in the discipline of computer science and technology. The main contents include: data mining backgrounds, data preprocessing, data warehousing and online analytical processing, clustering analysis, classification, and association analysis.

Instruments and Equipments

Computers, Python or MATLAB programming environments

Experiment Report

Should contain the algorithmic procedures, experimental settings, and experimental results.

Assessment

Experimental report

Teaching Materials and Reference Books

[1]Jiawei HanData Mining: Concepts and Techniques, Third EditionChina Machine Press2012

[2]Liangjun Zhang et al.Python Practice of Data Analysis and MiningChina Machine Press2015

[3]Liangjun Zhang et al.MATLAB Data Analysis and Data MiningChina Machine Press2015

Prepared by Whom and When

Yue-Jiao Gong, May 3, 2019

CourseTitle” Experimental Teaching Arrangements

No.

Experiment Item

Class Hours

Content Summary

Category

Requirements

Number of Students Each Group

Instruments, Equipments and Software

1

Data Visualization

4

Choose any datasets, incorporate 3+ data visualization techniques, analyze the datasets.

Verification

Compulsory

2

Computers, Python or MATLAB programming environments

2

Data Clustering

4

Destinate datasets, apply 2 different algorithms to perform data clustering, output the results, make comparisons between the two algorithms.

Verification

Compulsory

2

Computers, Python or MATLAB programming environments

3

Data Classification

4

Destinate datasets, apply 2 different algorithms to perform data classification, output the results, make comparisons between the two algorithms.

Verification

Compulsory

2

Computers, Python or MATLAB programming environments

4

Term Essay

4

Find an application scenario that needs data mining techniques, implement the application, finish a technical report or a research paper.

Exploratory

Compulsory

4

Computers, Python or MATLAB programming environments