《数据仓库与数据挖掘》实验教学大纲
课程代码 | 045100931 |
课程名称 | 数据仓库与数据挖掘 |
英文名称 | DataWarehouse and Data Mining |
课程类别 | 选修课 |
课程性质 | 选修 |
学时 | 总学时:48 实验:16 |
学分 | 2.5 |
开课学期 | 第七学期 |
开课单位 | 计算机科学与工程学院 |
适用专业 | 计算机科学与技术、网络安全、信息安全 |
授课语言 | 中英双语授课 |
先修课程 | 高级语言程序设计、算法设计与分析 |
毕业要求(专业培养能力) | 本课程对学生达到如下毕业要求有如下贡献: (1)工程知识:能够将数学、自然科学、工程基础和专业知识用于解决复杂工程问题。 (2)问题分析:能够应用数学、自然科学和工程科学的基本原理,识别、表达、并通过文献研究分析复杂工程问题,以获得有效结论。 (3)研究:能够基于科学原理并采用科学方法对复杂工程问题进行研究,包括设计实验、分析与解释数据、并通过信息综合得到合理有效的结论。 |
课程培养学生的能力(教学目标) | 本实验课程培养学生在掌握课堂所学理论知识(算法)基础上的实际动手能力,能够使用高级程序设计语言独立编程实现数据预处理、关联规则挖掘、分类、聚类分析、链接分析、数据摘要等算法,并用实际数据验证挖掘效果。[1,2, 3] |
课程简介 | 本课程是一门培养学生具有一定数据分析能力的选修课。课程的主要目的是让学生掌握数据仓库与数据挖掘基本概念与算法,针对实际工作与应用中产生的大数据,用数据挖掘技术来发现数据中隐藏的知识或规律,从而为生产、生活、商务活动、社会活动等提供决策支持。要求学生通过本课程的学习,认识数据仓库和数据挖掘在当今大数据与新人工智能时代中的重要作用,了解数据仓库的基本原理和实现方法,熟练掌握数据预处理技术和数据挖掘常用算法(包括关联分析、分类与预测、聚类分析、链接分析、数据摘要等)及其程序设计,能够用所学知识解决实际数据分析问题。 |
主要仪器设备与软件 | 配置有Python等高级程序设计语言的个人计算机。 |
实验报告 | 实验报告应包括以下主要内容,并以电子版提交: (1)实验任务描述 (2)实验目的 (3)实验数据的描述(包括数据的来源、数据的特征,如应用领域、数据集的大小、特征的数据类型、特征数目等) (4)实验过程 (5)实验结果及分析(包括得出的结论、存在的问题及可能的改进方向等) (6)程序源代码 |
考核方式 | 实验成绩由以下三部分综合评定: (1)程序设计的正确性(40%) (2)实验结果的合理性(30%) (3)实验报告的规范性(30%) |
教材、实验指导书及教学参考书目 | 教材: 数据挖掘概念与技术,JiaweiHand(韩家炜),M.Kamber, Jian Pei(裴健),北京:机械工业出版社(第三版),2012。 实验指导书:自编 说明:由于数据挖掘领域发展的日新月异,新技术、新算法、新工具每年都大量涌现(仅每年的领域顶级学术会议ICML、NIPS、KDD等就有上千篇论文发表),为了尽力保证相关实验的新颖性与先进性,可能每年实现的具体算法都有所不同,因此实验指导书每年再具体下达。 教学参考参考书目及网络资源: [1]周志华,机器学习,清华大学出版社,2016。 [2]StevenBird, Ewan Klein, Edward Loper. Natural Language Processing withPython, O'REILLY,2012. [3]Christopher M.Bishop. Pattern Recognition and Machine Learning. Springer, 2006. [4]I. Goodfellow, Y. Bengio, A. Courville. Deep Learning. MIT Press,2015. [5]荫蒙(InmonW.H)著,王志海 等 译,数据仓库(原书第四版),机械工业出版社,2011。 [6]Charu C. Aggarwal. Neural Networks and Deep Learning. Springer,2018. [7]数据挖掘开源工具包:https://scikit-learn.org/stable/. [8]深度学习开源平台:https://keras.io/. |
制定人及发布时间 | 王家兵,2019年4月5日 |
《数据仓库与数据挖掘》实验教学内容与学时分配
实验项目编号 | 实验项目名称 | 实验学时 | 实验内容提要 | 实验类型 | 实验要求 | 每组人数 | 主要仪器设备与软件 |
1 | 分类算法实现 | 6 | 实现某一分类算法(具体算法在实验前下达),并用实际数据加以验证。 | 综合性 | 必做 | 1 | 配置有Python等高级程序设计语言的个人计算机 |
2 | 聚类算法实现 | 6 | 实现某一聚类算法(具体算法在实验前下达),并用实际数据加以验证。 | 综合性 | 必做 | 1 | 配置有Python等高级程序设计语言的个人计算机 |
3 | 链接分析或数据摘要算法实现 | 4 | 实现链接分析或数据摘要算法(具体算法在实验前下达),并用实际数据加以验证。 | 综合性 | 必做 | 1 | 配置有Python等高级程序设计语言的个人计算机 |
“DataWarehouse and Data Mining”Syllabus
Course Code | 045100931 |
CourseTitle | DataWarehouse and Data Mining |
CourseCategory | ElectiveCourse |
CourseNature | ElectiveCourse |
Class Hours | ClassHours: 48 Lab Hours: 16 |
Credits | 2.5 |
Semester | 7th |
Institute | The Schoolof Computer Science and Engineering |
ProgramOriented | ComputerScience and Technology, NetworkEngineering,Information Security |
TeachingLanguage | Bilingualteaching in Chinese and English |
Prerequisites | AdvancedLanguage Programming, The Design and Analysis of ComputerAlgorithms |
StudentOutcomes (Special Training Ability) | (1).Engineering Knowledge: An ability to apply knowledge ofmathematics, science, engineering fundamentals and engineeringspecialization to the solution of complex engineering problems. (2).Problem Analysis: An ability to identify, formulate and analyzecomplex engineering problems, reaching to substantiatedconclusions using basic principles of mathematics, science, andengineering. (3).Research: An ability to conduct investigations of complexengineering problems based on scientific theories and adoptingscientific methods including design of experiments, analysis andinterpretation of data and synthesis of information to providevalid conclusions. |
CourseObjectives | Theexperimental course develops students' ability to implement datamining algorithms, such as data preprocessing, association rulemining, classification, clustering analysis, link analysis, datasummarization, etc., using advanced programming language. [1, 2,3] |
CourseDescription | This courseintroduces the concepts and techniques of data warehouse and datamining. Data mining, also popularly referred to as knowledgediscovery in databases (KDD), is the automated or convenientextraction of patterns representing knowledge implicitly stored orcaptured in large databases, data warehouses, the web, other bigdata. Contents include data preprocessing (data cleaning, dataintegration, data transformation, data reduction, and datadiscretization), data warehouse and OLAP technology, datawarehouse implementation, the implementations of mining frequentpatterns and association rules, classification and prediction,cluster analysis, linkage analysis and data summarizationalgorithms. |
Instrumentsand Equipments | PersonalComputer with Python programming language. |
ExperimentReport | Theexperimental report should include the following contents andsubmit in electronic form: (1)Experimental task description (2)The purpose of experiment (3)The description of experimental data (including data sources, datacharacteristics, such as application areas, data set size, thedata type, the number of features, etc.) (4)The experimental setup (5)The experimental results and analysis (including the conclusionsdrawn, the existing problems and possible directions forimprovement) (6)The source code |
Assessment | Thescore consists of the following three parts: (1)The correctness of programming (40%) (2)The rationality of the experimental results (30%) (3)The standardization of the experiment report (30%) |
TeachingMaterials and Reference Books | Textbook: JiaweiHan, M. Kamber, Jian Pei. Data Mining: Concepts and Techniques(3rd Edition). China Machine Press. 2012. Referencebooks and internet resources: [1]Zhi-Hua Zhou. Machine Learning (in Chinese). Tsinghua University,Press, 2016. [2]StevenBird, Ewan Klein, and Edward Loper. Natural Language Processingwith Python. O'REILLY,2012. [3]Christopher M.Bishop. Pattern Recognition and Machine Learning, Springer, 2006. [4]I. Goodfellow, Y. Bengio, A. Courville. Deep Learning. MIT Press,2015. [5]Inmon W.H. Buildingthe Data Warehouse, 4th Edition,ISBN:978-0-7645-9944-6,Wiley, 2005. [6]Charu C. Aggarwal. Neural Networks and Deep Learning. Springer,2018. [7]Open Source Package for Data Mining:https://scikit-learn.org/stable/. [8] OpenSource Package for DeepLearning: https://keras.io/. |
Preparedby Whom and When | JiabingWang, 04-05-2019 |
“DataWarehouse and Data Mining”ExperimentalTeaching Arrangements
No. | ExperimentItem | ClassHours | ContentSummary | Category | Requirements | Numberof Students Each Group | Instruments,Equipments and Software |
1 | Implementationof Classification Algorithms | 6 | Implement aspecified classification algorithm, and verify the implementationusing actual data. | Comprehensive | Compulsory | 1 | PersonalComputer with Python programming language |
2 | Implementationof Clustering Algorithms | 6 | Implement aspecified clustering algorithm (the specific algorithm before theexperiment), and verify the implementation using actual data. | Comprehensive | Compulsory | 1 | PersonalComputer with Python programming language |
3 | Implementationof Linkage Analysis or Data Summarization Algorithms | 4 | Implement alinkage analysis or specified data summarization algorithm, andverify the implementation using actual data. | Comprehensive | Compulsory | 1 | PersonalComputer with Python programming language |