《大数据技术》教学大纲
“Big Data Technology” Syllabus
Course Code | 045102751 |
Course Title | Big Data Technology |
Course Category | Specialty-related Course |
Course Nature | Elective Course |
Class Hours | Total: 40 laboratorial practice: 12 experiments: 0 field practice: 0 |
Credits | 2.5 |
Semester | Sixth term |
Institute | School of Computer Science and Technology |
ProgramOriented | Computer Science and Engineering, Network Engineering, Information Science |
Teaching Language | Chinese |
Prerequisites | “Computer Network”, “Operation System”, “Program designing”, “Database System” |
Student Outcomes (Special Training Ability) | This course contributes to the students’ ability from the aspects as follows: 1.Ideological and political construction: realize the organic integration of computer professional knowledge teaching and moral education; Inspire the students' patriotic spirit of "making the country prosperous by doing". 2. Engineering knowledge: students will learn the fundamental knowledge, basic professional principles, methodologies and techniques. Students will be trained to solve the problems in big data management and process by applying mathematics and their professional knowledge in the scope of computer science. The course enhances students’ ability to develop big data applications. 3. Problem analysis: students will learn to define, express and analyze the comprehensive problems in big data engineering by doing survey and applying mathematics, engineering techniques and their professional knowledge in the scope of computer science. 4. Problem solving: students will learn how to find the comprehensive solutions to the problems in big data engineering including the design of big data system, selection of critical techniques, implementation of workflows and planning. Students are promoted in innovative awareness through considering multiple factors (e.g., society, environment and security) in their designs. 5. Research ability: students will learn to do research on the problems in big data engineering by adopting scientific methodologies including experiments, data analysis and conclusion making. 6. Utilizing modern techniques: students will learn to select, utilize and develop tools and techniques available to anticipate and simulate problems in big data engineering. |
Teaching Objectives | After finishing the course: (1) Students should master the basic knowledge of distributed computing techniques, big data processing models, storage platforms, programming techniques and be trained in problem discovering and resolving. [I, II] (2) Students should master the basic methods and techniques for storing, processing and analyzing big data. [II, III, IV] (3) Students should master widely-used big data programming and be trained in designing and programming simple big data systems. [III, V] |
Course Description | This course is prepared for upperclassmen who have a good mastery of the basics of computer network, operating system, program design and database as well as have capability to develop an application. The objective of this course is to introduce the basic principles and development technology of traditional distributed computing, the storage and management of big data, platform for big data, the model of big data computing, principles of algorithm to analyze big data and how to design a framework for big data system as well as the application development technology. Students in this course should to read a lot of relevant literature about big data, in order to form a perception of the technology. Besides, students need to do some experiment which is necessary to master how to use tools to analyze and program for big data. We hope student can discover, solve and apply the technology of big data during the real work instead of just knowing the basic principles of managing big data platforms or the way to analyze. The knowledge modules of the course include basic knowledge of distributed computing, technology of distributed computing programming, technology of big data storage platform, computational model for big data, big data analysis and processing technology, technology of big data programing development, and technology of big data application development. |
Teaching Content and Class Hours Distribution | I. Introduction about the course2 hours Main content: Knowledge about the basic tasks, main targets of the course and the application of big data technology in computer science. II. Foundation of Distributed Computing2hours (1) Concepts in Distributed Computing1 hour (2) Distributed Computing Paradigm 1 hour Main content: The definition of distributed computing, its advantages/shortcomings, classical projects of distributed computing, basic concepts and theories in distributed computing (e.g., parallel computing, network computing, P2P computing, grid computing, cloud computing, fog computing and big data). Focus: foundations of different distributed computing models. Difficult points: understanding the difference and association between different models. III. Programming in Distributed Computing 6 hours (1) Inter-Process Communication (IPC) 0.5 hour (2) Socket programming 2 hours (3) RMI programming 2 hours (4) P2P programming 1.5 hours Main Content: basic concepts and principles of IPC, Socket API foundation and Socket programming, concepts of RMI and P2P, programming basics for RMI and P2P. Focus: programming frameworks of distributed computing. Difficult Points: the principles of IPC, Socket programming, the application of RMI and P2P. IV. Big Data and Storage techniques 4 hours
Focus: basic knowledge and principles in distributed storage systems, technical principles and platforms of big data storage. Difficult Points: Various principles of distributed storage systems and big data storage. V. Big Data Computing Models 6 hours (1) Traditional parallel computing models (PRAM, BSP, LogP, etc.) 1 hour (2) MapReduce model 2 hours (3) Distributed memory model 2 hours (4) Big data stream processing 1 hour Focus: MapReduce model and its application to big data analysis, distributed memory computing model (Spark) and its application to big data analysis. Difficult Points: Big data analysis using MapReduce model and distributed memory computing model (Spark) VI. Big Data Processing Techniques 7 hours
Focus: System architectures and the basic application of Hadoop, Impala and Ali big data platforms. Difficult Points: understanding the technical principles of Hadoop platform and Impala platform. VII. Big Data Programming 7 hours
Focus: The data storage model of HDFS, data processing models of MapReduce and Spark. Difficult Points: designing parallel computing programs based on MapReduce and Spark. VIII. Techniques in Big Data Applications Development 6 hours
Focus: Exploiting big data techniques in developing big data applications, methods for big data analysis. Difficult Points: System design and development techniques of big data applications. |
Experimental Teaching | Yes |
Teaching Method | Combining lectures, assignments, laboratorial tasks, online activities and the research projects of the lecturer. |
Examination Method | The final score comprises of three parts with specified weights: Assignments and attendance: 20% Laboratorial tasks (with reports): 20% Final exam: 60% |
Teaching Materials and Reference Books | Suggested Textbooks: 林伟伟,刘波编著《分布式计算、云计算与大数据》,机械工业出版社,2017年,第二版次。 Main References: [1] 杨正洪著,《大数据技术入门》,清华大学出版社,2016 [2] 林子雨编著,《大数据技术原理与应用(第2版)》,人民邮电出版社出版,2017. [3] 张良均等著,《Hadoop大数据分析与挖掘实战》,机械工业出版社,2015 [4] M.L. Liu著,《分布式计算原理和应用》,清华大学出版社,2004 [5]孙宇熙著,《云计算与大数据》,人民邮电出版社,2017 [6]刘鹏著,《大数据》,电子工业出版社,2017 |
Prepared by Whom and When | Lin Weiwei, 6 July 2017. |
《大数据技术》实验教学大纲
课程代码 | 045102751 |
课程名称 | 大数据技术 |
英文名称 | Big Data Technology |
课程类别 | 专业领域课 |
课程性质 | 选修 |
学时 | 总学时:40 上机学时:12 实验学时:0实践学时:0 |
学分 | 2.5 |
开课学期 | 第六学期 |
开课单位 | 计算机科学与工程学院 |
适用专业 | 计算机科学技术、网络工程、信息安全 |
授课语言 | 中文授课 |
先修课程 | 计算机网络,操作系统,程序设计,数据库 |
毕业要求(专业培养能力) | 本课程对学生达到如下毕业要求有如下贡献: 1.工程知识:掌握扎实的基础知识、专业基本原理、方法和手段,能够将应用数学、自然科学、本专业基础知识和专业知识用于解决大数据的管理和分析计算问题,为大数据技术应用和相关工程实践打下基础。 2.问题分析:能够应用数学、自然科学和工程科学的基本原理,识别、表达、并通过文献研究分析大数据应用工程中的复杂问题,以获得有效结论。 3.设计/开发解决方案:能够设计针对大数据应用工程复杂问题的解决方案,包括满足特定需求的大数据系统设计、关键技术选择、应用工程实施流程或方案设计,并能够在设计环节中体现创新意识,考虑社会、健康、安全、法律、文化以及环境等因素。 4.研究:能够基于科学原理并采用科学方法对大数据应用工程复杂问题进行研究,包括设计实验、分析与解释数据、并通过信息综合得到合理有效的结论。 5.使用现代工具:能够针对大数据应用工程复杂问题,开发、选择与使用恰当的技术、资源、现代工程工具和信息技术工具,包括对复杂问题的预测与模拟,并能够理解其局限性。 |
课程培养学生的能力(教学目标) | 完成课程后,学生将具备以下能力: (1)掌握分布式计算技术、大数据的分析计算模型、存储平台、分析处理技术、编程开发技术的基本知识,培养学生发现问题、解决问题的基本能力。[1、2] (2)掌握大数据存储管理、加工处理和分析计算的基本原理和基本技术,学生具有大数据的分析管理基本能力。[1、3、4] (3)掌握常用的大数据编程和应用开发技术,并具有初步大数据应用系统设计能力,培养学生的大数据技术应用实践能力。[3、5] |
课程简介 | 本课程主要面向有一定的计算机网络,操作系统,程序设计和数据库基础知识,并且具有一定软件开发能力的高年级学生。课程主要介绍传统分布式计算的基本原理和基本开发技术,大数据存储管理和平台架构技术,大数据计算模型和分析处理算法原理,以及大数据系统构建和应用开发技术。课程需要学生阅读大量的相关文献来获得对技术的理解,还要求学生通过完成一系列实验来掌握大数据编程实践和分析处理技术方法及工具。通过本课程的学习,希望学生能够在了解和掌握大数据管理平台和分析处理技术的基础上,学会应用大数据处理技术解决现实数据处理、分析和应用问题。课程的知识模块包括分布式计算基础知识、分布式计算编程技术、大数据存储平台技术、大数据的计算模型、大数据分析处理技术、大数据编程开发技术、大数据应用开发技术七个方面。 |
主要仪器设备与软件 | 设备:PC服务器 软件:Java开发环境软件、Hadoop生态软件等 |
实验报告 | 要求给出实验的方法、步骤、过程和结论。 |
考核方式 | 实验报告:50% 实验操作:50% |
教材、实验指导书及教学参考书目 | 建议教材:林伟伟,刘波编著《分布式计算、云计算与大数据》,机械工业出版社,2017年,第二版次。 主要参考资料: [1] 杨正洪著,《大数据技术入门》,清华大学出版社,2016 [2] 林子雨编著,《大数据技术原理与应用(第2版)》,人民邮电出版社出版,2017. [3] 张良均等著,《Hadoop大数据分析与挖掘实战》,机械工业出版社,2015 [4] M.L. Liu著,《分布式计算原理和应用》,清华大学出版社,2004 [5]孙宇熙著,《云计算与大数据》,人民邮电出版社,2017 [6]刘鹏著,《大数据》,电子工业出版社,2017 |
制定人及发布时间 | 林伟伟,2017年7月6日 |
《大数据技术》实验教学内容与学时分配
实验项目编号 | 实验项目名称 | 实验学时 | 实验内容提要 | 实验类型 | 实验要求 | 每组人数 | 主要仪器设备与软件 |
1 | 分布式计算程序设计 | 4 | 基于Socket API或Java RMI客户服务器通信程序,通过客户端程序对服务器程序的调用,实现简单信息查询功能(如对服务器的文件信息查询)。 | 设计性 | 必做 | 1 | PC机、JAVA开发环境 |
2 | 大数据基本操作 | 4 | 掌握分布式文件系统HDFS的文件基本操作,熟悉MapReduce程序运行方法,掌握HBase数据库基本操作和Hive数据仓库基础使用,并能设计简单的大数据存储程序(如HDFS或HBase数据存储与读取程序)。 | 演示性 | 必做 | 1-2 | PC服务器、Hadoop生态软件 |
3 | 日志大数据分析计算 | 4 | 使用MapReduce或Hive工具分析日志大数据(如手机用户上网日志数据),实现日志的基本查询和统计功能(如通过统计用户上网日志数据TOP URL功能,实现用户上网偏好分析)。 | 综合性 | 必做 | 1-2 | PC服务器、Hadoop生态软件 |
“Big Data Technology” Experiment Syllabus
Course Code | 045102751 |
Course Title | Big Data Technology |
Course Category | Specialty-related Course |
Course Nature | Elective Course |
Class Hours | Total: 40 laboratorial practice: 12 experiments: 0 field practice: 0 |
Credits | 2.5 |
Semester | Sixth term |
Institute | School of Computer Science and Technology |
Program Oriented | Computer Science and Engineering, Network Engineering, Information Science |
Teaching Language | Chinese |
Prerequisites | “Computer Network”, “Operation System”, “Program designing” , “Database System” |
Student Outcomes (Special Training Ability) | This course contributes to the students’ ability from the aspects as follows: 1. Engineering knowledge: students will learn the fundamental knowledge, basic professional principles, methodologies and techniques. Students will be trained to solve the problems in big data management and process by applying mathematics and their professional knowledge in the scope of computer science. The course enhances students’ ability to develop big data applications. 2. Problem analysis: students will learn to define, express and analyze the comprehensive problems in big data engineering by doing survey and applying mathematics, engineering techniques and their professional knowledge in the scope of computer science. 3. Problem solving: students will learn how to find the comprehensive solutions to the problems in big data engineering including the design of big data system, selection of critical techniques, implementation of workflows and planning. Students are promoted in innovative awareness through considering multiple factors (e.g., society, environment and security) in their designs. 4. Research ability: students will learn to do research on the problems in big data engineering by adopting scientific methodologies including experiments, data analysis and conclusion making. 5. Utilizing modern techniques: students will learn to select, utilize and develop tools and techniques available to anticipate and simulate problems in big data engineering. |
Teaching Objectives | After finishing the course: (1) Students should master the basic knowledge of distributed computing techniques, big data processing models, storage platforms, programming techniques and be trained in problem discovering and resolving. [I, II] (2) Students should master the basic methods and techniques for storing, processing and analyzing big data. [II, III, IV] (3) Students should master widely-used big data programming and be trained in designing and programming simple big data systems. [III, V] |
Course Description | This course is prepared for upperclassmen who have a good mastery of the basics of computer network, operating system, program design and database as well as have capability to develop an application. The objective of this course is to introduce the basic principles and development technology of traditional distributed computing, the storage and management of big data, platform for big data, the model of big data computing, principles of algorithm to analyze big data and how to design a framework for big data system as well as the application development technology. Students in this course should to read a lot of relevant literature about big data, in order to form a perception of the technology. Besides, students need to do some experiment which is necessary to master how to use tools to analyze and program for big data. We hope student can discover, solve and apply the technology of big data during the real work instead of just knowing the basic principles of managing big data platforms or the way to analyze. The knowledge modules of the course include basic knowledge of distributed computing, technology of distributed computing programming, technology of big data storage platform, computational model for big data, big data analysis and processing technology, technology of big data programing development, and technology of big data application development. |
Instruments and Equipments | Equipment: PC server Software: Java Development Kit、Hadoop Development Environment |
Experiment Report | The method, procedure, process and conclusion of experiment are required |
Assessment | Experiment Report: 50% Experimental Operation: 50% |
Teaching Materials and Reference Books | Suggested Textbooks: 林伟伟,刘波编著《分布式计算、云计算与大数据》,机械工业出版社,2017年,第二版次。 Main References: [1] 杨正洪著,《大数据技术入门》,清华大学出版社,2016 [2] 林子雨编著,《大数据技术原理与应用(第2版)》,人民邮电出版社出版,2017. [3] 张良均等著,《Hadoop大数据分析与挖掘实战》,机械工业出版社,2015 [4]林伟伟,彭绍亮. 云计算与大数据技术理论及应用. 清华大学出版社. 2019.07 [5]孙宇熙著,《云计算与大数据》,人民邮电出版社,2017 [6]刘鹏著,《大数据》,电子工业出版社,2017 |
Prepared by Whom and When | Lin Weiwei, 6 July 2017. |
“Big Data Technology” Experimental Teaching Arrangements
No. | Experiment Item | Class Hours | Content Summary | Category | Requirements | Number of StudentsEach Group | Instruments, Equipments and Software |
1 | Distributed Computing Program Design | 4 | Preparing Client/Server’s communication program with Socket API or Java RMI, and realize the simple function of information inquiry (e.g. query the information of files on the server) | Design | Compulsory | 1 | PC\Java Development Environment |
2 | Basic Operation of Big Data | 4 | Master the basic operation of distributed file system HDFS, be familiar with how the program of MapReduce run, and master the basic operation of HBase database and how to use Hive data warehouse, as well as be able to design a simple program for big data storage (e.g. the program to read or store data from HDFS or HBase) | Demonstration | Compulsory | 1-2 | PC Server\ Hadoop Development Environment |
3 | The Analysis and Computing of Massive Log Data | 4 | Query and analyze the log data by using the tools of MapReduce or Hive which are designed for this (e.g. discover the preference of users when their surfing the Internet by analyzing the TOP URL in the log data) | Comprehensive | Compulsory | 1-2 | PC Server\ Hadoop Development Environment |