收藏 分享(赏)

introduction to data mining- 中科院计算所入门材料.ppt

上传人:天天快乐 文档编号:1181855 上传时间:2018-06-17 格式:PPT 页数:59 大小:7.39MB
下载 相关 举报
introduction to data mining- 中科院计算所入门材料.ppt_第1页
第1页 / 共59页
introduction to data mining- 中科院计算所入门材料.ppt_第2页
第2页 / 共59页
introduction to data mining- 中科院计算所入门材料.ppt_第3页
第3页 / 共59页
introduction to data mining- 中科院计算所入门材料.ppt_第4页
第4页 / 共59页
introduction to data mining- 中科院计算所入门材料.ppt_第5页
第5页 / 共59页
点击查看更多>>
资源描述

1、Data Warehousing & Data Mining,Keith C.C. ChanDepartment of ComputingThe Hong Kong Polytechnic University,2018/6/17,2,Instructor,Keith C.C. Chan, Department of ComputingOffice: PQ803Phone: 2766 7262Fax:2170 0106Email: cskcchancomp.polyu.edu.hk.Consultation Hours:Tuesdays, 5:306:30pm.Other time by ap

2、pointment.,2018/6/17,3,Text and References,Chan, K.C.C., Course Notes on Data Mining & Data Warehousing, Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, 2008.Inmon, W.H., Building the Data Warehouse, 2nd Edition, J. Wliley & Sons, New York, NY, 1996.White

3、horn, M., Business Intelligence: the IBM Solution: Datawarehousing and OLAP, Springer, London, 1999.Han, J., and Kamber, M., Data Mining: Concepts and Techniques, Morgan Kaufmann, San Francisco, CA, 2001.O.P. Rud, Data Mining Cookbook: Modeling Data for Marketing, Risk, and Customer Relationship Man

4、agement, J. Wiley, New York, NY, 2001.Groth, R., Data Mining: Building Competitive Advantage, Prentice Hall, Upper Saddle River, NJ, 1998.Kovalerchuk, B., Data Mining in Finance: Advances in Relational and Hybrid Methods, Kluwer Academic, Boston, 2000.Berry, M.J.A., Mastering Data Mining: the Art an

5、d Science of Customer Relationship Management, Wilery, New York NY, 2000.Berry, M.J.A., Data Mining Techniques for Marketing, Sales and Customer Support, Wilery, New York NY, 1997.Mattison, R., Data Warehousing and Data Mining for Telecommunications, Artech House, Boston, 1997.,2018/6/17,4,Course Ou

6、tline (1),Data MiningFrom data warehousing to data mining.Data pre-processing and data mining life-cycle.Association and sequence analysis; classification and clustering.Fuzzy Logic, Neural Networks, and Genetic Algorithms.Mining Complex Data.OLAP mining; spatial data mining; text mining; time-serie

7、s data mining; web mining; visual data mining.,2018/6/17,5,Course Outline (2),Data warehousing.Introduction; basic concepts of data warehousing; data warehouse vs. Operational DB; data warehouse and the industry.Architecture and design; two-tier and three-tier architecture; star schema and snowflake

8、 schema; data capturing, replication, transformation and cleansing.Data characteristics; metadata; static and dynamic data; derived data.Data Marts; OLAP; data mining; data warehouse administration.,2018/6/17,6,Aims and Objectives,The hype about data warehousing and data mining.Better understand too

9、ls by IBM, Microsoft, Oracle, SAS, SPSS.Job mobility and prospects.Projects and research thesis.,2018/6/17,7,2018/6/17,8,2018/6/17,9,2018/6/17,10,2018/6/17,11,Xerox,Company:Xerox CorporationJob Title:Content Management, Data Mining LeaderCountry:USAState/Province:New YorkCity:RochesterJob Type:Regul

10、arFull Time:YesDate Posted:Sep 01 2007Job Description and Responsibilities:Are you interested in innovating new generation software systems, products and services for Xerox?,2018/6/17,12,Xerox has a strong need for mining technologies both in cost reduction through intelligent post sales services an

11、d revenue generation services. The cost reduction services such as parts inventory optimization, pre-emptive customer service delivery, mining of new solutions, best usage practices, etc. Integration with CRM will allow more relevant and desirable personalization/customization in VI solutions.Knowle

12、dge of J2EE, .net, W3C standards such as XML, XML Schema, XSLT Hands of experience in using Java, C+, Web Services and Object Oriented development. Interest in declarative programming and ontologies/semantic web is a big plus. Ability to identify key applications, integrated solutions, analyze exist

13、ing document workflow, and suggest solutions to enhance profitability and productivity Strong analytical, written, and verbal communication skills,2018/6/17,13,Yahoo!,Company: Yahoo! Sep 7, 2007Position: Data Mining Engineer (req# *9055) Location: Sunnyvale, CA Web: Each day Yahoo! collects approxi

14、mately ten terabytes of data- more than the entire Library of Congress. We analyze this data and act on it to constantly better our user experience, while building the worlds best consumer-centric data platform. Strategic Data Solutions (SDS) is looking for outstanding individuals for a variety of p

15、ositions. SDSs mission is to create value to consumers and marketers by delivering a consumer-centric data platform and insights services that maximize user engagement and enable innovative marketing solutions. Data Mining Engineer (req # *9055) We are looking for a highly motivated and experienced

16、Data Mining engineer to help develop algorithms and software systems to unveil the value of Yahoos tremendous amount of data ( trillion bytes).,2018/6/17,14,Yahoo!,The candidate should have: Excellent Data Mining and Machine Learning background; Ph.D. in Computer Science/Data Mining/Artificial Intel

17、ligence/Statistics or related field; Two references from well established Data Mining professionals reflecting on candidates abilities; Desire to work with data (vs. to research machine learning algorithms); Research track or proven record of doing industrial Data Mining; Ability to develop software

18、 systems; Creativity (this is a must). The following skills are pluses for the position: Experiences with Data Mining Tools (SAS, WEKA, SPSS, etc.) and Data Base; Strong understanding of business; Ability to present Statistics, math, algorithms. Contact: Please send resumes with req# 9055 to Tina Du

19、ccini, tducciniyahoo- posted by: Tina Duccini,2018/6/17,15,Data Warehousing and Industry,One of the hottest topic in IS.Over 90% of larger companies starting or have DW.Over $200 billion 1999-2004 Meta-group.US Federal government spending on DW projects from $579 million in 1999 to $911 million in 2

20、004.US State and local government spending grow from $550 million in 1999 to about $1.1 billion by 2004. Worldwide DW market grow at 43%, reaching $148 billion by 2003.US market alone will account for $72.7 billion by 2003 while growing at 41% annually.Need of DW solutions in Japan and Europe and gr

21、ow by 50% and 38% respectively.,2018/6/17,16,Data Warehousing and Industry (2),A 1996 study of 62 data warehousing projects showed:An average return on investment of 321%, with an average payback period of 2.73 years.WalMart has largest warehouse900-CPU, 2,700 disk, 23 TB Teradata system7TB in wareh

22、ouse40-50GB per day,2018/6/17,17,2018/6/17,18,What is a Data Warehouse?,Defined in many different ways non-rigorously.A DB for decision support.Maintained separately from an organizations operational database.A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collectio

23、n of data in support of managements decision-making process. W. H. InmonData warehousing:The process of constructing and using data warehouses,2018/6/17,19,Why Data Warehousing?,Advance of information technology.Data collected in huge amounts.Need to make good use of data?Architecture and tools toBr

24、ing together scattered information from multiple sources to provide consistent data source for decision support.Support information processing by providing a solid platform of consolidated, historical data for analysis.,2018/6/17,20,What is Data Mining? (1),Knowledge Discovery in Databases (KDD).Dis

25、cover useful patterns from large data warehouses.Nontrivial extraction of implicit, previously unknown, and potentially useful information from data95% of the salesperson, male or female, that are located in Toronto and are over 6 feet in height and unable to speak French make over 1 million in sale

26、s every year for the last 5 years,2018/6/17,21,What is Data Mining (2),Data Warehouse,Data Sources,DataMining,KnowledgeBase,Data Warehosuing and Data Mining,2018/6/17,22,Why Data Mining?,Data explosion problem: Automated data collection tools and mature database technology.Leading to tremendous amou

27、nts of data stored in databases, data warehouses and other information repositories. We are drowning in data, but starving for knowledge!,2018/6/17,23,Data Rich but Information Poor,Databases are too big,Terrorbytes,2018/6/17,24,Data Mining vs. Statistical Inference,Female Age Distribution,Can you t

28、ell the differences?,Male Age Distribution,2018/6/17,25,Data Mining vs. Statistical Inference (2),2018/6/17,26,Data Mining vs. Statistical Inference (3),2018/6/17,27,Data Mining vs. Linear Regression,2018/6/17,28,Mining for Knowledge,Knowledge in the form of rulesIf & & Then Types of knowledgeAssoci

29、ation Presence of one set of items/attributes implies presence of another set.ClassificationGiven examples of objects belonging to different groups, develop profile of each group in terms of attributes of the objects. Clustering.Unsupervised grouping of similar records based on attributes.Sequential

30、 analysis (temporal and spatial).Historical records collected at fixed period of time.,2018/6/17,29,Mining Association Rules,The presence of one set of items in a transaction implies the presence of another set of items30% of people who buy diapers also buy beer.The presence of an attribute value in

31、 a record implies the presence of another60% of patients with these symptoms also have that symptom.,2018/6/17,30,An Example Association Rule,Mobile Telecom DataProvided by a Malaysian telecom company.Over 200 relational tables and transactional data of over 30,000 records.Example of a discovered as

32、sociation rules60% who call from Kula Lumper call to Penang.77% whose average call duration is greater than 5 minutes make an average of over 80 phone calls per month.,2018/6/17,31,Mining Classification Rules,Customer RecordsDemographic Data, Loan Data,Good Customer,Bad Customer,Good?,Bad?,2018/6/17

33、,32,An Example Classification,Airline data200,000 questionnaires.flight information such as flight date and distance.Example of rules discoveredClassify according to level of satisfaction:IF Race = Chinese & Movie = Not interestedTHEN Overall satisfaction = Not satisfactoryIF Race = Japanese & Lunch

34、 = Japanese & Lunch = not satisfactoryTHEN Overall satisfaction = Not satisfactoryIF Race = TurkishTHEN Overall satisfaction = Very satisfactory,2018/6/17,33,An Example of Classification (2),Credit card dataEach transaction contains transaction date, amount, and a set of items purchased, etc.Each cu

35、stomer record contains gender, age, education background, etc.Example of rules discovered:IF e-mail address = no & use of card = 9 months continuously & no. of transaction Page B - Page C - Which page the viewer will arrive after accessing certain URLs.Results:IF Page = Destination Information & Nex

36、t Page = Flight Schedules THEN Next Page = XxxAir Travel PackagesIF Day of week = Wed. & Time = Non-office hour THEN duration = longActionable ItemsGolden time for advertisements is on Wed. during non-office hour.,2018/6/17,47,Other Applications of Data Mining,Market analysis and managementTarget ma

37、rketing, customer relation management, market basket analysis, cross selling, market segmentation.Risk analysis and managementForecasting, customer retention, improved underwriting, quality control, competitive analysis.Fraud detection and management,2018/6/17,48,Data Mining Techniques,Confluence of

38、 Multiple DisciplinesDatabase systems, data warehouse and OLAP.High performance computing.More traditionally:Statistics.Machine learning and Pattern Recognition.More recently:Fuzzy logic.Artificial neural networks.Genetic Algorithms and Evolutionary computationsVisualization.,2018/6/17,49,Statistica

39、l Techniques,SPSSTraditional statistics.Decision trees.Neural Networks.Data visualization.Database access and management.Multidimensional tables.Interactive graphics.Report generation and web distribution.,SASEnterprise Miner.Statistical tools for clustering.Decision trees.Linear and logistic regression.Neural networks.Data preparations tools.Visualization tools.Multi-D tables.,

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 企业管理 > 经营企划

本站链接:文库   一言   我酷   合作


客服QQ:2549714901微博号:道客多多官方知乎号:道客多多

经营许可证编号: 粤ICP备2021046453号世界地图

道客多多©版权所有2020-2025营业执照举报