1、Data Mining,Yanci Zhang,What is Data Mining?,Extraction of implicit, previously unknown and potentially useful information from data Exploration & analysis of large quantities of data automatic or semi-automatic means discover meaningful patterns,Process of Knowledge Discovery,Example: NBA 1/2,Play-
2、by-play information Who is on the court Who shoots Coaches want to know Who works best? What strategies combination works best? ,Example: NBA 2/2,Advanced Scout is a data mining tool to answer these questions Data collection Data preprocessing: cleaning, transformations, enrichment Data mining Inter
3、pretation and knowledge discovery,What is (not) Data Mining?,What is not data mining Look up phone number in phone directory Query a web search engine for information about “Amazon”What is data mining Certain names are more prevalent in certain US locations (OBrien, ORurke, OReilly in Boston area) G
4、roup together similar documents returned by search engine according to their context (e.g. Amazon rainforest, A),Why Data Mining?,data rich but information poor we are drown in data, but starving for knowledge,Tasks,Prediction Methods Use some variables to predict unknown or future values of other v
5、ariables Description Methods Find human-interpretable patterns that describe the data,Applications,Data analysis and decision support Market analysis and management Beer and diapers Risk analysis and management Credit card risk analysis and control Fraud detection and detection of unusual patterns,A
6、pplications,Text mining and Web mining Stream data mining DNA and bio-data analysis Similarity search and comparison among DNA sequences Association analysis: identification of co-occurring gene sequences Path analysis: linking genes to different disease development stages Visualization tools and ge
7、netic data analysis,Challenges,Scalability Dimensionality Complex and Heterogeneous Data Data Quality Data Ownership and Distribution Privacy Preservation Streaming Data,Assignments,Group Group16: PC and MAC 10 Group17: PC and MAC 11 Group18: What is augmented reality? Group38: What is Graphics Processing Units (GPU)? Individual: Write an English article: Applications of Data mining (300 words) Deadline: 2011-11-10,