收藏 分享(赏)

大数据分析全英文 (15).pdf

上传人:职教中心 文档编号:13701371 上传时间:2022-10-11 格式:PDF 页数:11 大小:263.37KB
下载 相关 举报
大数据分析全英文 (15).pdf_第1页
第1页 / 共11页
大数据分析全英文 (15).pdf_第2页
第2页 / 共11页
大数据分析全英文 (15).pdf_第3页
第3页 / 共11页
亲,该文档总共11页,到这儿已超出免费预览范围,如果喜欢就下载吧!
资源描述

1、Data transformData integration1Data transform231 Data integrationData integration: Integrate data from multiple data sources into a consistent storage Pattern matching; Data redundancy processing; Data value conflict solving;41 Data integration- Pattern matchingIntegrate metadata from different data

2、 sources.Entity recognition problem: Match real-world entities from different data sources, such as:A.cust-id=B.customer_no.51 Data integration- Data redundancy The same attribute will have different field names in different databases. One attribute can be derived from another attribute. For example

3、, the average monthly income attribute in a customer data table can be calculated based on the monthly income attribute. Some redundancy can be detected by correlation analysis61 Data integration- Data value conflictFor a real-world entity, its attribute values from different data sources may be dif

4、ferent. Such as Differences in representation, different scales, or differences in coding, etc. For example: the weight attribute uses the metric system, like kg, g in one system, but uses the imperial system like pound in another system. Same price attributes in different locations using different

5、currency units, $, pound, RMBData transformData integration1Data transform282 Data transform-1) smoothRemove noise, discretize continuous data, and increase granularity Binning Clustering Regression92 Data transformation- 2) AggregationAggregate the data:avg(), count(), sum(), min(), max().For examp

6、le: daily sales (data) can be aggregated to get the monthly or annual total.102 Data transformation-3) Data generalizationFor example: street attributes can be generalized to higher-level concepts, such as: city, country. Similarly, numeric attributes, such as age attributes, can be mapped to higher

7、-level concepts, such as young, middle-aged, and old.Replace low-level data objects with more abstract (higher-level) concepts112 Data transformation- 4) Data NormalizationThe data is scaled proportionally to make it fall into a specific area, so as to eliminate the deviation of the mining results c

8、aused by the different sizes of the numerical attributes. Such as mapping the salary income attribute value to the range of -1.0,1.0.method:(1) Min-Max normalization(2) Zero-mean normalization (z-score normalization)(3) Standardization of decimal calibration12Data transformation- 5) Attribute constructionUse the existing attribute set to construct new attributes and add them to the existing attribute set to help dig deeper pattern knowledge and improve the accuracy of mining results.For example: According to the width and height attributes, a new attribute can be constructed: area.

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 高等教育 > 大学课件

本站链接:文库   一言   我酷   合作


客服QQ:2549714901微博号:道客多多官方知乎号:道客多多

经营许可证编号: 粤ICP备2021046453号世界地图

道客多多©版权所有2020-2025营业执照举报