1、2What is Data reduction ?Data reduction (subtraction) technology is used to help obtain a condensed data setfrom the original huge data set, and make this condensed data set maintain the integrity of the original data set, so that data analysis on the condensed data set is obviously efficient higher
2、, and the results of analysis are basically the same as those obtained by using the original data set.3Data reduction standard The time spent on data reduction should not exceed or offset the time saved by analysis on the reduced data The data obtained by the reduction is much smaller than the origi
3、nal data, but can produce the same or almost the same analysis results4Data reduction technologyData reduction- Dimension reductionDimension reduction- Attributes subset selection7Attributes subset selection8Attributes subset selection9Attributes subset selectionDecision tree (decision tree) inducti
4、onUse the decision tree induction method to classify and induct the initial data to obtain an initial decision tree. All attributes that do not appear on the decision tree are considered irrelevant attributes. Therefore, delete these attributes from the initial attribute set to obtain an initial dec
5、ision tree. A better subset of attributes.Reduction based on statistical analysisData reduction-Data compression11Data reduction- data compressionLossless compression:Compressed data can be restored without losing any information.For example: string compression have a broad theoretical foundation an
6、d sophisticated algorithmsLossy compression: Only an approximate representation of the original data can be reconstructed.For example: audio/video compressionSometimes it is possible to reconstruct a fragment without decompressing the overall data12Data reduction-data compressionPrincipal component
7、analysis (PCA) assumes that the data to be compressed consists of N tuples or data vectors taken from k dimensions. Principal component analysis and search to obtain c-dimensional orthogonal vectors that best represent the data” , where ck. In this way, the original data can be projected into a smal
8、ler space to achieve data compression.13Data reduction technology14Data reduction- Data Cube Aggregation15Data reduction - DiscretizationThree types of attribute values: Name type-e.g. value in an unordered set Ordinal-e.g. value in an ordered set Continuous value-e.g. real numberDiscretization technologyReduce the number of values of a continuous (value) attribute by dividing the range of the attribute (continuous value) domain value into several intervals.16Data reduction- concept hierarchical generationYouth Middle aged Prime of life