1、基于狄克松检验的 NDVI 时序数据噪声检测及其在数据重建中的应用 范德芹 朱文泉 潘耀忠 姜楠 北京师范大学资源学院地表过程与资源生态国家重点实验室 摘 要: 归一化差值植被指数 NDVI (Normalized Difference Vegetation Index) 时序数据已被广泛应用于植被变化监测、植被物候识别和土地覆盖分类等领域, 但受观测条件限制, NDVI 原始数据中包含大量噪声, 在实际应用时需对其进行检测并去除。目前常用的 NDVI 数据去噪重建方法主要包括阈值检测法、滤波拟合法及曲线拟合法 3 类。各方法在应用时均需根据不同的土地覆盖类型或特定的研究区域设置一定数量的经验
2、参数, 对噪声的定义缺乏客观标准;此外, 这 3 类方法都没有进行专门的噪声检测, 在进行 NDVI 数据重建时只是根据经验进行噪声判断。本文提出了一种基于狄克松 (Dixon) 检验法、适用于对小样本进行检测的数理统计噪声检测方法, 该方法首先对同一像元、同一时段、不同年份的NDVI 时序数据进行统计分析, 然后再结合质量评估数据的分析结果, 最终给出NDVI 是否异常的判断。运用狄克松检验法对噪声进行检测, 然后结合已有的两种数据重建方法变权重滤波法和 SavitzkyGolay 方法, 基于 2001 年2010年 250 m 分辨率的 MODIS NDVI 时序数据, 对覆盖中国 55
3、 种植被类型共 520 个测试样点及洞庭湖测试区域进行了 NDVI 时序数据重建实验, 结果表明, 狄克松检验法降低了对先验知识的依赖程度, 应用该方法对 NDVI 时序数据中的噪声进行检测预处理后, 可以有效提高变权重和 Savitzky-Golay 方法的数据重建质量。关键词: NDVI; 噪声; 狄克松检验; 数据重建; 作者简介:范德芹 (1982) , 女, 博士研究生, 主要从事遥感数据处理的研究, 已发表论文 1 篇。E-mail:作者简介:朱文泉 (1975) , 男, 博士, 副教授, 博士生导师, 主要从事植被与生态遥感研究。已发表论文 50 余篇。E-mail:收稿日期:
4、2012-10-11基金:国家重点基础研究发展计划 (973 计划) (编号:2011CB952001) Noise detection for NDVI time series based on Dixons test and application in data reconstructionFAN Deqin ZHU Wenquan PAN Yaozhong JIANG Nan State Key Laboratory of Earth Surface Processes and Resource Ecology, College of Resources Science and Tec
5、hnology, Beijing Normal University; Abstract: Normalized Difference Vegetation Index ( NDVI) time series data are widely used to detect vegetation changes, identify vegetation phenology, and classify land cover. However, original NDVI data contain a great amount of noise that results from o bserving
6、 conditions. Therefore, noise should be detected and removed in practical applications. Generally, methods to remove noise and reconstruct high-quality NDVI time series data sets can be grouped into three types: threshold detection, filter, and curve fitting. Each method presets a certain number of
7、parameters according to different land cover types or a specific study area, resulting in a lack of objective criteria to define noise. These three methods do not include noise detection when reconstructing NDVI data; thus, noise is removed based only on experience. In this paper, a noise detection
8、method based on Dixons test is presented. The proposed method is suitable for a small sample. Through this method, we analyzed the statistical characteristics of NDVI data from the same period of different years for a given pixel. The outlier in the NDVI time series was then determined based on qual
9、ity assessment data. The noise detection method was applied to two existing data reconstruction methods ( i. e., changing weight filter and Savitzky-Golay filter methods) to reconstruct the NDVI data over 520 test pixels of 55 vegetation types and a region in Dongting Lake in China from 2001 to 2010
10、. Dixons test reduces the dependence on a priori knowledge for the data reconstruction methods, and data quality can be improved effectively through the proposed noise detection method.Keyword: NDVI; noise; Dixons test; data reconstruction; Received: 2012-10-111 INTRODUCTIONThe Normalized Difference
11、 Vegetation Index (NDVI) derived from satellite remote sensing data reflects the unique spectral reflectance characteristics of vegetation at visible and nearinfrared wavelengths.The NDVI is not only the best indicator of vegetation growth status and surface vegetation coverage, but also an importan
12、t indicator of seasonal changes and the effect of human activities (Zhao, 2003;LinChen, et al., 2000;Zhou, et al., 2001;StckliXin, et al., 2002;Zhang, et al., 2004;Gu, et al., 2006) .NDVI time series data derived from middle-resolution or low-resolution sensors contain a great amount of noise becaus
13、e of the significant changes in solar altitude angle, satellite observation angle, cloud coverage, water vapor content, and aerosol content with time.Various trend analyses and information extractions are hindered because the noise obscures seasonal trend and vegetation phenological characteristics
14、in the NDVI curve (YuLovellWen et al., 2010) , Fourier-based fitting methods (Yan, et al., 2005) , harmonic analysis of time series (HANTS) (Roerink, et al., 2000;Hou, et al., 2010) , and the changing weight filter method (Zhu, et al., 2012) .Curve fitting methods can be divided into double logistic
15、 function fitting methods (Beck, et al., 2006) and asymmetric Gaussian function fitting methods (JonssonHou, et al., 2010) .V arious methods have been used in different regions around the world and in different areas of study, but no consensus has been reached regarding the pros and cons of each alg
16、orithm (Song, e t al., 2011) .No algorithm is significantly better than the other in noise detection and data reconstruction methods because each method abovementioned possesses its own advantages and has been proved to be effective in its specific study area, vegetation cover type characteristics,
17、and application purposes.However, the above noise reduction and data reconstruction methods for NDVI time series data suffer several drawbacks:(1) The preprocessing results of these methods are related to the parameter settings according to the specific study area and purpose;that is, the effects of
18、 noise reduction mostly depend on the experience of the researchers.(2) Vegetation cover and the phenological characteristics of the study area should be known in advance on the assumption that NDVI data changes correspond to the vegetation growth, a year of data is used in various studies, the time
19、 series is segmented, or cut-off frequency is selected according to the vegetation growth cycle.(3) Abnormal NDVI values of vegetation caused by natural disasters or human mismanagement (e.g., forest fires) may be r emoved as noise on the assumption that the sudden increases or drops of NDVI time se
20、ries data are noises inconsistent with plant growth (Chen, et al., 2004) .(4) The set parameters lack objective criteria and should be judged based on experience and repeated experiments.For example, the acceptable change rate threshold of NDVI data and the size of the sliding window should be adjus
21、ted constantly according to the researchers experience based on different climate zones and vegetation characteristics for the BISE method.The set thresholds need to be adjusted constantly to avoid missing the largest vegetation index useful to the means iterative filtering method.This index is sens
22、itive to the width of the sliding w indow, and two parameters (filter window width and polynomial fitting order) need to be set artificially for the SG f ilter method.If the width of the window is too small, redundant data are easy to generate;conversely, some details will be easy to miss.The Fourie
23、r transform method is sensitive to the set of f requency components.If the frequency component is too large, then noise will be introduced;conversely, useful information will be lost (WangXie, et al., 2010) .The HANTS a lgorithm needs to set the number of frequencies, error threshold, maximum number
24、 of deleted points, and range of valid data.(5) Various data reconstruction methods do not include the step of special noise detection.NDVI data are preprocessed based on experience before reconstruction.The original data offset far from the reconstruction data are directly removed for this method;t
25、hus, the positioning accuracy of noise and the r easonableness of noise detection need to be improved.For e xample, two adjacent 16-day NDVI value differences of the same point are less than 0.4 according to vegetation growth characteristics for the SG and CW filtering methods.If the difference is g
26、reater than 0.4, it will be identified as noise.The value of noise is then replaced by the linear interpolation of its two adjacent points.NDVI data are filtered by using the SG and CW f iltering methods (Chen, et al., 2004;Zhu, et al., 2012) .For the BISE and FFT filtering methods, NDVI data are fi
27、ltered directly without being preprocessed using the filtering function after p arameters are set based on experience.In summary, to enhance the universality of NDVI time series data preprocessing methods;reduce the dependence on r esearch experience, study area, vegetation categories, and p aramete
28、r settings;and avoid misjudging land cover types, Dixons test was presented based on mathematical statistics, which is suitable for small-sample noise detection.Then, to test the validity of the NDVI time series data reconstruction, Dixons test was used in test sample points and regions based on dif
29、ferent Chinese vegetation types of MODIS data.2 METHODS2.1 DataMODIS Land products (MOD13Q1, version 005, 16-day250 m) for 2001 to 2010 were used to test the newly developed method (2011-12-12http:/lpdaac.usgs.gov) .The data products are composited using the 16-day maximum value c omposite method;th
30、us, the data of each year contain 23 values, and the 10-year NDVI time series data contain 230 values.The MODIS Quality Assessment (QA) layer was used as auxiliary judging criteria in this NDVI time series noise detection process.QA documents not only provide pixel quality evaluation results, but al
31、so show image quality from different aspects by using d ifferent combinations of binary code bits.A QA value smaller than 8 indicates that the quality of the NDVI data is preferable.To evaluate the effect of noise detection using Dixons test and its impact on reconstruction results for a single poin
32、t of NDVI time series data, 520 test pixels for 55 vegetation types (8 to 11test pixels were selected for each vegetation type) were manually selected based on the vegetation map of 11000000 in China (Hou, 2001) .The test pixels should be selected evenly a ccording to the different vegetation types
33、of their distribution regions (Chen, et al., 2004;Zhu, et al., 2012) .To further test the performance of Dixons test on regional scales, we applied it in a test region (the geographic center is at 119.69E, 29.02N) in southeastern China, including 401401 pixels (Fig.1) .This r egional area is located
34、 in the Dongting Lake region, where farmland, mountains, and plains cover 90%of the area, and rivers and residential land cover about 10%.The NDVI data quality of the regional area is seriously reduced by cloud c ontamination because of rainfall impact from March to July.Fig.1 TM image of test area
35、in July 2006 下载原图2.2 Noise detection and data reconstruction for NDVI time data series2.2.1 Noise detection based on Dixons testNoise is the individual value of samples that deviate significantly from the remaining observed values of the sample.The common methods to judge noise in normal distributio
36、n include Grubbs test, t-test, Dixons test, Nair test, and skewness-kurtosis test (Tao, 1994) .Dixons test is the one recommended by the I nternational Organization for Standardization and the American Society for Testing and Materials;this method is suitable for d etecting noises in small-sample da
37、ta (from 5 to 30) .Dixons test can directly detect noise by using the range ratio without calculating the arithmetic mean and standard deviation of the sample.It is also effective if no extremely high abnormal degree of m ultiple bilateral noises exists (Chen, 1992) .Dixons test for noise detection
38、is conducted as below.(1) Detected values n are sorted by size.x1x 2x n.(2) According to the formula in Table 1 (Standardization Administration of the Peoples Republic of China, 2008) (only the cases of n10 are listed) , the-statistic and-statistic are calculated based on the value of n.(3) Signific
39、ance levelis determined in look-up Table 1, and the threshold D is determined according to n and.Table 1 Threshold of Dixons test 下载原表 (4) High-level noise is tested:ifD, x (n) is considered noise.(5) Low-level noise is tested:ifD, x 1is judged as noise.(6) If the noise cannot be detected at step (4
40、) and step (5) , then no outlier exists in the detected values.(7) The detected noise is removed or corrected.The noise detection process is then continued in the remaining sample until no more noise can be detected.According to the statistical analysis for the same period of multi-year meteorologic
41、al observations, the NDVI data at the same location (i.e., the same pixel) of the same period of many consecutive years are assumed to obey normal distribution and are statistically significant.The statistical method can avoid e xcessive dependence on prior knowledge (such as vegetation cover types
42、and research experience) at the preprocessing stage.Meanwhile, reasonable noise detection results can be obtained u sing QA data.The main technical processes for Dixons test are as below (Fig.2) .(1) Obtain the NDVI data of the same pixel and the same band of each year for 10 years and determine the
43、 noise detected sample.(2) Detect the noises from the test sample composed of the10 NDVI values by using Dixons test (the judging criteria are shown in Table 1) .(3) If no noise is detected, the next pixel is detected circularly;otherwise, the noise is judged with QA.Since possible o utliers may not
44、 be noise points, they may be sudden natural d isaster points (such as forest fires) caused by real changes in land cover.(4) If the QA value is less than or equal to 8, the NDVI v alue has less interference.This point is marked as a change point of land cover type.This abnormal NDVI value may be ca
45、used by unexpected factors such as natural disasters, adverse weather, man-made disasters, and vegetation changes.This value needs further verificationaccordingtolocalmeteorological, p henological, and disaster records.(5) The detected outlier is removed or corrected using the average value of data
46、in other years except that in the abnormal year.The above steps are repeated for the next pixel until all the pixels of all periods are detected.Fig.2Main technical processes 下载原图2.2.2 Reconstruction methods for NDVI time series dataCW and SG filtering methods are applicable to various v egetation t
47、ypes and can obtain good results as data reconstruction methods.In this study, noises of NDVI time series data are detected based on Dixons test for different vegetation types.The data are then reconstructed using CW and SG filtering methods.The reconstruction results using and without using Dixons
48、test are compared.The CW filtering method is proposed by.Zhu, et al. (2012) .The core algorithm of the CW method is divided into two steps.First, the maximum and minimum values of the NDVI time series for each vegetation phenological cycle are determined by mathematical morphology algorithm to const
49、rain the basic shape of the NDVI time series curve.Then, the NDVI data are smoothed with the three-point changing weight filter.The weight of the target point is gradually increased with iteration, and the weight of its two adjacent points is gradually reduced.The noise is gradually reduced in this process.Noises of the NDVI time series data are removed while data can be reconstructed u sing this method based on the characteristics of vegetation phenology.The SG filtering method was first proposed by S