收藏 分享(赏)

ArcGIS10回归分析教程.pdf

上传人:HR专家 文档编号:6742475 上传时间:2019-04-21 格式:PDF 页数:24 大小:3.72MB
下载 相关 举报
ArcGIS10回归分析教程.pdf_第1页
第1页 / 共24页
ArcGIS10回归分析教程.pdf_第2页
第2页 / 共24页
ArcGIS10回归分析教程.pdf_第3页
第3页 / 共24页
ArcGIS10回归分析教程.pdf_第4页
第4页 / 共24页
ArcGIS10回归分析教程.pdf_第5页
第5页 / 共24页
点击查看更多>>
资源描述

1、 Analyzing 911 response data using Regression This tutorial demonstrates how regression analysis has been implemented in ArcGIS, and explores some of the special considerations youll want to think about whenever you use regression with spatial data. Regression analysis allows you to model, examine,

2、and explore spatial relationships, to better understand the factors behind observed spatial patterns, and to predict outcomes based on that understanding. Ordinary Least Squares regression (OLS) is a global regression method. Geographically Weighted Regression (GWR) is a local, spatial, regression m

3、ethod that allows the relationships you are modeling to vary across the study area. Both of these are located in the Spatial Statistics Tools - Modeling Spatial Relationships toolset: Before executing the tools and examining the results, lets review some terminology: Dependent variable (Y): what you

4、 are trying to model or predict (residential burglary incidents, for example). Explanatory variables (X): variables you believe influence or help explain the dependent variable (like: income, the number of vandalism incidents, or households). Coefficients (): values, computed by the regression tool,

5、 reflecting the relationship and strength of each explanatory variable to the dependent variable. Residuals (): the portion of the dependent variable that isnt explained by the model; the model under and over predictions. The sign (+/-) associated with the coefficient (one for each explanatory varia

6、ble) tells you whether the relationship is positive or negative. If you were modeling residential burglary and obtain a negative coefficient for the Income variable, for example, it would mean that as median incomes in a neighborhood go up, the number of residential burglaries goes down. Output from

7、 regression analysis can be a little overwhelming at first. It includes diagnostics and model performance indicators. All of these numbers should seem much less daunting once you complete the tutorial below. Important notes: 1. The steps in this tutorial document assume the data is stored at C:Spati

8、alStats. If a different location is used, substitute “C:SpatialStats“ with the alternate location when entering data and environment paths. 2. This tutorial was developed using ArcGIS 10.0. If you are using a different version of the software, the screenshots and how you access results, may be a bit

9、 different. Tutorial Estimated time: 1.5 hours Introduction: In order to demonstrate how the regression tools work, you will be doing an analysis of 911 Emergency call data for a portion of the Portland Oregon metropolitan area. Suppose we have a community that is spending a large portion of its pub

10、lic resources responding to 911 emergency calls. Projections are telling them that their communitys population is going to double in size over the next 10 years. If they can better understand some of the factors contributing to high call volumes now, perhaps they can implement strategies to help red

11、uce 911 calls in the future. Step 1 Getting Started Open C:SpatialStatsRegressionExerciseRegresssionAnalysis911Calls.mxd (the path may be different on your machine) In this map document you will notice several Data frames containing layers of data for the Portland Oregon metropolitan study area. Ens

12、ure that the Hot Spot Analysis data frame is active In the map, each point represents a single call into a 911 emergency call center. This is real data representing over 2000 calls. Step 2 Examine Hotspot Analysis results Expand the data frame and click the + sign to the right of the Hot Spot Analys

13、is grouped layer Ensure that the Response Stations layer is checked on Results from running the Hotspot Analysis tool show us where the community is getting lots of 911 calls. We can use these results to assess whether or not the stations (fire/police/emergency medical) are optimally located. Areas

14、with high call volumes are shown in red (hot spots); areas getting very few calls are shown in blue (cold spots). The green crosses are the existing locations for the police and fire units tasked with responding to these 911 calls. Notice that the 2 stations to the right of the map appear to be loca

15、ted right over, or very near, call hot spots. The station in the lower left, however, is actually located over a cold spot; we may want to investigate further if this station is in the best place possible. The community can use hot spot analysis to decide if adding new stations or relocating existin

16、g stations might improve 911 call response. Step 3 Exploring OLS Regression The next question our community is probably asking is, “Why are call volumes so high in those hot spot areas?” and “What are the factors that contribute to high volumes of 911 calls?” To help answer these questions, well use

17、 the regression tools in ArcGIS. Activate the Regression Analysis data frame by right clicking and choosing Activate Expand the Spatial Statistics tools toolbox Right click in a open space in ArcToolbox and set your environment as follows: Disable background processes (GeoprocessingGeoprocessing Opt

18、ions). With ArcGIS 10, geoprocessing tools can run in the background and all results are available through the Results window. By disabling background processing, we will see tool results in a progress window; this is often best when you are using the Regression tools: In the data frame, check off t

19、he Data911Calls layer Instead of looking at individual 911 calls as points, we have aggregated the calls to census tracts and now have a count variable (Calls) representing the number of calls in each tract. Right click the ObsData911Calls layer and choose Open Attribute Table The reason we are usin

20、g census tract level data is because this gives us access to a rich set of variables that might help explain 911 call volumes. Notice that the table has fields such as Educational status (LowEd), Unemployment levels (Unemploy), etc. When done exploring the fields, close the table Can you think of an

21、ything any variable that might help explain the call volume pattern we see in the hot spot map? What about population? Would we expect more calls in places with more people? Lets test the hypothesis that call volume is simply a function of population. If it is, our community can use Census populatio

22、n projections to estimate future 911 emergency call volumes. Run the OLS tool with the following parameters: Note: once the tool starts running, make sure the “close this dialog when completed successfully box” is NOT checked o Input Feature Class - ObsData911Calls o Unique ID Field - UniqID o Outpu

23、t Feature Class - C:SpatialStatsRegressionExerciseOutputsOLS911Calls.shp o Dependent Variable - Calls o Explanatory Variables - Pop Move the progress window to the side so you can examine the OLS911calls layer in the TOC. The OLS default output is a map showing us how well the model performed, using

24、 only the population variable to explain 911 call volumes. The red areas are under predictions (where the actual number of calls is higher than the model predicted); the blue areas are over predictions (actual call volumes are lower than predicted). When a model is performing well, the over/under pr

25、edictions reflect random noise the model is a little high here, but a little low there you dont see any structure at all in the over/under predictions. Do the over and under predictions in the output feature class appear to be random noise or do you see clustering? When the over (blue) and under (re

26、d) predictions cluster together spatially, you know that your model is missing one or more key explanatory variables. The OLS tool also produces a lot of numeric output. Expand and enlarge the progress window so you can read this output more clearly. Notice that the Adjusted R-Squared value is 0.393

27、460, or 39%. This indicates that using population alone, the model is explaining 39% of the call volume story. So looking back at our original hypothesis, is call volume simply a function of population? Might our community be able to predict future 911 call volumes from population projections alone?

28、 Probably not; if the relationship between population and 911 call volumes had been higher, say 80%, our community might not need regression at all. But with only 39% of the story, it seems other factors and other variables, are needed to effectively model 911 calls. The next question that follows i

29、s what are these other variables? This, actually, is the hardest part of the regression model building process: finding all of the key variables that explain what we are trying to model. Close the Progress Window. Step 4 Finding key variables The scatterplot matrix graph can help us here by allowing

30、 us to examine the relationships between call volumes and a variety of other variables. We might guess, for example, that the number of apartment complexes, unemployment rates, income or education are also important predictors of 911 call volumes. Experiment with the scatterplot matrix graph to expl

31、ore the relationships between call volumes and other candidate explanatory variables. If you enter the “calls” variable either first or last, it will appear as either the bottom row or the first column in the matrix. Below is an example of scatterplot matrix parameter settings: Once you finish creat

32、ing the scatterplot matrix, select features in the focus graph and notice how those features are highlighted in each scatterplot and on the map. Step 5 A properly specified model Now lets try a model with 4 explanatory variables: Pop, Jobs, LowEduc, and Dst2UrbCen. The explanatory variables in this

33、model were found by using the Scatterplot matrix and trying a number of candidate models. Finding a properly specified OLS model, is often an iterative process. Run OLS with the following parameters set: o Input Feature Class - AnalysisObsData911Calls o Unique ID Field - UniqID o Output Feature Clas

34、s - C:SpatialStatsRegressionExerciseOutputsData911CallsOLS.shp o Dependent Variable - Calls o Explanatory Variables - Pop;Jobs;LowEduc;Dst2UrbCen Notice that the Adjusted R2 value is much higher for this new model, 0.831080, indicating this model explains 83% of the 911 call volume story. This is a

35、big improvement over the model that only used Population. Close the Progress Window. Notice, too, that the residuals (the model over/under predictions) appear to be less clustered than they were using only the Population variable. We can check whether or not the residuals exhibit a random spatial pa

36、ttern using the Spatial Autocorrelation tool. Run the Spatial Autocorrelation tool (in the Analyzing Patterns Toolset) using the following parameters: o Input Feature Class Data911CallsOLS o Input Field StdResid o Generate Report checked ON o Conceptualization of Spatial Relationships Inverse Distan

37、ce o Distance Method Euclidean Distance o Standardization ROW (with polygons you will almost always want to Row Standardize). Close the Progress Window, then open the Results Window and Expand the entry for Spatial Autocorrelation (if you dont see the Results Window, select Geoprocessing from the me

38、nu, then Results). Double click on the HTML Report File: Results from running the Spatial Autocorrelation tool on the regression residuals indicates they are randomly distributed; the z-score is not statistically significant so we accept the null hypothesis of complete spatial randomness. This is go

39、od news! Anytime there is structure (clustering or dispersion) of the under/over predictions, it means that your model is still missing key explanatory variables and you cannot trust your results. When you run the Spatial Autocorrelation tool on the model residuals and find a random spatial pattern

40、(as we did here), you are on your way to a properly specified model. Step 6: The 6 things you gotta check! There are 6 things you need to check before you can be sure you have a properly specified model a model you can trust. 1. First check to see that each coefficient has the “expected” sign. A pos

41、itive coefficient means the relationship is positive; a negative coefficient means the relationship is negative. Notice that the coefficient for the Pop variable is positive. This means that as the number of people goes up, the number of 911 calls also goes up. We are expecting a positive coefficien

42、t. If the coefficient for the Population variable was negative, we would not trust our model. Checking the other coefficients, it seems that their signs do seem reasonable. Self check: the sign for Jobs (the number of job positions in a tract) is positive, this means that as the number of jobs goes

43、, the number of 911 calls also goes (?). 2. Next check for redundancy among your explanatory variables. If the VIF value (variance inflation factor) for any of your variables is larger than about 7.5 (smaller is definitely better), it means you have one or more variables telling the same story. This

44、 leads to an over-count type of bias. You should remove the variables associated with large VIF values one by one until none of your variables have large VIF values. Self check: Which variable has the highest VIF value? 3. Next, check to see that all of the explanatory variables have statistically s

45、ignificant coefficients. Two columns, Probability and Robust Probability, measure coefficient statistical significance. An asterisk next to the probability tells you the coefficient is significant. If a variable is not significant, it is not helping the model, and unless theory tells us that a parti

46、cular variable is critical, we should remove it. When the Koenker (BP) statistic is statistically significant, you can only trust the Robust Probability column to determine if a coefficient is significant or not. Small probabilities are “better” (more significant) than large probabilities. Self chec

47、k: Which variables have the “best” statistical significance? Did you consult the Probability or Robust_Pr column? Why? ! Note: An asterisk indicates statistical significance 4. Make sure the Jarque-Bera test is NOT statistically significant: Three checks down! Youre way there! The residuals (over/un

48、der predictions) from a properly specified model will reflect random noise. Random noise has a random spatial pattern (no clustering of over/under predictions). It also has a normal histogram if you plotted the residuals. The Jarque-Bera test measures whether or not the residuals from a regression m

49、odel are normally distributed (think Bell Curve). This is the one test you do NOT want to be statistically significant! When it IS statistically significant, your model is biased. This often means you are missing one or more key explanatory variables. Self check: how do you know that the Jarque-Bera Statistic is NOT statistically significant in this case? 5. Next, you want to check model performance: The adjusted R squared value

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 实用文档 > 简明教程

本站链接:文库   一言   我酷   合作


客服QQ:2549714901微博号:道客多多官方知乎号:道客多多

经营许可证编号: 粤ICP备2021046453号世界地图

道客多多©版权所有2020-2025营业执照举报