1、Data and its visualization,Professor Mikko Kolehmainen University of Eastern Finland Department of Environmental Science Research Unit of Environmental Informatics,What kind of data is available - example,Meteorological data is one of the oldest sources of this kind Urban air quality measurements ha
2、ve been carried out routinely in most of the cities These two create an excellent opportunity to monitor the workings of the nature and human activities mixed in complex ways The datasets are now reaching many years back in time it is possible to use computational models for these time-series.,15.5.
3、2019,Mikko Kolehmainen,What kind of data is needed for modeling environmental processes?,Time range: At least one year because of the seasonality Three years is a practical minimum because of different years If you can have 5-10 years you should be able to contruct and validate useful models Represe
4、ntativeness: Is the phenomenon stationary (i.e. mean and variance do not change over time)?,The process is described through the data,One of the most important things to bear in mind in is that what is actually modelled is (or should be) the process we are inspecting. Thus, the data available descri
5、be that process (phenomenon) only in some limited way, so, knowledge about the process is important.,15.5.2019,Mikko Kolehmainen,What kind of data we have used in UEF in research and teaching?,Our own real data: Air quality data Epidemiological data Waste collection data Artificial nose data Fox beh
6、aviour data ”Reference data” Boston data Iris data Gene expression data,Air quality data 35064 x 10 hourly measurements,Date Julianday Time NO2 O3 PM10 CO Temp Hum WS,Epidemiological data insulin resistence syndrome, 1650 x 44,Waste collection data,Artificial nose data from fermentation process (992
7、4 x 11),Fox behaviour data (use of bones for each 5 minute window, 4 x 576 x 16),Boston socioeconomic data,crim Per capita crime rate by town zn Proportion of residential land zoned for lots over 25,000 sq.ft. indus Proportion of non-retail business acres per town chas Charles River dummy variable (
8、= 1 if tract bounds river; 0 otherwise) nox Nitric oxides concentration (parts per 10 million) rm Average number of rooms per dwelling age Proportion of owner-occupied units built prior to 1940 dis Weighted distances to five Boston employment centres rad Index of accessibility to radial highways tax
9、 Full-value property-tax rate per $10,000 ptratio Pupil-teacher ratio by town lstat Proportion of population of lower status rate Median value of owner-occupied homes in $1000s.,506 x 13 = 506 datalines and 13 variables (columns),Iris data,SepalL SepalW PetalL PetalW,Gene expression data (diauxic sh
10、ift, 2467 x 9 or 117),How can we visualize the data?,Simple plotting in 1-D or 2-D Box plots Histograms Probability density functions (pdf) of distributions Scatter plots Multivariate visualization Sammons mapping Self-organizing maps (SOM),Boxplot,Example of histogram,Histograms with lines,Probability density function,Scatterplot,Sammons mapping example: mapping 4 dimensions to 2 dimensions,