1、Transportation: Refreshing Warehouse Data,Overview,Objectives,After completing this lesson, you should be able to do the following: Describe methods for capturing changed data Explain techniques for applying the changes Discuss techniques for purging and archiving data Outline final tasks, such as p
2、ublishing the data, controlling access, and automating processes List tools for transporting data into the warehouse,Developing a Refresh Strategy for Capturing Changed Data,Consider load window Identify data volumes Identify cycle Know the technical infrastructure Plan a staging area Determine how
3、to detect changes,T1,T2,T3,Operational databases,User Requirements and Assistance,Users define the refresh cycle IT balances requirements against technical issues Document all tasks and processes Employ user skills,T1,T2,T3,Operational databases,Load Window,Time available for entire ETT process Plan
4、 Test Prove Monitor,0 3 am 6 9 12 pm 3 6 9 12,User Access Period,Load Window,Load Window,Load Window,Plan and build processes according to a strategy. Consider volumes of data. Identify technical infrastructure. Ensure currency of data. Consider user access requirements first. High availability requ
5、irements may mean a small load window.,0 3 am 6 9 12 pm 3 6 9 12,User Access Period,Scheduling the Load Window,0 3 am,1,File 1,File 2,Receive data,Control FileFile namesFile typesNumber of filesNumber of loadsFirst-time load or refreshDate of fileDate rangeRecords in file - countsTotals - amounts,FT
6、P,Control process,4,Open and read files to verify and analyze,3,2,Requirements,Load cycle,Scheduling the Load Window,3 am 6 am 9 am,Load into warehouse,File 1,File 2,5,Verify, analyze, reapply,6,Create summaries,8,7,Index data,Update metadata,9,Parallel load,Scheduling the Load Window,6 am 9 am,Crea
7、te views for specialized tools,11,10,Back up warehouse,Users access summary data,12,Publish,13,User access,Capturing Changed Data for Refresh,Capture new fact data Capture changed dimension data Determine method for capture of each Methods: Wholesale data replacement Comparison of database instances
8、 Time stamping Database triggers Database log Hybrid techniques,Expensive Limited historical data, if any Data mart implementations Time period replacement,Wholesale Data Replacement,Comparison of Database Instances,Database comparison,Yesterdays operational database,Delta file holds changed data,Si
9、mple to perform, but expensive in time and processing Delta file: Changes to operational data since last refresh Used by various techniques,Todays operational database,Time and Date Stamping,Fast scanning for records changed since last extraction Date Updated field No detection of deleted data,Opera
10、tional data,Delta file holds changed data,Database Triggers,Changed data intersected at the server level Extra I/O required Maintenance overhead,Operational server (DBMS),Triggers on server,Trigger,Trigger,Trigger,Operational data,Delta file holds changed data,Using a Database Log,Contains before an
11、d after images Requires system checkpoint Common technique,Log,Log analysis and data extraction,Operational server (DBMS),Verdict,Consider each method on merit. Consider a hybrid approach if one approach is not suitable. Consider current technical, existing operational, and current application issue
12、s.,Applying the Changes to Data,You have a choice of techniques: Overwrite a record Add a record Add a field Maintain history Add version numbers,Overwriting a Record,Customer Id John Doe Single,.,.,Customer Id John Doe Married,Easy to implement Loses all history Not recommended,Adding a New Record,
13、1 Customer Id John Doe Single,History is preserved; dimensions grow. Time constraints are not required. Generalized key is created. Metadata tracks usage of keys.,Adding a Current Field,Customer Id John Doe Single,Customer Id John Doe Single Married 01-JAN-96,Maintains some history Loses intermediat
14、e values Is enhanced by adding an Effective Date field,Limitations of Methods for Applying Changes,Complete history impossible Dimensions may grow large Maintenance overhead,Maintaining History,Product,Time,Sales,HIST_CUST,CUSTOMER,One-to-many relationship Always retain current record Consistently a
15、ble to refer to record history,History Preserved,History enables realistic analysis. History retains context of data. History provides for realistic historical analysis. Model must be able to: Reflect business changes Maintain context between fact and dimension data Retain sufficient data to relate
16、old to new,Version Numbering,Avoid double counting Facts hold version number,Customer.CustId Version Customer Name 1234 1 Comer 1234 2 ComerSales.CustId Version Sales Facts 1234 1 11,000 1234 2 12,000,Customer,Sales,Product,Time,Purging and Archiving Data,As data ages, its value depreciates. Remove
17、old data from the warehouse: Archive for later use Purge without copy,Techniques for Purging Data,TRUNCATE: Retains no rollback DELETE: Retains redo and rollback ALTER TABLE: Removes a partition PL/SQL: Uses database triggers,Techniques for Archiving Data,Export to dump file from tables Import to ta
18、bles from dump file ALTER TABLE EXCHANGE partitions,EXP,.dmp,IMP,Verdict,Defined by business requirements Must be managed,Final Tasks,Update metadata ETT User Publish data Availability Changes Subject area basis Use database roles to prevent and allow access,Sources,Extract,Stage,Transform,Rules,Loa
19、d,Publish,Query,Publishing Data,Control access using database roles 24-hour operation may be requested Compromise between load and access Consider Staggering updates Using temporary tables Using separate tables,ETT Tool Selection Criteria,Overlap with existing tools Availability of meta model Suppor
20、ted data sources Ease of modification and maintenance Required fine tuning of code Ease of change control Power of transformation logic Level of modularization Power of error, exception, resubmission features Intuitive documentation Performance of code,ETT Tool Selection Criteria,Activity scheduling
21、 and sophistication Metadata generation Learning curve Flexibility Supported operating systems Cost,Transportation Tools,Informatica OpenBridge Oracle SQL*Loader Gateways PL/SQL Precompilers Platinum Technology InfoPump Platinum Info Transport,Replication Server Utilities,Oracle Symmetric and Hetero
22、geneous Replication,Gateways and Middleware,Brio Technology DataPrism Informatica Corporation OpenBridge Information Builders EDA/SQL Oracle Gateways Platinum Technology InfoHub Prism Prism Manager Software AG Entire Transaction Propagator,Summary,This lesson discussed the following topics: Capturin
23、g changed data Applying the changes Purging and archiving data Publishing the data, controlling access, and automating processes Identifying tools for transporting data into the warehouse,Practice 13-1 Overview,This practice covers the following topics: Identifying a series statements as true or false Answering a series of questions,