大数据问题中的技术策略.ppt

上传人：杨桃文库

文档编号：4700324

上传时间：2019-01-07

格式：PPT

页数：17

大小：146.50KB

下载提示：本站仅提供存储空间/不修改/不编辑

1.请仔细阅读文档，确保文档完整性，对于不预览、不比对内容而直接下载带来的问题本站不予受理。
2.下载的文档，不会出现我们的网址水印。
3、该文档所得收入（下载+内容+预览）归上传者、原创作者；如果您是本文档原作者，请点此认领！既往收益都归您。

同意并开始全文预览

文档包含非法信息？点此举报后获取现金奖励！

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 文币 0人已下载

下载	加入VIP,免费下载

版权申诉 word格式文档无特别注明外均可编辑修改；预览文档经过压缩，下载后原文更清晰！ 立即下载

配套讲稿：: 如PPT文件的首页显示word图标，表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
特殊限制：: 部分文档作品中含有的国旗、国徽等图片，仅作为作品整体效果示例展示，禁止商用。设计者仅对作品中独创性部分享有著作权。
关键词：: 大数据问题中的技术策略.ppt

资源描述：: 1、大数据管理与数据质量 - 美国金融业中的对策,汪时奇（博士）处理速度容量限制数据质量,Overview,数据 = Data = 信息 (并非数字集合) 数据科学 (约)= 信息科学为何研究大数据? 因为相关产品(如硬盘, memory, CPU等)价格指数下降因为信息爆炸因为大数据导致许多新问题大数据研究是多学科的综合(IT, DM, BI, BA, ) 实业界对大数据问题的对策 (见下文),1. 数据库策略,1.1 Database (DB) performance 1.2 DB space,1.1 DB performance,Auditing 2 tables: a sm
2、all active ),1.2 DB space,Space arrangement for even distribution (e.g. 1 huge table uses a few data files) Cleaning procedure with defragment Partition design with cleaning plan,2. Applications (软件) (Java example),Using advanced language (e.g. Java or C#) 2.1 Memory（内存） 2.2 Disk/network space 2.3 P
3、erformance 2.4 Maintainability,2.1 Memory,Minimize big objects creation and coexistence GC (Garbage Collection) or null big objects once out of scope Choose appropriate GC type gc() Try to split one big object to small objects Use mutable class for frequently changed big objects (e.g. StringBuilder,
4、 instead of String),2.2 Disk/network space,Smart clean and archive processes e.g. archive zipped old or not used files to low speed network space and delete very old files from that space Smart logging settings e.g. log4j size rolling e.g. Avoid duplicated or trivial logging info Monitor for spaces,
5、2.3 Performance,Avoid redundant treatment (in big loops) Maximize reuse Multi-threading DB accessing Logging - avoid slow options (e.g. line #),2.4 Maintainability,SOA principles Lose coupling, reusability, granularity, modularity, composability, componentization, interoperability, JEE patterns (DAO
6、, DTO, Biz Delegation, ) Design patterns (23) and MVC Creation Structure Behavior (e.g. Visitor) OOP principles Abstraction, encapsulation, polymorphism, Open/Close,3. 数据质量控制,3.1 Business 3.2 Process A. Failover & DR (Disaster Recovery) B. QA (Quality Assurance) (see for details) C. UAT (User Accept
7、ance Test) 3.3 Technology,3.1 Business,Reduce manual work; Increase automation Complete approval system for manual work E.g. 1 level = 2 levels or 3 levels approval Extend view points to confirm data quality Reduce redundancy systems (e.g. due to merge, due to vendors) Schedule Cleansing (see detail
8、s) Enhance Reconciliation (see details) Build Trust level (see details) Try to cover all rare cases,3.1.E Cleansing,When At system merge At major change How Develop detection applications Deliver mismatch reports to IT & business Find solutions on both IT & business,3.1.F Reconciliation,Where 1+ sub
9、systems have data for same contents. 1+ subsystems have independent date change functionality. What Run & improve recon. app. routinely. Categorize reports by urgency. Analyze reports. Debug or adjust biz rule or apply Cleansing.,3.1.G Trust level,When At 1+ fixed data inputs Inputs are independent
10、Must decide final details from inputs How (based on) Provider level (for a detailed data group) Data history Samples: Bloomberg, Reuter, Telekurs, DTCC, ; Moody, S&P, Fitch.,3.2.A Failover & DR,Failover DB: 2+ at diff. locations; real-time replication App Active-Active: Cluster with Load Balancing A
11、ctive-Passive Auto (via SAN) Manual + Auto DR DB: e.g. daily or hourly or real-time replication App: Manual switch,3.3 Technology,DB design Constraint Check (for sensitive table values) Normalization (to reduce duplications) Validation processes (to find conflict data) Application design Data integration check E.g. cryptography signature E.g. CRC check Data display (e.g. Excel missing leading 0, date=num),

展开阅读全文

道客多多所有资源均是用户自行上传分享，仅供网友学习交流，未经上传用户书面授权，请勿作他用。

关于本文

本文标题：大数据问题中的技术策略.ppt
链接地址：https://www.docduoduo.com/p-4700324.html