分享
分享赚钱 收藏 举报 版权申诉 / 17

类型大数据问题中的技术策略.ppt

  • 上传人:杨桃文库
  • 文档编号:4700324
  • 上传时间:2019-01-07
  • 格式:PPT
  • 页数:17
  • 大小:146.50KB
  • 配套讲稿:

    如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。

    特殊限制:

    部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。

    关 键  词:
    大数据问题中的技术策略.ppt
    资源描述:

    1、大数据管理与数据质量 - 美国金融业中的对策,汪时奇 (博士) 处理速度 容量限制 数据质量,Overview,数据 = Data = 信息 (并非数字集合) 数据科学 (约)= 信息科学 为何研究大数据? 因为相关产品(如硬盘, memory, CPU等)价格指数下降 因为信息爆炸 因为大数据导致许多新问题 大数据研究是多学科的综合(IT, DM, BI, BA, ) 实业界对大数据问题的对策 (见下文),1. 数据库策略,1.1 Database (DB) performance 1.2 DB space,1.1 DB performance,Auditing 2 tables: a sm

    2、all active ),1.2 DB space,Space arrangement for even distribution (e.g. 1 huge table uses a few data files) Cleaning procedure with defragment Partition design with cleaning plan,2. Applications (软件) (Java example),Using advanced language (e.g. Java or C#) 2.1 Memory(内存) 2.2 Disk/network space 2.3 P

    3、erformance 2.4 Maintainability,2.1 Memory,Minimize big objects creation and coexistence GC (Garbage Collection) or null big objects once out of scope Choose appropriate GC type gc() Try to split one big object to small objects Use mutable class for frequently changed big objects (e.g. StringBuilder,

    4、 instead of String),2.2 Disk/network space,Smart clean and archive processes e.g. archive zipped old or not used files to low speed network space and delete very old files from that space Smart logging settings e.g. log4j size rolling e.g. Avoid duplicated or trivial logging info Monitor for spaces,

    5、2.3 Performance,Avoid redundant treatment (in big loops) Maximize reuse Multi-threading DB accessing Logging - avoid slow options (e.g. line #),2.4 Maintainability,SOA principles Lose coupling, reusability, granularity, modularity, composability, componentization, interoperability, JEE patterns (DAO

    6、, DTO, Biz Delegation, ) Design patterns (23) and MVC Creation Structure Behavior (e.g. Visitor) OOP principles Abstraction, encapsulation, polymorphism, Open/Close,3. 数据质量控制,3.1 Business 3.2 Process A. Failover & DR (Disaster Recovery) B. QA (Quality Assurance) (see for details) C. UAT (User Accept

    7、ance Test) 3.3 Technology,3.1 Business,Reduce manual work; Increase automation Complete approval system for manual work E.g. 1 level = 2 levels or 3 levels approval Extend view points to confirm data quality Reduce redundancy systems (e.g. due to merge, due to vendors) Schedule Cleansing (see detail

    8、s) Enhance Reconciliation (see details) Build Trust level (see details) Try to cover all rare cases,3.1.E Cleansing,When At system merge At major change How Develop detection applications Deliver mismatch reports to IT & business Find solutions on both IT & business,3.1.F Reconciliation,Where 1+ sub

    9、systems have data for same contents. 1+ subsystems have independent date change functionality. What Run & improve recon. app. routinely. Categorize reports by urgency. Analyze reports. Debug or adjust biz rule or apply Cleansing.,3.1.G Trust level,When At 1+ fixed data inputs Inputs are independent

    10、Must decide final details from inputs How (based on) Provider level (for a detailed data group) Data history Samples: Bloomberg, Reuter, Telekurs, DTCC, ; Moody, S&P, Fitch.,3.2.A Failover & DR,Failover DB: 2+ at diff. locations; real-time replication App Active-Active: Cluster with Load Balancing A

    11、ctive-Passive Auto (via SAN) Manual + Auto DR DB: e.g. daily or hourly or real-time replication App: Manual switch,3.3 Technology,DB design Constraint Check (for sensitive table values) Normalization (to reduce duplications) Validation processes (to find conflict data) Application design Data integration check E.g. cryptography signature E.g. CRC check Data display (e.g. Excel missing leading 0, date=num),

    展开阅读全文
    提示  道客多多所有资源均是用户自行上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作他用。
    关于本文
    本文标题:大数据问题中的技术策略.ppt
    链接地址:https://www.docduoduo.com/p-4700324.html
    关于我们 - 网站声明 - 网站地图 - 资源地图 - 友情链接 - 网站客服 - 联系我们

    道客多多用户QQ群:832276834  微博官方号:道客多多官方   知乎号:道客多多

    Copyright© 2025 道客多多 docduoduo.com 网站版权所有世界地图

    经营许可证编号:粤ICP备2021046453号    营业执照商标

    1.png 2.png 3.png 4.png 5.png 6.png 7.png 8.png 9.png 10.png



    收起
    展开