收藏 分享(赏)

云计算 - 应用科学的新机会.ppt

上传人:oceanpvg 文档编号:6525759 上传时间:2019-04-15 格式:PPT 页数:37 大小:753KB
下载 相关 举报
云计算 - 应用科学的新机会.ppt_第1页
第1页 / 共37页
云计算 - 应用科学的新机会.ppt_第2页
第2页 / 共37页
云计算 - 应用科学的新机会.ppt_第3页
第3页 / 共37页
云计算 - 应用科学的新机会.ppt_第4页
第4页 / 共37页
云计算 - 应用科学的新机会.ppt_第5页
第5页 / 共37页
点击查看更多>>
资源描述

1、Ewa Deelman, deelmanisi.edu www.isi.edu/deelman pegasus.isi.edu,Clouds: An Opportunity for Scientific Applications?,Ewa Deelman USC Information Sciences Institute,Ewa Deelman, deelmanisi.edu www.isi.edu/deelman pegasus.isi.edu,Acknowledgements,Yang-Suk Ki (former PostDoc, USC) Gurmeet Singh (former

2、Ph.D. student, USC) Gideon Juve (Ph.D. student, USC) Tina Hoffa (Undergrad, Indiana University) Miron Livny (University of Wisconsin, Madison) Montage scientists: Bruce Berriman, John Good, and others Pegasus team: Gaurang Mehta, Karan Vahi, others,Ewa Deelman, deelmanisi.edu www.isi.edu/deelman peg

3、asus.isi.edu,Outline,Background Science Applications Workflow Systems The opportunity of the Cloud Virtualization On-demand availability Simulation study of an astronomy application on the Cloud Conclusions,Ewa Deelman, deelmanisi.edu www.isi.edu/deelman pegasus.isi.edu,Ewa Deelman deelmanisi.edu,Sc

4、ientific Applications,Complex Involve many computational steps Require many (possibly diverse resources) Often require a custom execution environmentComposed of individual application components Components written by different individuals Components require and generate large amounts of data Compone

5、nts written in different languages,Ewa Deelman, deelmanisi.edu www.isi.edu/deelman pegasus.isi.edu,Issues Critical to Scientists,Reproducibility of scientific analyses and processes is at the core of the scientific method Scientists consider the “capture and generation of provenance information as a

6、 critical part of the generated data” “Sharing is an essential element of education, and acceleration of knowledge dissemination.”,NSF Workshop on the Challenges of Scientific Workflows, 2006, www.isi.edu/nsf-workflows06 Y. Gil, E. Deelman et al, Examining the Challenges of Scientific Workflows. IEE

7、E Computer, 12/2007,Ewa Deelman, deelmanisi.edu www.isi.edu/deelman pegasus.isi.edu,Computational challenges faced by applications,Be able to compose complex applications from smaller components Execute the computations reliably and efficiently Take advantage of any number/types of resources Cost is

8、 an issue Cluster, Shared CyberInfrastructure (EGEE, Open Science Grid, TeraGrid), Cloud,Ewa Deelman, deelmanisi.edu www.isi.edu/deelman pegasus.isi.edu,Possible solution,Structure an application as a workflow Describe data and components in logical terms Can be mapped onto a number of execution env

9、ironments Can be optimized and if faults occur the workflow management system can recover Use a workflow management system (Pegasus-WMS) to manage the application on a number of resources,Ewa Deelman, deelmanisi.edu www.isi.edu/deelman pegasus.isi.edu,Pegasus-Workflow Management System,Leverages abs

10、traction for workflow description to obtain ease of use, scalability, and portability Provides a compiler to map from high-level descriptions to executable workflows Correct mapping Performance enhanced mapping Provides a runtime engine to carry out the instructions (Condor DAGMan) Scalable manner R

11、eliable manner Can execute on a number of resources: local machine, campus cluster, Grid, Cloud,Ewa Deelman, deelmanisi.edu www.isi.edu/deelman pegasus.isi.edu,Mapping Correctly,Select where to run the computations Apply a scheduling algorithm for computation tasks Transform task nodes into nodes wi

12、th executable descriptions Execution location Environment variables initializes Appropriate command-line parameters set Select which data to access Add stage-in nodes to move data to computations Add stage-out nodes to transfer data out of remote sites to storage Add data transfer nodes between comp

13、utation nodes that execute on different resources,Ewa Deelman, deelmanisi.edu www.isi.edu/deelman pegasus.isi.edu,Additional Mapping Elements,Add data cleanup nodes to remove data from remote sites when no longer needed reduces workflow data footprint Cluster compute nodes in small computational gra

14、nularity applications Add nodes that register the newly-created data products Provide provenance capture steps Information about source of data, executables invoked, environment variables, parameters, machines used, performance Scale matters-today we can handle: 1 million tasks in the workflow insta

15、nce (SCEC) 10TB input data (LIGO),Ewa Deelman, deelmanisi.edu www.isi.edu/deelman pegasus.isi.edu,Science-grade Mosaic of the Sky,Image Courtesy of IPAC, Caltech,Point on the sky, area,Ewa Deelman, deelmanisi.edu www.isi.edu/deelman pegasus.isi.edu,*The full moon is 0.5 deg. sq. when viewed form Ear

16、th, Full Sky is 400,000 deg. sq.,Generating mosaics of the sky (Bruce Berriman, Caltech),Ewa Deelman, deelmanisi.edu www.isi.edu/deelman pegasus.isi.edu,Types of Workflow Applications,Providing a service to a community (Montage project) Data and derived data products available to a broad range of us

17、ers A limited number of small computational requests can be handled locally For large numbers of requests or large requests need to rely on shared cyberinfrastructure resources On-the fly workflow generation, portable workflow definition Supporting community-based analysis (SCEC project) Codes are c

18、ollaboratively developed Codes are “strung” together to model complex systems Ability to correctly connect components, scalability Processing large amounts of shared data on shared resources (LIGO project) Data captured by various instruments and cataloged in community data registries. Amounts of da

19、ta necessitate reaching out beyond local clusters Automation, scalability and reliability Automating the work of one scientist (Epigenomic project, USC) Data collected in a lab needs to be analyzed in several steps Automation, efficiency, and flexibility (scripts age and are difficult to change) Nee

20、d to have a record of how data was produced,Ewa Deelman, deelmanisi.edu www.isi.edu/deelman pegasus.isi.edu,Outline,Background Science Applications Workflow Systems The opportunity of the Cloud Virtualization Availability Simulation study of an astronomy application on the Cloud Conclusions,Ewa Deel

21、man, deelmanisi.edu www.isi.edu/deelman pegasus.isi.edu,Clouds,Originated in the business domain Outsourcing services to the Cloud Pay for what you use Provided by data centers that are built on compute and storage virtualization technologies. Scientific applications often have different requirement

22、s MPI Shared file system Support for many dependent jobs,Container-based Data Center,Ewa Deelman, deelmanisi.edu www.isi.edu/deelman pegasus.isi.edu,Available Cloud Platforms,Commercial Providers Amazon EC2, Google, others Science Clouds Nimbus (U. Chicago), Stratus (U. Florida) Experimental Roll ou

23、t your own using open source cloud management software Virtual Workspaces (Argonne), Eucalyptus (UCSB), OpenNebula (C.U. Madrid) Many more to come,Ewa Deelman, deelmanisi.edu www.isi.edu/deelman pegasus.isi.edu,Cloud Benefits for Grid Applications,Similar to the Grid Provides access to shared cyberi

24、nfrastructure Can recreate familiar grid and cluster architectures (with additional tools) Can use existing grid software and tools Resource Provisioning Resources can be leased for entire application instead of individual jobs Enables more efficient execution of workflows Customized Execution Envir

25、onments User specifies all software components including OS Administration performed by user instead of resource provider (good user control and bad extra work),Ewa Deelman, deelmanisi.edu www.isi.edu/deelman pegasus.isi.edu,Amazon EC2 Virtualization,Virtual Nodes You can request a certain class of

26、machine Previous research suggests 10% performance hit Multiple virtual hosts on a single physical host You have to communicate over a wide-area network Virtual Clusters (additional software needed) Create cluster out of virtual resources Use any resource manager (PBS, SGE, Condor) Dynamic configura

27、tion is the key issue,Ewa Deelman, deelmanisi.edu www.isi.edu/deelman pegasus.isi.edu,Personal Cluster,GT4/PBS,Batch Resources,Compute Clouds,Private Queue,System Queue,No Job manager,Resource & execution environment,Private Cluster on Demand,Work by Yang-Suk Kee at USC,Can set up NFS, MPI, ssh,Ewa

28、Deelman, deelmanisi.edu www.isi.edu/deelman pegasus.isi.edu,EC2 Software Environment,Specified using disk images OS snapshot that can be started on virtualized hosts Provides portable execution environment for applications Helps with reproducibility for scientific applications Images for a workflow

29、application can contain: Application Codes Workflow Tools Pegasus, DAGMan Grid Tools Globus Gatekeeper, GridFTP Resource Manager Condor, PBS, SGE, etc.,Ewa Deelman, deelmanisi.edu www.isi.edu/deelman pegasus.isi.edu,EC2 Storage Options,Local Storage Each EC2 node has 100-300 GB of local storage Used

30、 for image too Amazon S3 Simple put/get/delete operations Currently no interface to grid/workflow software Amazon EBS Network accessible block-based storage volumes (c.f. SAN) Cannot be mounted on multiple workers NFS Dedicated node exports local storage, other nodes mount Parallel File Systems (Lus

31、tre, PVFS, HDFS) Combine local storage into a single, parallel file system Dynamic configuration may be difficult,Ewa Deelman, deelmanisi.edu www.isi.edu/deelman pegasus.isi.edu,Montage/IPAC Situation,Provides a service to the community Delivers data to the community Delivers a service to the commun

32、ity (mosaics) Have their own computing infrastructure Invests $75K for computing (over 3 years) Appropriates $50K in human resources every year Expects to need additional resources to deliver services Wants fast responses to user requests,Ewa Deelman, deelmanisi.edu www.isi.edu/deelman pegasus.isi.e

33、du,Cloudy Questions,Applications are asking: What are Clouds? How do I run on them? How do I make good use of the cloud so that I use my funds wisely? And how do I explain Cloud computing to the purchasing people? How many resources do I allocate for my computation or my service? How do I manage dat

34、a transfer in my cloud applications? How do I manage data storagewhere do I store the input and output data?,Ewa Deelman, deelmanisi.edu www.isi.edu/deelman pegasus.isi.edu,Outline,Background Science Applications Workflow Systems The opportunity of the Cloud Virtualization Availability Simulation st

35、udy of an astronomy application on the Cloud Conclusions,Ewa Deelman, deelmanisi.edu www.isi.edu/deelman pegasus.isi.edu,Montage Infrastructure,Ewa Deelman, deelmanisi.edu www.isi.edu/deelman pegasus.isi.edu,Computational Model,Based on Amazons fee structure $0.15 per GB-Month for storage resources

36、$0.1 per GB for transferring data into its storage system $0.16 per GB for transferring data out of its storage system $0.1 per CPU-hour for the use of its compute resources Normalized to cost per second Does not include the cost of building and deploying an image Simulations done using a modified G

37、ridsim,Ewa Deelman, deelmanisi.edu www.isi.edu/deelman pegasus.isi.edu,How many resources to provision?,Montage 1 Degree Workflow 203 Tasks 60 cents for the 1 processor computation versus almost $4 with 128 processors, 5.5 hours versus 18 minutes,Ewa Deelman, deelmanisi.edu www.isi.edu/deelman pegas

38、us.isi.edu,4 Degree Montage,3,027 application tasks 1 processor $9, 85 hours; 128 processors, 1 hour with and $14.,Ewa Deelman, deelmanisi.edu www.isi.edu/deelman pegasus.isi.edu,Data Management Modes,Remote I/ORegularCleanup,0,1,2,Ra,Rb,Rb,Wb,Good for non-shared file systems,Wc,Rc,1.25GB versus 4.5

39、 GB,Ewa Deelman, deelmanisi.edu www.isi.edu/deelman pegasus.isi.edu,How to manage data?,1 Degree Montage 4 Degree Montage,Ewa Deelman, deelmanisi.edu www.isi.edu/deelman pegasus.isi.edu,How do data cost affect total cost?,Data stored outside the cloud Computations run at full parallelism Paying only

40、 for what you use Assume you have enough requests to make use of all provisioned resources,Cost in $,Ewa Deelman, deelmanisi.edu www.isi.edu/deelman pegasus.isi.edu,Where to keep the data?,Storing all of 2 Mass data 12 TB of data $1,800 per month on the Cloud Calculating a 1 degree mosaic and delive

41、ring it to the user $2.22 (with data outside the cloud) Same mosaic but data inside the cloud: $2.12 To overcome the storage costs, users would need to request at least $1,800/($2.22-$2.12) = 18,000 mosaics per month Does not include the initial cost of transferring the data to the cloud, which woul

42、d be an additional $1,200 Is $1,800 per month reasonable? $65K over 3 years (does not include data access costs from outside the cloud) Cost of 12TB to be hosted at Caltech $15K over 3 years for hardware,Ewa Deelman, deelmanisi.edu www.isi.edu/deelman pegasus.isi.edu,The cost of doing science,Comput

43、ing a mosaic of the entire sky (3,900 4-degree-square mosaics) 3,900 x $8.88 = $34,632 How long it makes sense to store a mosaic? Storage vs computation costs,Ewa Deelman, deelmanisi.edu www.isi.edu/deelman pegasus.isi.edu,Summary,We started asking the question of how can a scientific workflow best

44、make use of clouds Assumed a simple cost model based on the Amazon fee structure Conducted simulationsNeed to find balance between cost and performance Computational cost outweighs storage costs Storing data on the Cloud is expensive Did not explore issues of data security and privacy, reliability,

45、availability, ease of use, etc,Ewa Deelman, deelmanisi.edu www.isi.edu/deelman pegasus.isi.edu,Will scientific applications move into clouds?,There is interest in the technology from applications They often dont understand what are the implications Need tools to manage the cloud Build and deploy ima

46、ges Request the right number of resources Manage costs for individual computations Manage project costs Projects need to perform cost/benefit analysis,Ewa Deelman, deelmanisi.edu www.isi.edu/deelman pegasus.isi.edu,Issues Critical to Scientists,Reproducibility yesmaybe-through virtual images, if we

47、package the entire environment, the application and the VMs behave Provenance still need tools to capture what happened Sharing can be easier to share entire images and data Data could be part of the image,Ewa Deelman, deelmanisi.edu www.isi.edu/deelman pegasus.isi.edu,Relevant Links,Amazon Cloud: h

48、ttp:/ Pegasus-WMS: pegasus.isi.edu DAGMan: www.cs.wisc.edu/condor/dagman Gil, Y., E. Deelman, et al. Examining the Challenges of Scientific Workflows. IEEE Computer, 2007. Workflows for e-Science, Taylor, I.J.; Deelman, E.; Gannon, D.B.; Shields, M. (Eds.), Dec. 2006LIGO: www.ligo.caltech.edu/ SCEC: www.scec.org Montage: montage.ipac.caltech.edu/ Condor: www.cs.wisc.edu/condor/,

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 实用文档 > 课程设计

本站链接:文库   一言   我酷   合作


客服QQ:2549714901微博号:道客多多官方知乎号:道客多多

经营许可证编号: 粤ICP备2021046453号世界地图

道客多多©版权所有2020-2025营业执照举报