收藏 分享(赏)

云计算时代的社交网络平台和技术_谷歌中国_张智威.ppt

上传人:hwpkd79526 文档编号:4455166 上传时间:2018-12-29 格式:PPT 页数:70 大小:7.79MB
下载 相关 举报
云计算时代的社交网络平台和技术_谷歌中国_张智威.ppt_第1页
第1页 / 共70页
云计算时代的社交网络平台和技术_谷歌中国_张智威.ppt_第2页
第2页 / 共70页
云计算时代的社交网络平台和技术_谷歌中国_张智威.ppt_第3页
第3页 / 共70页
云计算时代的社交网络平台和技术_谷歌中国_张智威.ppt_第4页
第4页 / 共70页
云计算时代的社交网络平台和技术_谷歌中国_张智威.ppt_第5页
第5页 / 共70页
点击查看更多>>
资源描述

1、12/29/2018,Ed Chang,1,云计算时代的社交网络 平台和技术,张智威 副院长, 研究院, 谷歌中国 教授, 电机工程系, 加州大学,12/29/2018,Ed Chang,2,180 million ( 25%),208 million ( 3%),60 million ( 90%),60 million ( 29%),500 million,180 million,600 k,Engineering,Graduates,Mobile Phones,Broadband Users,Internet,Population,China,U.S.,China Opportunity

2、China & US in 2006-07,72 k,72000,12/29/2018,Ed Chang,3,Google China,Size (700) 200 engineers 400 other employees Almost 100 interns Locations Beijing (2005) Taipei (2006) Shanghai (2007),12/29/2018,Ed Chang,4,Organizing the Worlds Information, Socially,社区平台 (Social Platform) 云运算 (Cloud Computing) 结论

3、与前瞻 (Concluding Remarks),12/29/2018,Ed Chang,5,Web 1.0,.htm,.htm,.htm,.jpg,.jpg,.doc,.htm,.msg,.htm,.htm,12/29/2018,Ed Chang,6,Web with People (2.0),.htm,.jpg,.doc,.xls,.msg,12/29/2018,Ed Chang,7,+ Social Platforms,.htm,.jpg,.doc,.xls,.msg,App (Gadget),App (Gadget),12/29/2018,Ed Chang,8,12/29/2018,E

4、d Chang,9,12/29/2018,Ed Chang,10,12/29/2018,Ed Chang,11,12/29/2018,Ed Chang,12,开放社区平台,12/29/2018,Ed Chang,13,12/29/2018,Ed Chang,14,12/29/2018,Ed Chang,15,12/29/2018,Ed Chang,16,12/29/2018,Ed Chang,17,开放社区平台,社区平台,12/29/2018,Ed Chang,18,12/29/2018,Ed Chang,19,12/29/2018,Ed Chang,20,开放社区平台,社区平台,12/29/

5、2018,Ed Chang,21,12/29/2018,Ed Chang,22,Social Graph,12/29/2018,Ed Chang,23,12/29/2018,Ed Chang,24,What Users Want?,People care about other people care about people they know connect to people they do not know Discover interesting information based on other people about who other people are about wh

6、at other people are doing,12/29/2018,Ed Chang,25,Information Overflow Challenge,Too many people, too many choices of forums and apps “I soon need to hire a full-time to manage my online social networks”Desiring a Social Network Recommendation System,12/29/2018,Ed Chang,26,Recommendation System,Frien

7、d Recommendation Community/Forum Recommendation Application Suggestion Ads Matching,12/29/2018,Ed Chang,27,Organizing the Worlds Information, Socially,社区平台 (Social Platform) 云运算 (Cloud Computing) 结论与前瞻 (Concluding Remarks),12/29/2018,Ed Chang,28,picture source: http:/www.sis.pitt.edu,(1)数据在云端不怕丢失不必备

8、份(2)软件在云端不必下载自动升级,(3)无所不在的云计算任何设备登录后就是你的(4)无限强大的云计算无限空间无限速度,业界趋势:云计算时代的到来,12/29/2018,Ed Chang,29,互联网搜索:云计算的例子,1. 用户输入查询关键字,Cloud Computing,2. 分布式预处理数据以便为搜索提供服务: Google Infrastructure (thousands of commodity servers around the world) MapReduce for mass data processing Google File System,3. 返回搜索结果,12/

9、29/2018,Ed Chang,30,Given a matrix that “encodes” data,Collaborative Filtering,12/29/2018,Ed Chang,31,Given a matrix that “encodes” data,Many applications (collaborative filtering):User CommunityUser UserAds UserAds Communityetc.,Users,Communities,12/29/2018,Ed Chang,32,Collaborative Filtering (CF)

10、Breese, Heckerman and Kadie 1998,Memory-based Given user u, find “similar” users (k nearest neighbors) Bought similar items, saw similar movies, similar profiles, etc. Different similarity measures yield different techniques Make predictions based on the preferences of these “similar” users Model-ba

11、sed Build a model of relationship between subject matters Make predictions based on the constructed model,12/29/2018,Ed Chang,33,Memory-Based Model Goldbert et al. 1992; Resnik et al. 1994; Konstant et al. 1997,Pros Simplicity, avoid model-building stage Cons Memory and Time consuming, uses the enti

12、re database every time to make a prediction Cannot make prediction if the user has no items in common with other users,12/29/2018,Ed Chang,34,Model-Based Model Breese et al. 1998; Hoffman 1999; Blei et al. 2004,Pros Scalability, model is much smaller than the actual dataset Faster prediction, query

13、the model instead of the entire dataset Cons Model-building takes time,12/29/2018,Ed Chang,35,Algorithm Selection Criteria,Near-real-time Recommendation Scalable Training Incremental Training is Desirable Can deal with data scarcityCloud Computing!,12/29/2018,Ed Chang,36,Model-based Prior Work,Laten

14、t Semantic Analysis (LSA) Probabilistic LSA (PLSA) Latent Dirichlet Allocation (LDA),12/29/2018,Ed Chang,37,Latent Semantic Analysis (LSA) Deerwester et al. 1990,Map high-dimensional count vectors to lower dimensional representation called latent semantic space By SVD decomposition: A = U VT,A = Wor

15、d-document co-occurrence matrix Uij = How likely word i belongs to topic j jj = How significant topic j is VijT= How likely topic i belongs to doc j,12/29/2018,Ed Chang,38,Latent Semantic Analysis (cont.),LSA keeps k-largest singular values Low-rank approximation to the original matrix Save space, d

16、e-noisified and reduce sparsityMake recommendations using Word-word similarity: T Doc-doc similarity: T Word-doc relationship: ,12/29/2018,Ed Chang,39,Probabilistic Latent Semantic Analysis (PLSA) Hoffman 1999; Hoffman 2004,Document is viewed as a bag of words A latent semantic layer is constructed

17、in between documents and words P(w, d) = P(d) P(w|d) = P(d)zP(w|z)P(z|d)Probability delivers explicit meaning P(w|w), P(d|d), P(d, w) Model learning via EM algorithm,12/29/2018,Ed Chang,40,PLSA extensions,PHITS Cohn & Chang 2000 Model document-citation co-occurrence A linear combination of PLSA and

18、PHITS Cohn & Hoffmann 2001 Model contents (words) and inter-connectivity of documents LDA Blei et al. 2003 Provide a complete generative model with Dirichlet prior AT Griffiths & Steyvers 2004 Include authorship information Document is categorized by authors and topics ART McCallum 2004 Include emai

19、l recipient as additional information Email is categorized by author, recipients and topics,12/29/2018,Ed Chang,41,Combinational Collaborative Filtering (CCF),Fuse multiple information Alleviate the information sparsity problem Hybrid training scheme Gibbs sampling as initializations for EM algorith

20、m Parallelization Achieve linear speedup with the number of machines,12/29/2018,Ed Chang,42,Notations,Given a collection of co-occurrence data Community: C = c1, c2, , cN User: U = u1, u2, , uM Description: D = d1, d2, , dV Latent aspect: Z = z1, z2, , zK Models Baseline models Community-User (C-U)

21、model Community-Description (C-D) model CCF: Combinational Collaborative Filtering Combines both baseline models,12/29/2018,Ed Chang,43,Baseline Models,Community-User (C-U) model,Community-Description (C-D) model,Community is viewed as a bag of usersc and u are rendered conditionally independent by

22、introducing zGenerative process, for each user u1. A community c is chosen uniformly2. A topic z is selected from P(z|c)3. A user u is generated from P(u|z),Community is viewed as a bag of wordsc and d are rendered conditionally independent by introducing zGenerative process, for each word d1. A com

23、munity c is chosen uniformly2. A topic z is selected from P(z|c)3. A word d is generated from P(d|z),12/29/2018,Ed Chang,44,Baseline Models (cont.),Community-User (C-U) model,Community-Description (C-D) model,Pros1. Personalized community suggestionCons 1. C-U matrix is sparse, may suffer from infor

24、mation sparsity problem2. Cannot take advantage of content similarity between communities,Pros1. Cluster communities based on community content (description words)Cons 1. No personalized recommendation2. Do not consider the overlapped usersbetween communities,12/29/2018,Ed Chang,45,CCF Model,Combina

25、tional Collaborative Filtering (CCF) model,CCF combines both baseline modelsA community is viewed as- a bag of users AND a bag of wordsBy adding C-U, CCF can perform personalized recommendation which C-D alone cannotBy adding C-D, CCF can perform better personalized recommendation than C-U alone whi

26、ch may suffer from sparsityThings CCF can do that C-U and C-D cannot- P(d|u), relate user to word- Useful for user targeting ads,12/29/2018,Ed Chang,46,Algorithm Requirements,Near-real-time Recommendation Scalable Training Incremental Training is Desirable,12/29/2018,Ed Chang,47,Parallelizing CCF,De

27、tails omitted,12/29/2018,Ed Chang,48,picture source: http:/www.sis.pitt.edu,(1)数据在云端不怕丢失不必备份(2)软件在云端不必下载自动升级,(3)无所不在的云计算任何设备登录后就是你的(4)无限强大的云计算无限空间无限速度,业界趋势:云计算时代的到来,12/29/2018,Ed Chang,49,Experiments on Orkut Dataset,Data description Collected on July 26, 2007 Two types of data were extracted Commun

28、ity-user, community-description 312,385 users 109,987 communities 191,034 unique English words Community recommendation Community similarity/clustering User similarity Speedup,12/29/2018,Ed Chang,50,Community Recommendation,Evaluation Method No ground-truth, no user clicks available Leave-one-out: r

29、andomly delete one community for each user Whether the deleted community can be recovered Evaluation metric Precision and Recall,12/29/2018,Ed Chang,51,Results,Observations:CCF outperforms C-UFor top20, precision/recall of CCF are twice higher than those of C-UThe more communities a user has joined,

30、 the better CCF/C-U can predict,12/29/2018,Ed Chang,52,Runtime Speedup,The Orkut dataset enjoys a linear speedup when the number of machines is up to 100 Reduces the training time from one day to less than 14 minutes But, what makes the speedup slow down after 100 machines?,12/29/2018,Ed Chang,53,Ru

31、ntime Speedup (cont.),Training time consists of two parts: Computation time (Comp) Communication time (Comm),12/29/2018,Ed Chang,54,CCF Summary,Combinational Collaborative Filtering Fuse bags of words and bags of users information Hybrid training provides better initializations for EM rather than ra

32、ndom seeding Parallelize to handle large-scale datasets,12/29/2018,Ed Chang,55,Chinas Contributions on/to Cloud Computing,Parallel CCF Parallel SVMs (Kernel Machines) Parallel SVD Parallel Spectral Clustering Parallel Expectation Maximization Parallel Association Mining Parallel LDA,12/29/2018,Ed Ch

33、ang,56,Speeding up SVMs NIPS 2007,Approximate Matrix Factorization Parallelization Open source 350+ downloads since December 07A task that takes 7 days on 1 machine takes 1 hours on 500 machines,12/29/2018,Ed Chang,57,Incomplete Cholesky Factorization (ICF),n x n,n x p,p x n,p n Conserve Storage,12

34、/29/2018,Ed Chang,58,Matrix Product,=,p x n,n x p,p x p,12/29/2018,Ed Chang,59,Organizing the Worlds Information, Socially,社区平台 (Social Platform) 云运算 (Cloud Computing) 结论与前瞻 (Concluding Remarks),12/29/2018,Ed Chang,60,Web With People,.htm,.htm,.htm,.jpg,.jpg,.doc,.xls,.msg,.msg,.htm,12/29/2018,Ed Ch

35、ang,61,What Next for Web Search?,Personalization Return query results considering personal preferences Example: Disambiguate synonym like fujiOops: several tried, the problem is hard Training data difficult to collect enough (for collaborative filtering) Computational intensive to support personaliz

36、ation (e.g., for personalizing page rank) User profile may be incomplete, erroneous,12/29/2018,Ed Chang,62,个人搜索 智能搜索,搜索“富士” 可返回 富士山 富士苹果 富士相机,12/29/2018,Ed Chang,63,12/29/2018,Ed Chang,64,12/29/2018,Ed Chang,65,12/29/2018,Ed Chang,66,12/29/2018,Ed Chang,67,Organizing Worlds Information , Socially,We

37、b is a Collection of Documents and People Recommendation is a Personalized, Push Model of Search Collaborative Filtering Requires Dense Information to be Effective Cloud Computing is Essential,12/29/2018,Ed Chang,68,References,1 Alexa internet. http:/ 2 D. M. Blei and M. I. Jordan. Variational metho

38、ds for the dirichlet process. In Proc. of the 21st international conference on Machine learning, pages 373-380, 2004. 3 D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993-1022, 2003. 4 D. Cohn and H. Chang. Learning to probabilistically i

39、dentify authoritative documents. In Proc. of the Seventeenth International Conference on Machine Learning, pages 167-174, 2000. 5 D. Cohn and T. Hofmann. The missing link - a probabilistic model of document content and hypertext connectivity. In Advances in Neural Information Processing Systems 13,

40、pages 430-436, 2001. 6 S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391-407, 1990. 7 A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomple

41、te data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1):1-38, 1977. 8 S. Geman and D. Geman. Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Transactions on Pattern recognition and Machine Intelligence, 6:721-7

42、41, 1984. 9 T. Hofmann. Probabilistic latent semantic indexing. In Proc. of Uncertainty in Arti cial Intelligence, pages 289-296, 1999. 10 T. Hofmann. Latent semantic models for collaborative filtering. ACM Transactions on Information System, 22(1):89-115, 2004. 11 A. McCallum, A. Corrada-Emmanuel,

43、and X. Wang. The author-recipient-topic model for topic and role discovery in social networks: Experiments with enron and academic email. Technical report, Computer Science, University of Massachusetts Amherst, 2004. 12 D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed inference for late

44、nt dirichlet allocation. In Advances in Neural Information Processing Systems 20, 2007. 13 M. Ramoni, P. Sebastiani, and P. Cohen. Bayesian clustering by dynamics. Machine Learning, 47(1):91-121, 2002.,12/29/2018,Ed Chang,69,References (cont.),14 R. Salakhutdinov, A. Mnih, and G. Hinton. Restricted

45、boltzmann machines for collaborative ltering. In Proc. Of the 24th international conference on Machine learning, pages 791-798, 2007. 15 E. Spertus, M. Sahami, and O. Buyukkokten. Evaluating similarity measures: a large-scale study in the orkut social network. In Proc. of the 11th ACM SIGKDD interna

46、tional conference on Knowledge discovery in data mining, pages 678-684, 2005. 16 M. Steyvers, P. Smyth, M. Rosen-Zvi, and T. Griths. Probabilistic author-topic models for information discovery. In Proc. of the 10th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 306

47、-315, 2004. 17 A. Strehl and J. Ghosh. Cluster ensembles - a knowledge reuse framework for combining multiple partitions. Journal on Machine Learning Research (JMLR), 3:583-617, 2002. 18 T. Zhang and V. S. Iyengar. Recommender systems using linear classi ers. Journal of Machine Learning Research, 2:

48、313-334, 2002. 19 S. Zhong and J. Ghosh. Generative model-based clustering of documents: a comparative study. Knowledge and Information Systems (KAIS), 8:374-384, 2005. 20 L. Admic and E. Adar. How to search a social network. 2004 21 T.L. Griffiths and M. Steyvers. Finding scientific topics. Proceed

49、ings of the National Academy of Sciences, pages 5228-5235, 2004. 22 H. Kautz, B. Selman, and M. Shah. Referral Web: Combining social networks and collaborative filtering. Communitcations of the ACM, 3:63-65, 1997. 23 R. Agrawal, T. Imielnski, A. Swami. Mining association rules between sets of items

50、in large databses. SIGMOD Rec., 22:207-116, 1993. 24 J. S. Breese, D. Heckerman, and C. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. In Proceedings of the Fourteenth Conference on Uncertainty in Artifical Intelligence, 1998. 25 M.Deshpande and G. Karypis. Item-based top-n recommendation algorithms. ACM Trans. Inf. Syst., 22(1):143-177, 2004.,

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 网络科技 > 网络与通信

本站链接:文库   一言   我酷   合作


客服QQ:2549714901微博号:道客多多官方知乎号:道客多多

经营许可证编号: 粤ICP备2021046453号世界地图

道客多多©版权所有2020-2025营业执照举报