1、面向校园的活动及社交网络聚合平台关键字:自动分类、SNS 聚合、NoSQL 、Realtimethe Rainbowfish ProjectOpen-Source under GPL v2 license学校:上海市市西中学辅导机构:上海市市西中学、英才俱乐部作者:施闻轩辅导教师:王纪华作者: 市西中学 施闻轩辅导老师: 市西中学 王纪华1 / 29目录1. 摘要 .22. 前言 .32.1 项目由来 32.2 设计目的 32.3 第三方技术和架构 .32.4 项目贡献 43. SNS 现状 53.1 商业社交网络缺陷 .53.2 成熟 SNS 及产品对比 .54. 项目设计 .64.1 数据
2、库(及可见度功能 )基本结构设计 .64.2 前端全异步架构设计 114.3 前端资源托管 .124.4 其他版式输出架构设计 134.5 I18N 架构设计 .144.6 信息流融合 .154.7 自动分类 .164.8 实时消息架构设计 175. 进展与改进 205.1 项目进展 .205.2 未来改进 .206. 参考文献 .216.1 官方文档 .216.2 其他资料 .217. 附录 .227.1 与 UCHome 性能对比 .227.2 源代码与协议 .271. 摘要本项目基于 PHP + MongoDB 构建了一个新型校园社交网络平台,具有以下特点:(1) 面向单独校园,突出校园
3、活动能够使学校内部的独有资源得到最大化利用,同学能直观了解正在或即将进行的活动,为学校活动的宣传和组织提供了便捷的渠道(2) 融合现有社交网络信息流(如新浪微博、人人网)用户在平台上只需简单几步即可将现有社交圈连接到本平台上,这样用户在平台上能兼顾已有的社交圈,并且不需要分散精力在多个社交网络中(3) 自动对信息进行分类能够帮助用户对发布的信息进行自动归类,这样就能允许用户查找 Ta 所感兴趣的同类信息,增强平台上有价值信息的利用率(4) 信息能指定发布圈(如公开、某社团可见、某班可见,或者某几个人可见)大大降低信息噪音,并能加强用户的隐私保护(5) 高效率的信息流操作硬件部署成本低,使用普通
4、服务器即可承载大量用户同时使用(6) 非常高的用户体验实现了一个 Full-Ajax 架构(目前网站几乎都只实现局部 Ajax) ,用户在使用平台时可以得到非常流畅的使用体验,增加了用户亲和力另外这个全异步架构在编程中能够非常容易地处理异常,并且页面开发难度相比较传统网页几乎没区别,具有非常高的开发先进性和使用价值(7) 开放源代码本项目完全开放源代码(GPL v2 协议) 。任何人都可以学习本项目中感兴趣的技术实现,修改后用在 Ta 自己的项目中。对于有类似需求的开发人员来说是一份非常宝贵的参考。3 / 292. 前言2.1 项目由来(1) 当前校园活动组织方面: 同学花很多在人人网等网站上
5、,却从不围观学校官网(上的活动信息) 校园活动多,但宣传往往只有广播或海报两种渠道 借助 SNS 的活动宣传难以持续关注,信息很容易被冲走因此,校园中活动的组织经常遇到宣传效果不佳、组织过程复杂等情况。这在很多中学尤其是大学中非常普遍。(2) 现有社交网络中信息利用方面: 社交网络中充斥着大量无价值信息(心情、吐槽状态等占 50% 以上) 无法进行信息的定向获取,虽然有很多高质量资源却不能有效利用 个人拥有多个社交网络圈趋势显著,但大多数人往往只能顾及其中一个 现有社交网络对个人隐私保护性极差,所有信息近乎公开地传播因此,本项目试图通过构建一个全新的平台来重点解决如上两方面问题。2.2 设计目
6、的 帮助学校更有效组织开展活动 提高信息(重复)利用效率 加强用户隐私保护、减少信息噪音 提供更广泛、更内部的教学资源交流平台 使用户能够在一个平台上兼顾到多个 SNS2.3 第三方技术和架构(1) 本项目使用或构建于以下成熟编程语言或架构: php (语言) JavaScript (语言) HTML (语言) NodeJs (服务器) MySQL (数据库 ) MongoDB (数据库)(2) 本项目借助或基于以下成熟技术或规范实现: Ajax (通讯) HTML5, CSS3 (前端) OAuth (接口) SQL (后端) NoSQL (后端)(3) 本项目中部分功能使用了现成的开源模块
7、来实现: SCWS (中文分词库) phpThumb require dirname(_FILE_) . /template/error.php;echo json_encode(array(html = array(html/feed.php),css = array(css/_shadow.css,css/feed.css),js = array(js/rf.feed.js),nameSpace = rf.feed,cache = false);(3) Sub Pages每一个子页面可选的可以有多个 js 或 css 或 html,必须有一个命名空间(requirement page 中的
8、 nameSpace 给出) 。initialize()函数被调用时候第一个参数为页面对象【不是页面上下文对象 】 。页面对象中包含了 css 和 html 内容。initialize()函数一般需要启用页面对象中的 CSS 对象,然后将html 内容追加到 Main Frame Page 中约定的 DOM 里。(4) 页面执行流程 当点击一个符合条件(以 ?开头)的超级链接时候,将会被框架的 Hook 拦截 使用 history.pushState() 函数修改地址栏 框架将按照链接内容加载 requirement page,然后取消默认的 行为 当 requirement page 中所有
9、的 js js: rf.cNameSpace.getResource(ID);4.4 其他版式输出架构设计为了兼容手机以及在 IE 系列浏览器上显示正常,本项目设计了多版式输出的架构。在/web/index.php 中有下列代码用于不同 UserAgent 下的重定向:/对于不同浏览器重定向到不同页面function redirect($type = )/获取域名部分preg_match(“/./+.?./+$/“, $_SERVERSERVER_NAME, $matches);$type = empty($type) ? : ($type . .);$rURI = $_SERVERREQUE
10、ST_URI;$dest = ;/获取 sub pageif ($pos = strpos($rURI, ?) != false)/获取 query$p2 = strpos($rURI, ?, $pos + 1);if ($p2 != false)$dest = substr($rURI, 2, $p2 - 2) . .php . substr($rURI, $p2);else$dest = substr($rURI, 2) . .php;/重定向Header(HTTP/1.1 303 See Other);Header(Location: http:/ . $type . $matches0
11、 . / . $dest);exit();$userAgent = strtolower($_SERVERHTTP_USER_AGENT);if (preg_match(/(juc|opera*mini|mmp|wap|phone)/i, $userAgent)redirect(3g);if (preg_match(/(iphone|ipod|aspen|nexus|android|blackberry|webos|symbian|msie)/i, $userAgent)redirect(m);这样的架构具有高兼容性优势,如:在正常版面上查看 UID=1 的用户的个人信息的 URL 是:htt
12、p:/ / 29else$lang = substr($_SERVERHTTP_ACCEPT_LANGUAGE, 0, 5);if (stripos($lang, zh-cn) != false)define(ENV_LANGUAGE, zh-cn);elseif (stripos($lang, zh) != false)define(ENV_LANGUAGE, zh-tw);elseif (stripos($lang, en) != false)define(ENV_LANGUAGE, en);elsedefine(ENV_LANGUAGE, zh-cn);接下来程序载入相关语言文件(如/I
13、18N/zh-cn.php ) 。而在代码中则可以调用 l()来实现显示相关语言内容的功能。l()函数可便捷高效和兼容性地输出语言内容,实现如下(/include/func.core.php):/I18Nfunction l($constant)if (defined(I18N_.$constant)$msg = constant(I18N_.$constant);/需要格式化if (func_num_args() 1)$param = func_get_args();$param0 = $msg;$msg = call_user_func_array(sprintf, $param);els
14、e$msg = $constant;return $msg;4.6 信息流融合当一个用户建立起一个深厚的社交圈以后,想要再让 Ta 建立另一个社交圈是比较困难的。为了能够吸引同学,增强用户黏度,本项目的平台提出“信息流融合”概念。在平台上,用户授权后可以同时收到下列社交圈的信息流: 平台本身的信息流 新浪微博 人人网 (Facebook 域名 进行中 关系网可视化2011/11 项目开源,界面重设计5.2 未来改进(1) 强化权限检验(2) 进一步完善基于关键字的信息分类(3) 增强信息搜索能力(4) 前端用户体验细节完善(5) 平台压力测试(6) 适用于手机的阅读版式输出21 / 296.
15、 参考文献6.1 官方文档1 http:/www.mongodb.org/display/DOCS/Home MongoDB Documentation2 http:/www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-RetrievingaSubsetofFields Advanced Queries3 http:/ phpThumb4 http:/ Uploadify Documentation5 http:/ TinyMCE6 http:/ 新浪微博开放平台文档7 http:/ 人人网开放平台文档6.2 其他资料8
16、http:/ SNS 站点数据库设计及实现9 http:/ django+celery+RabbitMQ 实现异步执行10 http:/ 数据库探讨之一 为什么要用非关系数据库?11 http:/ Comet,下一代 Ajax?12 http:/ MongoDB:下一代 MySQL?13 http:/ MongoDB 身上的优势和劣势14 http:/ 视觉中国的 NoSQL 之路:从 MySQL 到 MongoDB15 http:/ 安全警示,一个简单的 MongoDB 注入16 http:/ what is Node.js? A ready-to-code server17 http:/
17、Node.js18 http:/www.danielbaulig.de/socket-ioexpress/ socket.io and Express. Tying it all together.19 http:/ Socket.IO - how do I get a list of connected sockets/clients?20 http:/showmetheco.de/articles/2011/8/socket-io-for-backend-developers.htmlSocket.IO for backend developers21 http:/ MongoDB in
18、SNS22 http:/ MySQL 性能优化的最佳 20+条经验23 http:/ Benchmarking Node.js - basic performance tests against Apache + PHP24 http:/www.williamlong.info/archives/1839.html 中文分词和 TF-IDF25 http:/ Started With node.js and socket.io26 http:/ Blazing fast node.js: 10 performance tips from LinkedIn Mobile27 http:/ Err
19、or calling method on NPObject28 http:/ 再设计 Redesign29 http:/ 的 MySQL 库之 Pdo-Mysql 与 Mysqli 性能对比30 http:/blog.roga.tw/2010/06/%E6%B7%BA%E8%AB%87-php-mysql-php-mysqli-pdo-%E7%9A%84%E5%B7%AE%E7%95%B0/ 淺談 PHP-MySQL, PHP-MySQLi, PDO 的差異31 https:/developer.mozilla.org/en/XMLHttpRequest XMLHttpRequest7. 附录
20、7.1 与 UCHome 性能对比作者对于本项目和 UCHome 在发布消息的性能上进行了对比。UCHome 是康盛创想的一款商业社交网络产品,号称在国内具有最高的占有率。康盛创想开发团队开发了著名的 Discuz!论坛,现在在腾讯旗下。(1) 对比内容发布一条消息,所用时间的对比(2) 对比方法在脚本的超时时间内(30s)测试发布 10000 条信息(UCHome 由于 10000 次发布的执行时间超过脚本超时时间 30s,因此修改为单次测试发布 5000 条) 。每次实验不清空数据库数据,每次实验只排除网络因素,保留各种数据处理、数据统计等正常流程。每次耗时除以发布信息数可以得出两个数据:
21、 发布一条信息的用时 发布信息用时随着数据库数据量递增的变化趋势(3) 测试环境CPU: AMD Athlon II X4 640内存: 4GiB (可用 2.5GiB 左右)系统: Windows 7 x64 Ultimate Service Pack 1服务端软件: (均为最新稳定版) Apache, PHP, MySQL, MongoDB客户端软件: Chrome Dev 15(4) 测试方案 修改 UCHome 如下代码以进行批量测试 (已框选为修改处)/source/cp_doing.php说明:USER_FEED_publishAct()函数第一个参数为发布信息的 UID(1003
22、8 为专门开辟的测试用账号),第二个参数为信息类型,第三个参数为信息内容。本项目与 UCHome 处于相同的测试条件下、并且具有相同的数据处理需求(除了网络,其他因素均没有省略) ,得出的数据具有很高可比性。(5) 测试结果Rainbowfish(Rainbowfish 为本项目开发代号) :次数 总耗时(s) 平均(s) 次数 总耗时(s) 平均(s)10000 4.60117 0. 10000 4.71247 0.10000 4.76196 0. 10000 4.68191 0.10000 4.84419 0. 10000 4.84327 0.10000 4.64277 0. 10000
23、4.94757 0.10000 4.66301 0. 10000 4.75731 0.10000 4.68892 0. 10000 4.77129 0.10000 4.66920 0. 10000 4.77279 0.10000 4.65137 0. 10000 4.81140 0.10000 4.64829 0. 10000 4.82133 0.10000 4.70876 0. 10000 5.10476 0.10000 4.64284 0. 10000 4.91124 0.10000 4.71951 0. 10000 4.93127 0.10000 4.72105 0. 10000 4.9
24、1437 0.10000 4.69447 0. 10000 4.85588 0.10000 4.81828 0. 10000 4.87649 0.UCHome:次数 总耗时(s) 平均(s) 次数 总耗时(s) 平均(s)5000 17.19175 0. 5000 17.08805 0.25 / 295000 17.50795 0. 5000 17.05507 0.5000 17.16375 0. 5000 17.17792 0.5000 17.23577 0. 5000 17.19521 0.5000 17.23892 0. 5000 17.10053 0.5000 17.15758 0.
25、5000 17.13560 0.5000 17.32604 0. 5000 17.45339 0.5000 17.28276 0. 5000 17.13872 0.5000 17.21697 0. 5000 17.06915 0.5000 17.22403 0. 5000 17.06108 0.5000 17.27535 0. 5000 17.14723 0.5000 17.26896 0. 5000 17.05464 0.5000 17.20626 0. 5000 17.12236 0.5000 17.01037 0. 5000 17.02148 0.5000 17.04836 0. 500
26、0 17.00944 0.5000 17.12405 0. 5000 17.19343 0.5000 17.02451 0. 5000 17.07113 0.5000 17.04895 0. 5000 17.07530 0.5000 17.23523 0. 5000 17.04302 0.5000 17.33500 0. 5000 17.70589 0.5000 17.17614 0. 5000 17.13724 0.5000 17.07520 0. 5000 17.06881 0.5000 18.02079 0. 5000 17.10981 0.5000 17.14642 0. 5000 1
27、7.18011 0.5000 17.08830 0. 5000 17.15144 0.5000 17.89255 0. 5000 17.03605 0.5000 17.32945 0. 5000 17.13040 0.5000 17.07474 0. 5000 17.18802 0.5000 17.06996 0. 5000 17.07259 0.5000 17.36765 0.上图: UCHome 与 Rainbowfish 发布一条信息耗时对比(ms)(6) 测试结论 本项目和 UCHome 所使用的架构 都具有较高的稳定性 ,在非常多数据的情况下耗时改变不显著。 (注意到,UCHome
28、中有几次不稳定数据,耗时偏高,原因可能为数据库缓冲写入磁盘导致延迟) 。 本项目在信息发布上的性能比 UCHome 高出约 700%27 / 297.2 源代码与协议本项目平台源码基于 GNU v2 协议开放。 项目主页: http:/ SVN Checkout: http:/ License:GNU GENERAL PUBLIC LICENSETERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION0. This License applies to any program or other work which cont
29、ains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The “Program“, below, refers to any such program or work, and a “work based on the Program“ means either the Program or any derivative work under copyright law: that is to say, a
30、 work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term “modification“.) Each licensee is addressed as “you“.Activities other than copying, distribution and mo
31、dification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is t
32、rue depends on what the Program does.1. You may copy and distribute verbatim copies of the Programs source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the not
33、ices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program.You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee.
34、2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions:a) You must cause the modified files to carr
35、y prominent notices stating that you changed the files and the date of any change.b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the ter
36、ms of this License.c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warrant
37、y (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Progr
38、am is not required to print an announcement.)These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply
39、 to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and
40、thus to each and every part regardless of who wrote it.Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program.In
41、addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License.3. You may copy and distribute the Program (or a work based on it,
42、under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following:a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium cu
43、stomarily used for software interchange; or,b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distri
44、buted under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or,c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you receiv
45、ed the program in object code or executable form with such an offer, in accord with Subsection b above.)The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, p