1、Scaling the Worlds Largest Photo Blogging Community,Farhan “Frank” Mashraqi Senior MySQL DBA Fotolog, Inc. Credits: Warren L. Habib: CTO Olu King: Senior Systems Administrator,Introduction,Farhan Mashraqi Senior MySQL DBA Fotolog, Inc. Known on PlanetMySQL as Frank Mash Author of upcoming “Pro Ruby
2、on Rails” by Apress Contact Blog: http:/ http:/,What is Fotolog?,Social networking Guestbook comments Friend/ Favorite lists Members create “Social Capital” “One photo a day” Currently 25th most visited website on the Internet (Alexa) History http:/ (Screenshot of home page),Fotolog (Screenshot of
3、 a fotolog member page),Fotolog Growth,228 million member photos 2.47 billion guestbook comments 20% of members visit the site daily 24 minutes a day spent by an average user 10 guestbook comments per photo 1,000 people or more see a photo on average 7 million members and counting “explosive growth
4、in Europe” Italy and Spain among the fastest-growing countries Recently broke the 500K photos uploaded a day record 90 million page views,Fotolog Flickr,Technology,Sun Solaris 10 MySQL Apache Java / Hibernate PHP Memcached 3Par IBRIX StrongMail,MySQL at Fotolog,32 Servers Specification of servers Fo
5、ur “clusters” User GB PH FF,Non-persistent connections (PHP) Connection Pooling (Java) Mostly MyISAM initially Later mostly converted to InnoDB Application side table partitioning Memcache,Image Storage / Delivery,MySQL is used to store image metadata only 3Par (utility storage) Thin Provisioning (d
6、edicate on allocation vs. dedicate on write) How fast growing each day? Frequently Accessed vs. Infrequently accessed media Third party CDN: Akamai/Panther,Important Scalability Considerations,Do you really need to have 5 nines availability? Budget Time to deploy Testing Can we afford: SPF? Not havi
7、ng read redundancy? User PH GB FF Not having write redundancy? User PH GB FF,Partitioning,SHARD 1,SHARD 2,SHARD 3,Table_v1,Table_v2,Table_v3,Table_v4,Partitioning thoughts,Ideal distribution,GB current,db4 db18 db22 db23 db24 db25 db26 db27 db28 db30 db32,Application Servers,4,18,22,23,24,25,26,27,2
8、8,30,32,read,write,Single Point of Failure,GB Scalability,db4 db18 db22 db23 db24 db25 db26 db27 db28 db30 db32,Application Servers,4,18,22,23,24,25,26,27,28,30,32,read,write,00-08,09-17,18-26,27-35,36-44,45-53,54-62,63-71,72-80,81-89,90-99,Slave,Master/DRBD,Current Scheme for fl_db1 repl. PH,Applic
9、ation Servers,read,write,Slave,DB2,DB1,DB3,DB8,DB12,Application Servers Issuing PH Queries,RTX,Repl.,Repl.,Repl.,DB7,DB9,DB15,FSW,05DHN,AEK,16JOQUZ,28IP,_,39B,4C,7GLVY,M,DB10,DB11,DB13,DB14,DB16,29,FF. Repl.,Proposed Scheme for PH (Write & Read),Application Servers,7,8,9,10,11,12,13,14,15,16,29,read
10、,write,00-08,09-17,18-26,27-35,36-44,45-53,54-62,63-71,72-80,81-89,90-99,TO USER CLUSTER,AUTO-INC table lock contention,SEL,SEL,SEL,SEL,SEL,SEL,SEL,SEL,SEL,SEL,M Y S Q L,Thread concurrency,SELECTs do very well with Increased concurrency.,QPS: 500+,GOOD TIMES,AUTO-INC table lock contention,SEL,SEL,SE
11、L,SEL,SEL,INS,INS,M Y S Q L,Thread concurrency,As more SELECTs come, AUTO-INC lock contention Starts causing problem.,WARNING,SEL,SEL,SEL,AUTO-INC table lock contention,INS,SEL,INS,SEL,INS,INS,INS,INS,INS,INS,M Y S Q L,Thread concurrency,PROBLEM,SEL,SEL,SEL,SEL,INS,INS,INS,INS,INS,InnoDB Tablespace
12、Structure (Simplified),PK / CLUSTERED INDEX,SECONDARY INDEX,PK (clustered index key),6 byte header,Links together consecutive records & used in row-level locking,Clustered index contains Fields for all user-defined columns,6 byte trx id,7 byte roll pointer,6 byte row id,If no PK or UNIQUE NOT NULL d
13、efined,Record Directory,Array of Pointers to each field of the record 1 byte: If the total length of fields in record is 128 bytes 2 bytes: otherwise,Data part of record,InnoDB Index Structure (Simplified),DATA PAGE,PK INDEX / CLUSTERED INDEX,SECONDARY INDEX,PK,ROW DATA,PK,Old Schema,CREATE TABLE gu
14、estbook_v3 ( identifier bigint(20) unsigned NOT NULL auto_increment, user_name varchar(16) NOT NULL default , photo_identifier bigint(20) unsigned NOT NULL default 0, posted datetime NOT NULL default 0000-00-00 00:00:00, PRIMARY KEY (identifier), KEY guestbook_photo_id_posted_idx (photo_identifier,p
15、osted) ) ENGINE=MyISAM,Reads,Data pages,Data ordered by Identifier (PK)Looked up by secondary key,New Schema,CREATE TABLE guestbook_v4 ( identifier int(9) unsigned NOT NULL auto_increment, user_name varchar(16) NOT NULL default , photo_identifier int(9) unsigned NOT NULL default 0, posted timestamp
16、NOT NULL default 0000-00-00 00:00:00, PRIMARY KEY (photo_identifier,posted,identifier), KEY identifier (identifier) ) ENGINE=InnoDB 1 row in set (7.64 sec),Pending preads (Optimizing Disk Usage),Data pages,Data ordered by composite key consisting of photo_identifier (FK)Looked up by primary keyVery
17、low read requests per second,Pending reads / writes / Proposed,Throughput not as important as number of requests,Pending reads / writes / Proposed,Pending reads,MySQL Performance Challenges,Finding the source of problem Mostly disk bound in mature systems Is the query cache hurting you? RAM addition
18、 helps dodge the bullet Disk striping Restructuring tables for optimal performance LD_PRELOAD_64 = /usr/lib/sparcv9/libumem.so,Considerations for future growth,SQLite? File system? PostgreSQL? Make application better and optimize tables?,Things to remember,Know the problem Know your application Know your storage engine Know your requirements Know your budget,Questions?,