1、1,第六章 蛋白质基本性质的分析2016-11-17,2,本章的主要内容,一、蛋白质理化性质的分析二、酶切图谱三、亲疏水性分析四、抗原性分析五、表面分析六、柔性分析七、二级结构预测,3,分析界面,The Assay Document is the file created and used by PROTEAN to examine and elucidate protein sequences. It is often referred to as simply an assay. The assay is composed of five principle parts: the stat
2、us area, the palette tools, the Method Curtain, the assay surface and the Legend Curtain. Click in the assay graphic below for more on each part.,Method curtain,Analysis surface,Legend curtain,4,Two selection,5,Add a known feature,6,Join a feature,7,Using the Microscope,A Microscope is provided for
3、situations where you want to see the sequence responsible for a methods result. This lets you stay zoomed out for the overall big picture, but still see critical residues in a region. Click the Microscope palette tool to activate its display window. You have your choice of displaying the sequence as
4、 chemical formula or space-filling models. Just click the appropriate button in the upper left corner of the Microscope window (as shown in the following figures). The arrow box in the lower left corner controls the size of the amino acid sequence characters and the vertical size of the Microscope w
5、indow. The arrow box in the lower right corner controls the horizontal size of the Microscope window. Drag either box to control the size of the mini-window. As the cursors position changes over the assay surface, PROTEAN updates the Microscope window to display residues underneath.You can examine t
6、he entire sequence by choosing either Linear Space Fill or Chemical Formula from the Model Structure submenu (under the Analysis menu).,8,Tabular Data,The first column shows an amino acid, the second its position and the remaining columns show the value assigned for each residue at the given positio
7、n. Double-click any value or column to show the parameter window for the method in question. Changing the order of plots on the assay rearranges the column summaries on the Tabular Data window.,9,Model structure - Helical Wheel,10,Model structure - Helical Net,11,Model structure - Beta Net,12,Model
8、structure Space Fill Model,13,Model structure Chemical Formula,14,Titration Curve,15,Composition,16,Method Outline .pao,Apply to other proteinsSave different analysis method and its preset patterns (line color, line weight, fill pattern, fill color),17,Protease map,Different protease sitesMSDegradat
9、ionPMF (Peptide Map Finger),18,Analysis methods,Protease mapPattern: Prosite database, Ariadne fileCharge densitySecondary Structure: Coiled Coil, Garnier-Robson, Deleage & Roux, Chou-FasmanHydropathy: Goldman-Engleman-Steitz, Kyte-Doolittle, Hopp-WoodsAntigenicity: Settle MHC Motifs, AMPHI, Rothbar
10、d-Taylor, Jameson-WolfAmphilicity EisenbergSurface Probability EminiFlexibility Karplus-Schulz,19,Charge density,电荷平均值法预测正电或负电区域是通过计算在特定范围内的残基的电荷数的加和来实现的。该方法结果是以一张平均电荷的峰图和两张电荷分布区域图来表示的,分别是正电区域和负电区域分布图。由于带电的残基有着位于蛋白表面的趋势,因此,这种方法有有助于预测蛋白的表面特征。 DNASTAR 软件使用White, Handler and Smith (1964)的pK值表来进行表面特征的预测
11、。同时,pK值表也被用于测定蛋白和多肽序列的滴定曲线,获得其等电点(pI)。 参数:用于计算电荷平均值的氨基酸残基的数量,即范围窗口的大小。如果窗口值设得太低,则高带电的残基(如赖氨酸和半胱氨酸)将产生很大的噪音;如果太高则得到的带电区域太少而只得到平平的区域图。pH: 用于计算每个残基的pK值;正电阈值用于表示正电区域的最小值,负电阈值则相反。 局限性:该方法可能对于参数的改变特别的敏感,因此,使用时可多尝试改变参数得到多个分布图,以显示细微的差别。,20,Charge density,21,Charge density,22,二级结构预测(1):Coiled-coil,该方法是根据Parr
12、y (1982)的方法进行缠绕卷曲结构的预测。 该结构由两个右旋螺旋以堆积角度为20度相互围绕一个左旋超卷曲形成的。这些螺旋是由侧链基团相互之间的亲水作用稳定着的。 螺旋的特征是非极性氨基酸残基以每圈3.5个的有规律的七联子周期排布形成。这种规律使得预测缠绕的卷曲结构成为可能。Myosins(肌球蛋白)和keratins(角蛋白)是这种四级结构元素的典型。,23,Coiled-coil,PROTEAN首先给每个氨基酸赋予相对的出现缠绕卷曲结构的频率。频率值来自一个做过统计的已知出现缠绕螺旋频率的数据库。获得相对频率值后,残基被分配到28个残基的滑动窗口。这一长度值是呈现稳定的缠绕卷曲结构的四个
13、和五个七联肽的最小值。将所有的窗口的值相乘,积取28为根的对数。每个残基的起始最高分为196。得到的数值以峰图或区域图表示。,24,Coiled-coil,参数:只有一个确定出现区域图的阈值变量。1.3被认为是已知的缠绕卷曲的最小值。如果是找象球蛋白中的不严格的缠绕卷曲结构可以降至1.1;如果是纤维蛋白的严格缠绕卷曲结构则可升至1.5。限制性:该方法是统计得到的,基于已知的翻译GenBank和已知的缠绕卷曲结构的数据。如果我们的蛋白与用于检验的测试值有很大的出入,这方法就不能准确地预测。再者,虽然我们也可以找到三个或四个的螺旋并排缠绕,但该方法还是不推荐用于其他类似非极性七联肽重复区的其他结构
14、。可以结合Goldman-Engleman-Steitz法预测螺旋的穿膜区,也可优化GES方法得到两性(amphipathy)分析。,25,二级结构预测(2): Garnier-Robson,该算法通过蛋白的氨基酸序列预测其二级结构。它是基于已知蛋白的晶体结构的统计方法,尤其是针对二面角 (C-N-C-C-N) 和氢键网络对-helical (H), -pleated sheet (E - extended chains), -turns (T) 和 coil (C)的区域进行优化。. Garnier-Robson方法主要考察存在与一定已知结构的特定残基的倾向性。在赋予最可能的残基构象之前,先
15、考察周围的16个残基(上8个,下8个),如果倾向于某种特定的结构,初始的残基就会被归于那种类型的结构,否则,重新评估成其它型的结构。,26,Garnier-Robson,Parameters:The only parameter involved in Garnier-Robson predictions is the use of decision constants. If you assign decision constants, you should have prior knowledge of the circular dichroism data. No Decision Co
16、nstants makes no assumptions of the global a-helix and, sheet content. Calculated Decision Constants are computer derived from global a-helix and sheet probabilities. Constants added are based on three protein classes: proteins with less than 20%, between 20% and 50% and over 50% a-helix or -sheet.
17、Specified Decision Constants are user defined, from circular dichroism(CD) data.Limitations:This method is a statistical approach, based on observed residue patterns in 25 proteins. If your protein differs substantially from proteins used to establish the model secondary structure frequencies, this
18、method may give inaccurate structural predictions.,27,二级结构预测(3): Deleage & Roux,预测分为三步:蛋白类型预测,二级结构预测和优化参数的再预测。这种双预测法,是单独预测蛋白的类型,从而预测其二级结构的倾向性。第一步:是根据蛋白的氨基酸组成对蛋白质结构进行分类归属。分类法与Nakasima (1986)相同,除了/类蛋白,根据其与和类的相似性被分为 /-和/-两个亚类。 第二步:是根据Kabsch and Sander (1983)的二级结构数据库中四种类型(helix,sheet,turn and coil)的出现几率
19、进行预测的。第三步:每个预测区域基于蛋白类型预测的如下原则进行优化: (1) 二级结构预测的最大准确性maximal accuracy in secondary structure prediction, (2) 二级结构中预测的和观察到的结构的吻合程度maximal agreement between predicted and observed content in secondary structure and (3) 在给定类型的大多数蛋白的预测准确性不丧失。no loss of accuracy for the majority of proteins in a given clas
20、s.,28,Deleage & Roux,Parameters: Using any class prediction other than Computer Calculated forces the method to group the protein into a given class. Use the Computer Calculated parameter unless you have prior knowledge to force a class prediction.Limitations: If the algorithm successfully predicts
21、the class of a protein, structural predictions are somewhat more accurate than methods which do not make a class assumption (original Garnier-Robson or original Chou-Fasman). If a protein does not fall into a discreet class, prediction accuracy can be compromised.,29,二级结构预测(4): Chou-Fasman,Chou and
22、Fasman (1990)算法是利用蛋白质的氨基酸序列对其二级结构进行预测,是基于已知的蛋白质晶体结构的统计学方法。最初,它对15个蛋白质的2473 AA位于-helix、-sheet和coil的出现次数进行列表统计。由此再通过考察蛋白质中特定残基的相对频率计算得到每个残基的构象参数、出现在特定二级结构类型的几率和那种二级结构出现残基类型的分数。构象参数实际上是对特定残基出现在-helix、-sheet和coil(后来的-turn)的偏嗜性。后来, Chou and Fasman将分析工作增加到29个蛋白质结构(4741 AA),再后来Chou增加到64个结构和四种蛋白类型。即Alpha, B
23、eta, Alpha+Beta and Alpha/Beta. Representatives of the Alpha类主要是含大量的 -helix和少许的-sheet ,以Hemoglobins and cytochromes 为典型。Beta类含大量的-sheet 和少量或不含a-helix, 以 immunoglobulins and serine proteases为典型. Alpha+Beta类含-helix和-sheet,并且位于独立的结构域。Alpha/Beta类是含 -helix和-sheet ,但混合出现在相同的结构域,以 dehydrogenases and kinase
24、s为典型. 默认的参数来自64个蛋白结构。该算法的原理很简单。使用构象参数,我们可发现核心位点、确定是螺旋或是折叠、两端进行序列延伸,当出现更大倾向于另一中结构时停止延伸,此时确定的类型也终止了,再重复其他类型的判断直至完成整个序列的估算。转角区域默认为那些既没有-螺旋也没有-折叠形成的区域。,30,Chou-Fasman,Parameters: All parameters are radio button toggles. You can only use one of these conformational parameters at a time (although you can
25、have multiple plots using other conformational parameters). 29 Proteins uses the conformational parameters from the 29 protein structures. 64 Proteins uses the conformational parameters of 64 protein structures. Alpha Class, Beta Class, Alpha+Beta Class and Alpha/Beta Class use the conformational pa
26、rameters from Chous class assignments (1990). The two constant parameters Alpha Region Threshold: and Beta Region Threshold: are constants which limit whether a region is defined as a-helix or sheet. If Pa 103 and Pa P a region is predicted as helical. If P105 and P Pa a region is predicted as a bet
27、a sheet. These default values are from Chou-Fasman (1978), where Pa = 1.03 and P= 1.05 respectively. These thresholds apply only to the 29 Proteins and 64 Proteins methods.,31,Chou-Fasman,Limitations: This algorithm is statistical; that is, it looks at the probability that a given protein exists in
28、a given secondary structure element. Inaccuracies can arise if your protein sequences is substantially dissimilar from test case proteins and test case protein families. Chou predict 80% accuracy in assigning secondary structure using the 64 protein method.,32,二级结构预测(5): PSIPRED,生物信息学网站(http:/bioinf
29、.cs.ucl.ac.uk/psipred/)提供的基于神经网络搜索工具PSI-BLAST的PSIPRED蛋白二级结构预测方法,结果返回到学术性(商业后缀的信箱不可用)的Email信箱。,33,二级结构预测(5): PSIPRED,生物信息学网站(http:/bioinf.cs.ucl.ac.uk/psipred/)提供的基于神经网络搜索工具PSI-BLAST的PSIPRED蛋白二级结构预测方法,结果返回到学术性(商业后缀的信箱需申请密码)的Email信箱,34,PSIPRED,参考文献,35,PSIPREDhttp:/bioinf.cs.ucl.ac.uk/psipred/,36,Hydro
30、pathy: Goldman-Engleman-Steitz,该方法是使用Goldman, Engleman and Steitz (1986)模型对可能穿越细胞膜的非极性-螺旋进行预测。非极性螺旋的搜索是根据蛋白质序列上氨基酸残基极性指数进行的。极性指数则是根据非极性双层脂膜区域与待考察残基相互作用的程度得到的。出于自由能的考虑,该方法不考虑卷曲和-折叠的类型,因为这样的结构是不可能由细胞质插入到双层脂膜中的。基于残基的亲水和疏水成分以及位于-螺旋上的每一个残基的表面区域,赋于每个氨基酸的跨膜自由能值。因此,这种方法用于预测跨膜螺旋结构比只根据残基亲疏水性指数来判断的方法来得准确。,37,G
31、oldman-Engleman-Steitz,Parameters: The only parameter is how many Residues to Average when constructing each point in the plot. The default is 20, to reflect a common transmembrane alpha helix. Factors playing into the choice of window length are the hydrophobic length of the bilayer and the orienta
32、tion of a potential helix within it. 21 residues are required to span a lipid bilayer 30 long, as the interval between residues in an a-helix is 1.5.Limitations: This algorithm is much more accurate at predicting membrane bound helices than other hydropathy methods, especially if the helix contains
33、polar residues. It makes no attempt at predicting other structural elements and would not be accurate for non-helical membrane bound structures. Non-helical membrane bound structures are not very common, due to the high free energy costs of folding in a polar environment and functioning in a non-pol
34、ar environment.,38,Hydropathy: Kyte-Doolittle,Kyte and Doolittle (1982) 预测法是通过蛋白质的氨基酸序列来预测的。亲疏水性被定义成与水分子亲和的程度,正值为亲水的(hydrophilic) ,负值为疏水的(hydrophobic) 。其值为按预定窗口值进行所有氨基酸残基亲疏水值的平均值。作图时平均值标在窗口的中央位置。残基亲疏水值是由水蒸发自由能以及残基侧链内部和外部分布情况来计算的。,39,Kyte-Doolittle,Parameters: The only variable is how large of a wind
35、ow to average hydropathy values over. The author recommends 7 to 11 residues, as anything less than seven will display too much noise and anything greater than 11 will miss small hydropathic regions.Limitations: When Kyte and Doolittle created their hydropathy table, they decided not to use water to
36、 ethanol transfer free energies, as ethanol is not necessarily a neutral, non-interacting solvent. For each residue, they looked at water-vapor transfer free energies and interior-exterior distribution of side-chains . When these values conflicted, the final hydropathy value was determined by averag
37、ing, other empirical data or the authors somewhat arbitrary choices. For common residues, more data was available for determining hydropathy. For less common residues (Tyr, His, Pro, Trp), this becomes more problematic.,40,Hydropathy: Hopp-Woods,Hopp-Woods(1981)法是通过搜索蛋白质中的最大的局部亲水区来寻找蛋白的抗原决定簇。这位置被认为是
38、抗原决定簇的所在或者其附近区域。利用亲水性进行抗原决定簇搜索,作者给出两个根据: 1) 抗原决定簇常位于高度暴露在溶剂中的区域; 2) 带电的亲水性侧链常出现在抗原决定簇中。每个残基被赋予一定的亲水值,并以六肽为窗口值进行平均。除了发生修饰的Asp、Glu和Pro外,每个氨基酸残基的亲疏水值是根据Levit制定的表获得的。对已知抗原决定簇的蛋白进行测试实验显示,最高亲水值的峰准确地预测了抗原的位点,而较低的峰实际上可能是也可能不是抗原位点。,41,Hopp-Woods,Parameters: The default Residues to Average is set at seven, wh
39、ile the original method uses six. There is very little difference in antigenic site prediction with this change, except that our change requires a slightly larger antigenic site. The authors do not recommend setting this value below five, as too many regions are falsely predicted, or above eight, as
40、 averaging residues creates a curve too smooth to locate areas of potential antigenic sites.Limitations: Not all antigenic sites correlate with hydrophilicity. The highest peak has the greatest chance of being an antigenic site if it is correlated with hydrophilicity. Lower peaks tend to become nois
41、e. If your protein is substantially divergent from test proteins used to determine hydrophilicity values, predictions may be in error.,42,Antigenicity: Sette MHC Motifs,Sette MHC II motif 是 Sette, A. (1989)提出的方法,用于预测能否与鼠MHC II单体蛋白相互作用的多肽抗原位点。包括两个方法,一用于IAd单体表位的预测,另一用于IEd。其中IAd方法是基于鸡卵清蛋白的一段序列模式 327-33
42、2 (V, H, A, A, H, A)进行预测的,考察多肽与IAd单体链的结合以及不同碱基替代后影响亲和力等情况。从一套62个六肽库信息中,可建立与IAd单体链结合的蛋白序列模体。基于已知实验数据和Dayhoff取代模式以及位于六肽的位置,每个残基 可获得1, 2或3的分数。3分表示残基位于这种模体的可能性很大,1分则可能性较小。六肽中的每个残基都获得分数后,在将所有的组分分数相乘,如果区域得分400则被认为是IAd阳性区域。 IEd模体预测法是根据IEd抗原结合位点的通用模式,该模式来自62个六肽组成的已知抗原决定簇数据库。IEd识别模式包括1,2,3位是碱性氨基酸(R, H, K) ,4
43、,6位也是碱性氨基酸,并且5位是不带电氨基酸 (A, C, F, G, I, L, M, P, S, T, V, Y)。 符合这样原则的残基在区域图中被赋予正值。利用IAd和IEd的MHC II模体预测法在已知IAd和IEd 表位的蛋白验证准确率超过75%。,43,Sette MHC Motifs,Parameters: The only parameter is whether to allow one deleterious substitution or not for the IAd motif. If you allow one substitution, you set the t
44、hreshold for a positive IAd region at 400. If you dont allow a deleterious substitution, the threshold is set to 729 (36). There are no parameters which affect the IEd method.Limitations: This method is intended to find x and y motifs for the d haplotype of mouse MHC II proteins only. It is unlikely
45、 to be affective for alternate haplotypes or non-mouse MHC II proteins. It can be an effective tool for designing peptide epitopes when used in conjunction with AMPHI and the Rothbard - Taylor method.,44,Antigenicity: AMPHI,根据 Margalit and Berzofsky (1987)的模型,可用于预测蛋白一级序列中存在的辅助T淋巴细胞的免疫优势表位。大多数已知的辅助T细
46、胞抗原位点都是由两性的螺旋结构组成的:即,螺旋的一面是极性表面,另一面是非极性表面。这些疏水性是周期性分布的,对应与一个或310螺旋结构的典型键角。-螺旋的周期率是100(也就是每圈3.6个残基),310螺旋则是120。首先,根据Faucher and Pliska的疏水值表将一级氨基酸序列转换成疏水值序列;其次,疏水值序列分成相互重叠的7个或11个肽段块;再者,检索每个疏水块与两性螺旋周期排布的符合程度; 最后,由PROTEAN识别出稳定或不稳定的两性螺旋结构。 结果可显示为一张疏水图、一个AMPHI区域图、一个螺旋分布图和/或3-10螺旋图(亲疏水性指数)和一张AMPHI强度图。 AMPH
47、I强度图用于确定预测的亲疏水性区域的相对强度。根据已知的T细胞抗原位点,作者发现结合Faucher-Pliska亲疏水性指数以及对强度图进行正弦最小平方拟合可得到 最为准确的结果。,45,AMPHI,Parameters: The only parameter is a sensitivity threshold. Use the default Low Amphipathic Content for most protein sequences. Low Amphipathic Content uses a minimum amphipathic index of 4 to create t
48、he region graph from three (or more) consecutive regions of length 7. For proteins exhibiting a high amphipathic content, use Highly Amphipathic. Highly Amphipathic uses an amphipathic index of 8 to create the region plot for three (or more) consecutive regions of length 11. Limitations: The underly
49、ing assumption of the AMPHI method is that T-cell antigenic sites are composed of amphipathic helices. While the majority of known T-cell epitopes contain amphipathic helices, this method will miss those which do not. It is also limited to T-cell epitopes only. B-cell antigenic sites exhibit a large
50、r tertiary character which this method does not attempt to resolve.,46,Antigenicity: Rothbard-Taylor,Rothbard & Taylor方法 (1988)可用于查找潜在的包含有通用模式序列的T淋巴细胞抗原决定簇。经过蛋白降解过程后,潜在的T淋巴细胞表位可结合到主要组织相容性复合体的I型或II型分子上(MHC I和MHC II)。从已知的抗原决定簇数据库中,作者发现被MHC I或MHC II分子呈递的潜在的T细胞表位都具有一个通用模式序列(46个中有37个具有此特征)。该模式序列具有4或5个残基。4残基形式是:残基1:甘氨酸或charged;残基2:疏水氨基酸;残基3:疏水氨基酸;残基4:极性氨基酸或甘氨酸。5残基形式是:残基1:甘氨酸或极性氨基酸;残基2:疏水氨基酸;残基3:疏水氨基酸;残基4:疏水氨基酸或脯氨酸;残基5:极性氨基酸或甘氨酸。带电残基为D, E, H, K, R;疏水氨基酸残基为A, V, L, I, F, M, W, T, Y;极性氨基酸残基为D, E, H, R, K, N, Q, S, T。一些氨基酸残基可能会存在于超过一个的性质分类中。T细胞模式表位寻找的结果以区域图(region plot)的方式显示。,