收藏 分享(赏)

剪切位点预测方法.pdf

上传人:HR专家 文档编号:5904009 上传时间:2019-03-20 格式:PDF 页数:12 大小:142.54KB
下载 相关 举报
剪切位点预测方法.pdf_第1页
第1页 / 共12页
剪切位点预测方法.pdf_第2页
第2页 / 共12页
剪切位点预测方法.pdf_第3页
第3页 / 共12页
剪切位点预测方法.pdf_第4页
第4页 / 共12页
剪切位点预测方法.pdf_第5页
第5页 / 共12页
点击查看更多>>
资源描述

1、 Splice Site Tools A Comparative Analysis Report Beth Hellen Splice Site Tools Analysis Report November 2009 2 Contents Introduction 3 Methods 4 Results 5 Conclusions 9 References 10 Appendix 1 Variants found in literature 11 Splice Site Tools Analysis Report November 2009 3 Introduction Splicing is

2、 a process which modifies mRNA after transcription. It allows for introns to be removed and exons joined together to form mature mRNA, ready for translation into protein. The splice site junction, found where an intron meets an exon, contains multiple sequence motifs. These motifs provide signals to

3、 allow for correct splicing to occur. The best characterised of these are the acceptor and donor splice site signals. These signals consist of invariant dinucleotides at positions +1, +2, -1 and -2 of the intron and less well conserved nucleotides both within the immediate adjoining exonic sequence

4、and deeper into the intron from the +3 and -3 positions (Seif et al., 1979). The specific splicing of a gene can be easily affected by mutations in the sequence surrounding the splice site junction. This can lead to alternate splicing and thus adversely affect the translated protein (Novoyatleva et

5、al., 2006; Tazi et al., 2009). In-silico splice site prediction tools can be used to predict the effect of a genetic variant on splicing. A large number of prediction tools are currently available, either as standalone programs or as part of the Alamut (http:/www.interactive- or Human Splicing Finde

6、r (Desmet, 2009) interfaces. Some small analyses of these algorithms have been carried out, but no large scale analyses (Hartmann et al., 2008; Holler et al., 2009; Houdayer et al., 2008). Although the UV guidelines (Bell et al., 2007) provided by the CMGS (http:/www.cmgs.org/) suggest several splic

7、e site prediction algorithms, the performance of these algorithms have not been formally assessed and may give divergent results. This analysis aims to provide an assessment of the performance of these algorithms in the prediction of splicing-related variant pathogenicity. It will also assess the sc

8、ope of the splice-site prediction tools to ensure that they can be used in the most appropriate way. The analysis will allow scientists to use splice site prediction tools in the prediction of pathogenesis with more confidence. In this analysis, six of the most common donor and acceptor prediction a

9、lgorithms have been assessed for their ability to predict the pathogenicity of splice site variants. The algorithms chosen were those suggested by the UV guidelines, plus MaxEntScan, which are used as part of the Alamut and HSF splicing interfaces. The six algorithms were: GeneSplicer (Pertea et al.

10、, 2001), Human Splicing Finder (HSF) (Desmet et al., 2009), MaxEntScan (Yeo MaxEntScan was included because it is used in both the HSF and Alamut splicing interfaces. A set of 265 pathogenic variants and 15 non-pathogenic variants from a total of 180 genes (see figure 1 and appendix 1) were retrieve

11、d from the literature. These variants were used to assess the splice site prediction algorithms using their default settings and recommended lengths of sequence. Sensitivity (equation 1), specificity (equation 2) and accuracy (equation 3) were calculated, as were the standard errors for each of the

12、statistics. For the purposes of this analysis a true positive was defined as a pathogenic variant correctly classified as pathogenic and a true negative was a non-pathogenic variant correctly classified as non-pathogenic. A change in splice site signal of 10% was considered to predict a pathogenic e

13、ffect. (1) (2) (3) A second set of sensitivity, specificity and accuracy calculations were made for those variants which did not fall into the invariant di-nucleotide positions at -1, -2, +1, +2. The dataset consisted of 110 pathogenic variants and 15 non-pathogenic variants. The variants occurred i

14、n 83 different genes. This analysis will allow the algorithms to be assessed on their performance with the more difficult splice site variants. The UV guidelines for splice site analysis recommend the use of three prediction algorithms to give a consensus prediction. Combinations of three high perfo

15、rming algorithms were compared to determine whether the accuracy was improved. The criteria required to categorise a variant as pathogenic or non-pathogenic was that at least two of the algorithms must agree on the prediction. The accuracy scores were calculated and compared to those given by the si

16、ngle algorithms. To test the range of predictions made by the algorithms at each intronic position near the splice site junction, an in-silico analysis was performed. Thirteen acceptor and donor splice site junctions from BRCA1 and BRCA2 were analysed. Only junctions where the wild type splice site

17、signal was found by all four of the highest performing algorithms were used. The wild type base at each position from +1 to +10 or -1 to -10 was artificially mutated in-silico to each of the remaining 3 nucleotides and the proportional change in splice site signal given by each algorithm was recorde

18、d. The mean change in splice site prediction (equation 4) at each position was plotted for each algorithm. The mean change in splice site signal strength is described in equation 4, where SSM is the mutated splice site signal, SSWis the wild type splice site signal and N is the number of examples an

19、alysed. (4) Splice Site Tools Analysis Report November 2009 5 Results Pathogenic and non-pathogenic splice site related variants retrieved from the literature were found at a range of positions relative to the splice site junction (Figure 1). The majority of splice site related pathogenic mutations

20、used in this analysis were found within intronic positions between 1 and 10 nucleotides from the splice site junction. However, 40 of the variants were found in positions within the exon, and pathogenic mutations were also found at 100bp from the splice site junction. Only 15 non-pathogenic variants

21、 were found and they mainly occurred at positions further from the splice site junction. The small number of non-pathogenic variants arises from the problem of non-reporting of negative results. This is likely to increase the error associated with the specificity scores. -40 -20 0 20 400102030405060

22、70Intronic_positionFrequencyFigure 1 Chart showing the position of variants retrieved from the literature. Variants in exonic positions are shown at 0, variants 50bp from the splice site junction are binned and represented as a single frequency at 50bp from the splice site. Black lines represent the

23、 frequency of pathogenic variants and red lines represent the frequency of non-pathogenic variants. The sensitivity, specificity and accuracy scores showed that the four highest performing algorithms were NNSplice, MaxEntScan, GeneSplicer and SSFL (Figure 2). These algorithms achieved between 80 and

24、 92% accuracy and sensitivity. The specificity scores (between 73 and 93%) were less reliable due to the smaller number of variants tested. These four algorithms are those implemented through the Alamut interface. It is possible that the ease of interpretation of the results, when using the Alamut i

25、nterface, has influenced this result. With the HSF interface it was more difficult to determine the predicted difference in splice site signal. Splice Site Tools Analysis Report November 2009 6 00.10.20.30.40.50.60.70.80.91MaxEntScan(A)NNSplice(A)SSFL (A) GeneSplicer(A)MaxEntScan(H)HSF NetGene2AccSe

26、nSpecFigure 2 Accuracy, Sensitivity and Specificity values for each of the splice site prediction algorithms tested. Sensitivity measures the ability to predict pathogenic variants (TP) and specificity measures the ability to predict non-pathogenic variants (TN). The removal of variants occurring at

27、 +1, +2, -1 and -2 positions reduced the performance of the algorithms, as was expected (Figure 3). However, two algorithms (MaxEntScan & NNSplice) still achieved an accuracy score of 80%. Therefore it can be seen that these algorithms perform reasonably well, even with variants where it is more dif

28、ficult to predict the splicing effect. 00.10.20.30.40.50.60.70.80.91MaxEntScan(A)NNSplice(A)SSFL (A) GeneSplicer(A)MaxEntScan(H)HSF NetGene2AccSensSpecFigure 3 Accuracy, Sensitivity and Specificity values for each of the splice site prediction algorithms tested. Only variants which did not occur at

29、one of the +1, +2,-1 or -2 positions were analysed. The accuracy given by the consensus prediction of splice site signals was found to be between 86% and 92% for all combinations (Figure 4). The highest accuracy obtained through a consensus method was comparable to that given by MaxEntScan when impl

30、emented through Alamut. None of the consensus methods achieved an accuracy that was significantly higher than the individual algorithms. Splice Site Tools Analysis Report November 2009 7 0.850.860.870.880.890.90.910.921234GroupAccuracySSFL MES NNSPlice Genesplicer Group 1 X X X Group 2 X X X Group 3

31、 X X X Group 4 X X X Figure 4 The chart shows the accuracy obtained by combining results from three algorithms and using the consensus to predict pathogenicity of variants. The accompanying table describes the combinations of programs used in each consensus group. Genetic variants which occur in the

32、 invariant dinucleotides at -1, -2, +1 and +2 were predicted to always disrupt splice site signalling (Figure 5). This would be assumed by most users and so no further information is gained by using the splice site prediction tools at these positions. The algorithms were shown to be the most useful

33、for the prediction of both pathogenic and non-pathogenic splice site variants when applied to positions between +3 and +7 and -3 to at least -10 (Figure 5). At positions further from the splice site junction, no disruption in splice site signal was seen. The scope of these tools can therefore be def

34、ined as the prediction of the disruption of splice sites within these regions. The effect of variants on splice sites further than this cannot be predicted by any of the algorithms. The tools are, however, able to predict new splice sites at other positions. This could occur if the variant caused th

35、e sequence surrounding the new splice site to become a closer match to the statistical models used by the tools. Splice Site Tools Analysis Report November 2009 8 00.10.20.30.40.50.60.70.80.91-10 -5 0 5 10Intronic positionProportionalsignalstrengthSSFLMaxEntScanNNSpliceGeneSplicerFigure 5 Graphs sho

36、wing the proportional signal strength change on known splice sites when a mutation was introduced at positions in the intron between -1 and -10 or between +1 and +10. A score of 1 indicates that no disruption in the splice site signal was observed, a score of 0 indicates that the signal was complete

37、ly destroyed. Lines between points have been added to ease interpretation although the data is discrete. Splice Site Tools Analysis Report November 2009 9 Conclusions The four algorithms used in Alamut were shown to have a high degree of accuracy and users can be confident in the safe interpretation

38、 of these results as part of the assessment of a variant. It should still be noted that the algorithms alone are not sufficient evidence for a clinical decision. These algorithms, with the exception of SSFL, can be used as standalone web tools as well as via the Alamut interface. However, the result

39、s obtained through alternative implementations may differ, as shown by the MaxEntScan results obtained through Alamut and HSF. The range of splice site signal strength predictions given by the algorithms is determined by the position of the variant. At +1, +2, -1 or -2 the algorithms always predict

40、a large change in splice site signal, as would be predicted by experts. Variations in the wild type sequence further than +7 or -10 from the splice site junction do not cause any reduction in the wild type splice site signal predicted by the algorithms. Variants found between these two regions show

41、a range of splice site reduction predicted by the algorithms and it is in this range that the algorithms are likely to be the most useful. This mirrors the reduction in occurrence of pathogenic variants found in the literature at these positions. The algorithms are still useful for prediction of spl

42、ice site signals related to variants further into the intron, however it is only new splice sites which can be detected, not the reduction in wild type splice sites. Although the use of three different algorithms is suggested in the UV guidelines, the accuracy was not improved by using a consensus m

43、ethod, therefore there does not seem to be a need for this step. However, as the Alamut interface performs all four analyses simultaneously, it is easy to compare predictions without a formal consensus method. The Alamut interface also contains methods to predict splicing enhancer or silencer motifs

44、 (ESE, ESS etc.) and branch point motifs. These methods have not been assessed and as the mechanisms by which these motifs regulate splicing are less clearly understood, the methods should be only be used with caution. Splice Site Tools Analysis Report November 2009 10 References Bell, J., Bodmer, D

45、., Sistermans, E., Ramsden, S. (2007) Practice guidelines for the interpretation and reporting of unclassified variants in clinical molecular genetics. Available: http:/cmgs.org/BPGs/pdfs current bpgs/UV GUIDELINES ratified.pdf Brunak, S., Engelbrecht, J., Knudsen, S. (1991) Prediction of human mRNA

46、 donor and acceptor sites from the DNA sequence. J. Mol. Biol., 220:49-65. Desmet, F.O., Hamroun, D., Lalande, M., Collod-Broud, G., Claustres, M., Broud, C. (2009) Human Splicing Finder: an online bioinformatics tool to predict splicing signals. Nucleic Acids Res., 37(9):e67. Hartmann, L., Theiss,

47、S., Niederacher, D., Schaal, H. (2008) Diagnosis of pathogenic splicing mutations: does bioinformatics cover all bases? Front Biosci., 13:3252-72. Holla, . L., Nakken, S., Mattingsdal, M., Ranheim, T., Berge, K.E., Defesche, J.C., Leren, T.P. (2009) Effects of intronic mutations in the LDLR gene on

48、pre-mRNA splicing: Comparison of wet-lab and bioinformatics analyses. Mol. Genet. Metab., 96(4):245-252. Houdayer, C., Dehainault, C., Mattler, C., Michaux, D., Caux-Moncoutier, V., Pags-Berhouet, S., dEnghien, C.D., Laug, A., Castera, L., Cauthier-Villars, M., Stoppa-Lyonnet, D. (2008) Evalutation of in silico splice tools for decision-making in molecular diagnosis. Hum. Mutat., 29(7): 975-82. Novoyatleva, T., Tang, Y., Rafalska, I., Stamm, S. (2006) Pre-mRNA missplicing as a cause of human disease. Prog. Mol. Subcell. Biol., 44:27-46. Pertea, M., Lin, X., Salzberg, S.L.

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 企业管理 > 经营企划

本站链接:文库   一言   我酷   合作


客服QQ:2549714901微博号:道客多多官方知乎号:道客多多

经营许可证编号: 粤ICP备2021046453号世界地图

道客多多©版权所有2020-2025营业执照举报