收藏 分享(赏)

ClusteringTemporalGeneExpressionData.ppt

上传人:yjrm16270 文档编号:7994073 上传时间:2019-06-02 格式:PPT 页数:20 大小:500.50KB
下载 相关 举报
ClusteringTemporalGeneExpressionData.ppt_第1页
第1页 / 共20页
ClusteringTemporalGeneExpressionData.ppt_第2页
第2页 / 共20页
ClusteringTemporalGeneExpressionData.ppt_第3页
第3页 / 共20页
ClusteringTemporalGeneExpressionData.ppt_第4页
第4页 / 共20页
ClusteringTemporalGeneExpressionData.ppt_第5页
第5页 / 共20页
点击查看更多>>
资源描述

1、Temporal Probabilistic Concepts from Heterogeneous Data Sequences,Title & Authors,Sally McClean Bryan Scotney Fiona Palmer,School of Information & Software Engineering, University of Ulster.,Gene Expression,Background,Scientists have now sequenced the entire human genome -approximately 30,000 genes.

2、 Each of these genes when active results in the production of a protein -proteins have a variety of functions. In order to understand the function of the genes, and the related proteins, scientists are interested in determining where and when the genes are active.,The steps involved in producing a p

3、rotein from a gene.,Gene (DNA),RNA,Protein,Background,Gene Expression Results.,The DNA microarray is a microscope slide which enables scientists to determine the activity or expression of genes Scientists place on each of the microarray spots an extract of the cells along with an extract from a refe

4、rence sample . The more RNA produced the more active the gene, (green for the sample and red for the reference). Fluorescence of the spot is then measured to give the expression of the gene compared to the reference.,The Gene Expression Data Set,Background,The gene expression data set analysed descr

5、ibes the expression of 112 genes in the rat cervical spinal cord over 9 time points through the development of the rat from embryo to adult. Only specific genes were analysed which are considered important in the development of the central nervous system in the rat.,E11,E13,E15,E18,E21,P0,P7,P14,A,E

6、mbryo: Days since conception,PostNatal: Days since birth,Adult,The temporal nature of the gene expression data,Clustering Mutual Information,Clustering,Clustering is usually based on a distance metric - in this case mutual information. Before clustering, the continuous gene expressions were discreti

7、sed by partitioning the expression into 3 equal sized bins.,Gene Expression Sequences for Cluster 3,The Clusters,In this paper we mainly use data from cluster 2.,The Process,The Process,1. Cluster,2.Learn Mappings,5. Learn Temporal Probabilistic Concepts,4. Learn Local Temporal Comcepts,Set of Seque

8、nces,Homogenised sequences,The steps used to learn the temporal semantics of sequences,Characterisation of the cluster,Clusters & Mappings,3. Map sequences,An Example,The Problem,The sequences are heterogeneous in the sense that they represent different attributes The codes (0, 1, or 2) should be re

9、garded as symbolic We re-label to emphasise this.,Gene Expression Data,Relabelled Gene Expression Data,Mappings,The schema mappings are between each sequence (local ontologies) and the hidden variable (L, M, N) We represent the underlying concept (global ontology) via a temporal probabilistic concep

10、t model.,Mapped sequences,Schema Mappings,Correspondence Graph for cluster containing sequences 1 and 2.,The Mapping Algorithm,Mapping,Choose one of the sequences whose number of symbols is maximal (S* say); these symbols act as a proxy for values of the global ontology. For each remaining sequence

11、Si, of length L, determine the mapping of the rth value of Si onto one of the values of the global ontology so as to maximise the number of co-occurrences. Repeat for each r and i. In the ith sequence, the value r is then mapped to a set of values (partial value), if it is not unique.,Concept Defini

12、tions,Concept Learning,The concepts we are concerned with may be thought of as symbolic objects which are described in terms of discrete-valued features, e.g.features: expression level, function with respective domains low, medium, high and growth, control. concept: C1=expression level = high; funct

13、ion = growthProbabilistic Concepts have been used to extend the definition of a concept to uncertain situations where we must associate a probability with the values of each feature vector e.g. C3 = expression level = high:0.8, expression level = medium:0.2, function = growth:1.0.,Local & Temporal P

14、robabilistic Concepts,Concept Learning,A local probabilistic concept (LPC) is defined on a time interval e.g.In time-interval S =t1, t2 we have a local probabilistic concept C4 = Time = S, expression level = high:0.8, expression level = medium:0.2,i.e. during time interval S there is a high expressi

15、on level with probability 0.8 and medium expression level with probability 0.2. A temporal probabilistic concept (TPC) is defined in terms of a time attribute with domain T = t1 , tk and discrete-valued features Xj, where Xj has domain Dj=vj1,Learning Local Probabilistic Concepts,Concept Learning,Th

16、e algorithm for learning LPCs takes account of the fact that the schema mappings may map a local value in the local ontology onto a set of global values (partial values) . We use the EM algorithm to learn LPCs with values that are expressed as a local probabilistic concepts.,Then, for example, using

17、 only the data at the eighth time point, (column 9 of Table 5) we obtain:,Iteration yields the solution,Learning Temporal Probabilistic Concepts,Concept Learning,Once we have learned the local probabilistic concepts, the next task is to learn the TPC. This is carried out using temporal clustering. T

18、his is done via log-likelihood ratios and chi-squared tests.,The values for the first two time points (columns) are identical so the distance d12 is zero and we combine LC1 and LC2 to form LC12. We now must decide whether LC12 should be combined with LC3 or whether LC3 is part of a new LPC. The dist

19、ance between LPC12 and LPC3 is then 1.193. Since this value is inside the chi-squared threshold, we therefore decide to combine LPC12 and LPC3 etc.,Cluster 2,Rat Gene Sequence Data,Cluster 2,Mapped Rat Data,Cluster 2,The LPCs and TPC,The LPCs there are 4 LPCs represented by the 4 colours,These clust

20、ers are then characterised by the local probabilistic concepts E11, E13: (0.961, 0, 0.039) E15: (0.28, 0. 0.72) E18, E21, P0, P14, A: (0, 0, 1) P7: (0, 0.154, 0.846),Conclusion,Conclusion,We have described a methodology for describing and learning temporal concepts from heterogeneous sequences that

21、have the same underlying temporal pattern. The data are heterogeneous with respect to classification schemes. However, because the sequences relate to the same underlying concept, the mappings between values may be learned. On the basis of these mappings we use statistical learning methods to descri

22、be the local probabilistic concepts. A temporal probabilistic concept that describes the underlying pattern is then learned. This concept may be matched with known genetic processes and pathways.,Further Work,Further Work,For the moment we have not considered performance issues since the problem we

23、have identified is both novel and complex. Our focus, therefore, has been on defining terminology and providing a preliminary methodology. In addition to addressing such performance issues, future work will also investigate the related problem of associating clusters with explanatory data. For example our gene expression sequences could be related to the growth process.,Temporal Probabilistic Concepts from Heterogeneous Data Sequences,Title & Authors,Sally McClean Bryan Scotney Fiona Palmer,School of Information & Software Engineering, University of Ulster.,

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 企业管理 > 管理学资料

本站链接:文库   一言   我酷   合作


客服QQ:2549714901微博号:道客多多官方知乎号:道客多多

经营许可证编号: 粤ICP备2021046453号世界地图

道客多多©版权所有2020-2025营业执照举报