1、Automatic Annotation of Gene Lists from Literature Analysis,Xin HeBeespace Annual Workshop05/21/2009,Annotating Gene Lists,Enrichment of Gene Ontology Terms,Enrichment test based on these numbers,In the given gene list,In the background,Limitations of GO Analysis,GO annotations of all genes involve
2、substantial manual effortsRapid growth of literature: constantly add new functions to existing genesCoverage is not even in all areas. E.g. ecology and behavior; medicine; anatomy and physiology; etc.,Literature-based Analysis,Enrichment of terms: if a term is associated with many genes in the input
3、 list, this term is likely important for this list. Need to account for the expected term occurrences by chance: a term may occur in a gene, but not important.,Gene-term matrix: the count of terms in the documents of a gene.,Overview of Gene List Annotator,Document Retrieval for Genes,Input: a list
4、of gene identifiersYeast: SGD idsFruit fly: FlyBase idsMouse: MGI idsMapping genes to synonyms: use Entrez Gene database (manually created synonyms)Document collection: choose or create one from BeespaceRetrieve documents in the collection that match at least one synonym,Statistical Method (I),Intui
5、tion: For a gene i, if the term count xi is significantly higher than expected by chance (determined by 0 and di), then the term may be related to the gene i; If there are many genes related to the term, then this term is enriched in the given gene list.,Statistical Method (II),Dataset distribution:
6、 Poisson(;d),Reference distribution: Poisson(0;d),Model: whether a gene is related to the term is unknown, so assume the term count xi follows the mixture of two Poisson distributions.,Likelihood ratio test: on the observed term counts, mixture distribution vs null distribution (reference distributi
7、on only),Interactive Analysis (I),Output control,Significant Concepts,Relevant Statistics,Information of Input Genes,Choose concepts,Interactive Analysis (II),User-selected concepts,Genes containing the selected concepts,Term counts in genes, and link to documents,Applications,Test case 1. bee genes
8、 differentially expressed in brain in different species during behavior maturationBroadly consistent with the results from GO enrichment analysisIdentify interesting genesTest case 2. bee genes up-regulated in brain by the methoprene treatment (inducing behavior maturation)GO enrichment analysis: no
9、 significant termsA theme about myosin is overrepresented: may suggest neuron growth and movement, or remodeling, during behavior maturationSee Beespace v4 Demo for details: 1pm, Friday,Summary,Not limited to a controlled vocabulary (GO) Even for concepts covered by GO, a broader notation of term re
10、levance (gene-term co-occurrence in literature)Possible to retrieve the supporting documents for further explorationNot meant to substitute GO-based analysis, but a complementary tool,Acknowledgement,Bruce Schatz,Software support: Xu Ling, Jing Jiang, Brant Chee, David ArcoeloBiological evaluation: Moushumi Sen Sarma, Amy Toth,Gene Robinson,Chengxiang Zhai,