1、Nucleic Acids Research 2010 Database Issue,ACLAME: A CLAssification of Mobile genetic Elements, update 2010,ISbrowser: an extension of ISfinder for visualizing insertion sequences in prokaryotic genomes,Insertion sequences (ISs) are among the smallest and simplest autonomous transposable elements. I
2、Sfinder (http:/www-is.biotoul.fr/) is a dedicated IS database which assigns names to individual ISs to maintain a coherent nomenclature, an IS repository including 3000 individual ISs from both bacteria and archaea and provides a basis for IS classification. Each IS is indexed in ISfinder with vario
3、us associated pieces of information (the complete nucleotide sequence, the sequence of the ends and target sites, potential open reading frames, strain of origin, distribution in other strains and available bibliography) and classified into a group or family to provide some insight into its phylogen
4、y. ISfinder also includes extensive background information on ISs and transposons in general. Online tools are gradually being added. At present, it is difficult to visualize the global distribution of ISs in a given bacterial genome. Such information would facilitate understanding of the impact of
5、these small transposable elements on shaping their host genome. Here we describe ISbrowser (http:/www-genome.biotoul.fr/ISbrowser.php), an extension to the ISfinder platform and a tool which permits visualization of the position, orientation and distribution of complete and partial ISs in individual
6、 prokaryotic genomes.,RegPrecise: a database of curated genomic inferences of transcriptional regulatory interactions in prokaryotes,The RegPrecise database (http:/regprecise.lbl.gov) was developed for capturing, visualization and analysis of predicted transcription factor regulons in prokaryotes th
7、at were reconstructed and manually curated by utilizing the comparative genomic approach. A significant number of high-quality inferences of transcriptional regulatory interactions have been already accumulated for diverse taxonomic groups of bacteria. The reconstructed regulons include transcriptio
8、n factors, their cognate DNA motifs and regulated genes/operons linked to the candidate transcription factor binding sites. The RegPrecise allows for browsing the regulon collections for: (i) conservation of DNA binding sites and regulated genes for a particular regulon across diverse taxonomic line
9、ages; (ii) sets of regulons for a family of transcription factors; (iii) repertoire of regulons in a particular taxonomic group of species; (iv) regulons associated with a metabolic pathway or a biological process in various genomes. The initial release of the database includes 11 500 candidate bind
10、ing sites for 400 orthologous groups of transcription factors from over 350 prokaryotic genomes. Majority of these data are represented by genome-wide regulon reconstructions in Shewanella and Streptococcus genera and a large-scale prediction of regulons for the LacI family of transcription factors.
11、 Another section in the database represents the results of accurate regulon propagation to the closely related genomes.,PROSITE, a protein domain database for functional characterization and annotation,PROSITE consists of documentation entries describing protein domains, families and functional site
12、s, as well as associated patterns and profiles to identify them. It is complemented by ProRule, a collection of rules based on profiles and patterns, which increases the discriminatory power of these profiles and patterns by providing additional information about functionally and/or structurally cri
13、tical amino acids. PROSITE is largely used for the annotation of domain features of UniProtKB/Swiss-Prot entries. Among the 983 (DNA-binding) domains, repeats and zinc fingers present in Swiss-Prot (release 57.8 of 22 September 2009), 696 (70%) are annotated with PROSITE descriptors using informatio
14、n from ProRule. In order to allow better functional characterization of domains, PROSITE developments focus on subfamily specific profiles and a new profile building method giving more weight to functionally important residues. Here, we describe AMSA, an annotated multiple sequence alignment format
15、used to build a new generation of generalized profiles, the migration of ScanProsite to Vital-IT, a cluster of 633 CPUs, and the adoption of the Distributed Annotation System (DAS) to facilitate PROSITE data integration and interchange with other sources. The latest version of PROSITE (release 20.54
16、, of 22 September 2009) contains 1308 patterns, 863 profiles and 869 ProRules. PROSITE is accessible at: http:/www.expasy.org/prosite/.,PANTHER version 7: improved phylogenetic trees, orthologs and collaboration with the Gene Ontology Consortium,Protein Analysis THrough Evolutionary Relationships (P
17、ANTHER) is a comprehensive software system for inferring the functions of genes based on their evolutionary relationships. Phylogenetic trees of gene families form the basis for PANTHER and these trees are annotated with ontology terms describing the evolution of gene function from ancestral to mode
18、rn day genes. One of the main applications of PANTHER is in accurate prediction of the functions of uncharacterized genes, based on their evolutionary relationships to genes with functions known from experiment. The PANTHER website, freely available at http:/www.pantherdb.org, also includes software
19、 tools for analyzing genomic data relative to known and inferred gene functions. Since 2007, there have been several new developments to PANTHER: (i) improved phylogenetic trees, explicitly representing speciation and gene duplication events, (ii) identification of gene orthologs, including least di
20、verged orthologs (best one-to-one pairs), (iii) coverage of more genomes (48 genomes, up to 87% of genes in each genome; see http:/www.pantherdb.org/panther/summaryStats.jsp), (iv) improved support for alternative database identifiers for genes, proteins and microarray probes and (v) adoption of the
21、 SBGN standard for display of biological pathways. In addition, PANTHER trees are being annotated with gene function as part of the Gene Ontology Reference Genome project, resulting in an increasing number of curated functional annotations.,MEROPS: the peptidase database,Peptidases, their substrates
22、 and inhibitors are of great relevance to biology, medicine and biotechnology. The MEROPS database (http:/merops.sanger.ac.uk) aims to fulfil the need for an integrated source of information about these. The database has a hierarchical classification in which homologous sets of peptidases and protei
23、n inhibitors are grouped into protein species, which are grouped into families, which are in turn grouped into clans. The classification framework is used for attaching information at each level. An important focus of the database has become distinguishing one peptidase from another through identify
24、ing the specificity of the peptidase in terms of where it will cleave substrates and with which inhibitors it will interact. We have collected over 39 000 known cleavage sites in proteins, peptides and synthetic substrates. These allow us to display peptidase specificity and alignments of protein su
25、bstrates to give an indication of how well a cleavage site is conserved, and thus its probable physiological relevance. While the number of new peptidase families and clans has only grown slowly the number of complete genomes has greatly increased. This has allowed us to add an analysis tool to the
26、relevant species pages to show significant gains and losses of peptidase genes relative to related species.,The comprehensive microbial resource,The Comprehensive Microbial Resource or CMR (http:/cmr.jcvi.org) provides a web-based central resource for the display, search and analysis of the sequence
27、 and annotation for complete and publicly available bacterial and archaeal genomes. In addition to displaying the original annotation from GenBank, the CMR makes available secondary automated structural and functional annotation across all genomes to provide consistent data types necessary for effec
28、tive mining of genomic data. Precomputed homology searches are stored to allow meaningful genome comparisons. The CMR supplies users with over 50 different tools to utilize the sequence and annotation data across one or more of the 571 currently available genomes. At the gene level users can view th
29、e gene annotation and underlying evidence. Genome level information includes whole genome graphical displays, biochemical pathway maps and genome summary data. Comparative tools display analysis between genomes with homology and genome alignment tools, and searches across the accessions, annotation,
30、 and evidence assigned to all genes/genomes are available. The data and tools on the CMR aid genomic research and analysis, and the CMR is included in over 200 scientific publications. The code underlying the CMR website and the CMR database are freely available for download with no license restrict
31、ions.,MBGD update 2010: toward a comprehensive resource for exploring microbial genome diversity,The microbial genome database (MBGD) for comparative analysis is a platform for microbial comparative genomics based on automated ortholog group identification. A prominent feature of MBGD is that it all
32、ows users to create ortholog groups using a specified subgroup of organisms. The database is constantly updated and now contains almost 1000 genomes. To utilize the MBGD database as a comprehensive resource for investigating microbial genome diversity, we have developed the following advanced functi
33、onalities: (i) enhanced assignment of functional annotation, including external database links to each orthologous group, (ii) interface for choosing a set of genomes to compare based on phenotypic properties, (iii) the addition of more eukaryotic microbial genomes (fungi and protists) and some high
34、er eukaryotes as references and (iv) enhancement of the MyMBGD mode, which allows users to add their own genomes to MBGD and now accepts raw genomic sequences without any annotation (in such a case, it runs a gene-finding procedure before identifying the orthologs). Some analysis functions, such as
35、the function to find orthologs with similar phylogenetic patterns, have also been improved. MBGD is accessible at http:/mbgd.genome.ad.jp/.,phiSITE: database of gene regulation in bacteriophages,We have developed phiSITE, database of gene regulation in bacteriophages. To date it contains detailed in
36、formation about more than 700 experimentally confirmed or predicted regulatory elements (promoters, operators, terminators and attachment sites) from 32 bacteriophages belonging to Siphoviridae, Myoviridae and Podoviridae families. The database is manually curated, the data are collected mainly form
37、 scientific papers, cross-referenced with other database resources (EMBL, UniProt, NCBI taxonomy database, NCBI Genome, ICTVdb, PubMed Central) and stored in SQL based database system. The system provides full text search for regulatory elements, graphical visualization of phage genomes and several
38、export options. In addition, visualizations of gene regulatory networks for five phages (Bacillus phage GA-1, Enterobacteria phage lambda, Enterobacteria phage Mu, Enterobacteria phage P2 and Mycoplasma phage P1) have been defined and made available. The phiSITE is accessible at http:/www.phisite.or
39、g/.,The integrated microbial genomes system: an expanding comparative analysis resource,The integrated microbial genomes (IMG) system serves as a community resource for comparative analysis of publicly available genomes in a comprehensive integrated context. IMG contains both draft and complete micr
40、obial genomes integrated with other publicly available genomes from all three domains of life, together with a large number of plasmids and viruses. IMG provides tools and viewers for analyzing and reviewing the annotations of genes and genomes in a comparative context. Since its first release in 20
41、05, IMGs data content and analytical capabilities have been constantly expanded through regular releases. Several companion IMG systems have been set up in order to serve domain specific needs, such as expert review of genome annotations. IMG is available at http:/img.jgi.doe.gov.,MicrobesOnline: an
42、 integrated portal for comparative and functional genomics,Since 2003, MicrobesOnline (http:/www.microbesonline.org) has been providing a community resource for comparative and functional genome analysis. The portal includes over 1000 complete genomes of bacteria, archaea and fungi and thousands of
43、expression microarrays from diverse organisms ranging from model organisms such as Escherichia coli and Saccharomyces cerevisiae to environmental microbes such as Desulfovibrio vulgaris and Shewanella oneidensis. To assist in annotating genes and in reconstructing their evolutionary history, Microbe
44、sOnline includes a comparative genome browser based on phylogenetic trees for every gene family as well as a species tree. To identify co-regulated genes, MicrobesOnline can search for genes based on their expression profile, and provides tools for identifying regulatory motifs and seeing if they ar
45、e conserved. MicrobesOnline also includes fast phylogenetic profile searches, comparative views of metabolic pathways, operon predictions, a workbench for sequence analysis and integration with RegTransBase and other microbial genome resources. The next update of MicrobesOnline will contain signific
46、ant new functionality, including comparative analysis of metagenomic sequence data. Programmatic access to the database, along with source code and documentation, is available at http:/microbesonline.org/programmers.html.,The MiST2 database: a comprehensive genomics resource on microbial signal tran
47、sduction,The MiST2 database (http:/) identifies and catalogs the repertoire of signal transduction proteins in microbial genomes. Signal transduction systems regulate the majority of cellular activities including the metabolism, development, host-recognition, biofilm production, virulence, and antib
48、iotic resistance of human pathogens. Thus, knowledge of the proteins and interactions that comprise these communication networks is an essential component to furthering biomedical discovery. These are identified by searching protein sequences for specific domain profiles that implicate a protein in
49、signal transduction. Compared to the previous version of the database, MiST2 contains a host of new features and improvements including the following: draft genomes; extracytoplasmic function (ECF) sigma factor protein identification; enhanced classification of signaling proteins; novel, high-qualit
50、y domain models for identifying histidine kinases and response regulators; neighboring two-component genes; gene cart; better search capabilities; enhanced taxonomy browser; advanced genome browser; and a modern, biologist-friendly web interface. MiST2 currently contains 966 complete and 157 draft bacterial and archaeal genomes, which collectively contain more than 245 000 signal transduction proteins. The majority (66%) of these are one-component systems, followed by two-component proteins (26%), chemotaxis (6%), and finally ECF factors (2%).,