PROTEOMIC DATABASES
 
 
Proteome (Proteome Databases): Caenorhabditis elegans (WormPD) and Saccharomyces cerevisiae (YPD), and S. pombe. Lots of information on protein expression, function, homologs, etc.
 
PDB (Protein Data Bank): 3-D structure of proteins, nucleic acids and some other biological molecules
 
PIR (Protein Information Resources): supports research on molecular evolution, functional genomics, and computational biology by providing an integrated system of protein sequence databases, derived related databases, and access facilities.
 
MIPS: Munich Infomation Site on Protein Sequences - Protein extraction, description, and analysis tools at MIPS.
 
Protein analysis:
 
BLOCKS
http://www.blocks.fhcrc.org/
Blocks are multiply aligned ungapped segments corresponding to the most highly conserved regions of proteins.
The blocks for the Blocks Database are made automatically by looking for the most highly conserved regions in groups of proteins documented in the PrositeDatabase.
EPD - Eukaryotic Promoter Database, Current release 63
http://www.epd.isb-sib.ch/
The Eukaryotic Promoter Database is an annotated non-redundant collection of eukaryotic POL II promoters, for which the transcription start site has been determined experimentally.
ENZYME - Enzyme nomenclature database
http://www.expasy.ch/enzyme/
ENZYME is a repository of information relative to the nomenclature of enzymes. It is primarily based on the recommendations of the Nomenclature
Committee of the International Union of Biochemistry and Molecular Biology (IUBMB) and it describes each type of characterized enzyme for which an EC
(Enzyme Commission) number has been provided.
GeneCards
http://bioinformatics.weizmann.ac.il/cards/
GeneCards is a database of human genes, their products and their involvement in diseases. It offers concise information about the functions of all human genes that have an approved symbol, as well as selected others.
KEGG: Kyoto Encyclopedia of Genes and Genomes
http://www.genome.ad.jp/kegg/
Kyoto Encyclopedia of Genes and Genomes (KEGG) is an effort to computerize current knowledge of molecular and cellular biology in terms of the information pathways that consist of interacting molecules or genes and to provide links from the genecatalogs produced by genome sequencing projects.
KEGG consists of the following five types of data:
Pathway maps - represented by graphical diagrams
Ortholog group tables - represented by HTML tables
Molecular catalogs - represented by HTML tables or hierarchical texts
Genome maps - represented by Java graphics
Gene catalogs - represented by hierarchical texts
Library of Protein Family Cores
http://www-camis.stanford.edu/projects/helix/LPFC/
We have taken structural alignments of protein families and computed average core structures for each family. The core structures can be divided into residues with low spatial variation and those with high spatial variation. Amino acids with low spatial variance occupy essentially the same relative position in all family members. This library is useful for building models, threading, and exploratory analysis. It is also a useful mechanism for summarizing variability in NMR structures.
Pfam
http://www.sanger.ac.uk/Software/Pfam/
Pfam is a collection of protein families and domains. Pfam contains multiple protein alignments and profile-HMMs of these families. Pfam is a semi-automatic protein family database, which aims to be comprehensive as well as accurate.
PRINTS - PROTEIN FINGERPRINT DATABASE
http://bioinf.man.ac.uk/dbbrowser/PRINTS/
PRINTS is a compendium of protein fingerprints. A fingerprint is a group of conserved motifs used to characterise a protein family; its diagnostic power is refined by iterative scanning of a SWISS-PROT/TrEMBL composite. Usually the motifs do not overlap, but are separated along a sequence, though they may be contiguous in 3D-space. Fingerprints can encode protein folds and functionalities more flexibly and powerfully than can single motifs, full diagnostic potency deriving from the mutual context provided by motif neighbours.
Protein Data Bank - PDB
http://pdb-browsers.ebi.ac.uk//
http://www.rcsb.org/pdb/index.html
PDB, the single international repository for the processing and distribution of 3-D macromolecular structure data primarily determined experimentally by X-ray crystallography and NMR.
 
InterPro - Integrated Resource of Protein Domains and Functional Sites
http://www.ebi.ac.uk/interpro/databases.html
nterPro release 1.2 (June 2000) was built from Pfam 5.2, PRINTS 26.1, PROSITE 16, ProDom 2000.1 and the current SWISS-PROT + TrEMBL data. This release of InterPro contains 3052 entries, representing 574 domains, 2418 families, 46 repeats and 14 post-translational modification sites. InterPro is a useful resource for whole genome analysis and has already been used for the proteome analysis of a number of completely sequenced organisms. A preliminary proteome analysis was also produced for the human genome.
S. cerevisiae functional analysis: Eisenberg, UCLA
SWISS-PROT - Annotated protein sequence database
TrEMBL - Computer-annotated supplement to SWISS-PROT
http://expasy.hcuge.ch/sprot/sprot-top.html
SWISS-PROT is a curated protein sequence database which strives to provide a high level of annotations (such as the description of the function of a protein, its domains structure, post-translational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other databases.
TrEMBL is a computer-annotated supplement of SWISS-PROT that contains all the translations of EMBL nucleotide sequence entries not yet integrated in SWISS-PROT.
TRANSFAC - The Transcription Factor Database
http://transfac.gbf.de/TRANSFAC/
http://www.cbi.pku.edu.cn/TRANSFAC/
http://www.hgmp.mrc.ac.uk/Bioinformatics/Databases/transfac-help.html
TRANSFAC is a database on eukaryotic cis-acting regulatory DNA elements and trans-acting factors. It covers the whole range from yeast to human. The TRANSFAC database is a database of TRANScription regulatory FACtors and is maintained at the GBF Braunschweig It combines data about the transcription factors and their DNA binding sites with additional important information (e.g. the sources of the factors, systematic classification of transcription factors) All experimental data have been extracted from literature.These data are accessible through two main tables, the FACTORS and the SITES table. While the first table holds data about the binding proteins, the second holds the data about the DNA sequences that are recognized by these proteins. Besides these experimental data, TRANSFAC comprises also information derived from them. As many transcription factors can be classified by their DNA binding domains and/or their dimerization domains we introduced the CLASS table to TRANSFAC. We also prepared a GENES table, which contains data about the according genes and their promoters/enhancers (Knueppel et al.) and which will be part of the ASCII flatfile version in future.
Transpath - Signal Transduction Browser
http://193.175.244.148/
The database on gene-regulatory pathways.
SCOP - Structural Classification of Proteins
http://pdb.weizmann.ac.il/scop/
The scop database aims to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known, including all entries in Brookhaven National Laboratory's Protein Data Bank (PDB). It is available as a set of tightly linked hypertext documents which make the large database comprehensible and accessible. In addition, the hypertext pages offer a panoply of representations of proteins, including links to PDB entries, sequences, references, images and interactive display systems.
CATH - Protein Structure Classification
http://www.biochem.ucl.ac.uk/bsm/cath/
The CATH database is a hierarchical domain classification of protein structures in the Brookhaven protein databank. All non-protein, model, and "C-alpha only" structures are not classified in CATH. Only crystal structures solved to resolution better than 3.0 angstroms are considered, together with NMR structures.
FSSP - Fold classification based on Structure-Structure alignment of Proteins
http://www2.ebi.ac.uk/dali/fssp/fssp.html
The FSSP database is based on exhaustive all-against-all 3D structure comparison of protein structures currently in the Protein Data Bank (PDB). The classification and alignments are automatically maintained and continuously updated using the Dali search engine.
 
3 Dee - Database of Protein Domain Definitions
http://jura.ebi.ac.uk:8080/3Dee/help/help_intro.html
3Dee contains structural domain definitions for all protein chains in the Brookhaven Protein Databank (PDB) that have 20 or more residues and are not theoretical models [listed here]. In addition, the domains have been clusterd on sequence similarity and structural similarity. The resulting families are stored as a hierarchy.
PRESAGE
http://presage.berkeley.edu/
PRESAGE is a collaborative resource for structural genomics. It provides a database of proteins, each of, which has a collection of annotations reflecting current experimental status, structural assignments models, and suggestions. PRESAGE is a tool for scientists to keep track of structural knowledge of their proteins of interest
GeneCensus Genome Comparisons
http://bioinfo.mbb.yale.edu/genome/
GeneCensus is intended to give a comprehensive statistical accounting of protein features, particularly structural ones, in genomes -- in the sense of a demographic census.
TRRD - Transcription Regulatory Region Database
http://wwwmgs.bionet.nsc.ru/mgs/dbases/trrd4/
The Transcription Regulatory Regions Database (TRRD 4.x) collects information on structural and functional organisation of transcription regulatory regions of eukaryotic genes. The hierarchical organisation of transcription regulatory regions of eukaryotic genomes is put into the database schema. It includes the following information: transcription factor binding sites eukaryotic gene promoters, enhancers transcription regulatory regions gene expression regulation.
TargetDB
http://molbio.nmsu.edu:81/
a database of peptides targeting proteins to cellular locations.
Metabolic Pathways of Biochemistry
http://www.media.gwu.edu/~mpb/index.html
This site is designed to graphically represent all major metabolic pathways, primarily those important to human biochemistry.
 
COMPEL
http://compel.bionet.nsc.ru/
COMPEL collects information about composite regulatory elements (CEs) - pairs of closely situated sites and transcription factors binding to them. We define a composite element as a minimal functional unit within that both protein-DNA and protein-protein interactions contribute to a highly specific pattern of gene transcriptional regulation. The factors that cooperate at an individual CE mostly belong to different classes with respect to the structure of protein domains, namely DNA-binding and activation domain. The factors also differ in their functional properties: cell-specificity, inducibility and othres. Thus, composite regulatory elements contribute to the one of the fundumental principles of genom functioning - combinatorial nature of gene transcriptional regulation.
 
RegulonDB: a database on transcriptional regulation in Escherichia coli
http://tula.cifn.unam.mx:8850/regulondb/regulon_intro.framese
RegulonDB is a database on transcription regulation and operon organization in Escherichia coli. It describes regulatory signals of transcription initiation, promoters, regulatory binding sites of specific regulators, ribosome binding sites and terminators, as well as information on genes clustered in operons. These specific annotations have been gathered from a constant search in the literature, as well as based on computational sequence predictions. The genomic coordinates of all these objects in the E. Coli K-12 chromosome are clearly indicated. Every known object has a link to at least one MEDLINE reference.