ORF finding - the value of cDNA libraries and good ORF-finding tools- including BLAST

Gene structure

 

Organism


Genome size # of genes

Genetic unit
(in megabases)

(average of gene size)

Prokaryota:      
Mycoplasma genitalium
0.58 473 1235 bp

 

Haemophilus influenzae

1.8 1,709 1042 bp
Escherichia coli 4.6 4,288  
Myxococcus snathus 9.5 8,000  
Archea:      
Methanococcus jannaschii 1.7 1,738  
Eukaryota:      
Saccharomyces cerevisiae 1.3 6,241 2,100 bp
Neurospora crassa 42.9 10,000 - 13,000 3,000 - 4,000 bp
Drosophila melanogaster 165 13,601 10,000 bp
Caenorhabditis elegans 100 18,424  
Homo sapiens   2,910 30,000 - 40,000
Arabidopsis thaliana 125 25,498  

ORF finding:

Gene-finding algorithms

Using cDNAs to identify genes:

UniGene - NCBI

TIGR Human Gene Index (includes Tentative Human Consensus (THC) sequences, assembled using the TIGR assembler

TIGR EGAD - Expressed Gene Anatomy Database