Genomes and Genomic Analysis Notes: Week 1, Fall 2008 © MWW 2008 (updated 1/27/2008)
- Grading: 1/3 quizzes and final; 1/3 paper , 1/3 in-class participation (this will also include office hours - which are mandatory for people taking 544 and strongly suggested for 444 students. This is where you will begin to integrate genomics into how you think about biology.
- Make sure you have a GMail account. How should we do this for each weeks homework? - add on to the bottom? Is there a way to bookmark? I am open to suggestions.
We have a book - which we will use this semester. I've taught a form of this class for about 9 years and really haven't ever found a book that is great. Genomics is in such a developing and innovative state that in the time it takes to get a book out, many things have changed significantly. However, I think having some text that you can go to for insight is helpful, so we'll try once more with this book. If it doesn't work, maybe we should think about writing a book together. I think the Google docs and other innovations might make this a better way for people to do this. If you look at the webpage for our text, you can see the presentation is really not great - at least that's my opinion.
Overview of the issues:
Genomics is a sea change in biology that started around 1988 and took off after 1996. Before genomics, we could only study one or a few genes or proteins or mRNAs at a time. It was not that more classical approaches weren't powerful - they are - but the ability to ask questions about all the genes, proteins, transcripts, and metabolites at one time provides a capability for integration that has never been possible before.
Imagine a cell: a cell is capable of replicating itself, there are many levels of organization and they all have to be integrated. How does a cell respond to nutrients or stress, what are the programs for this and for development and how do they interact?
The questions we want to eventually ask in Genomics are very broad. We want to know how evolution happens, how cells work, how communities communicate, and, of course, what is the genomic basis of health and disease. Right now, we are a bit like Christopher Columbus, sailing off in his ships. We talk a lot about emergent properties in the data - things we wouldn't have thought about before getting and analyzing these new levels of data. For Columbus, the emergent property he discovered was that the earth was round and that another continent existed. Anyway, because of this aspect of genomics, we will also be working on ramping up our imaginations, increasing our logical approaches to data we haven't seen before, and identifying the interdisciplinary interactions needed to discover the America of genomics.
The difference between genetics and genomics is analogous to the difference between writing a single letter or having the internet - maybe even greater. The genomic age started sometime between 1988 and 1995 - and is continuing to grow.. Classes in genomics are fairly recent and difficult because the literature is growing so fast.
There are three parts to this class:
1) an overview of genomics - with a focus on yeast and C. elegans genomics
2) whole genome analysis, including rapid changes happening as the result of high throughput sequencing advances
3) functional genomics and systems biology, including studies of the transcriptome and proteome and computational and statistical approaches used to identify emergent properties in the data.
At the end of the semester, I want you to have more than just lists in our heads and papers we have read.
You should have a larger, integrated picture of this field: what approaches are used in genomics, what kind of information each type of analysis gives, what are the limitations and potential of each approach, and what are the neatest questions we can think of and what do we need to answer them. We will do this primarily with yeast and C. elegans, but analogous developments are occurring in all model organisms - and surprisingly rapidly in humans.
Does Genomics come in different flavors?
Let's start this process by talking about cells and how they work.
Cellular organization:
Levels of organization in cells: genes -> RNA -> protein -> structures -> interactions -> localization -> modifications -> etc. -> metabolites. Can you think of others?
All of these levels of organization have been given "ome" names!!
Then there are larger levels of organization and developmental/temporal changes. What are they?
Ways to look at genome organization:
(a) the physical genome: the full DNA sequence; the map of genes, regulatory regions, and non-coding regions. The narrowest definition of genomics is the study of whole genome (DNA) sequence.
(b) the functional genome: an understanding of what the genes and gene products do
(c) the population genome: variation of genes in populations, including humans (This is becoming more important as we get more sequence);
(d ) the comparative genome: the comparison of the human genome with other, less-related genomes;
(e) the integrative genome: the functional interaction of genes and gene products between genomes - metagenomics
"genomics is both the science of understanding the structure and evolution of genomes and a tool for learning about the functions of the genes therein. Genetics and genomics differ in scale and focus. Genetics uses mutants to identify and study the few genes that control a particular phenotype, whereas genomics collects data on all the genes in an organism. I have taken genomics to include all methods that collect and analyze comprehensive data about genes, including the sequence and abundance of nucleic acids and the properties of the proteins they encode (often called proteomics)."
- What is a genome? Glossary of genetic terms (NHGRI)
- What does a genome look like? Genome sizes; Comprehensive Microbial Resource
- Genomics is a web-based science, it is data driven - databases are fundamental to what we do. It was the driving force for open source journals.
Genomics journals:
- Genome Biology (great, web-based journal)
- Nature Genome Gateway (papers, etc.)
- Science Functional Genomics Page (useful links, papers, etc.)
History of getting to genomics
Historical development of genomics (not exhaustive)
Elucidation of heredity - one of the most important advances in Biology in the past 1000 years. On the practical side, many indigenous peoples understood the importance of heredity - e.g. taro and banana species, chili, and grains.
1900-1910 advances in genetics and identification of chromosomes and the development of biochemistry. This marriage of biology and chemistry sought to understand life by isolating molecules and reconstituting living processes in the nonliving extracts prepared from cells.
1960 DNA, RNA, protein, structure of a gene, promoter, etc.
1970 to 1980 - increased ability to study DNA including identifying restriction enzymes, DNA sequencing improvements, ability to clone genes, ability to knockout genes (homologous recombination), etc.
The study of evolution was revolutionized by this sequence analysis. As more sequence became available, connections became increasingly apparent - oncogenes and growth regulators; phylogenetic conservation, and recognition of gene families.
1988, The Human Genome Project - started by DOE included Mendelian analysis (phenotypes), physical mapping, and sequencing. It seemed impossible - 3 billion (3 x 109) base pairs when one could get only 300 bases per sequencing run.
David Botstein and colleagues proposed in 1980 that one could construct a complete genetic map of the human chromosomes by following the inheritance of common DNA sequence variations, termed DNA polymorphisms (16). Single-nucleotide polymorphisms are now a common way to approach this.
Development of the concept of "model organism" - supported by the ability to move functionally conserved genes/proteins from organism to organism. Needed non-mammalian models because mammalian genomes were so large, problematic, and the cells themselves are not so tractable by molecular genetics.
Yeast. (Saccharomyces cerevisiae) was the first eucaryote to be sequenced (1996), H. influenzae, was the first whole-organism sequencing project and was done in 1995. : Methanococcus was the first archaeal species to be sequenced. Next: C. elegans, Drosophila, mouse, E.coli, zebrafish, Arabidopsis, etc.
- MODEL ORGANISMS ARE THE ROSETTA STONES FOR DECIPHERING BIOLOGICAL SYSTEMS (What does this mean?)
- At first, these were organisms that were tractable by molecular genetic techniques, had been well studied, and were distributed across the tree of life.
Now have begun to open it up to a large number of different organisms - with smaller numbers of researchers/organism.
But, sequence is not enough: A vision of what to do - Systems Biology (Lee Hood)
- Systems biology systematically perturbes biological systems (biologically, genetically, or chemically); monitoring the gene, protein, and informational pathway responses; integrating these data; and ultimately, formulating mathematical models that describe the structure of the system and its response to individual perturbations.
- Hypothesis (classical science) vs. discovery (genomics).
- Experimentalists and modelers - working together
- Biology plus computer science, engineering, etc.
- Ultimate goal is to define parts, how they interact, and model a whole organism and, eventually, communities, etc.
- "All of this information is hierarchical in nature:DNA->mRNA->protein->protein interactions->informational pathways->informational networks->cells!tissues or networks of cells -> an organism -> populations-> ecologies". Is this right or is it still too simple?
Biological information has several important features:
- It operates on multiple hierarchical levels of organization.
- It is processed in complex networks
- These information networks are typically robust, such that many single perturbations will not greatly effect them
- There are key nodes in the network where perturbations may have profound effects; these offer powerful targets for the understanding and manipulation of the system.
What is needed to do systems biology?
1. Genetics parts list - sequence motifs, promoters, other sequence and structural information. SNPs (single nucleotide polymorphisms) and other mutations. With these components now in hand, the immediate challenge is to place them in the context of their informational pathways and networks.
2. HIGH-THROUGHPUT: Genome-scale technologies that allow you to monitor each level of organization - each "ome"
DNA sequencing - the $1000 genome, SNPs, etc.
robotics, "hands off" tools
two-hybrid analysis : protein-protein interactions
Mass spectrometry, other inventions......
3. SYSTEMATIC GENE MUTATIONS
Gene knockouts and knockdowns - PERTURBATIONS , e.g. deletion of all yeast genes, experiments to find the minimal genome, RNAi experiments with C. elegans.
4. Computational biology/Bioinformatics-Databases
- GenomeWeb Genome Databases: The GenomeWeb is the authoritative collection of the best genome related sites on the Web. It is up to date, relevant, fully searchable and extensive. You can find the databases like Magpie, etc. here, but it might take some time. This is really a Database of Databases.
- ENTREZ-Genome: NCBI site for searching - pubmed, genomes, proteins, etc.
- SGD ( Saccharomyces Genome Database): scientific database of the molecular biology and genetics of the yeast Saccharomyces cerevisiae at Stanford Genomic Resources
- GOLD (Genomes OnLine Database): a WWW resource for comprehensive access to information regarding complete and ongoing genome projects around the world. It provides the most detailed and accurate monitoring of genome sequencing projects. Its genomic links
- Science: What makes a great genome viewer: NAR 2004 database issue; NAR 2003 database issue
- PROTEOME
Computational biology - what else is in the data?
CS Branches:
•Probability and statistics–Data mining and pattern analysis
•Algorithms and data structures–Combinatorial optimization algorithms
–Graph theory
•Computer graphics–High-throughput data visualization
•Databases and systems management–High-throughput data storage and access
Where are we going?
- What happens to genomes over time and what are the consequences? How does the genome direct the function and development of a cell or organism? What kind of hypotheses do genomic data lead to?
- Predicting protein function, identifying new functions, regulation, interactions, emergent properties
- Is there a core set of genes for each organism? TIGR minimal genome How many genes are there on earth? Other great questions?
Leading to "modular" and systems biology approaches.
Synthetic biology - goal is to make new microbial workhorses
Whither Genomics- Andrew Murray
"Functional modules are biological pathways: collections of molecules, both large and small, that co-operate to perform a given function, such as protein synthesis, signal transduction, or the biosynthesis of small molecules.
We want to know what the properties of a module are (such as the detailed, quantitative correlation between its inputs and outputs), how its parts are chemically and structurally connected to each other, how these connections explain the properties, and how different modules are connected to or insulated from each other."
Ross Overbeek, Genomics: What is realistically achievable? (Moore's law for sequencing.)"Most readers will be aware that the amount of actual sequence data available to the research community has doubled roughly every 1.5 years for at least the last decade. This is completely analogous to the doubling of microprocessor speed every 1.5 years in the computing community. The question in such situations is: how long can such growth be sustained?There are actually three closely related 'laws' that should be considered: (1) the amount of available DNA sequence data will double every 18 months; (2) the number of available genomes will double every 18 months; (3) the cost of sequence will drop by a factor of 2 every 18 months."
What hasn't been mentioned:
- Ethics , NIH ethics; The role of business vs. federal investment
Training needs
- Genome technology/genome web site: deals, acquisitions, etc.
Return to MWW homepage