Chapter 2
Whole genome sequencing
Sequence Similarity Searching:
Why is sequence useful?
If you work with one or a few proteins or genes, it can tell you about their conservation, active sites, structure and regulation in other organisms, etc.
If you do genomics:
How is sequencing done?
PCR-based sequencing also called di-deoxy sequencing
- fragment length is an issue
- algorithms for assembly
- different approaches to scaffolds
- challenges of repeat sequences
- telomeres and centromeres
- what is "finished" sequence?
- cost has gone from $10 per base in 1085 to 0.1 cent per base in 2006.
New types of sequencing being developed all the time - want to get to $1,000 Genome (we'll talk about this next time) - what is the cost per base of this for a human genome?
What do you do with sequence? If you are just looking at one or a few genes, there are global (Smith-Waterman) and local alignments.
Global versus local alignments
- Dot plots (Have been doing this for 30 years!)
- Used for analysis of gene structure and genome organization, detection of internal sequence repeats, RNA folding, molecular evolution
- Dots are placed at the intersection of each row and column where the bases or amino acids are identical
- Sequence similarity searching algorithms have resulted from a dialectic process of iterative improvement and refinement.
BLAST tutorial from Geospiza and Geospiza tutorial site (these are useful to look at)
BLAST - reading frames, what do you start with, why would you have vector contamination? (What do molecular biologists mean by the word "vector"?)
BLAST overview: Basic Local Alignment Search Tool
Why do BLAST searches?
- Introduction to searches at NCBI tutorials for queries, including PSI-BLAST
- Definitions What is similarity searching?
- BLAST tutorial What do the parameters mean?
- BLAST course- NCBI
- BLAST guide - helps plan your experiment
Program Description blastp Compares an amino acid query sequence against a protein sequence database. blastn Compares a nucleotide query sequence against a nucleotide sequence database. blastx Compares a nucleotide query sequence translated in all reading frames against a protein sequence database. You could use this option to find potential translation products of an unknown nucleotide sequence. tblastn Compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames. tblastx Compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database. Please note that the tblastx program cannot be used with the nr database on the BLAST Web page because it is computationally intensive.
How do we decide the score for substitutions? Scoring matrices - to objectify analysis.
|
K |
A |
L |
M |
R |
PAM120 |
|
V |
A |
K |
N |
S |
|
|
-4 |
3 |
-4 |
-3 |
-1 |
-9 |
Evolutionary model schemes (simple mutation matrix - Yale)
Chemical similarity models (Yale)
Additive matrix - 1 point for match - see also Matrix Discussion, written by a student last semester
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
E = mn2-s where m is the length of the query and n is the effective length of the database and s is the bit score - a measure of identity.
Whole genome sequencing, assembly, and annotation Powerpoint
Science Breakthrough of 2007 Human Genetic Variation: We will watch this video ~ 15 minutes.
Here's a link to the Encode project.
Genomics has been part of many of the Science Breakthroughs in the past 10 years.
HOMEWORK Go to SGD, to ADY2, SNZ1, and TOR1.
End of homework
© MWW 2008