Applied Biological Statistics with R

UNM Course BIOL 519.

Biostats for the 21st Century

Living systems are complex. Myriad interacting parts, transient and latent processes, and continuous adaptation to changing conditions occur at nearly every level of biological organization.

Complex biological systems yield complicated data, presenting challenges for data analysis. Often, the assumptions and requirements of standard statistical methods simply don't apply. Yet, until very recently, textbooks and courses on biological statistics have focused almost entirely on standard methods. This has led to misunderstanding and false confidence, and ultimately in some cases, to conflict between the goals and methods of analysis, and improper interpretation of results.

Biologists, faced with difficult data, have turned to Free and Open Source software as an aid to analysis. A virtually unlimited pool of developers gives FOSS the advantage over commercial software -- FOSS support for obscure and specialized methods is unparalleled. R in particular is popular among biologists. But as a programming language, R can be difficult for non-programmers to master. R was built by statisticians, who expected users to be well-versed in both stats and programming. Few biologists have those backgrounds, and learning R has become a common additional challenge to data analysis.

The aim of this course is to help rectify this situation by introducing students to statistical methods better suited to the kinds of analytical problems encountered by biologists, while also imparting essential skills for working with R.

Difficult analytical issues commonly encountered in biology include:

  • small sample sizes
  • huge sample sizes
  • missing data
  • unbalanced designs
  • spatially nested designs
  • repeated measures
  • non-constant error variance
  • non-Normal error distributions
  • too many zeros (e.g. species absence)
  • random covariates
  • multiple uncontrolled or unquantified confounding variables.

The Applied Biostatistics course will zero-in on recent advances in statistical modeling, while addressing common errors in the application of statistics by biologists. To achieve this goal, the course will build on material typical of introductory statistics courses. For this reason, participation will be limited to students who have completed an introductory statistics course.

Course Requirements

(To receive instructor approval)

  1. Have completed one or more college courses in introductory statistics
  2. Demonstrate equivalent knowledge of an introductory statistics course
  3. Provide your UNM ID

To request instructor approval, click the "Send Permission Request" link below, which will open an email message to Professor Fuller. Include information about your Course Requirements and your UNM ID in your message.

Send Permission Request

Course Topics

The course is designed to appeal to students in evolutionary biology, molecular biology, and ecology. For the Fall 2016 edition of this course, I intend to cover the following topics. This is a preliminary list that may change.

Statistical Topics
  • Experimental Design
  • Linear Mixed Models (models with fixed and random factors)
  • Logistic Regression (Generalized Models)
  • Smoothing methods (e.g. splines)
  • Classification and Clustering methods
  • Model validation and testing
  • Variance structures for problematic data
  • Variance partitioning
R Topics
  • Computer programming fundamentals
  • R language essentials
  • Data management
  • R session management
  • Analysis methods in R that address the above statistical topics
Data Analysis Topics
  • Essential philosophy of statistical modeling
  • Critical assumptions of data analysis
  • Major paradigms of statistics and modeling
  • Common mistakes, and how to avoid them
  • P-Hacking and Researcher Degrees of Freedom

This is a Hands On Course:
(You're gonna need a laptop)

To optimize the learning of this technical subject, I intend to provide separate sections for lecture and lab. Generally, the lecture will focus on concepts and the lab will focus on practical skills. However, you will not be sitting passively during lecture. Lecture sessions will include examples in R, which you will be asked to work through during class. This means you will need access to a laptop for both lecture and lab.