5 Million Records to Analyze?
Machine Learning to the Rescue!

Making sense of 50 years of BBS Data

First posted by Mike Fuller on October 5, 2017
Updated November 20

Machine Learning in Long-Term Research

Long term research in ecology has deepened our understanding of the processes that govern natural systems (and arguably changed the lives of researchers -- see my book review in Bioscience). It's no surprise that long term projects can generate large amounts of data. This is both boon and curse -- while it can reveal trends missed by short term studies, piles of data also present challenges for analysis. Here, I provide an example of how machine learning methods can supplement traditional approaches to analysis of long term data.

The ultimate goal of ecology is to understand the underlying causes of species abundance and distribution. But to do that, we must first separate random noise from process-driven patterns. Therein lies the potential of Machine Learning: a powerful tool for uncovering predictive patterns in data.

Ecological systems are complex, constantly shifting entities, where patterns arise from the interplay of thousands of variables. To make sense of this complexity, ecologists have adopted the divide and conquer approach of hypothesis testing. The world is divided into experimental groups, defined by the question at hand. Want to know if climate change induced drought is changing vegetation patterns? Set up plots in a natural landscape, and add water to some plots but not others. Are species ranges shifting in response to climate change? Track abundance over time, effectively creating a moving window of "before" and "after" subgroups. To extract meaningful information, we impose artificial order on othewise intractable chaos.

For this blog post, I consider the North American Breeding Bird Survey (BBS) program, which annually tracks the distribution and abundance of over 400 species of birds in the US and Canada. Inaugurated in 1966, the BBS has amassed over 5 million records on avian populations-- a trove of data which has been used to inform conservation practices and environmental policy. It is a famous example of citizen science, where members of the public contribute to large scientific projects. Anyone can join in to help identify and count birds for the BBS.

The Moving Frontier of Data Analysis

As is the tradition in ecology, analysis of BBS data has relied heavily on statistical models. Approaches have mirrored larger trends in ecology, beginning with relatively simple frequentist approaches in the 80s and 90s, and moving to more sophisticated Bayesian approaches in the 2000s and beyond. These methods have proven effective for identifying species declines, and landscape scale trends in avian community structure.

Machine Learning is an alternative approach to analysis that has become the de facto method in the world of Big Data. How does Machine Learning differ from standard statistical approaches? Is it suitable for ecological analysis? What can it tell us that traditional methods can't?

A Tool for Long-Term Research

The divide and conquer approach works when there are clearly defined subgroups, which is why most research builds on the results of past studies; having established predictable patterns, we can construct narrowly defined questions that are amenable to hypothesis testing.

By contrast, the questions posed by long-term research are often broadly defined. This is intentional, as we expect surprise when following a complex system through time. Long-term research is by nature exploratory, not confirmatory. What's more, compared to short-term studies, we expect long-term data to be messy (more variable); given enough time, species abundances will fluctuate wildly. It's precisely this greater complexity, and the desire to uncover the drivers of both short and long term patterns, that motivates long-term research.

Where Machine Learning can contribute to this endeavor is in its ability to:

  • Discover previously unknown patterns of association
  • Reveal or verify temporal or spatial trends
The key strength of Machine Learning is its ability to identify hidden structure in large, messy data sets. That can mean finding previously unknown relationships among entities, and assigning entities to subgroups based on a large number of variables, where the sheer number of variables poses a barrier to standard analytical methods.

Citizen Science = More Data!

Quoting the Oxford Dictionary, Citizen Science Association", defines CitSci as:

“scientific work undertaken by members of the general public."

Work in this context most often involves helping with data collection. By inviting the public to assist researchers in the field, CitSci accelerates the rate that data can be gathered: we get more data in less time. It also transforms the culture of science by diversifying what it means to be a scientist. After all, data collection is fundamental to scientific research, and anyone who contributes to the process becomes an integral member of the team, no?

CitSci transforms research in other ways, too, not all of which are well understood. For one thing, we can no longer rely on the expectation that field workers have been trained by years of formal preparation. Now, it's true that you don't need a degree to be an expert -- years of experience can be a valid form of education. But what separates a degree from self-gained knowledge is that with the former, skills and knowledge are formally tested and validated. A degree is an objective measure of experience that is arguably a more reliable metric than the statement "I've been birding all my life".

Citizen Science = Good Data?

Does the quality of data from CitSci projects differ from conventional research? This can be difficult to assess. For the BBS, having decades-worth of data yielded the statistical power required to distinguish annual variation from true population declines. But how reliable is that data? Recent studies indicate their may be problems. For example, studies have revealed declines in hearing ability over time, that affects the accuracy and completeness of species lists[1,2]. Understanding how differences among observers in birding skill influence estimates of bird abundance is crucial for drawing accurate conclusions about observed trends in bird populations.

Detecting Observer Bias

Before we can quantify differences in skill among observers, we need to separate observer effects from route effects. BBS counts are recorded for a combination of route + observer + year. We want to disentangle the route and year components of bias from the observer component. Which means we need data from multiple observers for a particular route.

It would be helpful if count data from two or more observers were recorded at the same time. Unfortunately, although simultaneous observers are permitted, the BBS database doesn't track their individual contributions -- only the sum total is recorded. As an alternative, to quantify observer differences, we can use single-observer records for a given route, and look for a bias between different observers that is consistent over time (i.e. using a repeated measures approach). Which means we need to find routes surveyed by different people over time. Where might that be?

Lonsome Cowbirdboys?
Continental Trends in Birder Density

We are more likely to find multi-observer routes in regions that support many BBS observers. From 1966 to 2015, over 7,000 people participated in at least one BBS survey in the US. The map below shows the distribution of these observers. As might be expected, participation in the BBS program tends to follow human population density, with a distinct descreasing trend in BBS participation away from the coasts, and from east to west.

map of US observer density

But human density is not the whole story. Evidently, birding is most popular in the northeast, and least popular in Nevada! The next step is to examine multi-observer routes for evidence of observer bias.

Can We Quantify Birding Skills?

Having found where to look for routes with multiple observers, we next require a measure of birding ability. Now, it's one thing to compare species counts of a given person over time[1,2]. But how do you quantify differences between people?

One approach is to estimate observer error rates for bird identification. For example, one study[3] asked birders to listen to recordings of bird calls and songs, recorded from BBS survey routes, and report all the species they heard. The study revealed an average error rate of 14 percent, with some participants missing nearly 40 percent of recorded calls and songs.

These results suggests that, given suitable data (recorded bird calls, in this case), we can indeed measure differences between birders. Unfortunately, audio recordings are not available for more than a handful of BBS routes. To understand the influence of observer bias more generally, we must find a different approach -- one that can exploit the vast base of records established by the BBS.

Shannon Entropy as a Metric
for Birding Ability

Here, I argue that Shannon Entropy provides an objective measure of birder ability. Most ecologists are familiar with Shannon Entropy (H) as an index of species diversity. H is calculated from the proportions of different species in a sample. It is widely used to compare different communities, or assess changes in community structure over time. But Shannon's original purpose for H was as a relative measure of the information content contained in a signal, and that makes it a great metric for comparing birder skills.

Think of the calls, songs, and sightings of the birds at a site as a signal, which encodes information about which birds are present. Now consider an observer as an imperfect signal recorder, whose error rate varies randomly from person to person (here, age could be consider a non-random covariate. We will get to that later). Just as with the audio-recordings study, two observers may generate different species lists based on the same signal.

Differences in error rates could be due to differences in hearing ability, knowledge of local species, or general identification skill. Weather conditions on a given day, such as noise from high winds or low light due to dark clouds, could also influence an observer's error rate on that day. But on average, if error rates differ among observers, their species lists will show consistent differences, too. Assuming each person's error rate (and how it changes with age or experience) is relatively constant, then on average, differences in H between observers who are working the same sites provide a good indication of relative birding skill.

Choosing a Machine Learning Approach

With our comparison metric in hand, it's time to choose a specific method for analysis. Machine Learning is not a single approach -- as a mature discipline, it encompasses many different methods. And as with conventional statistics, one often has a choice of several possible methods for a given data type and problem. Which one will we use? That will be the topic of the next installment!

More to follow ...


  1. Farmer et al. 2014. Observer aging and long-term avian survey data quality. Ecology and Evolution 4: 2563-2576.
  2. Kelling et al. 2015. Can Observation Skills of Citizen Scientists Be Estimated Using Species Accumulation Curves? PLoS One 10: e0139600.
  3. Campbell, M. and C.M. Francis. 2011. Using Stereo-Microphones To Evaluate Observer Variation In North American Breeding Bird Survey Point Counts. The Auk 128: 303-312.

image of grassland #2 Trips to fairly unknown regions should be made twice:
once to make mistakes and once to correct them.
- John Steinbeck.