Michael Fuller

Graph Theoretic Approach to Quantifying Community Structure

For my Ph.D. dissertation, I developed a graph theoretic approach to link spatial variation at small scales to the structure of competitive relationships at the community scale. I constructed species networks based on the empirical spatial relationships of species in a tropical dry forest, and compared them to networks in which species were randomly distributed in space.

To better understand how variation in the physical interactions among trees in neighborhoods influences neighborhood composition, and how this relationship is influenced by body size, I analyzed the effect of increasing crown overlap on the composition of neighborhoods representing trees grouped in four different body size classes. To determine whether species niche differences influenced the spatial relationships of species, I analyzed the effect of increasing crown overlap on the randomness of the networks.

The graph at right depicts the nearest neighbor structure of 40cm diameter trees in the San Emilio forest. Circles (nodes) represent species, coded by a number. Lines (edges) link two species if they occur at least once as nearest neighbors.

tree network figure

Measures of Network Topology

I used five measures of graph topology to quantify the effects of body size on the probability that two species were found together in a neighborhood. I calculated the size (number of nodes) of the largest connected subnetwork, the network order (number of edges), the node degree distribution, the characteristic path length (CPL), and the average clustering coefficient (CC). For each dbh class, I used undirected graphs to calculate the degree of each node and the value of CC. I calculated CPL from directed graphs in which reference trees were sources and neighbors were targets. A subnetwork is a graph that represents a subset of the nodes and edges of a larger graph. A connected subnetwork is a subnetwork in which a path (a sequence of edges) can be traced from each node to every other node in the subnetwork. In other words, none of the nodes in a connected subnetwork have zero degree.

Node Degree Distribution

The node degree distribution characterizes the likelihood that a randomly chosen node will have a degree of a given size. In a species network, node degree represents the total number of heterospecifics that are neighbors of each reference species. In a randomly assembled community, the number of edges a node has is proportional to the relative abundance of the species it represents and the relative abundance of the other species in the community. As can be seen in the figure, I found no consistent differences between the observed (light bars) and random (dark bars) species networks. Click figure for larger view.

node degree figure

Clustering Coefficient

If every species in the neighborhood of a particular reference species also occurs in the neighborhoods defined by the reference trees of the other species in that neighborhood, then the species form a clique (second figure). This could occur, for example, if the species are restricted to a specific soil type. The prevalence of cliques in a network is quantified by the clustering coefficient, CC, which is the average of the clustering coefficients of each node, (CCi):

clustering coeff equation

where Ei is the number of undirected edges connected to node i, and the denominator shows the maximum possible number of edges between all k nodes connected to i. The maximum possible value of CC = 1.0, which occurs when every node in a network is connected to every other node. However, in networks that represent the spatial relationships of species in communities, uneven species relative abundance and non-random factors can constrain the maximum attainable value of CC to < 1.0 in the same way that it constrains the mean node degree. CC is generally proportional to mean node degree. Our analysis revealed the observed species networks (closed circles) to have much higher clustering relative to random networks (open squares), a difference that increased with tree size. Vertical bars = 95% confidence limits. Asterisks indicate level of statistical significance (*=0.05, ** = 0.01, ** = 0.001). Click figure for larger view.

clustering coeff. figure

Characteristic Path Length

Path length refers to the distance between two nodes in a graph, measured by the number of edges that separate them. The characteristic path length, CPL, of a graph is the average of the smallest number of edges that separate each node from every other node. I used the breadth-first search algorithm to calculate CPL. This algorithm requires edges to be equally weighted (i.e. have equal length). Therefore, I set all edges to unit length in the species networks. CPL is generally inversely proportional to mean node degree. The differences between observed (circles) and random (squares) networks in CPL were less substantial, but also increased with increasing tree size. Vertical bars = 95% confidence limits. Asterisks indicate level of statistical significance (*=0.05, ** = 0.01, ** = 0.001). Click figure for larger view.

path length figure