Point Location Data

My previous discussions of range maps (see part 1 here) focused on areal (or linear) representations of where different species might be found. Another type of location data has been part of the site for awhile but was never explicitly compiled into a useful form…until now.

Individual records of species usages (such as those found on the citation pages) often list the particular location to which the name applies. This location can be fairly specific (e.g., a particular river mouth) or extremely broad (e.g., a country or ocean basin). These location names are stored as part of the database; a number of years ago I decided to standardize the names to allow for better analysis and consistency. As much as possible, the names found on the website reflect the current name of the location, rather than the historical name that might have appeared in the original publication, although the historical names are kept as synonyms (invisible until now).

Beyond just standardizing the names, every location was given a latitudinal and longitudinal representation. These coordinates lack precision, but represent a quick-and-dirty estimate of what the location name might represent. For a broad area, the coordinates usually match a major coastal city or the rough center of the coastal area; for smaller islands, the center of the island might have been used.

I’ve been using the coordinates for a few years to help identify misidentified records and/or errors in the database. By plotting the locations associated with different names, it is fairly easy to find major outliers; these can then be examined to determine if the error was a recording error on my part or a (potential) error made by the original authors.

In this new release, the point location data is now public and is used in a variety of different ways and places on the site.

Species Pages

The species pages (as well as the general geography page) now show a point location map in addition to the traditional range map. These point location maps represent all of the places where the database believes a particular species is actually found (as opposed to where an author said it was found). These should generally match the range maps, although there are certainly cases where they diverge; at some point these differences will be used to update/fix the range maps and/or the underlying point-species references. Additionally, the point maps are only up-to-date based on citation records added to the database. Aberrant locations where no fiddler crabs are found are indicated by a different color symbol.

Every recorded point location of a fiddler crab in the database. Blue points represent clearly erroneous entries where no fiddler crabs are actually found. The point in central northern Africa (border of Libya and Niger) was a lat/lon error in the database which has already been corrected.

Name Pages

The name pages (both compound and specific name) now show a point map indicating all of the places where a particular name was used, if any (some names are not associated with any specific location so maps are left off of those pages). This is different than the species pages, because this indicates author usage rather than the algorithmicaly and/or expertly adjusted species meanings found on the species pages.

Location Pages

Every location has its own page as well. Each location page includes the latitude and longitude used to indicate its position (as well as a map of the point), cross-references to locations that contain or are contained within the location based on the hierarchy mentioned below, lists of currently recognized species found within the location, and lists of names (both compound and specific) which have been used for species at the location.

The species and name lists automatically fill in missing names by including all names found at any sublocation of the hierarchy (names inferred as such are indicated by a symbol). This should allow one to more readily access complete lists of names or species for an area without having to worry about the vagaries of historical records.

Location pages can be accessed in two ways. First, from the general geography page, there are now a pair of indices of all locations. The first index represents a rough hierarchical sorting of locations, lumping sub-locations into larger locations (e.g., a city within a state within a country). The second index is a pure alphabetical listing of all locations, including the historical names, not just the modern names. Second, all location pages can also be accessed from any page/record which mentions that location; the locations in the tabular records automatically link to the associated location page.

Problems and Limitations

These location data are not perfect. Some known problems include:

  • Species may be missing from locations just due to holes in observations. For example, Uca pugilator is not recorded for Deleware, even though it falls right in the middle of the known range of the species and is found in neighboring states.
  • Less precise records may have a point location which seems to indicate the presence of a species well outside its actual range. Thus a species recorded from “Central America” might use a point location to indicate this area well outside the actual range of the species in question if it is limited to only one end of the region.
  • The hierarchy is imperfect in a number of ways. This limits how readily it can combine and infer records from subregions into large regions. Some of this can be fixed and updated over time, but other parts may be very difficult to fix, or impossible to put into a strict hierarchy. For example, take the country of Panama. We can combine all of the subrecords of Panama under the country entry to indicate all of the species found in Panama. However, once we do that, we cannot readily also combine the coasts of Panama into the individual Atlantic or Pacific regions which have separate species assemblages. Under the hierarchical structure, we either have to keep the two coasts of Panama separate or we have to abandon the Atlantic and Pacific basins as part of the natural hierarchy (for now I’ve done the latter).
  • Most of the Google maps of individual point locations default to a higher zoom than one would prefer (essentially, the begin maximally zoomed). There’s an easier, imperfect solution to this and a tedious, better solution. I’m working on the latter, but it’ll be awhile before the default zoom will be fixed for all locations.

All of the point location data in the database are directly based on what has been reported in publications. There are other potential sources of location records that might be integrated at some point, including more formal museum specimen records such as those found in GBIF or more informal, citizen-science records such as those from iNaturalist (although I note that at least some of the latter records are included in the former). For now I’d prefer not to mix and match these with the data on this site, but in the long run it will be worthwhile to explore how to combine them.

I’ve been playing with these location data for a few years and the new pages offline for a few months. It seems time to make them generally available; if you find obvious problems or flaws, please let me know so we can endeavor to make them better.

Constructing Fiddler Range Maps, part 3: A Better Way?

This is the third of four planned posts about how I construct the range maps for fiddler crabs. The first part gave the history and background of how these maps were drawn in the first place. The second part discussed where the maps become problematic when we want to use them as input data for analysis. This part will present a possible solution to the problem detailed in the second part. The fourth and final part will step back and ask if we’re actually thinking about range maps the wrong way entirely.

As highlighted in part two, even beyond general questions of their accuracy there are some potential problems with using the range maps as data, particularly if we are thinking about estimating the sizes of their ranges. Three parts of the solution are completely obvious. First, for any question involving coastline length (whether to measure available space or estimate a fiddler range), one must be very specific about the map scale and background map used to make such measures since changing these could change the results. Second, when comparing species and/or regions, one must use the same base maps (optimally) or maps constructed from identical scales (suboptimal, but tolerable if necessary) to calculated values for each species or region. Third, species ranges and coastline lengths have to be determined from the identical maps if they are to be at all compared.

This last part is where the most work suddenly looms. As detailed in the previous post, currently our species ranges are generated from one map set while our coastlines are generated from a different one. To make them match, we would likely need to recreate the range data…again…on our new coastlines. Doing so highlights the fragility of the current process…but also leads to the realization that there is likely a better way to store the base information about fiddler crab ranges.

I’ve mentioned a few times that fiddler crab ranges should likely be viewed as one-dimensional lines rather than two-dimensional areas. But we can take advantage of areas and polygons to define a fiddler crab range in an easy and flexible manner. The idea here (which one can view as theoretical since I have not implemented it yet) is that for a given species we only need to define the general boundaries of the range—define a loose polygon which includes all of the within-range coastline and none of the outside-range coastline. Whenever we need to draw the actual range or estimate a distance, we then just need an algorithm which compares that polygon to a coastline map and extracts just the coastline inside the polygon.

There are a number of clear advantages to this approach.

  1. The range data is not fixed to a specific background coastline map, allowing any coastline map to be used to generate the actual mapped or measured range. Changing maps would not require redoing the range data.
  2. Updating the range data for a species simply requires updating the enclosing polygon, likely a vaster simpler operation in most cases than the current system.
  3. The polygon that describes the range data does not require tremendous precision over most of its boundaries. A simple rectangle might be adequate for many species (some will require more complicated shapes, unfortunately).

In theory, for some species with extreme simple ranges (a single contiguous coastline without any outlying islands), one could define the entire range by just noting the end points. In reality, we’ll likely define these by a simple shape that intersects the coast at those two points.

For example, below is a range map for Uca maracoani which I used in the first post of this series, with the addition of a blue rectangle to serve as the polygon denoting its “range.”

This rectangle adequately represents the range of the species, as long as we recognize that the range is the coastline within the rectangle and not the area of the rectangle itself. We need not concern ourselves with how far into the Atlantic the rectangle extends (as long as it isn’t so wide as to clip Africa), nor that it includes two landlocked countries without any shoreline. This rectangle would serve as the masking template for the species, to be applied to any coastline map. An algorithm would simply need to extract the coastline in the rectangle (marked in red) as needed for display or analytical purposes. If we need to change the range of the species we make the rectangle bigger or smaller or use a more complicated polygon as necessary (for example, we could not extend the rectangle west to encompass the Atlantic coast of Colombia and Panama without it intersecting the Pacific coast of Peru…in that case we’d have to use a slightly more complicated polygon that avoided intersecting the Pacific coast).

The biggest question mark about this approach is how complicated the computation will need to be to extract the correct coastline from a complex polygon of an arbitrary shape (simple polygons would be pretty easy), but my presumption is not too complicated. If nothing else, this problem is not unique and  has been solved in many other applications (e.g., masking or clipping figures using complex shapes in vector drawing programs such as Illustrator or Inkscape) so likely a workable solution already exists.

While not solving every problem, using this simpler bounding concept with algorithmic coastline extraction seems like a much more flexible manner of storing the range data. Of course, maybe the way we are thinking about range maps is completely wrong to begin with. Stay tuned for final thoughts…

Constructing Fiddler Range Maps, part 2: The Problem of Ranges as Data

This is the second of four planned posts about how I constructed the range maps for fiddler crabs. The first part gave the history and background of how these maps were drawn in the first place. This part will discuss where the maps become problematic when we want to use them as input data for analysis. The third  part will present a possible solution to the problem detailed in the second part. The fourth and final part will step back and ask if we’re actually thinking about range maps the wrong way entirely.

In the first post I discussed how the range maps were originally created and have evolved over time. As general tools for the display of information, they work perfectly fine (there are some limitations that will be raised in part 4).

But what if we want to go beyond the display itself and think about the ranges as input data for other analyses. What type of analysis? As an example, a few years ago, Jeff Levinton (my PhD advisor) published a study on the Latitudinal diversity relationships of fiddler crabs. In general, fiddler crab diversity declines as latitude increases (as it does for many other groups of species), and this paper explored potential factors that explain this pattern. In this paper, the species ranges themselves made up a key piece of the raw data.

For the purposes of that sort of study, there’s nothing wrong with the ranges as data. You mostly only need to know the upper and lower latitudes between which a species is found, and while there may be some uncertainty on the precise boundaries (I’ll come back to this in part four), small errors are not likely to make a huge difference in the results. Other aspects of the ranges may become more problematic if looked at too closely, however.

For example, one of the results that can be found in the above paper is a slight northern bias to fiddler crab species diversity: globally, peak diversity is not found at the equator, but rather about 10° north (regionally, the northern bias is strongest in the Americas, but largely absent from the Indo-West Pacific). There are any number of reasons why there may be a slight northern bias, but one hypothesis that could be suggested is that there is more land in the northern hemisphere than the southern and diversity is partially tracking habitat availability. Since land area as a whole is fairly meaningless to fiddler crabs, this hypothesis can only make sense if the increased land mass in the northern hemisphere corresponds to an increased coastline. There are a number of reasons I suspect this hypothesis about fiddler diversity is likely incorrect (the simplest of which is that the greatest species diversity is often found in very small areas), but what if we wanted to test it? We would need a way of measuring the amount of of coastline available. Because fiddlers only live on coastlines, this also leads to the idea of measuring the “size” of a fiddler crab range by the total length of coastline it inhabits. Unlike most other species, fiddler crab ranges can be thought of as one-dimensional lengths measured in km, rather than two-dimensional areas measured in km2 (this 1D argument might fall apart for some of the species which range over large parts of the western Pacific islands, but that is a discussion for another time).

So, whether we are interested in the range of a species of the potential habitat it inhabits, we are looking at measuring the length of the coastline. How do we do this? The coastlines and species ranges on our maps are recorded as a series of connected coordinates, so it’s simple to imagine simply calculating the distances between connected pairs and adding them together. Voilà, species range and/or coastline length! Except, now we need to go back and look at our map data more closely.

The coastlines and countries boundaries in our new cartoon maps (see previous post) came from the Natural Earth data sets. These maps come in three different scales: 1:10 m, 1:50 m, and 1:110 m. Essentially, fine scale to rough scale. For simple display purposes, most of the fiddler crab ranges could use the medium or roughest scale; the finer scale maps are only really needed if we need to zoom in to fairly small regions. For example,  Uca osa is a recently described species of fiddler crab known only from the Gulf of Dulce, in Costa Rica. Below is a zoomed-in look at the Gulf drawn from two of these data sets.

Maps of the Gulf of Dulce, Costa Rica. The top map is from the 1:50 m scale, the bottom map 1:10 m scale.

At this level, the two maps are strikingly different. Because the gulf is so small, it does not even show up in the 1:110 m map data (not shown)! But it is more than just a visual difference. The measurement of coastline will be different in each of these. The finer the scale of the map, the longer the coastline.

This issue has been known for a long time; in fact, coastline length is considered to be a fractal mathematics problem called the coastline paradox. Theoretically (if not practically), if you could keep measuring a coastline at greater and greater accuracy, it’s length would continue to increase…all the way to infinity. One of my favorite oddities of fractal mathematics is the proof that a finite area can contain an infinitely long line (e.g., see the Koch Snowflake). If we want to measure the length of the coastline, we need to be concerned with the scale at which we measure it.

Since our species ranges are also based on coastlines, they have the same issue. But here, a secondary problem arises. The ranges are currently based on yet a different map set (this one extracted from Google Earth). If we draw the coastline data for Uca osa on top of one of our Gulf maps…

we immediately find that it is at yet a different scale than any of our background maps.

This all just highlights how we need to be careful about thinking of these species ranges as data. They’re perfectly good for generally asking about where species are and questions of overlap, but if we want to translate these ranges into measures of distance or area, more thought is needed. Some of those more thoughts in part three…

Constructing Fiddler Range Maps, part 1: History

This is the first of four planned posts about how I constructed the range maps for fiddler crabs. This first part (likely the longest) will give the history and background of how these maps were drawn in the first place. The second part will discuss where the maps become problematic when we want to use them as input data for analysis. The third  part will present a possible solution to the problem detailed in the second part. The fourth and final part will step back and ask if we’re actually thinking about range maps the wrong way entirely.

1. The Original Maps

In mid-2002 I decided to add maps depicting the distributions of each fiddler crab species to the website. I don’t recall why; possibly I just wanted updated information, possibly there was another motivation. This might be around when I expanded the site beyond its original collection of photos, videos, and references to be a bit more species-information-centric. The maps that ended up on the site were constructed in the roughest way possible.

The base data that went into each map started with the maps from Crane’s 1975 monograph.

Map 9 from Crane (1975) depicting the range of four fiddler crab taxa.

Using this as the starting range, I pored through the literature post-1975 (more or less) to look for publications that might have information altering the ranges presented by Crane. Once I’d established an updated range for a species, a background map of the coastline was drawn for the appropriate area using the first version of my spatial software PASSaGEThis map was imported into a graphics program (Photoshop? MS Paint?) and the range was hand drawn with a transparent brush and saved as a raster image (GIF of all things) for display on the site. Not precisely high-tech, but it got the job done.

Original hand drawn range map of Uca maracoani.

One obvious problem with this approach was updating a map often required redrawing it from scratch, particularly if the range needed to be reduced rather than expanded. Not the most efficient approach and not the best quality.

2. The Interactive Maps

At some point between 2011 and 2013, the entire website was rebuilt from the ground up; this is when the site was transitioned from hand-coded to dynamically created from a data back-end. How to handle the range maps was an interesting question, and after some exploration Google Maps seemed to be a good solution. By creating a custom KML layer for each species, I could use the Google Maps API to insert an interactive map into the webpages.

The challenge was how best to create the KML layers with the species ranges. The solution I came up with had a couple of different parts. One key element was to take advantage of the fact that (adult) fiddler crabs are essentially restricted to marine coastlines (they go a little bit inland into river mouths and estuaries and the like, but on the  global scale this is still within the margin of error of the “coast”). Juveniles presumably get a bit more into the ocean, but even there the limited data suggests most of them seem to stick close to the shore.

Thus, a set of coastline data can provide the entire framework on which to build a fiddler crab range: basically, any given piece of coastline can either be in the range or out of the range, and you don’t have to particularly worry about any space in between coastlines (whether land or open ocean). To create these, I imported a full set of world country borders from within Google Earth (Google Earth made it easy to quickly display, check, and update distributions without fighting with the vagaries of the online Google Maps API). For each country whose coastline had fiddler crabs, I exported the individual country to a KML file, then manually removed the parts of the boundaries that represented interior land borders, leaving only coastlines. Countries with borders that included multiple major oceanic regions were split into their constituent parts (e.g., an Atlantic Panama coast and a Pacific Panama coast). All of these coast outlines were re-imported into Google Earth. For each species it was then just a matter of copying the appropriate country borders into a new folder to represent that species; when only part of a country was in the species range, a copy of that country’s coastline was again exported, manually trimmed to the correct range, then re-imported. While seemingly a lot of up front work, for most species it became a fairly quick and easy method to produce the ranges. Each species had its own folder within Google Earth and the whole set was exported as a single KML file. I then wrote a program to extract all of the individual maps from this file, standardizing certain style elements, and exporting each one to its own individual file for integrating with the Google Maps API on the website. This also allowed the creation of the map which overlaps all of the species ranges with high transparency to get a worldwide view of species density.

Uca maracoani range displayed on Google Map embedded in website.

Updating or editing the maps, while still a bit of work, was substantially easier than with the original ones since it only required changing a subset of the coastline data, rather than redrawing an entire map by hand.

Interestingly, a few people complained about the new maps. As rough as they were, the old ones displayed the ranges in a fairly simple cartoon form which is lost with the more complicated Google Map backdrop; also, the new images are not easily exportable for use in another format (beyond doing a screen grab, as I had to do to display the above figure).

3. The New Static Maps

Recently, as part of some potential and planned updates to the site, I came to the realization that the inability to automatically export the maps to an image for use outside a webpage had become somewhat problematic. After a lot of time spent trying (and failing) to come up with ways to automatically export the layered Google Map, it finally occurred to me that the obvious solution was to back away from the Google Map approach entirely for non-web-based use. In fact, part of the solution was to go back to the beginning. As mentioned above, I’ve written code in the past that can draw a background map given a set of lines or polygons describing coastal outlines (data readily available from a number of online sources). And although it was never part of the original plan, the KML file that is being used to add the Google Map layers has all of the boundary data for the ranges in an already parsable format. Combining these into the site creation code with a drawing module that could export directly to a file suddenly allows us to automatically draw higher quality cartoon maps, much like the site originally had, but automated from the code rather than hand painted. (1) Use the world map data to draw a nice representation of key coastlines. (2) Draw the range data from the KML file on top of this in a different color and with a slightly thicker line to help make it stand out. (3) Export as a vector file format (SVG) rather than raster file to allow scalability for high quality figures. I still use the Google Maps and KML layers on the website for interactivity, but these new maps can be seen and downloaded by clicking on the link directly below the corresponding Google map (the link can be seen in the previous figure).

New range map for Uca maracoani.

On these new maps I decided to use a background map with country boundaries to help make the ranges more obvious, and filled in the land areas to make the maps clearer for species with more limited ranges, but unlike with the original maps, these decisions can be quickly changed and the maps redrawn with minimal effort since they are created programmatically. The range data itself is identical to that from the Google Maps.

For display purposes outside of webpages, these newer maps are quite nice. But all is not perfect if we actually start thinking about using these ranges as data rather than just visual guides. To be continued in part two…