Thursday, February 6, 2014

Scatter for Terrain Database Generation

Part 2 - Storing Scatter Points in Shapefiles

In part 1 of this series, I described two general storage methods that could be used for scatter points.  These two methods apply equally to shapefiles, geodatabases and sql-databases:
  1. All scatter points stored in a single container and attributed with a type
  2. Scatter points stored within individual containers according to their types
The big difference between these two methods is how many objects you decide to store in a single container.  Regardless of which method we choose, we need to choose a container.  In addition, we would like this container to be compatible with popular GIS tools.  For this tutorial, I will store the points in shapefiles.

Storing data in a shapefile usually consists of segregating geometry and attributes.  Geometry is stored in the .sh* files and attributes are stored in the dbf.  Let's try this out using a shapefile to see how it performs.  In this case, we will create one million point features that each have an integer attribute value:

for(j=0; j < 1000000; ++j)
{
    x = i % 360;
    y = j % 360;
    m = j % 10;
    shape = SHPCreateObject(SHPT_POINT, -1, 0, NULL, NULL, 1, &x, &y, NULL, NULL);
    SHPWriteObject(shp_handle, -1, shape);
    DBFWriteIntegerAttribute(dbf_handle, counter++, 0, m);
    SHPDestroyObject(shape);
}

On my 2.5Ghz MacBook Pro (with several standard desktop programs running), this operation took 19 seconds.  I stress that I am using a single integer attribute to persist the feature type.  This practice of using multiple attributes to categorize a features characteristics is a traditionally accepted practice in GIS communities.  Let's think outside the box for a minute and consider what would happen if we did not encode this data in an attribute table.  What other options do we have?  Suppose we encoded this data using a measure, aka m-value.  This would completely eliminate the attribute table overhead while mildly impacting our shapefile write performance.  See the snippet below, notice that the DBF write integer call has been eliminated:

for(j=0; j < 1000000; ++j)
{
    x = i % 360;
    y = j % 360;
    z = 0.0;
    m = i % 10;
    shape = SHPCreateObject(SHPT_POINTM, -1, 0, NULL, NULL, 1, &x, &y, &z, &m);
    SHPWriteObject(shp_handle, -1, shape);
    SHPDestroyObject(shape);
}


When you're talking only a million points, 7 seconds vs 19 seconds is huge deal.  This benchmark can easily be replicated for 100 million features+ (you just need to wait a minute).

What have we sacrificed to achieve this gain?  We have sacrificed the comfort of being able to use an attribute table to manipulate a features type ("OH No!" Screams the GIS analyst).  In the case of scattering features for terrain database generation, this is not a huge deal.  The only issue is figuring out how to encode feature types into m-values.  This is where a simple configuration file would come into play.  This config file would define the key-value relationships needed to derive the feature type.

[trees]
red oak = 0
sugar maple = 1

[other]
... = 10
... = 11

The values could also be auto-generated from the type names if a static-definition was not required.   From a technical standpoint, this is very efficient because the lookup table will typically be small.  The structure can be represented using a map in c++ or a dictionary in Python.  This also raises a rather interesting (and perhaps new) concept for the M&S industry, which is the idea of real-number mappings.  The suggestion is to use real number ranges to represent the set of all possible enumeration values for a particular type of feature.  Meaning, we could map all of our point trees to values within the range [0,1) and buildings within the range [1,2).

At this point it looks like we have everything we need; where to scatter the feature, the type of feature to scatter and how to orient it.  Oh wait... how to orient it...?  We completely missed that!  Perhaps this means that we will now be forced into using an attribute and now have to pay the attribute table penalties.  Well, not quite.  Review the second code snippet again and observe the z-value.  Scatter features typically do not use Z-values.  This is a mighty convenient place to store orientation, which means, we have already paid the price.

I understand that this approach is not for everyone.  In fact, from a pure GIS standpoint, this practice would be frowned upon because the data can not be trivially filtered and styled.  However, from a development standpoint, this method can be used to store a large quantity of simple points that have a type and orientation.  It is a convenient and clever method that utilizes a standard, portable container.

No comments: