Selecting & Filtering by Attributes; More Plotting

This time, we'll work a little more on manipulating and exploring the attributes. This is akin to working with the attribute table in desktop GIS software.

Start by importing pandas, geopandas, matplotlib.pyplot, and os:

>>> import pandas as pd
>>> import geopandas as gpd
>>> import matplotlib.pyplot as plt
>>> import os
In [1]:
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import os
Important! Reset your working directory to your home folder, and only run this once!
>>> os.chdir('../')
>>> os.getcwd()
In [2]:
os.chdir('../')
os.getcwd()
Out[2]:
'/home/jupyter-phwh9568'

Import the OSMP_Trails shapefile as trails:

>>> trails = gpd.read_file('workshopdata/OSMP_Trails.shp')
In [3]:
trails = gpd.read_file('workshopata/OSMP_Trails.shp')

Use .head() to view only the first few rows of the dataframe:

>>> trails.head()
In [4]:
trails.head()
Out[4]:
OBJECTID OWNER BICYCLES DISPLAY TRAILTYPE DATEFROM DATETO SEGMENTID HORSES RID TRLID MILEAGE MEASUREDFE TRAILNAME SHAPELEN GLOBALID geometry
0 2568 OSMP No Yes Hiking Trail 2005-12-31T00:00:00.000Z 2099-12-31T00:00:00.000Z 296-357-356 No 1000 296 0.106 -9999.0 Boulder Falls 562.147648 {118369E0-68A9-4089-8B04-295FBCA62CBB} LINESTRING (-105.40585 40.00468, -105.40606 40...
1 2569 OSMP Yes Yes Multi-Use Trail 2016-10-26T00:00:00.000Z 2099-12-31T00:00:00.000Z 116-424-421 Yes 1135 116 0.876 -9999.0 Flatirons Vista North 4625.547170 {FCFF5BB0-A953-4151-AA12-FB5EFD7B7C22} LINESTRING (-105.23878 39.92365, -105.23881 39...
2 2570 OSMP Yes Yes Multi-Use Trail 2016-10-25T00:00:00.000Z 2099-12-31T00:00:00.000Z 116-430-428 Yes 1140 116 0.019 -9999.0 Flatirons Vista North 102.623588 {A2917E15-8708-416C-8694-8A3E0871855B} LINESTRING (-105.23569 39.92407, -105.23570 39...
3 2571 OSMP Yes Yes Multi-Use Trail 2016-10-25T00:00:00.000Z 2099-12-31T00:00:00.000Z 116-425-424 Yes 1544 116 0.178 -9999.0 Flatirons Vista North 939.742819 {64EA49F9-8293-4859-AD3E-05244C78CAEB} LINESTRING (-105.23655 39.92554, -105.23660 39...
4 2572 OSMP Yes Yes Multi-Use Trail 2016-10-25T00:00:00.000Z 2099-12-31T00:00:00.000Z 117-429-428 Yes 1141 117 0.004 -9999.0 Prairie Vista 22.315176 {BE8E1241-78E1-4F43-9EA0-968591282E6D} LINESTRING (-105.23555 39.92430, -105.23556 39...

You can also view all column headers:

>>> trails.columns
In [5]:
trails.columns
Out[5]:
Index(['OBJECTID', 'OWNER', 'BICYCLES', 'DISPLAY', 'TRAILTYPE', 'DATEFROM',
       'DATETO', 'SEGMENTID', 'HORSES', 'RID', 'TRLID', 'MILEAGE',
       'MEASUREDFE', 'TRAILNAME', 'SHAPELEN', 'GLOBALID', 'geometry'],
      dtype='object')

You can also call up just one column:

>>> trails.TRAILTYPE
In [6]:
trails.TRAILTYPE
Out[6]:
0         Hiking Trail
1      Multi-Use Trail
2      Multi-Use Trail
3      Multi-Use Trail
4      Multi-Use Trail
            ...       
574       Hiking Trail
575       Hiking Trail
576       Hiking Trail
577       Hiking Trail
578       Hiking Trail
Name: TRAILTYPE, Length: 579, dtype: object

Now take a look at the trail data by plotting it:

>>> trails.plot()
In [7]:
trails.plot()
Out[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f54e4c3f278>

You can also filter by attributes and create a new variable from the result:

>>> bikeTrails = trails.loc[trails['BICYCLES']=='Yes']
>>> bikeTrails
In [8]:
bikeTrails = trails.loc[trails['BICYCLES']=='Yes']
bikeTrails
Out[8]:
OBJECTID OWNER BICYCLES DISPLAY TRAILTYPE DATEFROM DATETO SEGMENTID HORSES RID TRLID MILEAGE MEASUREDFE TRAILNAME SHAPELEN GLOBALID geometry
1 2569 OSMP Yes Yes Multi-Use Trail 2016-10-26T00:00:00.000Z 2099-12-31T00:00:00.000Z 116-424-421 Yes 1135 116 0.876 -9999.0 Flatirons Vista North 4625.547170 {FCFF5BB0-A953-4151-AA12-FB5EFD7B7C22} LINESTRING (-105.23878 39.92365, -105.23881 39...
2 2570 OSMP Yes Yes Multi-Use Trail 2016-10-25T00:00:00.000Z 2099-12-31T00:00:00.000Z 116-430-428 Yes 1140 116 0.019 -9999.0 Flatirons Vista North 102.623588 {A2917E15-8708-416C-8694-8A3E0871855B} LINESTRING (-105.23569 39.92407, -105.23570 39...
3 2571 OSMP Yes Yes Multi-Use Trail 2016-10-25T00:00:00.000Z 2099-12-31T00:00:00.000Z 116-425-424 Yes 1544 116 0.178 -9999.0 Flatirons Vista North 939.742819 {64EA49F9-8293-4859-AD3E-05244C78CAEB} LINESTRING (-105.23655 39.92554, -105.23660 39...
4 2572 OSMP Yes Yes Multi-Use Trail 2016-10-25T00:00:00.000Z 2099-12-31T00:00:00.000Z 117-429-428 Yes 1141 117 0.004 -9999.0 Prairie Vista 22.315176 {BE8E1241-78E1-4F43-9EA0-968591282E6D} LINESTRING (-105.23555 39.92430, -105.23556 39...
5 2573 OSMP Yes Yes Multi-Use Trail 2016-10-25T00:00:00.000Z 2099-12-31T00:00:00.000Z 117-427-423 Yes 1545 117 0.199 -9999.0 Prairie Vista 1049.725608 {74B9B1A6-1368-4BA2-9A6A-9106F7C56135} LINESTRING (-105.23563 39.92439, -105.23563 39...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
443 3370 OSMP Yes Yes Multi-Use Trail 2015-04-09T00:00:00.000Z 2099-12-31T00:00:00.000Z 403-622-674 Yes 1592 403 0.032 138.0 Foothills South 166.390898 {2004BF80-764B-4277-B81A-D50A331C634A} LINESTRING (-105.28883 40.06240, -105.28883 40...
459 3387 OSMP Yes Yes Multi-Use Trail 2019-06-20T00:00:00.000Z 2099-12-31T00:00:00.000Z 403-618-621 Yes 1694 403 0.219 NaN Foothills South 1157.710963 {9C7A1C42-0A4B-4F9E-84F7-166C8FE38FD5} LINESTRING (-105.29140 40.06012, -105.29138 40...
460 3388 OSMP Yes Yes Multi-Use Trail 2019-06-20T00:00:00.000Z 2099-12-31T00:00:00.000Z 403-617-618 Yes 1695 403 0.072 NaN Foothills South 378.164900 {457AD6EC-A830-40C5-89FF-1C531C78C487} LINESTRING (-105.29144 40.05908, -105.29144 40...
461 3389 OSMP Yes Yes Multi-Use Trail 2019-06-20T00:00:00.000Z 2099-12-31T00:00:00.000Z 403-615-617 Yes 1696 403 0.452 NaN Foothills South 2388.989417 {36D74C83-7AB4-4328-ACF6-1BE2D9E0B6C9} LINESTRING (-105.29247 40.05301, -105.29247 40...
481 3409 OSMP Yes Yes Multi-Use Trail 2019-07-10T00:00:00.000Z 2099-12-31T00:00:00.000Z 112-434-433 Yes 1708 112 2.667 NaN High Plains 14083.727879 {E6F13C70-97C3-44A2-BB61-784FC0FE9628} LINESTRING (-105.20617 39.91431, -105.20617 39...

130 rows × 17 columns

Nice. You have a new geodataframe that is just bike trails. Plot them:

>>> bikeTrails.plot()
In [9]:
bikeTrails.plot()
Out[9]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f54e4bd7ac8>

Now, do the same for hiking trails, then count the results:

>>> hikeTrails = trails.loc[trails['TRAILTYPE']=='Hiking Trail']
>>> len(hikeTrails)
In [10]:
hikeTrails = trails.loc[trails['TRAILTYPE']=='Hiking Trail']
len(hikeTrails)
Out[10]:
439

Do the same for horse trails:

>>> horseTrails = trails.loc[trails['HORSES']=='Yes']
>>> len(horseTrails)
In [11]:
horseTrails = trails.loc[trails['HORSES']=='Yes']
len(horseTrails)
Out[11]:
467

Now, select all trails whose mileage is greater than 2 miles:

>>> trails.loc[trails['MILEAGE']>=2]
In [12]:
trails.loc[trails['MILEAGE']>=2]
Out[12]:
OBJECTID OWNER BICYCLES DISPLAY TRAILTYPE DATEFROM DATETO SEGMENTID HORSES RID TRLID MILEAGE MEASUREDFE TRAILNAME SHAPELEN GLOBALID geometry
49 2922 OSMP Yes Yes Multi-Use Trail 2005-12-31T00:00:00.000Z 2099-12-31T00:00:00.000Z 408-652-675 Yes 1009 408 2.779 14775.0 Left Hand 14671.966605 {24656ACD-CE01-4D16-A322-DB00832F328C} LINESTRING (-105.26275 40.08011, -105.26276 40...
117 2990 OSMP Yes Yes Multi-Use Trail 2016-02-01T00:00:00.000Z 2099-12-31T00:00:00.000Z 501-537-555 Yes 1050 501 2.227 11862.0 East Boulder - White Rocks 11758.165168 {5D08B3D6-5CA7-4B84-A015-F5918D3D8732} LINESTRING (-105.14279 40.04001, -105.14333 40...
481 3409 OSMP Yes Yes Multi-Use Trail 2019-07-10T00:00:00.000Z 2099-12-31T00:00:00.000Z 112-434-433 Yes 1708 112 2.667 NaN High Plains 14083.727879 {E6F13C70-97C3-44A2-BB61-784FC0FE9628} LINESTRING (-105.20617 39.91431, -105.20617 39...

Interesting. Only 3 trails are 2 miles or greater in length?

Not exactly. These are trail segments. Note in the original geodataframe that there are multiple rows with the same trail name.

Let's all the unique trail names:

>>> trails.TRAILNAME.unique()
In [13]:
trails.TRAILNAME.unique()
Out[13]:
array(['Boulder Falls', 'Flatirons Vista North', 'Prairie Vista',
       'Flatirons Vista South', 'Doudy Draw', 'Mount Sanitas', 'Anemone',
       'IBM Connector', 'Cottontail', 'Eldorado Canyon',
       'Green Mountain West Ridge', 'Bear Peak', '7th Street Connector',
       '8th Street Connector', '9th Street Connector', 'Baseline',
       '6th Street Connector West', 'Baseline Connector',
       '6th Street Connector', 'Joder Ranch', 'Buckingham Park',
       'Left Hand', 'Sage', 'North Rim', 'Eagle', 'Mesa Reservoir',
       'Degge', 'Cobalt', 'Hidden Valley', 'Foothills North',
       'Hogback Ridge', 'Foothills South', 'Foothills Spur', 'Old Kiln',
       'Old Kiln Spur', 'Wonderland Lake',
       'Foothills - Wonderland Lake spur', 'Wonderland Hill',
       'East Boulder - Gunbarrel', 'East Boulder - White Rocks',
       'East Boulder - Teller Farm', 'East Boulder - Teller Lake #5',
       'East Boulder - Teller Farm Spur', 'East Boulder - Teller Spur',
       'Sawhill Ponds', 'Cottonwood', 'Sombrero Marsh', 'Dry Creek',
       'Centennial - Equestrian', 'South Boulder Creek', 'Cherryvale',
       'Mayhoffer-Singletree', 'Cowdrey Draw', 'Community Ditch',
       'Marshall Mesa', 'Marshall Valley', 'Coal Seam',
       'Community Ditch Hwy 93 Spur', 'Greenbelt Connector', 'Coalton',
       'Greenbelt Plateau', 'Marshall Valley - Accessible',
       'Doudy Draw Picnic Area', 'Spring Brook Loop North',
       'Spring Brook Loop South', 'Fowler', 'Goshawk Ridge',
       'Doudy Draw Connector', 'Northern Quarry', "Lion's Lair",
       "Lion's Lair Spur", 'Sanitas Valley', 'East Ridge', 'Dakota Ridge',
       'Goat', 'Sanitas Spur', 'Sunshine Canyon', 'Sanitas Connector',
       'Red Rocks', 'Red Rocks Spur', 'Viewpoint', 'Flagstaff',
       'Halfway House', 'Panorama', 'Boy Scout', 'Chapman Drive',
       'Plains Overlook', 'Sensory', 'Tenderfoot', 'Ute', 'Range View',
       'Crown Rock', 'Crown Rock Picnic Loop', 'Sacred Cliffs',
       'Gregory Canyon Spur', 'Green Mountain Summit', 'Bear Canyon',
       'Bear Peak West Ridge', 'Fern Canyon', 'Green Bear',
       'Gregory Canyon', 'Long Canyon', 'Lost Gulch', 'Ranger',
       'Shadow Canyon', 'South Boulder Peak', 'Saddle Rock',
       'E.M. Greenman', 'South Mesa Connector', 'Shadow Canyon North',
       'Shanahan - Mesa', 'Mallory Cave', 'Royal Arch', 'Enchanted Mesa',
       'Enchanted-Kohler Spur', 'Enchanted - McClintock Spur',
       'Four Pines', 'McClintock Lower', 'McClintock Upper', 'Mesa',
       "Wood's Quarry", 'Amphitheater Express', 'Flatirons Loop',
       '1st/2nd Flatiron', 'Amphitheater', 'Baseline-Gregory Connector',
       'Bluebell Mesa', 'Bluebell Road', 'Bluebell Spur',
       'Bluebell-Baird', 'Meadow', 'Ski Jump', 'KOA Lake', 'Artist Point',
       "May's Point", 'NCAR', 'NIST Service Rd Connector', 'Chautauqua',
       'Shadow Canyon South', 'Dunn House Loop', 'NCAR - Bear Connector',
       'High Plains', 'NCAR - Skunk Canyon', 'Kohler Mesa', 'Holly Berry',
       'Skunk Connector', 'Skunk Canyon', 'NIST Service Rd',
       'Kohler Spur', 'Skunk Canyon Spur', 'NCAR - Bear Canyon',
       'Devils Thumb Access', 'Cragmoor Connector', 'South Fork Shanahan',
       'Shanahan Connector', 'North Fork Shanahan',
       'Hardscrabble Connector', 'NCAR - Table Mesa',
       'NCAR - Table Mesa Bear', 'Lehigh Connector - South',
       'Fern Meadow - Cragmoor', 'Greenbriar Connector',
       'Lower Big Bluestem', 'Upper Big Bluestem', 'Bluestem Connector',
       'Shadow Canyon South Spur', 'Homestead', 'Towhee',
       'Mesa Connector', 'South Boulder Creek West',
       'Lehigh Connector - North', 'Big Bluestem West', 'South Mesa Spur',
       'NCAR Water Tank Rd'], dtype=object)

Find out how many unique trail names there are using the same len()

>>> len(trails.TRAILNAME.unique())
In [14]:
len(trails.TRAILNAME.unique())
Out[14]:
174

Now, let's take each row, group them by trail name, then sum up the total mileage of each trail by trail name:

Sounds hard, but not really!

>>> trailLengths = trails.groupby('TRAILNAME')['MILEAGE'].sum()
>>> trailLengths
In [16]:
trailLengths = trails.groupby('TRAILNAME')['MILEAGE'].sum()
trailLengths
Out[16]:
TRAILNAME
1st/2nd Flatiron             0.852
6th Street Connector         0.283
6th Street Connector West    0.033
7th Street Connector         0.017
8th Street Connector         0.018
                             ...  
Ute                          0.776
Viewpoint                    1.112
Wonderland Hill              0.983
Wonderland Lake              1.083
Wood's Quarry                0.399
Name: MILEAGE, Length: 174, dtype: float64

Resetting the index will bring it into a Pandas dataframe:

>>> trailLengths = trailLengths.reset_index()
>>> trailLengths
In [17]:
trailLengths = trailLengths.reset_index()
trailLengths
Out[17]:
TRAILNAME MILEAGE
0 1st/2nd Flatiron 0.852
1 6th Street Connector 0.283
2 6th Street Connector West 0.033
3 7th Street Connector 0.017
4 8th Street Connector 0.018
... ... ...
169 Ute 0.776
170 Viewpoint 1.112
171 Wonderland Hill 0.983
172 Wonderland Lake 1.083
173 Wood's Quarry 0.399

174 rows × 2 columns

Now you can export your dataframe to a csv:

>>> trailLengths.to_csv('trailLengths.csv')
In [18]:
trailLengths.to_csv('trailLengths.csv')

You should find it in your home folder.

Now let's do a little more plotting:

Import OSMP.shp and Boulder_Co.shp:

>>> osmp = gpd.read_file('workshopdata/OSMP.shp')
>>> county = gpd.read_file('workshopdata/Boulder_Co.shp')
In [19]:
osmp = gpd.read_file('workshopdata/OSMP.shp')
county = gpd.read_file('workshopdata/Boulder_Co.shp')

Reset the county layer's crs to match the osmp layer:

>>> county = county.to_crs(osmp.crs)
In [20]:
county = county.to_crs(osmp.crs)

To plot more than two layers together, you must use 'fig' and a "blank" axis like so:

>>> fig, ax = plt.subplots(1, 1, figsize = (15,15))
>>> county.plot(ax=ax, color = '#F0E68C', edgecolor='black')
>>> osmp.plot(ax=ax, color = 'green', alpha=0.5)
>>> hikeTrails.plot(ax=ax, color='red')
>>> bikeTrails.plot(ax=ax, color='yellow')
In [25]:
fig, ax = plt.subplots(1, 1, figsize = (15,15))
county.plot(ax=ax, color = '#F0E68C', edgecolor='black')
osmp.plot(ax=ax, color='green', alpha=0.5)
hikeTrails.plot(ax=ax, color='red')
bikeTrails.plot(ax=ax, color='yellow')
Out[25]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f54e44720b8>

Nice job!