Okay--I admit this is slightly complicated, but doing GIS in the wild can be that way sometimes!
There's a bunch of ways you may want to clip something, so consider this a primer on what it really takes to clip a raster.
Essentially, you need:
Let's do it!
import rasterio as rio
from rasterio.plot import show
from rasterio.mask import mask #this is the masking aka clip module
import matplotlib.pyplot as plt
import geopandas as gpd #to read the input shapefile
import shapely #to transform extent into coordinate geometries
import json
import os
Important: only run this once or you'll wind up in the wrong directory.
os.chdir('../')
os.getcwd()
os.chdir('../')
First, let's read in the image:
src = rio.open('workshopdata/NAIP_Boulder.tif')
src = rio.open('workshopdata/NAIP_Boulder.tif')
Let's take a peek:
show(src.read(1))
show(src.read(1))
<AxesSubplot: >
Big image! In fact, too big for us to all run stuff on it together... so let's clip so that we're only working with the campus area...
Read in the campus shapefile:
shape = gpd.read_file('workshopdata/UCB_MainCampus_Boundaries.shp')
shape = gpd.read_file('workshopdata/UCB_MainCampus_Boundaries.shp')
Let's take a look:
shape.plot(figsize = (12,12))
shape.plot(figsize = (12,12,))
<AxesSubplot: >
Okay, but if we're going to clip with this, it needs to be in the same CRS as the image, this is easy to change using geopandas:
src.crs
src.crs
CRS.from_epsg(26913)
Change it to EPSG 26913
shape = shape.to_crs(26913)
shape = shape.to_crs(26913)
Now let's get the extent of this and use it as our clipping geometry...
First, we'll use geopandas to get the extent:
geoseries.total_bounds Returns a tuple containing minx, miny, maxx, maxy values for the bounds of the series as a whole.
shape.total_bounds
shape.total_bounds
array([ 476016.29658655, 4427821.07743908, 477920.19326694, 4430124.84186719])
Does this seem right?
src.bounds
src.bounds
BoundingBox(left=473124.0, bottom=4427586.0, right=478878.0, top=4434942.0)
Save it as a variable:
extent = shape.total_bounds
extent = shape.total_bounds
But what we have here are really xmin, ymin, xmax, ymax...
What we need are coordinates of the four corners of this box...
So we'll use the Shapely library to convert these values to a box with actual point coordinates in the corners.
bbox = shapely.geometry.box(*extent)
bbox = shapely.geometry.box(*extent)
Look right?
print(bbox)
print(bbox)
POLYGON ((477920.19326693704 4427821.077439084, 477920.19326693704 4430124.841867187, 476016.2965865457 4430124.841867187, 476016.2965865457 4427821.077439084, 477920.19326693704 4427821.077439084))
Now we have a couple of more steps... although there are probably many ways to do this, I find this reasonably easy.
First bring it into a geopandas dataframe:
geom = gpd.GeoDataFrame({'geometry': bbox}, index=[0], crs=26913)
geom = gpd.GeoDataFrame({'geometry':bbox}, index=[0], crs = 26913)
Now convert that geopandas geodataframe into geojson (which is the same as a Python dictionary.)
geojson = geom.to_json()
geojson = geom.to_json()
Take a peek if you're not familiar:
geojson
geojson
'{"type": "FeatureCollection", "features": [{"id": "0", "type": "Feature", "properties": {}, "geometry": {"type": "Polygon", "coordinates": [[[477920.19326693704, 4427821.077439084], [477920.19326693704, 4430124.841867187], [476016.2965865457, 4430124.841867187], [476016.2965865457, 4427821.077439084], [477920.19326693704, 4427821.077439084]]]}}]}'
Now, here what we're doing is using the json library to just get the part we need... that is, only the geometry element, not all the other stuff.
So, here we select what we need (parse) from the json:
bbox_geom = [json.loads(geojson)['features'][0]['geometry']]
bbox_geom = [json.loads(geojson)['features'][0]['geometry']]
Okay, so you should now have a list that contains the bounding box coordinate geometry:
print(bbox_geom)
print(bbox_geom)
[{'type': 'Polygon', 'coordinates': [[[477920.19326693704, 4427821.077439084], [477920.19326693704, 4430124.841867187], [476016.2965865457, 4430124.841867187], [476016.2965865457, 4427821.077439084], [477920.19326693704, 4427821.077439084]]]}]
Perfect!
You may be asking: Why does it want it this way???
Because, even though this time we're only clipping by one geometry, if it's formatted as a dict/geojson you can have multiple geometries aka multiple vector features to clip with... (see below)
Now finally, we can run the mask function to get the clipped image. mask() will return two things: the image, and it's affine transformation:
out_img, out_transform = mask(src, bbox_geom, crop=True)
out_img, out_transform = mask(src, bbox_geom, crop=True)
Let's take a look...
show(out_img)
show(out_img)
<AxesSubplot: >
Cool, clip done... if you think the color looks weird, don't worry we'll sort that out when we write the final image...
... now to save it we'll use same approach as before by repurposing the profile:
src.profile
src.profile
{'driver': 'GTiff', 'dtype': 'uint8', 'nodata': None, 'width': 9590, 'height': 12260, 'count': 4, 'crs': CRS.from_epsg(26913), 'transform': Affine(0.6, 0.0, 473124.0, 0.0, -0.6, 4434942.0), 'blockysize': 1, 'tiled': False, 'interleave': 'pixel'}
Copy it to a variable:
profile = src.profile.copy()
(Note, this is the same as the kwargs variable we used in the last notebook)
profile = src.profile.copy()
Grab the height and the width of the output image:
height = out_img.shape[1]
width = out_img.shape[2]
height = out_img.shape[1]
width = out_img.shape[2]
Update the profile. Since we have a new transform, a new width & height, we need to use the .update
method to take the old profile and update certain elements:
profile.update(transform = out_transform,
width = width,
height = height,
photometric = 'rgb',
alpha = 'no')
(more on photometric & alpha in the next notebook)
profile.update(transform = out_transform, width = width, height = height, photometric ='rgb', alpha='no')
Open a new empty dataset:
new = rio.open('NAIP_Campus_Clip.tif', 'w', **profile)
new = rio.open('NAIP_Campus_Clip.tif', 'w', **profile)
Write it to disc:
new.write(out_img)
new.write(out_img)
Close it...
new.close()
new.close()
Though, I have a hard time coming up with scenarios when you would do it this way, but maybe!
What will happen here is, everything beyond the extent of the image (aka, bounds) will be gone, but areas within the extent that are outside of the polygons will remain as no data values...
First, rather than the bounding box, we'll take the original shapefile geodataframe and convert it to geojson:
geojson = shape.to_json()
geojson = shape.to_json()
Add the geometries of each polygon feature to a list:
clip_geoms = []
for i in json.loads(geojson)['features']:
clip_geoms.append(i['geometry'])
clip_geoms = []
for i in json.loads(geojson)['features']:
clip_geoms.append(i['geometry'])
slightly faster list comprehension method:
clip_geoms = [q['geometry'] for q in json.loads(geojson)['features']]
clip_geoms = [q['geometry'] for q in json.loads(geojson)['features']]
Then, same process as we used before:
out_img, out_transform = mask(src, clip_geoms, crop=True)
show(out_img)
<AxesSubplot: >
profile = src.profile.copy()
height = out_img.shape[1]
width = out_img.shape[2]
Except this time, we update the 'nodata'
value in the profile to 0
:
profile.update(transform = out_transform,
width = width,
height = height,
nodata = 0,
photometric = 'rgb',
alpha = 'no')
new = rio.open('NAIP_Campus_Clip2.tif', 'w', **profile)
new.write(out_img)
new.close()