When you're working in with rasters in Rasterio, a typical workflow is often reading data, doing something to it, and then writing out a new dataset. To perform these i/o type tasks, accessing descriptive info about the dataset is also essential.
Start by importing libraries:
import numpy as np
import rasterio as rio
from rasterio.plot import show
import os
import numpy as np
import rasterio as rio
from rasterio.plot import show
import os
Important: only run this once or you'll wind up in the wrong directory.
os.chdir('../')
os.getcwd()
os.chdir('../')
os.getcwd()
'/home/jupyter-chain.saw'
First, we'll read in a raster as a Rasterio dataset object:
src = rio.open('workshopdata/Flatirons_DEM_1m.tif')
src = rio.open('workshopdata/Flatirons_DEM_1m.tif')
Just so you know what you're looking at...
show(src)
show(src)
<AxesSubplot: >
src.name
'workshopdata/Flatirons_DEM_1m.tif'
src.width
src.width
2702
src.height
src.height
2522
Check the coordinate reference system (CRS)
src.crs
src.crs
CRS.from_epsg(6342)
How many bands are in this image?
src.count
src.count
1
The unit is part of the coordinate system:
src.crs.linear_units
src.crs.linear_units
'metre'
Now, let's get the extent AKA bounds:
src.bounds
src.bounds
BoundingBox(left=473821.016811, bottom=4425081.87378, right=476522.853286, top=4427603.58783)
Affine transformation:
src.transform
src.transform
Affine(0.9999394800148107, 0.0, 473821.016811, 0.0, -0.999886617763522, 4427603.58783)
Transform is super important, but what the heck is it?
(cell width, x origin, x origin coordinate, y origin, cell height, y origin coordinate)
From Rasterio's docs:
A dataset’s DatasetReader.transform
is an affine transformation matrix that maps pixel locations in (row, col) coordinates to (x, y) spatial positions. The product of this matrix and (0, 0), the row and column coordinates of the upper left corner of the dataset, is the spatial position of the upper left corner.
Upper left corner position is (0,0), so...
src.transform * (0,0)
src.transform * (0,0)
(473821.016811, 4427603.58783)
There's the coordinates of our top left corner...
For funsies, we could take a stab at the center point, or at least pretty darn close:
centerX = src.width/2
centerY = src.height/2
centerX = src.width/2
centerY = src.height/2
src.transform * (centerX,centerY)
src.transform * (centerX, centerY)
(475171.9350485, 4426342.730805)
So, that's how you apply a transformation--it does the math over the pixels to "georeference" them to a coordinate system, or change it from one crs to another... which we'll work on in the next notebook.
You can access parts of the transform as well, such as cell dimensions:
cellWidth = src.transform[0]
cellHeight = src.transform[4]
cellWidth = src.transform[0]
cellHeight = src.transform[4]
cellWidth, cellHeight
cellWidth, cellHeight
(0.9999394800148107, -0.999886617763522)
Okay, and finally two related and important descriptive tools:
src.meta
src.meta
{'driver': 'GTiff', 'dtype': 'float32', 'nodata': -3.4028234663852886e+38, 'width': 2702, 'height': 2522, 'count': 1, 'crs': CRS.from_epsg(6342), 'transform': Affine(0.9999394800148107, 0.0, 473821.016811, 0.0, -0.999886617763522, 4427603.58783)}
And similar...
src.profile
src.profile
{'driver': 'GTiff', 'dtype': 'float32', 'nodata': -3.4028234663852886e+38, 'width': 2702, 'height': 2522, 'count': 1, 'crs': CRS.from_epsg(6342), 'transform': Affine(0.9999394800148107, 0.0, 473821.016811, 0.0, -0.999886617763522, 4427603.58783), 'blockysize': 1, 'tiled': False, 'interleave': 'band'}
Note that you can access these:
src.meta['dtype']
src.meta['dtype']
'float32'
Now let's do a little bit of work on the raster layer and save it's output to a new file.
Frequently, Numpy is used to work with arrays--and a raster layer is just a big ol' array...
To read a Rasterio dataset object as a Numpy array use src.read(). The parameter is the band you want to read:
array = src.read(1)
array = src.read(1)
And here it is...
array
array
array([[2035.8704, 2035.8304, 2035.7504, ..., 1698.5481, 1698.5231, 1698.4481], [2035.899 , 2035.8453, 2035.7727, ..., 1698.5516, 1698.5345, 1698.4625], [2035.9617, 2035.9552, 2035.8711, ..., 1698.5631, 1698.5507, 1698.5112], ..., [2258.158 , 2258.8215, 2259.3848, ..., 1799.3021, 1799.3027, 1799.2911], [2258.2954, 2259.0105, 2259.6692, ..., 1799.037 , 1799.0354, 1799.0189], [2258.469 , 2259.1514, 2259.8518, ..., 1798.7863, 1798.7714, 1798.76 ]], dtype=float32)
Now, let's do something super basic for now: multiply every value in the array times 2...
array2 = array * 2
array2 = array *2
Did it work?
print(array.max())
print(array2.max())
print(array.max())
print(array2.max())
2479.9094 4959.819
Complex analysis, right?
Let's grab the height and width of our array2 and save them for later... note that these should be exactly the same as our source DEM, so in this case, you could also use src.width and src.height...
But we'll use the Numpy method:
height = np.size(array2,0)
width = np.size(array2,1)
height = np.size(array2, 0)
width = np.size(array2, 1)
Okay, let's take our new data and write it to a new GeoTiff file.
The steps are:
So, begin by creating and opening a new GeoTiff that will store our data...
First let's take a look at the documentation: https://rasterio.readthedocs.io/en/latest/quickstart.html#opening-a-dataset-in-writing-mode
The parameters we need are:
Note that several of these are optional, but you probably want to use at least these. And there are more optional params as well...
For starters let's do this:
new = rio.open('new_dem.tif', 'w',
driver='GTiff',
height = height,
width = width,
count = 1,
dtype = src.meta['dtype'],
crs = src.crs,
transform = src.transform)
new = rio.open('new_dem.tif', 'w', driver='GTiff', height = height, width = width, count = 1, dtype = src.meta['dtype'], crs = src.crs, transform = src.transform)
Now write the array to the open data set:
new.write(array2,1)
new.write(array2, 1)
Close it when you're done:
new.close()
new.close()
The basic gist is that you need the descriptive info (meta, profile, crs, transform, width, height, dtype, etc) to write a new dataset.
There is a slightly more streamlined approach where you use the meta/profile of the source image and repurpose it (and modify if needed) and use it in the destination image....
First, note that the .meta
and .profile
information has all the same parameters that you need to open a brand new dataset in write mode:
src.meta
src.meta
{'driver': 'GTiff', 'dtype': 'float32', 'nodata': -3.4028234663852886e+38, 'width': 2702, 'height': 2522, 'count': 1, 'crs': CRS.from_epsg(6342), 'transform': Affine(0.9999394800148107, 0.0, 473821.016811, 0.0, -0.999886617763522, 4427603.58783)}
Same stuff!
Okay, make a copy of it to a new variable, we'll call it kwargs, which is short for key word arguments:
kwargs = src.meta.copy()
kwargs = src.meta.copy()
Okay, now we have all the parameters necessary saved in a dict, which is of course, key:value pairs....
In this case, none of them need updating because all we did was double the values in the array, we didn't modify it's shape at all so...
Just open a new Rasterio data oject in write mode and apply the kwargs like this:
new = rio.open('new_dem.tif', 'w', **kwargs)
new = rio.open('new_dem.tif', 'w', **kwargs)
That's called a double star expression--it is used to unpack keyword arguments as parameters of a function.
And now, same as before, write and close:
new.write(array2,1)
new.close()
new.write(array2, 1)
new.close()