import numpy as np
Numpy arrays can have multiple dimensions.
To create a one dimensional array from scratch, use np.array()
with a list of numbers (integers or floats) as the sole parameter:
np.array([4,6,4,7,2])
np.array([4,6,4,7,2])
array([4, 6, 4, 7, 2])
Usually, though an array is two-dimensional.
To create a 2D array from scratch, you need nested lists:
np.array([[4,6,4,7,2],[5,8,4,9,7],[6,8,3,4,2],[9,5,8,6,7],[5,2,7,9,1],[7,4,2,6,7]])
np.array([[4,6,4,7,2],[5,8,4,9,7],[6,8,3,4,2],[9,5,8,6,7],[5,2,7,9,1],[7,4,2,6,7]])
array([[4, 6, 4, 7, 2], [5, 8, 4, 9, 7], [6, 8, 3, 4, 2], [9, 5, 8, 6, 7], [5, 2, 7, 9, 1], [7, 4, 2, 6, 7]])
So what is an array? You can think of it as a grid.
What types of data that you might work with are gridded?
Store the above array into a variable:
arr = np.array(([4,6,4,7,2],[5,8,4,9,7],[6,8,3,4,2],[9,5,8,6,7],[5,2,7,9,1],[7,4,2,6,7]))
arr = np.array(([4,6,4,7,2],[5,8,4,9,7],[6,8,3,4,2],[9,5,8,6,7],[5,2,7,9,1],[7,4,2,6,7]))
arr
array([[4, 6, 4, 7, 2], [5, 8, 4, 9, 7], [6, 8, 3, 4, 2], [9, 5, 8, 6, 7], [5, 2, 7, 9, 1], [7, 4, 2, 6, 7]])
Often, you need to know the shape of your array, particularly for transforming it to different data formats such as netCDF or GeoTiff.
arr.shape
arr.shape
(6, 5)
Note that shape is in (rows, columns) notation.
Use the .size
method to get the size:
arr.size
arr.size
30
Use the .ndim
method to get it's dimensions.
arr.ndim
2
Indexing arrays is similar, with some quirks.
arr[0]
Retruns the row in 0 index position.
arr[0]
array([4, 6, 4, 7, 2])
arr[0:3]
Returns a slice of the first three rows.
arr[0:3]
array([[4, 6, 4, 7, 2], [5, 8, 4, 9, 7], [6, 8, 3, 4, 2]])
What about columns?
First, get the row or rows you want, then column position like this:
arr[0:6,0] # first 6 rows before comma, first column after comma
arr[0:6,0]
array([4, 5, 6, 9, 5, 7])
So, array indexing is [rows,columns]
or in other words [vertical axis, horizontal axis]
. Vertical axis = 0, Horizontal axis = 1. (typically)
arr[0,0:] # first row, all columns
arr[1,0:3] # second row, columns 0, 1, 2
arr[0:3,1] # first three rows from column 2
arr[3,2] # value from fourth row, third column
arr[3,2]
np.int64(8)
A 3x3 slice of the top left corner:
arr[0:3,0:3]
arr[0:3,0:3]
array([[4, 6, 4], [5, 8, 4], [6, 8, 3]])
What will this do?
arr[0:3,0:3] + 100
arr[0:3, 0:3] + 100
array([[104, 106, 104], [105, 108, 104], [106, 108, 103]])
arr
arr
array([[4, 6, 4, 7, 2], [5, 8, 4, 9, 7], [6, 8, 3, 4, 2], [9, 5, 8, 6, 7], [5, 2, 7, 9, 1], [7, 4, 2, 6, 7]])
Note that arrays are "mutable", meaning the values can be reassigned. Modify the same code:
arr[0:3,0:3] += 100
arr[0:3,0:3] += 100
arr
arr
array([[104, 106, 104, 7, 2], [105, 108, 104, 9, 7], [106, 108, 103, 4, 2], [ 9, 5, 8, 6, 7], [ 5, 2, 7, 9, 1], [ 7, 4, 2, 6, 7]])
Modify the same code to return it to it's original form:
arr[0:3,0:3] -= 100
You can use various methods to broadcast to an array, such as np.ones()
or np.zeros()
:
np.ones((10,10)) # shape of the desired array is first parameter, which is provided in parentheses aka tuple
np.ones((10,10))
array([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])
Of course, you can cast to the size of another array easily:
arr1 = np.ones((arr.shape))
arr1 = np.ones((arr.shape))
arr1
array([[1., 1., 1., 1., 1.], [1., 1., 1., 1., 1.], [1., 1., 1., 1., 1.], [1., 1., 1., 1., 1.], [1., 1., 1., 1., 1.], [1., 1., 1., 1., 1.]])
You can do arithmetic operations on arrays. If you want to apply a formula to a whole array (array to scalar), it's very easy:
arr2 = arr1 * 2
arr2 = arr1 * 2
arr2
array([[2., 2., 2., 2., 2.], [2., 2., 2., 2., 2.], [2., 2., 2., 2., 2.], [2., 2., 2., 2., 2.], [2., 2., 2., 2., 2.], [2., 2., 2., 2., 2.]])
It's also easy to do arithmetic operations on multiple arrays of the same shape:
arr * arr2
arr * arr2
array([[ 8., 12., 8., 14., 4.], [10., 16., 8., 18., 14.], [12., 16., 6., 8., 4.], [18., 10., 16., 12., 14.], [10., 4., 14., 18., 2.], [14., 8., 4., 12., 14.]])
Note those values are now float type. Sometimes you may need to cast to a different data type, such as integer to float or vice versa:
arr3 = (arr * arr2).astype(int)
arr3 = (arr * arr2).astype(int)
arr3
array([[ 8, 12, 8, 14, 4], [10, 16, 8, 18, 14], [12, 16, 6, 8, 4], [18, 10, 16, 12, 14], [10, 4, 14, 18, 2], [14, 8, 4, 12, 14]])
A useful method is np.where()
. This comes in handy in a variety of scenarios where you need to select based on conditions.
First parameter is the condition, second parameter is the result if true, third parameter if the result is false.
np.where(arr>5,1,0)
np.where(arr>5,1,0)
array([[0, 1, 0, 1, 0], [0, 1, 0, 1, 1], [1, 1, 0, 0, 0], [1, 0, 1, 1, 1], [0, 0, 1, 1, 0], [1, 0, 0, 1, 1]])
np.where(arr>5,arr,0)
array([[0, 6, 0, 7, 0], [0, 8, 0, 9, 7], [6, 8, 0, 0, 0], [9, 0, 8, 6, 7], [0, 0, 7, 9, 0], [7, 0, 0, 6, 7]])
Masking:
mask = np.where(arr>5,1,0)
arr * mask
array([[0, 6, 0, 7, 0], [0, 8, 0, 9, 7], [6, 8, 0, 0, 0], [9, 0, 8, 6, 7], [0, 0, 7, 9, 0], [7, 0, 0, 6, 7]])
Create a random array:
This array will be populated by whole numbers between 1 and 100.
np.random.randint(1,100,(10,10))
np.random.randint(1,100,(10,10))
array([[92, 73, 9, 17, 60, 54, 22, 77, 29, 45], [15, 66, 13, 65, 14, 14, 81, 61, 60, 6], [52, 13, 60, 13, 42, 34, 13, 18, 65, 65], [86, 86, 34, 37, 35, 22, 71, 86, 41, 76], [69, 31, 36, 5, 99, 76, 17, 91, 89, 54], [72, 30, 93, 55, 31, 67, 44, 8, 65, 38], [ 4, 87, 44, 46, 59, 69, 3, 45, 16, 26], [48, 87, 85, 20, 10, 69, 11, 76, 64, 18], [99, 21, 48, 91, 57, 55, 39, 40, 67, 6], [52, 95, 50, 71, 60, 48, 35, 18, 97, 7]], dtype=int32)
Multidimensional arrays¶
You can stack arrays of the same size on top of one another for a multidimensional array.
a1 = np.random.randint(1,100,(10,10))
a2 = np.random.randint(1,100,(10,10))
a3 = np.random.randint(1,100,(10,10))
a1 = np.random.randint(1,100,(10,10))
a2 = np.random.randint(1,100,(10,10))
a3 = np.random.randint(1,100,(10,10))
stack = np.stack([a1,a2,a3])
stack = np.stack([a1,a2,a3])
Or, a better way:
arrays = []
for n in range(3):
arrays.append(np.random.randint(1,100,(10,10)))
stack = np.stack(arrays)
arrays = []
for n in range(3):
arrays.append(np.random.randint(1,100,(10,10)))
stack = np.stack(arrays)
stack.shape
stack.shape
(3, 10, 10)
Now, 0 axis is the multidimensional one.
Generate a derived mean array from all three:
np.mean(stack, axis=0)
np.mean(stack,axis=0)
array([[83. , 11.33333333, 54.33333333, 62.66666667, 69.66666667, 31. , 38.66666667, 42. , 73.66666667, 53. ], [47.66666667, 48.66666667, 80.33333333, 56.66666667, 54.66666667, 27.66666667, 45. , 74.66666667, 56. , 25.66666667], [59.33333333, 29. , 56. , 52. , 47.66666667, 49.33333333, 47.66666667, 58.66666667, 28.66666667, 22. ], [37.33333333, 57. , 72.33333333, 40.33333333, 65.66666667, 36.33333333, 60.66666667, 78. , 28.66666667, 82.66666667], [47.33333333, 20.66666667, 58.33333333, 37.66666667, 42.66666667, 60.33333333, 35.33333333, 84. , 46. , 25. ], [52. , 53.66666667, 28.66666667, 47. , 56.33333333, 69. , 37. , 51. , 20.33333333, 40. ], [43. , 31.66666667, 76.33333333, 42.66666667, 73. , 43.33333333, 47.66666667, 33.66666667, 40.66666667, 40.33333333], [15. , 55.66666667, 50.66666667, 77.33333333, 33.66666667, 48.33333333, 49.66666667, 46.66666667, 58.66666667, 44.33333333], [52.66666667, 45.33333333, 65.66666667, 41.66666667, 43.66666667, 65. , 56.66666667, 42. , 29.66666667, 37. ], [66.66666667, 28. , 35.66666667, 31. , 56.33333333, 74. , 58.66666667, 63. , 26. , 43. ]])
Or,
stack.mean((0))
stack.mean((0))
array([[83. , 11.33333333, 54.33333333, 62.66666667, 69.66666667, 31. , 38.66666667, 42. , 73.66666667, 53. ], [47.66666667, 48.66666667, 80.33333333, 56.66666667, 54.66666667, 27.66666667, 45. , 74.66666667, 56. , 25.66666667], [59.33333333, 29. , 56. , 52. , 47.66666667, 49.33333333, 47.66666667, 58.66666667, 28.66666667, 22. ], [37.33333333, 57. , 72.33333333, 40.33333333, 65.66666667, 36.33333333, 60.66666667, 78. , 28.66666667, 82.66666667], [47.33333333, 20.66666667, 58.33333333, 37.66666667, 42.66666667, 60.33333333, 35.33333333, 84. , 46. , 25. ], [52. , 53.66666667, 28.66666667, 47. , 56.33333333, 69. , 37. , 51. , 20.33333333, 40. ], [43. , 31.66666667, 76.33333333, 42.66666667, 73. , 43.33333333, 47.66666667, 33.66666667, 40.66666667, 40.33333333], [15. , 55.66666667, 50.66666667, 77.33333333, 33.66666667, 48.33333333, 49.66666667, 46.66666667, 58.66666667, 44.33333333], [52.66666667, 45.33333333, 65.66666667, 41.66666667, 43.66666667, 65. , 56.66666667, 42. , 29.66666667, 37. ], [66.66666667, 28. , 35.66666667, 31. , 56.33333333, 74. , 58.66666667, 63. , 26. , 43. ]])
Working with data arrays in practice.¶
Here is a quick geospatial demonstration using a digital elevation model.
I'll use the rasterio library to transform a geoTiff file into an array, run a processing operation over the array, and write a new output geoTiff.
Note, you probably don't have rasterio installed, but you can always open your command line and type conda install rasterio
.
import rasterio as rio
import rasterio as rio
import os
import matplotlib.pyplot as plt
data_dir = r'C:\Users\phwh9568\Workshops\Python_Data_Camp\data'
file = 'Flatirons_DEM_1m.tif'
dem = rio.open(os.path.join(data_dir, file))
plt.imshow(dem.read(1))
<matplotlib.image.AxesImage at 0x1f27fa337d0>
Or:
from rasterio.plot import show
from rasterio.plot import show
show(dem)
show(dem)
<Axes: >
Note matplot lib's colormaps: https://matplotlib.org/stable/tutorials/colors/colormaps.html
show(dem, cmap='gray')
<Axes: >
Plotting matplotlib style:
fig, ax = plt.subplots(1,figsize=(12,12))
show(dem.read(1), transform=dem.transform, cmap='plasma', ax=ax)
fig, ax = plt.subplots(1,figsize=(12,12))
show(dem.read(1), transform=dem.transform, cmap='plasma', ax=ax)
<Axes: >
When working with geospatial raster data, knowing the coordinate system, size, and transformation information are all crucial for writing outputs:
dem.profile
dem.profile
{'driver': 'GTiff', 'dtype': 'float32', 'nodata': -3.4028234663852886e+38, 'width': 2702, 'height': 2522, 'count': 1, 'crs': CRS.from_epsg(6342), 'transform': Affine(0.9999394800148107, 0.0, 473821.016811, 0.0, -0.999886617763522, 4427603.58783), 'blockysize': 1, 'tiled': False, 'interleave': 'band'}
Translate the DEM to a numpy array:
demArray = dem.read(1)
demArray = dem.read(1)
demArray
array([[2035.8704, 2035.8304, 2035.7504, ..., 1698.5481, 1698.5231, 1698.4481], [2035.899 , 2035.8453, 2035.7727, ..., 1698.5516, 1698.5345, 1698.4625], [2035.9617, 2035.9552, 2035.8711, ..., 1698.5631, 1698.5507, 1698.5112], ..., [2258.158 , 2258.8215, 2259.3848, ..., 1799.3021, 1799.3027, 1799.2911], [2258.2954, 2259.0105, 2259.6692, ..., 1799.037 , 1799.0354, 1799.0189], [2258.469 , 2259.1514, 2259.8518, ..., 1798.7863, 1798.7714, 1798.76 ]], dtype=float32)
demArray.size
6814444
demArray.shape
(2522, 2702)
Now that the DEM is an array, there are endless ways we could manipulate it. Let's perform a common geospatial operation, a smoothing filter based on the mean of a cells based the cell and it's 8 neighboring cells' values.
There are several approaches to this including iterating through every cell (slow), takign the mean from slices of the array (fast), or using the SciPy library which already includes mean filter functions (easy).
SciPy is frequently used alongside NumPy to extend it's functionality. A lot of things do-able in NumPy are made easier with SciPy's more advanced suite of algorithms.
import scipy.ndimage
import scipy.ndimage
demMean = scipy.ndimage.uniform_filter(demArray, size=3)
demMean
array([[2035.8651, 2035.8243, 2035.7487, ..., 1698.558 , 1698.5096, 1698.4775], [2035.8993, 2035.8618, 2035.792 , ..., 1698.5618, 1698.5215, 1698.4948], [2035.9618, 2035.9213, 2035.8488, ..., 1698.5713, 1698.5398, 1698.5184], ..., [2258.3694, 2258.7927, 2259.4094, ..., 1799.2964, 1799.2906, 1799.2833], [2258.5364, 2258.979 , 2259.6223, ..., 1799.0466, 1799.0338, 1799.0277], [2258.6423, 2259.1023, 2259.766 , ..., 1798.8768, 1798.8585, 1798.8507]], dtype=float32)
Now, write it out as a GeoTiff:
with rio.open('demMean.tif', 'w', **dem.profile) as demMean_out:
demMean_out.write(demMean,1)