Primer: Working with Arrays using Numpy¶

"Nearly every scientist working in Python draws on the power of Numpy."
Numpy.org

Numpy is widely used across scientific disciplines for numerical computing, and is the most common means of working with data arrays.

import numpy as np
In [50]:
import numpy as np

Numpy arrays can have multiple dimensions.

To create a one dimensional array from scratch, use np.array() with a list of numbers (integers or floats) as the sole parameter:

np.array([4,6,4,7,2])
In [51]:
np.array([4,6,4,7,2])
Out[51]:
array([4, 6, 4, 7, 2])

Usually, though an array is two-dimensional.

To create a 2D array from scratch, you need nested lists:

np.array([[4,6,4,7,2],[5,8,4,9,7],[6,8,3,4,2],[9,5,8,6,7],[5,2,7,9,1],[7,4,2,6,7]])
In [52]:
np.array([[4,6,4,7,2],[5,8,4,9,7],[6,8,3,4,2],[9,5,8,6,7],[5,2,7,9,1],[7,4,2,6,7]])
Out[52]:
array([[4, 6, 4, 7, 2],
       [5, 8, 4, 9, 7],
       [6, 8, 3, 4, 2],
       [9, 5, 8, 6, 7],
       [5, 2, 7, 9, 1],
       [7, 4, 2, 6, 7]])

So what is an array? You can think of it as a grid.

What types of data that you might work with are gridded?

Store the above array into a variable:

arr = np.array(([4,6,4,7,2],[5,8,4,9,7],[6,8,3,4,2],[9,5,8,6,7],[5,2,7,9,1],[7,4,2,6,7]))
In [53]:
arr = np.array(([4,6,4,7,2],[5,8,4,9,7],[6,8,3,4,2],[9,5,8,6,7],[5,2,7,9,1],[7,4,2,6,7]))
In [54]:
arr
Out[54]:
array([[4, 6, 4, 7, 2],
       [5, 8, 4, 9, 7],
       [6, 8, 3, 4, 2],
       [9, 5, 8, 6, 7],
       [5, 2, 7, 9, 1],
       [7, 4, 2, 6, 7]])

Often, you need to know the shape of your array, particularly for transforming it to different data formats such as netCDF or GeoTiff.

arr.shape
In [55]:
arr.shape
Out[55]:
(6, 5)

Note that shape is in (rows, columns) notation.

Use the .size method to get the size:

arr.size
In [56]:
arr.size
Out[56]:
30

Use the .ndim method to get it's dimensions.

In [57]:
arr.ndim
Out[57]:
2

Indexing arrays is similar, with some quirks.

arr[0]

Retruns the row in 0 index position.

In [58]:
arr[0]
Out[58]:
array([4, 6, 4, 7, 2])
arr[0:3]

Returns a slice of the first three rows.

In [59]:
arr[0:3]
Out[59]:
array([[4, 6, 4, 7, 2],
       [5, 8, 4, 9, 7],
       [6, 8, 3, 4, 2]])

What about columns?

First, get the row or rows you want, then column position like this:

arr[0:6,0] # first 6 rows before comma, first column after comma
In [60]:
arr[0:6,0]
Out[60]:
array([4, 5, 6, 9, 5, 7])

So, array indexing is [rows,columns] or in other words [vertical axis, horizontal axis]. Vertical axis = 0, Horizontal axis = 1. (typically)

arr[0,0:] # first row, all columns
arr[1,0:3] # second row, columns 0, 1, 2
arr[0:3,1] # first three rows from column 2
arr[3,2] # value from fourth row, third column
In [26]:
arr[3,2]
Out[26]:
np.int64(8)

A 3x3 slice of the top left corner:

arr[0:3,0:3]
In [27]:
arr[0:3,0:3]
Out[27]:
array([[4, 6, 4],
       [5, 8, 4],
       [6, 8, 3]])

What will this do?

arr[0:3,0:3] + 100
In [61]:
arr[0:3, 0:3] + 100
Out[61]:
array([[104, 106, 104],
       [105, 108, 104],
       [106, 108, 103]])
arr
In [62]:
arr
Out[62]:
array([[4, 6, 4, 7, 2],
       [5, 8, 4, 9, 7],
       [6, 8, 3, 4, 2],
       [9, 5, 8, 6, 7],
       [5, 2, 7, 9, 1],
       [7, 4, 2, 6, 7]])

Note that arrays are "mutable", meaning the values can be reassigned. Modify the same code:

arr[0:3,0:3] += 100
In [63]:
arr[0:3,0:3] += 100
arr
In [64]:
arr
Out[64]:
array([[104, 106, 104,   7,   2],
       [105, 108, 104,   9,   7],
       [106, 108, 103,   4,   2],
       [  9,   5,   8,   6,   7],
       [  5,   2,   7,   9,   1],
       [  7,   4,   2,   6,   7]])

Modify the same code to return it to it's original form:

In [65]:
arr[0:3,0:3] -= 100

You can use various methods to broadcast to an array, such as np.ones() or np.zeros():

np.ones((10,10)) # shape of the desired array is first parameter, which is provided in parentheses aka tuple
In [66]:
np.ones((10,10))
Out[66]:
array([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])

Of course, you can cast to the size of another array easily:

arr1 = np.ones((arr.shape))
In [72]:
arr1 = np.ones((arr.shape))
arr1
Out[72]:
array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

You can do arithmetic operations on arrays. If you want to apply a formula to a whole array (array to scalar), it's very easy:

arr2 = arr1 * 2
In [68]:
arr2 = arr1 * 2
In [73]:
arr2
Out[73]:
array([[2., 2., 2., 2., 2.],
       [2., 2., 2., 2., 2.],
       [2., 2., 2., 2., 2.],
       [2., 2., 2., 2., 2.],
       [2., 2., 2., 2., 2.],
       [2., 2., 2., 2., 2.]])

It's also easy to do arithmetic operations on multiple arrays of the same shape:

arr * arr2
In [74]:
arr * arr2
Out[74]:
array([[ 8., 12.,  8., 14.,  4.],
       [10., 16.,  8., 18., 14.],
       [12., 16.,  6.,  8.,  4.],
       [18., 10., 16., 12., 14.],
       [10.,  4., 14., 18.,  2.],
       [14.,  8.,  4., 12., 14.]])

Note those values are now float type. Sometimes you may need to cast to a different data type, such as integer to float or vice versa:

arr3 = (arr * arr2).astype(int)
In [75]:
arr3 = (arr * arr2).astype(int)
arr3
Out[75]:
array([[ 8, 12,  8, 14,  4],
       [10, 16,  8, 18, 14],
       [12, 16,  6,  8,  4],
       [18, 10, 16, 12, 14],
       [10,  4, 14, 18,  2],
       [14,  8,  4, 12, 14]])

A useful method is np.where(). This comes in handy in a variety of scenarios where you need to select based on conditions.

First parameter is the condition, second parameter is the result if true, third parameter if the result is false.

np.where(arr>5,1,0)
In [79]:
np.where(arr>5,1,0)
Out[79]:
array([[0, 1, 0, 1, 0],
       [0, 1, 0, 1, 1],
       [1, 1, 0, 0, 0],
       [1, 0, 1, 1, 1],
       [0, 0, 1, 1, 0],
       [1, 0, 0, 1, 1]])
In [80]:
np.where(arr>5,arr,0)
Out[80]:
array([[0, 6, 0, 7, 0],
       [0, 8, 0, 9, 7],
       [6, 8, 0, 0, 0],
       [9, 0, 8, 6, 7],
       [0, 0, 7, 9, 0],
       [7, 0, 0, 6, 7]])

Masking:

In [81]:
mask = np.where(arr>5,1,0)
arr * mask
Out[81]:
array([[0, 6, 0, 7, 0],
       [0, 8, 0, 9, 7],
       [6, 8, 0, 0, 0],
       [9, 0, 8, 6, 7],
       [0, 0, 7, 9, 0],
       [7, 0, 0, 6, 7]])

Create a random array:

This array will be populated by whole numbers between 1 and 100.

np.random.randint(1,100,(10,10))
In [84]:
np.random.randint(1,100,(10,10))
Out[84]:
array([[92, 73,  9, 17, 60, 54, 22, 77, 29, 45],
       [15, 66, 13, 65, 14, 14, 81, 61, 60,  6],
       [52, 13, 60, 13, 42, 34, 13, 18, 65, 65],
       [86, 86, 34, 37, 35, 22, 71, 86, 41, 76],
       [69, 31, 36,  5, 99, 76, 17, 91, 89, 54],
       [72, 30, 93, 55, 31, 67, 44,  8, 65, 38],
       [ 4, 87, 44, 46, 59, 69,  3, 45, 16, 26],
       [48, 87, 85, 20, 10, 69, 11, 76, 64, 18],
       [99, 21, 48, 91, 57, 55, 39, 40, 67,  6],
       [52, 95, 50, 71, 60, 48, 35, 18, 97,  7]], dtype=int32)

Multidimensional arrays¶

You can stack arrays of the same size on top of one another for a multidimensional array.

a1 = np.random.randint(1,100,(10,10))
a2 = np.random.randint(1,100,(10,10))
a3 = np.random.randint(1,100,(10,10))
In [3]:
a1 = np.random.randint(1,100,(10,10))
a2 = np.random.randint(1,100,(10,10))
a3 = np.random.randint(1,100,(10,10))
stack = np.stack([a1,a2,a3])
In [4]:
stack = np.stack([a1,a2,a3])

Or, a better way:

arrays = []
for n in range(3):
    arrays.append(np.random.randint(1,100,(10,10)))

stack = np.stack(arrays)
In [87]:
arrays = []
for n in range(3):
    arrays.append(np.random.randint(1,100,(10,10)))

stack = np.stack(arrays)
stack.shape
In [88]:
stack.shape
Out[88]:
(3, 10, 10)

Now, 0 axis is the multidimensional one.

Generate a derived mean array from all three:

np.mean(stack, axis=0)
In [89]:
np.mean(stack,axis=0)
Out[89]:
array([[83.        , 11.33333333, 54.33333333, 62.66666667, 69.66666667,
        31.        , 38.66666667, 42.        , 73.66666667, 53.        ],
       [47.66666667, 48.66666667, 80.33333333, 56.66666667, 54.66666667,
        27.66666667, 45.        , 74.66666667, 56.        , 25.66666667],
       [59.33333333, 29.        , 56.        , 52.        , 47.66666667,
        49.33333333, 47.66666667, 58.66666667, 28.66666667, 22.        ],
       [37.33333333, 57.        , 72.33333333, 40.33333333, 65.66666667,
        36.33333333, 60.66666667, 78.        , 28.66666667, 82.66666667],
       [47.33333333, 20.66666667, 58.33333333, 37.66666667, 42.66666667,
        60.33333333, 35.33333333, 84.        , 46.        , 25.        ],
       [52.        , 53.66666667, 28.66666667, 47.        , 56.33333333,
        69.        , 37.        , 51.        , 20.33333333, 40.        ],
       [43.        , 31.66666667, 76.33333333, 42.66666667, 73.        ,
        43.33333333, 47.66666667, 33.66666667, 40.66666667, 40.33333333],
       [15.        , 55.66666667, 50.66666667, 77.33333333, 33.66666667,
        48.33333333, 49.66666667, 46.66666667, 58.66666667, 44.33333333],
       [52.66666667, 45.33333333, 65.66666667, 41.66666667, 43.66666667,
        65.        , 56.66666667, 42.        , 29.66666667, 37.        ],
       [66.66666667, 28.        , 35.66666667, 31.        , 56.33333333,
        74.        , 58.66666667, 63.        , 26.        , 43.        ]])

Or,

stack.mean((0))
In [90]:
stack.mean((0))
Out[90]:
array([[83.        , 11.33333333, 54.33333333, 62.66666667, 69.66666667,
        31.        , 38.66666667, 42.        , 73.66666667, 53.        ],
       [47.66666667, 48.66666667, 80.33333333, 56.66666667, 54.66666667,
        27.66666667, 45.        , 74.66666667, 56.        , 25.66666667],
       [59.33333333, 29.        , 56.        , 52.        , 47.66666667,
        49.33333333, 47.66666667, 58.66666667, 28.66666667, 22.        ],
       [37.33333333, 57.        , 72.33333333, 40.33333333, 65.66666667,
        36.33333333, 60.66666667, 78.        , 28.66666667, 82.66666667],
       [47.33333333, 20.66666667, 58.33333333, 37.66666667, 42.66666667,
        60.33333333, 35.33333333, 84.        , 46.        , 25.        ],
       [52.        , 53.66666667, 28.66666667, 47.        , 56.33333333,
        69.        , 37.        , 51.        , 20.33333333, 40.        ],
       [43.        , 31.66666667, 76.33333333, 42.66666667, 73.        ,
        43.33333333, 47.66666667, 33.66666667, 40.66666667, 40.33333333],
       [15.        , 55.66666667, 50.66666667, 77.33333333, 33.66666667,
        48.33333333, 49.66666667, 46.66666667, 58.66666667, 44.33333333],
       [52.66666667, 45.33333333, 65.66666667, 41.66666667, 43.66666667,
        65.        , 56.66666667, 42.        , 29.66666667, 37.        ],
       [66.66666667, 28.        , 35.66666667, 31.        , 56.33333333,
        74.        , 58.66666667, 63.        , 26.        , 43.        ]])

Working with data arrays in practice.¶

Here is a quick geospatial demonstration using a digital elevation model.

I'll use the rasterio library to transform a geoTiff file into an array, run a processing operation over the array, and write a new output geoTiff.

Note, you probably don't have rasterio installed, but you can always open your command line and type conda install rasterio.

import rasterio as rio
In [18]:
import rasterio as rio
import os
import matplotlib.pyplot as plt
In [16]:
data_dir = r'C:\Users\phwh9568\Workshops\Python_Data_Camp\data'
file = 'Flatirons_DEM_1m.tif'
In [17]:
dem = rio.open(os.path.join(data_dir, file))
In [24]:
plt.imshow(dem.read(1))
Out[24]:
<matplotlib.image.AxesImage at 0x1f27fa337d0>
No description has been provided for this image

Or:

from rasterio.plot import show
In [25]:
from rasterio.plot import show
show(dem)
In [26]:
show(dem)
No description has been provided for this image
Out[26]:
<Axes: >

Note matplot lib's colormaps: https://matplotlib.org/stable/tutorials/colors/colormaps.html

In [36]:
show(dem, cmap='gray')
No description has been provided for this image
Out[36]:
<Axes: >

Plotting matplotlib style:

fig, ax = plt.subplots(1,figsize=(12,12))
show(dem.read(1), transform=dem.transform, cmap='plasma', ax=ax)
In [37]:
fig, ax = plt.subplots(1,figsize=(12,12))
show(dem.read(1), transform=dem.transform, cmap='plasma', ax=ax)
Out[37]:
<Axes: >
No description has been provided for this image

When working with geospatial raster data, knowing the coordinate system, size, and transformation information are all crucial for writing outputs:

dem.profile
In [48]:
dem.profile
Out[48]:
{'driver': 'GTiff', 'dtype': 'float32', 'nodata': -3.4028234663852886e+38, 'width': 2702, 'height': 2522, 'count': 1, 'crs': CRS.from_epsg(6342), 'transform': Affine(0.9999394800148107, 0.0, 473821.016811,
       0.0, -0.999886617763522, 4427603.58783), 'blockysize': 1, 'tiled': False, 'interleave': 'band'}

Translate the DEM to a numpy array:

demArray = dem.read(1)
In [38]:
demArray = dem.read(1)
In [39]:
demArray
Out[39]:
array([[2035.8704, 2035.8304, 2035.7504, ..., 1698.5481, 1698.5231,
        1698.4481],
       [2035.899 , 2035.8453, 2035.7727, ..., 1698.5516, 1698.5345,
        1698.4625],
       [2035.9617, 2035.9552, 2035.8711, ..., 1698.5631, 1698.5507,
        1698.5112],
       ...,
       [2258.158 , 2258.8215, 2259.3848, ..., 1799.3021, 1799.3027,
        1799.2911],
       [2258.2954, 2259.0105, 2259.6692, ..., 1799.037 , 1799.0354,
        1799.0189],
       [2258.469 , 2259.1514, 2259.8518, ..., 1798.7863, 1798.7714,
        1798.76  ]], dtype=float32)
In [40]:
demArray.size
Out[40]:
6814444
In [41]:
demArray.shape
Out[41]:
(2522, 2702)

Now that the DEM is an array, there are endless ways we could manipulate it. Let's perform a common geospatial operation, a smoothing filter based on the mean of a cells based the cell and it's 8 neighboring cells' values.

There are several approaches to this including iterating through every cell (slow), takign the mean from slices of the array (fast), or using the SciPy library which already includes mean filter functions (easy).

SciPy is frequently used alongside NumPy to extend it's functionality. A lot of things do-able in NumPy are made easier with SciPy's more advanced suite of algorithms.

import scipy.ndimage
In [44]:
import scipy.ndimage
In [46]:
demMean = scipy.ndimage.uniform_filter(demArray, size=3)
In [47]:
demMean
Out[47]:
array([[2035.8651, 2035.8243, 2035.7487, ..., 1698.558 , 1698.5096,
        1698.4775],
       [2035.8993, 2035.8618, 2035.792 , ..., 1698.5618, 1698.5215,
        1698.4948],
       [2035.9618, 2035.9213, 2035.8488, ..., 1698.5713, 1698.5398,
        1698.5184],
       ...,
       [2258.3694, 2258.7927, 2259.4094, ..., 1799.2964, 1799.2906,
        1799.2833],
       [2258.5364, 2258.979 , 2259.6223, ..., 1799.0466, 1799.0338,
        1799.0277],
       [2258.6423, 2259.1023, 2259.766 , ..., 1798.8768, 1798.8585,
        1798.8507]], dtype=float32)

Now, write it out as a GeoTiff:

In [49]:
with rio.open('demMean.tif', 'w', **dem.profile) as demMean_out:
    demMean_out.write(demMean,1)