Ok, so in the last notebook we loaded a single file and looked at what's in there. But Iris can do much more. In this notebook we'll look at loading a subset of MOGREPS-G data.
Last time we retrieved Met Office data by downloading from public links. Instead, this time lets use boto3
, a Python module for interacting with Amazon Web Services, to download some files. Note that, to use Boto, you will need an AWS account, and you will need to include your credentials in your boto requests (more here).
We'll look at the model run started at midnight on June 01, 2016, taking the forecasts at 12:00, 15:00 and 18:00. For each forecast we'll grab the unpurturbed forecast (the 0th realisation) and three ensemble members - all in all, about 600mb of data.
First, lets generate the names of the relevant data objects.
def make_data_object_name(dataset_name, year, month, day, hour, realization, forecast_period):
template_string = "prods_op_{}_{:02d}{:02d}{:02d}_{:02d}_{:02d}_{:03d}.nc"
return template_string.format(dataset_name, year, month, day, hour, realization, forecast_period)
from itertools import product
realizations = [0, 1, 2, 3]
forecast_periods = [12, 15, 18]
key_names = [
make_data_object_name('mogreps-g', 2016, 6, 1, 0, r, f) for r, f in product(realizations, forecast_periods)
]
key_names
Now to download those files using boto3
. Note the environmental variables detailing your AWS credentials.
import os
import boto3
from botocore.client import Config
def boto_download(key_name, download_dir='./'):
session = boto3.session.Session(aws_access_key_id=os.environ['ACCESS_KEY'], # These need to be set at environmental
aws_secret_access_key=os.environ['SECRET_KEY']) # variables as per your AWS credentials
s3 = session.resource('s3', config=Config(signature_version='s3v4'))
bucket = s3.Bucket('mogreps-g')
fpath = '{}{}'.format(download_dir, key_name)
if os.path.isfile(fpath):
print('{} already exists, skipping...'.format(fpath))
else:
with open(key_name, 'wb') as outf:
print('Downloading {}...'.format(key_name))
bucket.download_fileobj(Key=key_name, Fileobj=outf)
return fpath
file_paths = [boto_download(key) for key in key_names]
And now to load them (including a callback to adjust ensemble realisation metadata - a common problem. See here for more on this). You will need Iris
which you can install with !conda install -y -c conda-forge iris
.
import iris
# I don't really know what this does but it stops a deprecation warning so :shrug:
iris.FUTURE.netcdf_promote = True
def format_cube(cube, field, filename):
"""
This dataset is a raw dump of our archived data.
As such we need to do some metadata parsing to make our lives easier.
In this case we're just going to add an extra 'realization' coordinate,
but we can put anything in here, as long as a cube is returned at the end.
"""
if not cube.coords('realization'):
realization = int(filename.split('_')[-2])
realization_coord = iris.coords.AuxCoord(
realization, standard_name='realization', var_name='realization')
cube.add_aux_coord(realization_coord)
return cube
# we can also pass constraints to the load function. In this case we'll load only one parameter
constraint = 'dew_point_temperature'
cube = iris.load_cube(file_paths, constraint=constraint, callback=format_cube)
print(cube)
With a little help from our callback function Iris has loaded these 12 input files, extracted the dew_point_temperature
parameter and merged the metadata to give us a unified view of the dataset.
Iris is lazy by default, so until we access the data the memory footprint is much smaller than the ~600Mb dataset. When we access a piece of data Iris will read from the individual source file. This is very powerful as it allows us to explore datasets that are much larger than will fit in memory. The downside is that if we delete the on disk files this cube will break.
cube.has_lazy_data()
The Holoviews and Geoviews libraries are powerful tools for creating interactive plots. They work directly with cubes, and will auto-create widgets for navigating these 3 / 4 dimensional datasets. I'm just going to cover the absolute basics here, but they're great tools for visualising all sorts of data.
!conda install -y -c ioam geoviews holoviews pandas
import cartopy # the Met Office's open source map projection module
import datetime
import geoviews as gv
import holoviews as hv
hv.Dimension.type_formatters[datetime.datetime] = "%m/%y %Hh"
hv.notebook_extension()
First we'll convert our cube into a GeoViews 'Dataset' object. All we have to do is tell GeoViews about the dimensions of our data.
dataset = gv.Dataset(
cube,
kdims=['realization', 'time', 'latitude', 'longitude'],
vdims=['dew_point_temperature'])
dataset
Once we have a Dataset
we can use the to
method to render it in various ways, from box plots to maps like this one.
%%opts Image [colorbar=True fig_size=400 projection=cartopy.crs.PlateCarree()] (cmap='viridis')
dataset.to(gv.Image, ['longitude', 'latitude']) * gv.feature.coastline