Comparing MOM6 data to hydrographic section

In this notebook, we show how to compare model’s results with observations using xESMF.

[1]:
import xarray as xr
import xesmf
import zipfile
import pandas
[2]:
import warnings
warnings.filterwarnings("ignore")
[3]:
%matplotlib inline

First we load the sample model data:

[4]:
dataurl = 'http://35.188.34.63:8080/thredds/dodsC/OM4p5/'

ds = xr.open_dataset(f'{dataurl}/ocean_monthly_z.200301-200712.nc4',
                     chunks={'time':1, 'z_l': 1}, engine='pydap')

Then we download data from Calcofi:

[5]:
!curl -O https://cappuccino.ucsd.edu/downloads/2012/20-1202NH_CTDFinalQC.zip
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 36.9M  100 36.9M    0     0  3933k      0  0:00:09  0:00:09 --:--:-- 5815k  0:00:03  0:00:19 1672k07  0:00:05 4261k

We need to play a bit with the zip file to read the data into a pandas dataframe:

[6]:
with zipfile.ZipFile('20-1202NH_CTDFinalQC.zip', 'r') as archive:
    data = archive.extract('db_csvs/20-1202NH_CTDBTL_001-075D.csv')
    df = pandas.read_csv('db_csvs/20-1202NH_CTDBTL_001-075D.csv')
[7]:
df
[7]:
Project Study Ord_Occ Event_Num Cast_ID Date_Time_UTC Date_Time_PST Lat_Dec Lon_Dec Sta_ID ... SaltB OxB OxBuM Chl-a Phaeo NO3 NO2 NH4 PO4 SIL
0 CalCOFI 1202NH 1 7.0 1202_001d 27-Jan-2012 23:57:08 27-Jan-2012 15:57:08 32.95690 -117.30751 093.3 026.7 ... 33.3669 6.546 285.89 1.826 0.239 0.0 0.00 0.00 0.24 0.5
1 CalCOFI 1202NH 1 7.0 1202_001d 27-Jan-2012 23:57:43 27-Jan-2012 15:57:43 32.95685 -117.30740 093.3 026.7 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 CalCOFI 1202NH 1 7.0 1202_001d 27-Jan-2012 23:57:44 27-Jan-2012 15:57:44 32.95684 -117.30740 093.3 026.7 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 CalCOFI 1202NH 1 7.0 1202_001d 27-Jan-2012 23:57:45 27-Jan-2012 15:57:45 32.95684 -117.30740 093.3 026.7 ... 33.3678 6.589 287.79 1.581 0.300 0.0 0.00 0.00 0.24 0.5
4 CalCOFI 1202NH 1 7.0 1202_001d 27-Jan-2012 23:57:46 27-Jan-2012 15:57:46 32.95684 -117.30738 093.3 026.7 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
33408 CalCOFI 1202NH 75 882.0 1202_075d 12-Feb-2012 03:57:53 11-Feb-2012 19:57:53 35.08678 -120.77914 076.7 049.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
33409 CalCOFI 1202NH 75 882.0 1202_075d 12-Feb-2012 03:58:00 11-Feb-2012 19:58:00 35.08677 -120.77916 076.7 049.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
33410 CalCOFI 1202NH 75 882.0 1202_075d 12-Feb-2012 03:58:07 11-Feb-2012 19:58:07 35.08676 -120.77920 076.7 049.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
33411 CalCOFI 1202NH 75 882.0 1202_075d 12-Feb-2012 03:58:23 11-Feb-2012 19:58:23 35.08674 -120.77926 076.7 049.0 ... 33.6131 3.468 151.27 0.405 1.092 18.2 0.28 0.52 1.64 21.7
33412 CalCOFI 1202NH 75 882.0 1202_075d 12-Feb-2012 03:58:28 11-Feb-2012 19:58:28 35.08675 -120.77926 076.7 049.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

33413 rows × 82 columns

Let’s define a function to plot the longitude/latitude of the stations:

[8]:
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
from cartopy.mpl.gridliner import LONGITUDE_FORMATTER, LATITUDE_FORMATTER
import cartopy.feature as cfeature
import cartopy.crs as ccrs

def plot_stations(lon, lat):
    fig = plt.figure(figsize=[16,10])
    ax = plt.axes(projection=ccrs.PlateCarree())
    ax.add_feature(cfeature.COASTLINE)
    ax.add_feature(cfeature.LAND)
    ax.set_extent([-126, -115, 29, 36])

    # parallels/meridiens
    gl = ax.gridlines(crs=ccrs.PlateCarree(), draw_labels=True,
                      linewidth=2, color='gray', alpha=0.5, linestyle='--')
    gl.xlabels_top = False
    gl.ylabels_right = False
    gl.xlabel_style = {'size': 18, 'color': 'black'}
    gl.ylabel_style = {'size': 18, 'color': 'black'}

    plt.plot(lon, lat, 'ko')
    return fig
[9]:
fig = plot_stations(df['Lon_Dec'], df['Lat_Dec'])
../_images/notebooks_Comparing_to_section_data_14_0.png

Calcofi is an array with multiple Lines, we can extract lon/lat on a single line using the surface values:

[10]:
df_linesurf = df[['Lon_Dec', 'Lat_Dec']].where(df['Line'] == 93.3).where(df['Depth'] <= 2.0).drop_duplicates().dropna()
[11]:
df_linesurf
[11]:
Lon_Dec Lat_Dec
0 -117.30751 32.95690
59 -117.28240 32.95315
60 -117.28262 32.95292
89 -117.39411 32.91489
608 -117.53097 32.84862
1123 -117.86989 32.68423
1639 -118.21614 32.51461
2156 -118.55562 32.34692
2672 -118.89647 32.18020
3190 -119.23289 32.01236
3705 -119.57253 31.84655
4221 -120.24702 31.51537
4736 -120.91814 31.17939
4737 -120.91836 31.17935
5253 -121.58557 30.84750
5770 -122.25949 30.51269
5771 -122.25971 30.51267
6286 -122.91835 30.18028
6801 -123.58863 29.84797
[12]:
fig = plot_stations(df_linesurf['Lon_Dec'], df_linesurf['Lat_Dec'])
../_images/notebooks_Comparing_to_section_data_18_0.png

We can now use these locations to create a xesmf Regrid object and interpolate the model’s data onto these stations. This uses the LocStream capabilities of ESMF. More details can be found in the xESMF documentation on LocStream. The first part is to create a xarray dataset with the stations:

[13]:
ds_locs = xr.Dataset()
ds_locs['lon'] = xr.DataArray(data=df_linesurf['Lon_Dec'], dims=('stations'))
ds_locs['lat'] = xr.DataArray(data=df_linesurf['Lat_Dec'], dims=('stations'))
[14]:
ds_locs
[14]:
<xarray.Dataset>
Dimensions:   (stations: 19)
Coordinates:
  * stations  (stations) int64 0 59 60 89 608 1123 ... 5253 5770 5771 6286 6801
Data variables:
    lon       (stations) float64 -117.3 -117.3 -117.3 ... -122.3 -122.9 -123.6
    lat       (stations) float64 32.96 32.95 32.95 32.91 ... 30.51 30.18 29.85

We can now create the regridding weights and apply to a variable (for example salinity):

[15]:
regridder = xesmf.Regridder(ds.rename({'geolon': 'lon', 'geolat': 'lat'}), ds_locs, 'bilinear', locstream_out=True)
Create weight file: bilinear_576x720_1x19.nc
[16]:
salt_calcofi_line = regridder(ds['so'])

Let’s use the first time record to make a test plot and compare to observations:

[17]:
salt_0 = salt_calcofi_line.sel(time='2003-01-16')
_ = salt_0.load()

We extract and plot the data on the Calcofi Line:

[18]:
df_line = df.where(df['Line'] == 93.3)

plt.figure(figsize=[9, 6])
plt.scatter(df_line['Lon_Dec'], - df_line['Depth'], c=df_line['Salt1'],
            vmin=33, vmax=34.2, cmap='nipy_spectral')
plt.colorbar()
_ = plt.axis([-125, -117, -600, 0])
_ = plt.title('Observations')
../_images/notebooks_Comparing_to_section_data_28_0.png

And now the data interpolated from the model:

[19]:
plt.figure(figsize=[9, 6])
plt.contourf(salt_0['lon'], - salt_0['z_l'], salt_0.values.squeeze(), 30,
            vmin=33, vmax=34.2, cmap='nipy_spectral')
plt.colorbar()
_ = plt.axis([-125, -117, -600, 0])
_ = plt.title('Model')
../_images/notebooks_Comparing_to_section_data_30_0.png

and that’s it!