1. Overview

Who Is This Tutorial For?

Many people that use OPeNDAP services have a workflow that relies on the data being held in local netcdf files. Many of these users come to an OPeNDAP service endpoint for a particular data granule, and then they ask the server to "rewrite" the entire granule as a netcdf-3 or netcdf-4 files. If you are one of these people then this tutorial is for you!

Tutorial Examples Language

The examples are written as bash scripts.

Why nccopy and ncdump?

The nccopy and ncdump applications are command line tools that can be used in shell scripting environment to:

  • Build a custom netcdf file locally from an existing local netcdf file.

  • Build a custom netcdf file locally from a remote DAP2 or DAP4 data service.

  • Fine tune the compression and chunking parameters in the resulting file.

When nccopy and ncdump access remote data via the DAP2 and DAP4 protocol the server is able to stream the response back. Data transmission begins almost immediately. In contrast, when a user asks the server to return a response encoded as a netcdf-3 or netcdf-4 file, the server cannot stream the response because of the way the netcdf-4 is structured: bytes at the beginning of the file are modified when the file is closed after creation. This means that there can be a lengthy delay before the server can begin transmitting the netcdf-4 result. The delay can be lengthy enough that the request will fail due to a timeout condition. When nccopy is used the delay is eliminated and the result is also a netcdf-4 file on local disk. ohsnap.

In this tutorial we will work with the retrieval of remote data from a DAP4 data service.

This tutorial assumes that the reader has a basic grasp of bash shell programming.

The NASA examples in this document will require that the user configure their client applications to authenticate with the NASA EDL OAuth2 service. Since the authentication setup is complex, yet similar for many clients, we have covered it in a separate document

Note
Both the nccopy and the ncdump applications discussed in this tutorial need to be configured to authenticate with EDL in order to work some of these tutorial examples.

1.1. nccopy details

The nccopy application is a linux command line program that is typically installed with the netcdf-c libraries which can typically be acquired from a package manager such as brew/yum/dnf/apt-get. The nccopy application allows the user to rewrite netcdf files on the local disk, and using the DAP2 and DAP4 protocols it can also rewrite remote datasets with a DAP access url and store the results on local disk. It provides options for output file format, customizing the internal data compression, chunk sizes, and selecting data and/or definitions so that not everything is brought from the source dataset.

In this tutorial we will focus on the remote data access aspects of nccopy. We leave the nuances of the compression and chunking controls to you, the capable user.

Here is nccopy’s usage output:

nccopy: nccopy [-k kind] [-[3|4|6|7]] [-d n] [-s] [-c chunkspec] [-u] [-w] [-[v|V] varlist] [-[g|G] grplist] [-m n] [-h n] [-e n] [-r] [-F filterspec] [-Ln] [-Mn] infile outfile
[-k kind] specify kind of netCDF format for output file, default same as input
kind strings: 'classic', '64-bit offset', 'cdf5',
'netCDF-4', 'netCDF-4 classic model'
[-3]      netCDF classic output (same as -k 'classic')
[-6]      64-bit-offset output (same as -k '64-bit offset')
[-4]      netCDF-4 output (same as -k 'netCDF-4')
[-7]      netCDF-4-classic output (same as -k 'netCDF-4 classic model')
[-5]      CDF5 output (same as -k 'cdf5)
[-d n]    set output deflation compression level, default same as input (0=none 9=max)
[-s]      add shuffle option to deflation compression
[-c chunkspec] specify chunking for variable and dimensions, e.g. "var:N1,N2,..." or "dim1/N1,dim2/N2,..."
[-u]      convert unlimited dimensions to fixed-size dimensions in output copy
[-w]      write whole output file from diskless netCDF on close
[-v var1,...] include data for only listed variables, but definitions for all variables
[-V var1,...] include definitions and data for only listed variables
[-g grp1,...] include data for only variables in listed groups, but all definitions
[-G grp1,...] include definitions and data only for variables in listed groups
[-m n]    set size in bytes of copy buffer, default is 5000000 bytes
[-h n]    set size in bytes of chunk_cache for chunked variables
[-e n]    set number of elements that chunk_cache can hold
[-r]      read whole input file into diskless file on open (classic or 64-bit offset or cdf5 formats only)
[-F filterspec] specify a compression algorithm to apply to an output variable (may be repeated).
[-Ln]     set log level to n (>= 0); ignored if logging isn't enabled.
[-Mn]     set minimum chunk size to n bytes (n >= 0)
infile    name of netCDF input file
outfile   name for netCDF output file

netCDF library version 4.9.0 of Oct  2 2022 23:17:14 $

1.1.1. Environment

You will need to have installed the netcdf-c library in order to complete this tutorial. Because the tutorial examples are primarily DAP4, the netcdf-c library version 4.9.0 or newer is recommended. This tutorial was developed using netcdf library version 4.9.0 of Feb 13 2023 10:14:14 $

With netcdf-c library installed on your system you should be able to run both ncdump and nccopy from any terminal window.

1.2. The Data

In these examples will use the following remote datasets:

Both of the example granules are DAP datasets: The precipitation dataset contains one or more Groups. Because of this we will need to tell the nccopy software to use the DAP4 protocol to access the data. We do this by changing the https:// at the beginning of the dataset URL to dap4://

This dataset Url:

http://test.opendap.org/opendap/data/nc/coads_climatology.nc

Becomes this DAP4 URL:

dap4://test.opendap.org/opendap/data/nc/coads_climatology.nc

2. Examples

Since nccopy is a linux command line tool, I have written the examples in the bash shell.

2.1. Full Granule Rewrite

This is the simplest usage of nccopy, in which we retrieve all the data from remote DAP4 serviced granules, one of which contains a Group hierarchy.

2.1.1. COADS Climatology (No Authentication)

#!/bin/bash
#
# The dataset URLs should dereference to DAP service endpoint. In the case of
# these servers the DAP2 and DAP4 endpoints are the same.
#
# When we invoke nccopy we use the "-4" option to tell nccopy to make a
# netcdf-4 file. This is important because netcdf-3 (classic) does not support
# Groups and we want to preserve them.
#

# This is the URL of the COADS Climatology data granule hosted at test.opendap.org,
# no authentication required for data access

test_d4_url="dap4://test.opendap.org/opendap/data/nc/coads_climatology.nc"

#
# Get the entire COADS data granule, using the DAP4 protocol, and save it
# to test_coads.nc4

nccopy -4 ${test_d4_url} test_coads.nc4

# fini

2.1.2. NGAP Precipitation Data (EDL Authentication)

#!/bin/bash
#
# The dataset URLs should dereference to DAP service endpoint. In the case of
# these servers the DAP2 and DAP4 endpoints are the same.
#
# When we invoke nccopy we use the "-4" option to tell nccopy to make a
# netcdf-4 file. This is important because netcdf-3 (classic) does not support
# Groups and we want to preserve them.

#
# This is the precipitation granule hosted at earthdata.nasa.gov,
# NASA EDL authentication mandatory.

ngap_d4_url="dap4://opendap.uat.earthdata.nasa.gov/collections/C1225808238-GES_DISC/granules/GPM_3IMERGHH.06%3A3B-HHR.MS.MRG.3IMERG.20200101-S000000-E002959.0000.V06B.HDF5"

#
# Get the entire precipitaion granule using the DAP4 protocol, and save it
# to ngap_precip.nc4

nccopy -4 ${ngap_d4_url} ngap_precip.nc4

# fini

2.2. Remote Inventory Inspection

How do we see an inventory of the variables in a remote dataset?

In preparation for performing an inventory sub-setting example we need to inspect the remote dataset to see what variables it may contain. The nccopy command comes bundled with a sibling command, ncdump. The ncdump command allows you to inspect the contents of the remote dataset, and to make a complete Common Data Language (CDL) version, including data values, of the remote dataset.

For sub-setting we are more interested in the inspection aspect of ncdump.

2.2.1. Using ncdump to view remote Dataset inventory

The usage is:

ncdump -h dap4_url

For the COADS dataset on test.opendap.org

#!/bin/bash
#

# This is the URL of the COADS Climatology data granule hosted at test.opendap.org, no
# authentication required for data access
test_d4_url="dap4://test.opendap.org/opendap/data/nc/coads_climatology.nc"

ncdump -h "${test_d4_url}"

Which returns:

netcdf coads_climatology {
dimensions:
	COADSX = 180 ;
	COADSY = 90 ;
	TIME = 12 ;
variables:
	double COADSX(COADSX) ;
		string COADSX:units = "degrees_east" ;
		string COADSX:modulo = " " ;
		string COADSX:point_spacing = "even" ;
	double COADSY(COADSY) ;
		string COADSY:units = "degrees_north" ;
		string COADSY:point_spacing = "even" ;
	double TIME(TIME) ;
		string TIME:units = "hour since 0000-01-01 00:00:00" ;
		string TIME:time_origin = "1-JAN-0000 00:00:00" ;
		string TIME:modulo = " " ;
	float SST(TIME, COADSY, COADSX) ;
		SST:missing_value = -1.e+34f ;
		SST:_FillValue = -1.e+34f ;
		string SST:long_name = "SEA SURFACE TEMPERATURE" ;
		string SST:history = "From coads_climatology" ;
		string SST:units = "Deg C" ;
		string SST:_edu.ucar.maps = "/TIME", "/COADSY", "/COADSX" ;
	float AIRT(TIME, COADSY, COADSX) ;
		AIRT:missing_value = -1.e+34f ;
		AIRT:_FillValue = -1.e+34f ;
		string AIRT:long_name = "AIR TEMPERATURE" ;
		string AIRT:history = "From coads_climatology" ;
		string AIRT:units = "DEG C" ;
		string AIRT:_edu.ucar.maps = "/TIME", "/COADSY", "/COADSX" ;
	float UWND(TIME, COADSY, COADSX) ;
		UWND:missing_value = -1.e+34f ;
		UWND:_FillValue = -1.e+34f ;
		string UWND:long_name = "ZONAL WIND" ;
		string UWND:history = "From coads_climatology" ;
		string UWND:units = "M/S" ;
		string UWND:_edu.ucar.maps = "/TIME", "/COADSY", "/COADSX" ;
	float VWND(TIME, COADSY, COADSX) ;
		VWND:missing_value = -1.e+34f ;
		VWND:_FillValue = -1.e+34f ;
		string VWND:long_name = "MERIDIONAL WIND" ;
		string VWND:history = "From coads_climatology" ;
		string VWND:units = "M/S" ;
		string VWND:_edu.ucar.maps = "/TIME", "/COADSY", "/COADSX" ;
}

For the precipitation dataset at earthdata.nasa.gov:

#!/bin/bash
#

# This is the precipitation granule hosted at at earthdata.nasa.gov,
# NASA EDL authentication mandatory.
ngap_d4_url="dap4://opendap.uat.earthdata.nasa.gov/collections/C1225808238-GES_DISC/granules/GPM_3IMERGHH.06%3A3B-HHR.MS.MRG.3IMERG.20200101-S000000-E002959.0000.V06B.HDF5"

ncdump -h "${ngap_d4_url}"

Which returns the follow CDL:

netcdf GPM_3IMERGHH.06%3A3B-HHR.MS.MRG.3IMERG.20200101-S000000-E002959.0000.V06B {

// global attributes:
		string :FileHeader = "DOI=10.5067/GPM/IMERG/3B-HH/06;\nDOIauthority=http://dx.doi.org/;\nDOIshortName=3IMERGHH;\nAlgorithmID=3IMERGHH;\nAlgorithmVersion=3IMERGH_6.3;\nFileName=3B-HHR.MS.MRG.3IMERG.20200101-S000000-E002959.0000.V06B.HDF5;\nSatelliteName=MULTI;\nInstrumentName=MERGED;\nGenerationDateTime=2020-05-04T06:20:10.000Z;\nStartGranuleDateTime=2020-01-01T00:00:00.000Z;\nStopGranuleDateTime=2020-01-01T00:29:59.999Z;\nGranuleNumber=;\nNumberOfSwaths=0;\nNumberOfGrids=1;\nGranuleStart=;\nTimeInterval=HALF_HOUR;\nProcessingSystem=PPS;\nProductVersion=V06B;\nEmptyGranule=NOT_EMPTY;\nMissingData=;\n" ;
		string :FileInfo = "DataFormatVersion=6a;\nTKCodeBuildVersion=0;\nMetadataVersion=6a;\nFormatPackage=HDF5-1.8.9;\nBlueprintFilename=GPM.V6.3IMERGHH.blueprint.xml;\nBlueprintVersion=BV_62;\nTKIOVersion=3.93;\nMetadataStyle=PVL;\nEndianType=LITTLE_ENDIAN;\n" ;

group: Grid {
  dimensions:
  	time = 1 ;
  	lon = 3600 ;
  	lat = 1800 ;
  	latv = 2 ;
  	lonv = 2 ;
  	nv = 2 ;
  variables:
  	float precipitationQualityIndex(time, lon, lat) ;
  		string precipitationQualityIndex:DimensionNames = "time,lon,lat" ;
  		string precipitationQualityIndex:coordinates = "time lon lat" ;
  		precipitationQualityIndex:_FillValue = -9999.904f ;
  		string precipitationQualityIndex:CodeMissingValue = "-9999.9" ;
  	short IRkalmanFilterWeight(time, lon, lat) ;
  		string IRkalmanFilterWeight:DimensionNames = "time,lon,lat" ;
  		string IRkalmanFilterWeight:coordinates = "time lon lat" ;
  		IRkalmanFilterWeight:_FillValue = -9999s ;
  		string IRkalmanFilterWeight:CodeMissingValue = "-9999" ;
  	short HQprecipSource(time, lon, lat) ;
  		string HQprecipSource:DimensionNames = "time,lon,lat" ;
  		string HQprecipSource:coordinates = "time lon lat" ;
  		HQprecipSource:_FillValue = -9999s ;
  		string HQprecipSource:CodeMissingValue = "-9999" ;
  	float lon(lon) ;
  		string lon:DimensionNames = "lon" ;
  		string lon:Units = "degrees_east" ;
  		string lon:units = "degrees_east" ;
  		string lon:standard_name = "longitude" ;
  		string lon:LongName = "Longitude at the center of\n\t\t\t0.10 degree grid intervals of longitude \n\t\t\tfrom -180 to 180." ;
  		string lon:bounds = "lon_bnds" ;
  		string lon:axis = "X" ;
  	float precipitationCal(time, lon, lat) ;
  		string precipitationCal:DimensionNames = "time,lon,lat" ;
  		string precipitationCal:Units = "mm/hr" ;
  		string precipitationCal:units = "mm/hr" ;
  		string precipitationCal:coordinates = "time lon lat" ;
  		precipitationCal:_FillValue = -9999.904f ;
  		string precipitationCal:CodeMissingValue = "-9999.9" ;
  	int time(time) ;
  		string time:DimensionNames = "time" ;
  		string time:Units = "seconds since 1970-01-01 00:00:00 UTC" ;
  		string time:units = "seconds since 1970-01-01 00:00:00 UTC" ;
  		string time:standard_name = "time" ;
  		string time:LongName = "Representative time of data in \n\t\t\tseconds since 1970-01-01 00:00:00 UTC." ;
  		string time:bounds = "time_bnds" ;
  		string time:axis = "T" ;
  		string time:calendar = "julian" ;
  	float lat_bnds(lat, latv) ;
  		string lat_bnds:DimensionNames = "lat,latv" ;
  		string lat_bnds:Units = "degrees_north" ;
  		string lat_bnds:units = "degrees_north" ;
  		string lat_bnds:coordinates = "lat latv" ;
  	float precipitationUncal(time, lon, lat) ;
  		string precipitationUncal:DimensionNames = "time,lon,lat" ;
  		string precipitationUncal:Units = "mm/hr" ;
  		string precipitationUncal:units = "mm/hr" ;
  		string precipitationUncal:coordinates = "time lon lat" ;
  		precipitationUncal:_FillValue = -9999.904f ;
  		string precipitationUncal:CodeMissingValue = "-9999.9" ;
  	float lat(lat) ;
  		string lat:DimensionNames = "lat" ;
  		string lat:Units = "degrees_north" ;
  		string lat:units = "degrees_north" ;
  		string lat:standard_name = "latitude" ;
  		string lat:LongName = "Latitude at the center of\n\t\t\t0.10 degree grid intervals of latitude\n\t\t\tfrom -90 to 90." ;
  		string lat:bounds = "lat_bnds" ;
  		string lat:axis = "Y" ;
  	float HQprecipitation(time, lon, lat) ;
  		string HQprecipitation:DimensionNames = "time,lon,lat" ;
  		string HQprecipitation:Units = "mm/hr" ;
  		string HQprecipitation:units = "mm/hr" ;
  		string HQprecipitation:coordinates = "time lon lat" ;
  		HQprecipitation:_FillValue = -9999.904f ;
  		string HQprecipitation:CodeMissingValue = "-9999.9" ;
  	short probabilityLiquidPrecipitation(time, lon, lat) ;
  		string probabilityLiquidPrecipitation:DimensionNames = "time,lon,lat" ;
  		string probabilityLiquidPrecipitation:Units = "percent" ;
  		string probabilityLiquidPrecipitation:units = "percent" ;
  		string probabilityLiquidPrecipitation:coordinates = "time lon lat" ;
  		probabilityLiquidPrecipitation:_FillValue = -9999s ;
  		string probabilityLiquidPrecipitation:CodeMissingValue = "-9999" ;
  	short HQobservationTime(time, lon, lat) ;
  		string HQobservationTime:DimensionNames = "time,lon,lat" ;
  		string HQobservationTime:Units = "minutes" ;
  		string HQobservationTime:units = "minutes" ;
  		string HQobservationTime:coordinates = "time lon lat" ;
  		HQobservationTime:_FillValue = -9999s ;
  		string HQobservationTime:CodeMissingValue = "-9999" ;
  	float randomError(time, lon, lat) ;
  		string randomError:DimensionNames = "time,lon,lat" ;
  		string randomError:Units = "mm/hr" ;
  		string randomError:units = "mm/hr" ;
  		string randomError:coordinates = "time lon lat" ;
  		randomError:_FillValue = -9999.904f ;
  		string randomError:CodeMissingValue = "-9999.9" ;
  	int time_bnds(time, nv) ;
  		string time_bnds:DimensionNames = "time,nv" ;
  		string time_bnds:Units = "seconds since 1970-01-01 00:00:00 UTC" ;
  		string time_bnds:units = "seconds since 1970-01-01 00:00:00 UTC" ;
  		string time_bnds:coordinates = "time nv" ;
  	float IRprecipitation(time, lon, lat) ;
  		string IRprecipitation:DimensionNames = "time,lon,lat" ;
  		string IRprecipitation:Units = "mm/hr" ;
  		string IRprecipitation:units = "mm/hr" ;
  		string IRprecipitation:coordinates = "time lon lat" ;
  		IRprecipitation:_FillValue = -9999.904f ;
  		string IRprecipitation:CodeMissingValue = "-9999.9" ;
  	float lon_bnds(lon, lonv) ;
  		string lon_bnds:DimensionNames = "lon,lonv" ;
  		string lon_bnds:Units = "degrees_east" ;
  		string lon_bnds:units = "degrees_east" ;
  		string lon_bnds:coordinates = "lon lonv" ;

  // group attributes:
  		string :GridHeader = "BinMethod=ARITHMETIC_MEAN;\nRegistration=CENTER;\nLatitudeResolution=0.1;\nLongitudeResolution=0.1;\nNorthBoundingCoordinate=90;\nSouthBoundingCoordinate=-90;\nEastBoundingCoordinate=180;\nWestBoundingCoordinate=-180;\nOrigin=SOUTHWEST;\n" ;
  } // group Grid
}

2.3. Inventory Sub-setting

There are two ways to perform inventory sub-setting with nccopy. The nccopy way, and the DAP way. The nccopy application has options that allow you to select one or more variables and/or Groups (and their children) so that the resulting local netcdf file created by nccopy contains only the desired data.

2.3.1. The nccopy way

Returning to our example datasets we’ll form an nccopy command in which we will utilize the -V option to request a subset of the variables held in each dataset to be saved to a local netcdf-4 file.

COADS Climatology Data (No Authentication)

In which we request the domain coordinates TIME, COADSX, and COADSY along with the range variable SST (sea surface temperature).

#!/bin/bash
#

# This is the URL of the COADS Climatology data granule hosted at test.opendap.org,
# authentication is not required for data access
test_d4_url="dap4://test.opendap.org/opendap/data/nc/coads_climatology.nc"

#
# !! DAP4 FQNs did not work for this for this, I had to use the unadorned names.
# FQNs do work on the other NGAP example (there are Groups)
# !! I think that's a bug in nccopy !!

request_vars="TIME,COADSX,COADSY,SST"

#
# We use the "-4" option to tell nccopy to make a netcdf-4 file.
# We use the "-V" option to specify what to get.

nccopy -4 -V "${request_vars}" ${test_d4_url} coads_subset_1.nc4

# fini
NGAP Precipitation Data (EDL Authentication)

In which we request the domain variables for time, latitude, and longitude, and the range variables precipitationCal and IRprecipitation. Because each of these variables is a member of the Group named "Grid", we must include the Group’s name in the Fully Qualified Name (FQN) of each item requested:

/Grid/time,/Grid/lat,/Grid/lon,/Grid/precipitationCal,/Grid/IRprecipitation
#!/bin/bash
#
# This is the precipitation granule hosted at at earthdata.nasa.gov,
# NASA EDL authentication mandatory.

ngap_d4_url="dap4://opendap.uat.earthdata.nasa.gov/collections/C1225808238-GES_DISC/granules/GPM_3IMERGHH.06%3A3B-HHR.MS.MRG.3IMERG.20200101-S000000-E002959.0000.V06B.HDF5"

#
# Because this is a DAP4 transaction the name of each variable in the list of
# requested variables submitted to nccopy must be expressed as a Fully Qualified
# Name (FQN). And because each variable in this example is a member of the Group
# named "Grid" each requested variables name is prefixed with "/Grid/" as below:

request_vars="/Grid/time,/Grid/lat,/Grid/lon,/Grid/precipitationCal,/Grid/IRprecipitation"

#
# We use the "-4" option to tell nccopy to make a netcdf-4 file. This is
# important because netcdf-3 does not support Groups
# We use the "-V" option to specify what to get.

nccopy -4 -V "${request_vars}" ${ngap_d4_url} ngap_precip_subset_1.nc4

# fini

2.3.2. The DAP4 Way

The DAP4 way means using a DAP4 constraint expression (d4_ce) to tell the server which things to get. The difference is subtle, and this example may seem redundant, but this technique can be used in other contexts.

/TIME;/COADSX;/COADSY;/SST
COADS Climatology Data (No Authentication)

In which we request the domain coordinates TIME, COADSX, and COADSY along with the range variable SST (sea surface temperature).

#!/bin/bash
#
# This is the URL of the COADS Climatology data granule hosted at test.opendap.org,
# authentication is not required for data access
d4_url="dap4://test.opendap.org/opendap/data/nc/coads_climatology.nc"

#
# The DAP4 constraint expression to use with the request (Note: the FQNs are
# separated by ";" and not "," like in the argument to nccopy's "-V" option.

d4_ce="dap4.ce=/TIME;/COADSX;/COADSY;/SST"

#
# We use the "-4" option to tell nccopy to make a netcdf-4 file.

nccopy -4 "${d4_url}?${d4_ce}" coads_subset_2.nc4

# fini
NGAP Precipitation Data (EDL Authentication)

In which we request the domain variables for time, latitude, and longitude, and the range variables precipitationCal and IRprecipitation. Because each of these variables is a member of the Group named "Grid", we must include the Group’s name in the Fully Qualified Name (FQN) of each item requested:

/Grid/time;/Grid/lat;/Grid/lon;/Grid/precipitationCal;/Grid/IRprecipitation
#!/bin/bash
#
# This is the precipitation granule hosted at at earthdata.nasa.gov,
# NASA EDL authentication mandatory.

d4_url="dap4://opendap.uat.earthdata.nasa.gov/collections/C1225808238-GES_DISC/granules/GPM_3IMERGHH.06%3A3B-HHR.MS.MRG.3IMERG.20200101-S000000-E002959.0000.V06B.HDF5"

#
# The DAP4 constraint expression to use with the request (Note: the FQNs are
# separated by ";" and not "," like in the argument to nccopy's "-V" option.

d4_ce="dap4.ce=/Grid/time;/Grid/lat;/Grid/lon;/Grid/precipitationCal;/Grid/IRprecipitation"

#
# And we apply the dap4 constraint to the URL we submit to nccopy and the
# subsetting just happens :)

nccopy -4 "${d4_url}?${d4_ce}" ngap_precip_subset_2.nc4

# fini