This week, I needed a daily average surface flux subset of MERRA data. Locally, we have the 1 hourly data on a mass store system, but that is primarily archive, and not best used for routine analysis. The source data files are 269Mb each, and I needed one for each day from July 1987 through Dec 2007, which would have been a 2TB request. So, I used the MDISC to create data files specifically for the comparison from the Data Subsetter.
With the subsetter, I selected the 4 variables needed for the experiment, the time range (Jul1987-Dec2007, or 7489 days), the region could have been trimmed, but was left at the default (global). Daily averages, not the 1 hourly averages were preferable, so the the daily mean box was checked. HDF was suitable, so it was left at the default (as opposed to NetCDF, more formats may be added later). The subsetter provided a text file with the http links to the reduced data request. The links activate a program that does the subsetting and streams the requested data back. This text file is used as input to a Linux call to wget, which does the work of opening the http.
It took only about 16 hours (mostly over night), but the result was only 23Gb of disc space. The daily mean check box also saved me the time to process those daily means. Lastly, I did use the Mirador search to access and download one of the unaltered source data files, just to verify the variables and the daily averaging, and there was no difference between the Subset processed averages and daily averages computed manually.
The subsetter has evolved a lot over this past year, and additional functionality is planned. However, in it's present form, it should be a very useful tool in accessing MERRA data.