Data Download Scripts

Automation
Code Samples
- wget
- Linux Shell Script
- Perl
- Python

For information on how to work with download tokens, see How to Access LAADS Data

Automation

If all you need is one file and you know which file it is, it is much easier to go to the LAADS archive and click to download as needed. If you need many files (e.g. all of last month's MOD09 data) you might prefer to rely on scripts. We have samples for Shell Script, Perl, and Python.

Downloading an entire mission (e.g. VIIRS or MODIS) to a personal hard drive is not recommended. It may end up being anywhere from hundreds of terabytes to a petabyte!

Do your best to avoid mistakes in your script. Test it with one file download before trying many. If your script bombards us with many requests for files that don't exist we may block future requests (and possibly those from other users who use the same IP address).

Code Samples

Most current programming languages support HTTPS communication or can call on applications that support HTTPS communication. See sample scripts below. We provide support for wget, linux shell script, Perl, and Python. When recursively downloading entire directories of files, wget will likely require the least amount of code to run.

To use these, click "Download source" to download or copy and paste the code into a file with an extension reflecting the programming language (.sh for Shell Script, .pl for Perl, .py for Python). Be sure the Unix execute permissions are set for the file. Lastly, open a terminal or shell and execute the file. Command-line examples are also included below.

wget

wget is an open-source utility that can download entire directories of files with just one command. The only path that is required is the root directory. wget will automatically traverse the directory and download any files it locates.

wget is free and available for Linux, macOS, and Windows.

Installation

Linux
1. Launch a command-line terminal
2. Type yum install wget -y
macOS
1. Install Homebrew (admin privileges required)
2. Launch Applications > Utilities > Terminal
3. Type brew install wget
Windows
1. Download the latest 32-bit or 64-bit binary (.exe) for your system.
2. Move it to C:\Windows\System32 (admin privileges will be required).
3. Click the Windows Menu > Run.
4. Type cmd and hit Enter.
5. In the Windows Command Prompt, type wget -h to verify the binary is being executed successfully.
6. If any errors appear, the wget.exe binary may be not be located in the correct directory or you may need to switch from 32-bit <-> 64-bit.

Command-Line/Terminal Usage:

wget -e robots=off -m -np -R .html,.tmp -nH --cut-dirs=3 "https://ladsweb.modaps.eosdis.nasa.gov/archive/allData/PATH_TO_DATA_DIRECTORY" --header "Authorization: Bearer MY_TOKEN" -P TARGET_DIRECTORY_ON_YOUR_FILE_SYSTEM

Be sure to replace the following:

PATH_TO_DATA_DIRECTORY: location of source directory in LAADS Archive
TOKEN: Your token
TARGET_DIRECTORY_ON_YOUR_FILE_SYSTEM: Where you would like to download the files. Examples include /Users/jdoe/data for macOS and Linux or C:\Users\jdoe\data for Windows

Linux Shell Script

Download source (remove .txt extension when downloaded)

Command-Line/Terminal Usage:

% laads-data-download.sh

#!/bin/bash

function usage {
  echo "Usage:"
  echo "  $0 [options]"
  echo ""
  echo "Description:"
  echo "  This script will recursively download all files if they don't exist"
  echo "  from a LAADS URL and stores them to the specified path"
  echo ""
  echo "Options:"
  echo "    -s|--source [URL]         Recursively download files at [URL]"
  echo "    -d|--destination [path]   Store directory structure to [path]"
  echo "    -t|--token [token]        Use app token [token] to authenticate"
  echo ""
  echo "Dependencies:"
  echo "  Requires 'jq' which is available as a standalone executable from"
  echo "  https://stedolan.github.io/jq/download/"
}

function recurse {
  local src=$1
  local dest=$2
  local token=$3
  
  echo "Querying ${src}.json"

  for dir in $(curl -L -b session -s -g -H "Authorization: Bearer ${token}"  -C - ${src}.json | jq '.content | .[] | select(.size==0) | .name' | tr -d '"')
  do
    echo "Creating ${dest}/${dir}"
    mkdir -p "${dest}/${dir}"
    echo "Recursing ${src}/${dir}/ for ${dest}/${dir}"
    recurse "${src}/${dir}/" "${dest}/${dir}" "${token}"
  done

  for file in $(curl -L -b session -s -g -H "Authorization: Bearer ${token}"  -C - ${src}.json | jq '.content | .[] | select(.size!=0) | .name' | tr -d '"')
  do
    if [ ! -f ${dest}/${file} ] 
    then
      echo "Downloading $file to ${dest}"
      # replace '-s' with '-#' below for download progress bars
      curl -L -b session -s -g -H "Authorization: Bearer ${token}" -C - ${src}/${file} -o ${dest}/${file}
    else
      echo "Skipping $file ..."
    fi
  done
}

POSITIONAL=()
while [[ $# -gt 0 ]]
do
  key="$1"

  case $key in
    -s|--source)
    src="$2"
    shift # past argument
    shift # past value
    ;;
    -d|--destination)
    dest="$2"
    shift # past argument
    shift # past value
    ;;
    -t|--token)
    token="$2"
    shift # past argument
    shift # past value
    ;;
    *)    # unknown option
    POSITIONAL+=("$1") # save it in an array for later
    shift # past argument
    ;;
  esac
done

if [ -z ${src+x} ]
then 
  echo "Source is not specified"
  usage
  exit 1
fi

if [ -z ${dest+x} ]
then 
  echo "Destination is not specified"
  usage
  exit 1
fi

if [ -z ${token+x} ]
then 
  echo "Token is not specified"
  usage
  exit 1
fi

# need to make sure destination directory exists
if [ ! -d "${dest}" ] ; then
    echo "Creating ${dest}"
    mkdir -p "${dest}"
fi

# and download...
recurse "$src" "$dest" "$token"

Perl

Download source (remove .txt extension when downloaded)

Command-Line/Terminal Usage:

% perl laads-data-download.pl

#!/usr/bin/env perl
use strict;
use warnings;
use Getopt::Long qw( :config posix_default bundling no_ignore_case no_auto_abbrev);
use JSON;
use File::Path qw(make_path);

my $source      = undef;
my $destination = undef;
my $token       = undef;

GetOptions( 's|source=s' => \$source, 'd|destination=s' => \$destination, 't|token=s' => \$token) or die usage();

sub usage {
  print "Usage:\n";
  print "  $0 [options]\n\n";
  print "Description:\n";
  print "  This script will recursively download all files if they don't exist\n";
  print "  from a LAADS URL and stores them to the specified path\n\n";
  print "Options:\n";
  print "  -s|--source [URL]         Recursively download files at [URL]\n";
  print "  -d|--destination [path]   Store directory structure to [path]\n";
  print "  -t|--token [token]        Use app token [token] to authenticate\n";
}

sub recurse {
  my $src   = $_[0];
  my $dest  = $_[1];
  my $token = $_[2];
  print "Recursing $src\n";

  my $cmd = "curl -L -b session -s -g --header 'Authorization: Bearer $token' -C - '$src.json'";
  my $message = qx{$cmd};
  my $listing = decode_json($message);
  for my $entry (@{$listing->{content}}){
    if($entry->{size} == 0){
      make_path("$dest/$entry->{name}");
      recurse("$src/$entry->{name}", "$dest/$entry->{name}", $token);
    }
  }

  for my $entry (@{$listing->{content}}){
    if($entry->{size} != 0 and ! -e "$dest/$entry->{name}"){
      print "Downloading $src/$entry->{name} to $dest/$entry->{name}\n";
      my $cmd = "curl -L -b session -s -g --header 'Authorization: Bearer $token' -C - '$src/$entry->{name}' > $dest/$entry->{name}";
      my $result = system($cmd);
      die "FAIL : $cmd" unless $result == 0;
    } else {
      print "Skipping $entry->{name} ...\n";
    }
  }
}


if(!defined($source)){
  print "Source not set\n";
  usage();
  die;
}

if(!defined($destination)){
  print "Destination not set\n";
  usage();
  die;
}

if(!defined($token)){
  print "Token not set\n";
  usage();
  die
}

# make sure the destination exists
make_path($destination) unless -d $destination;

# and download...
recurse($source, $destination, $token);

0;

Interactive Perl simulation of gftp using http:

% perl gftp

#!/usr/bin/perl
# This script simulates the interactive behavior of the ftp tool that is
# available on linux machines, but uses HTTP instead of FTP to communicate
# with the server. Since it uses only core perl modules, it should run
# anywhere that perl is available.
#
# NOTE: this script does use the "curl" command for downloading
#       resources from the server. You must have curl installed on
#       your system.
#
#       curl is available from https://curl.haxx.se/
use strict;
use warnings;

use Cwd;
use File::Basename;
use File::Path qw(make_path);
use JSON::PP;
use Term::ReadLine;
use Term::ANSIColor qw(:constants :pushpop);

my $http_url = "https://ladsweb.modaps.eosdis.nasa.gov/archive/allData";

my $pwd = '/';

# check earthdata token
my $TOKEN = undef;
if(0 != loadToken()){
  saveToken();
}

help();

my $TERM = Term::ReadLine->new('GFTP');

while(1){
#  print "$pwd>";
#  my $input = <>;
  $TERM->ornaments(0);
  my $input = $TERM->readline("$pwd> ");
  chomp $input;
  my @parms = split(/\s+/, $input);
  my $cmd = lc(shift @parms) if scalar @parms;
  next unless $cmd;

  if("$cmd" eq "q" or "$cmd" eq "exit" or "$cmd" eq "bye"){
    last;
  }

  if("$cmd" eq "ls"){
    my $dir = $parms[0];
    $dir = '.' unless $dir;

    my $pattern = '';
    if($dir =~ /\*$/){
      my $dir2 = dirname($dir);
      $pattern = basename($dir);

      $pattern =~ s/\*//g;
      $dir = $dir2;
    }

    my $src = normalize_path($pwd, $dir);
    my ($files, $dirs) = check($src, [$pattern]);
    if (! defined $files && ! defined $dirs) {
      print "no such directory: $src\n";
      next;
    }
    else {
      foreach (@$dirs){
        print BLUE, "$_\n", RESET;
      }

      foreach (@$files){
        print "$_\n";
      }
    }
  } # ls

  elsif("$cmd" eq "cd"){
    $pwd = normalize_path($pwd, $parms[0]);
  } # cd

  elsif("$cmd" eq "lcd"){
    my $dir = @parms[0];
    if (! -r $dir ){
      print RED, "Error: local dir [$dir] not exists.\n", RESET;
    }
    else{
      chdir $dir;
    }
  } # lcd

  elsif("$cmd" eq "pwd"){
    print "[$pwd]\n";
  } #pwd

  elsif("$cmd" eq "lpwd"){
    print "Now in local dir: [", cwd(), "]\n";
  } #lpwd

  elsif("$cmd" eq "get"){
    print "[get] : no file specified." if scalar @parms < 1;
    foreach my $item (@parms){
      my $src = $pwd;

      if($item =~ m|^/|){
        my @parts = split(m|/+|, normalize_path($pwd, $item));
        $item = pop @parts;
        $src = join('/', @parts);
        $src = '/' unless $src;
      }
      http_get($src, [$item]);
    }
  } # get
  
  elsif("$cmd" eq "mget"){
    @parms = ('.*') unless @parms and scalar @parms > 0;
    http_get($pwd, [@parms], 1);
  } # mget

  elsif("$cmd" eq "token"){
    saveToken();
  } # token
  
  elsif("$cmd" eq "?"){
    help();
  } #?

  print "\n";
}

exit;

# remove .. and . directory pieces from the path so that it is in normalized form.
sub normalize_path
{
  my ($current_working_dir, $path) = @_;
  $path = '.' unless $path;
  my $nocheck = 0;
  
  #remove trailing '/'
  $current_working_dir =~ s|/+$||;
  $path =~ s|/+$||;
  
  my $_pwd = $current_working_dir;
  if($path =~ m|^/|){
    $_pwd = $path;
  }
  elsif($path eq '.'){
    $nocheck=1;
  }
  else{
    $_pwd = join('/', $_pwd, $path);
  }

  # handle requests for parent directories
  if($_pwd =~ /\.\./){
    while ((my $pos = index($_pwd, '..')) >= 0) {
      my $start = rindex($_pwd, '/', $pos-2);
      $start = 0 unless $start >= 0;
      my $new_dir = substr($_pwd, 0, $start);
      $pos += 2;
      my $end_str = substr($_pwd, $pos);
      $new_dir = join('', $new_dir, $end_str) if $end_str;
      $_pwd = $new_dir;
    }
  }
  # handle requests for current directories
  $_pwd =~ s|^\./||;
  while ($_pwd =~ m|/\./|) {
    $_pwd =~ s|/\./|/|g;
  }
  $_pwd =~ s|/\.$||;
  $_pwd = '/' unless $_pwd;

  if($nocheck || defined check($_pwd)) {
    return $_pwd;
  }
  
  print RED, "Error: [$_pwd] not exists.\n", RESET;
  return $current_working_dir;
}

# get the contents of a directory
sub check
{
  my ($from, $patterns) = @_;
  die "no from" unless $from;
  $patterns = [''] unless $patterns && scalar @$patterns;
  
  my $json_str=`curl -L -b session -s -g -H "Authorization: Bearer $TOKEN" -C - "${http_url}/${from}.json"`;
  if ($json_str =~ /</)
  {
    return undef;  # got html, probably an error page
  }
  my $json = decode_json($json_str);
    
  my $files = [];
  my $dirs = [];
  foreach my $row (@{$json->{content}}) {
    if ($row->{size} == 0) {
      foreach my $regex (@$patterns) {
        chomp $regex;
        $regex = '.*' unless $regex;
        push @$dirs, $row->{name} if $row->{name} =~ /$regex/;
      }
    }
    else {
      foreach my $regex (@$patterns) {
        chomp $regex;
        $regex = '.*' unless $regex;
        push @$files, $row->{name} if $row->{name} =~ /$regex/;
      }
    }
  }

  return ($files, $dirs);
}

# get the specified file(s) from the specified directory
sub http_get
{
  my ($from, $patterns, $recursive) = @_;
  die "no from" unless $from;
  my ($files, $dirs) = check($from, $patterns);
  foreach my $file (@$files){
    print("fetching $from/$file\n");
    my $out_location = "$from";
    $out_location =~ s|^/||;
    if (-d $out_location) {
      $out_location = "$out_location/$file";
    }
    else {
      $out_location = $file;
    }
    my $cmd = join(' ',
      'curl',
      qq{-o "$out_location"},
      '-s',
      '-L',
      '-b session',
      '-g', 
      qq{-H "Authorization: Bearer $TOKEN"},
      qq{-C - "$http_url/$from/$file"},
    );
    my $result = system($cmd);
    if ($result != 0) {
      print RED, "FAIL: $cmd\n", RESET;
    }
  }
  if ($recursive) {
    foreach my $dir (@$dirs){
      # this is recursive and can get a LOT of files, so ask user and make sure
      # it's what they want.
      print GREEN, "    $dir is a directory. Download all matching files from it?[ynq]> ", RESET;
      my $input = $TERM->readline();
      chomp $input;
      if ($input =~ /^[yY]/) {
        my $path = "$from/$dir";
        $path =~ s|^/||;
        make_path($path) unless -d $path;
        http_get("/$path", $patterns, $recursive);
      }
      last if $input =~ /^[qQ]/;
    }
  }
}

# load the URS authentication token from special file if there is one
sub loadToken{
  my $home = glob('~/');
  my $tokenFile = "$home/.earthdatatoken";

  if(-r $tokenFile){
    open (IN, '<', $tokenFile)||die "Can't open $tokenFile: $!\n";
    $TOKEN = <IN>;
    chomp $TOKEN;
    close(IN);
    print "Token loaded: [$TOKEN].\n";
    return 0;
  }
  else{
    return 9;
  }
}

# prompt user for token, and save it in special file
sub saveToken{
  my $home = glob('~/');
  my $tokenFile = "$home/.earthdatatoken";

  print "Input token:\n";
  $TOKEN = <>;
  chomp($TOKEN);
  open (OUT, '>', $tokenFile)||die "Can't open $tokenFile: $!\n";
  print OUT "$TOKEN\n";
  close(OUT);
  print "Token saved.\n";
}

# print out command menu
sub help{
  print "Supported cmd: [ls] [cd] [lcd] [pwd] [lpwd] [get] [mget] [token] [q]\n";
  print "[ls]: list dirs and files in remote dir\n";
  print "[cd]: go to remote dir\n";
  print "[lcd]: go to local dir\n";
  print "[pwd]: print the current remote dir\n";
  print "[lpwd]: print the current local dir\n";
  print "[get]: download one or more specified files to current local dir\n";
  print "[mget]: download files that match a pattern to current local dir.\n";
  print "        Don't use *; e.g.: mget h12v04; mget hdf\n";
  print "        Will also recursively download files from matching subdirectories.\n";
  print "[token]: change token\n";
  print "[q]: quit\n";
  print "[?]: show help message\n";
}

0;

Python

Download source (remove .txt extension when downloaded)

Command-Line/Terminal Usage:

% python laads-data-download.py

#!/usr/bin/env python

# script supports either python2 or python3. You might need to change the above
# line to "python3" depending on your installation.
#
# Attempts to do HTTP Gets with urllib2(py2) urllib.requets(py3) or subprocess
# if tlsv1.1+ isn't supported by the python ssl module
#
# Will download csv or json depending on which python module is available
#

from __future__ import (division, print_function, absolute_import, unicode_literals)

import argparse
import os
import os.path
import shutil
import sys
import time

try:
    from StringIO import StringIO   # python2
except ImportError:
    from io import StringIO         # python3


################################################################################

# you will need to replace the following line with the location of a
# python web client library that can make HTTPS requests to an IP address.
USERAGENT = 'tis/download.py_1.0--' + sys.version.replace('\n','').replace('\r','')

# this is the choice of last resort, when other attempts have failed
def getcURL(url, headers=None, out=None):
    # OS X Python 2 and 3 don't support tlsv1.1+ therefore... cURL
    import subprocess
    try:
        print('trying cURL', file=sys.stderr)
        args = ['curl', '--fail', '-sS', '-L', '-b session', '--get', url]
        for (k,v) in headers.items():
            args.extend(['-H', ': '.join([k, v])])
        if out is None:
            # python3's subprocess.check_output returns stdout as a byte string
            result = subprocess.check_output(args)
            return result.decode('utf-8') if isinstance(result, bytes) else result
        else:
            subprocess.call(args, stdout=out)
    except subprocess.CalledProcessError as e:
        print('curl GET error message: %' + (e.message if hasattr(e, 'message') else e.output), file=sys.stderr)
    return None
    
# read the specified URL and output to a file
def geturl(url, token=None, out=None):
    headers = { 'user-agent' : USERAGENT }
    if not token is None:
        headers['Authorization'] = 'Bearer ' + token
    try:
        import ssl
        CTX = ssl.SSLContext(ssl.PROTOCOL_TLSv1_2)
        if sys.version_info.major == 2:
            import urllib2
            try:
                fh = urllib2.urlopen(urllib2.Request(url, headers=headers), context=CTX)
                if out is None:
                    return fh.read()
                else:
                    shutil.copyfileobj(fh, out)
            except urllib2.HTTPError as e:
                print('TLSv1_2 sys 2 : HTTP GET error code: %d' % e.code, file=sys.stderr)
                return getcURL(url, headers, out)
            except urllib2.URLError as e:
                print('TLSv1_2 sys 2 : Failed to make request: %s, RETRYING' % e.reason, file=sys.stderr)
                return getcURL(url, headers, out)
            return None

        else:
            from urllib.request import urlopen, Request, URLError, HTTPError
            try:
                fh = urlopen(Request(url, headers=headers), context=CTX)
                if out is None:
                    return fh.read().decode('utf-8')
                else:
                    shutil.copyfileobj(fh, out)
            except HTTPError as e:
                print('TLSv1_2 : HTTP GET error code: %d' % e.code, file=sys.stderr)
                return getcURL(url, headers, out)
            except URLError as e:
                print('TLSv1_2 : Failed to make request: %s' % e.reason, file=sys.stderr)
                return getcURL(url, headers, out)
            return None

    except AttributeError:
      return getcURL(url, headers, out)


################################################################################


DESC = "This script will recursively download all files if they don't exist from a LAADS URL and will store them to the specified path"


def sync(src, dest, tok):
    '''synchronize src url with dest directory'''
    try:
        import csv
        files = {}
        files['content'] = [ f for f in csv.DictReader(StringIO(geturl('%s.csv' % src, tok)), skipinitialspace=True) ]
    except ImportError:
        import json
        files = json.loads(geturl(src + '.json', tok))
    
    # use os.path since python 2/3 both support it while pathlib is 3.4+
    for f in files['content']:
        # currently we use filesize of 0 to indicate directory
        filesize = int(f['size'])
        path = os.path.join(dest, f['name'])
        url = src + '/' + f['name']
        if filesize == 0:                 # size FROM RESPONSE
            try:
                print('creating dir:', path)
                os.mkdir(path)
                sync(src + '/' + f['name'], path, tok)
            except IOError as e:
                print("mkdir `%s': %s" % (e.filename, e.strerror), file=sys.stderr)
                sys.exit(-1)
        else:
            try:
                if not os.path.exists(path) or os.path.getsize(path) == 0:    # filesize FROM OS
                    print('\ndownloading: ' , path)
                    with open(path, 'w+b') as fh:
                        geturl(url, tok, fh)
                else:
                    print('skipping: ', path)
            except IOError as e:
                print("open `%s': %s" % (e.filename, e.strerror), file=sys.stderr)
                sys.exit(-1)
    return 0


def _main(argv):
    parser = argparse.ArgumentParser(prog=argv[0], description=DESC)
    parser.add_argument('-s', '--source', dest='source', metavar='URL', help='Recursively download files at URL', required=True)
    parser.add_argument('-d', '--destination', dest='destination', metavar='DIR', help='Store directory structure in DIR', required=True)
    parser.add_argument('-t', '--token', dest='token', metavar='TOK', help='Use app token TOK to authenticate', required=True)
    args = parser.parse_args(argv[1:])
    if not os.path.exists(args.destination):
        os.makedirs(args.destination)
    return sync(args.source, args.destination, args.token)


if __name__ == '__main__':
    try:
        sys.exit(_main(sys.argv))
    except KeyboardInterrupt:
        sys.exit(-1)

Data Download Scripts

Automation

Code Samples

wget

Installation

Linux Shell Script

Perl

Python

Confirm Download of Multiple Files

Important Download Information

Browser Configurations

Please log in and try again