/ TECHNOLOGY

Nasdaq Data Link / Quandl Python API Definitive Guide

Learn what Nasdaq Data Link / Quandl is, and how to use the web and Python API to access a vast array of free and paid financial, economic and alternative datasets.

Table of Contents

Nasdaq Data Link, previously known as Quandl, is a premier marketplace for financial, economic, and alternative data sets. Users can download free datasets, pay for premium data sets, and even get paid for providing data sets to Nasdaq Data Link users.

As a retail algorithmic trader, I use Quandl as one of my data providers and I’m overall quite happy with them. You might also notice I said one of my data providers, as while Quandl is a fantastic source, it’s not the be-all-end-all. Let’s cover the pros and cons of Quandl.

Pros of Quandl

  • Massive amounts of data - Nasdaq Data Link brings together millions of financial and economic indicators from over 250 sources.
  • Unified API - Most datasets on Nasdaq Data Link are available via a standardized API, irrespective of who originally published the data and its format.
  • High quality - With 650,000 user and 12 of the top 15 largest hedge funds using the platform, the data is well vetted.

Cons of Quandl

  • Quandl Pricing - Nasdaq Data Link is often more expensive than alternatives
  • No Master Asset List - You’ll need to create your own master database to associate like assets from different data sets.
  • Limited API - For some large datasets, API calls are limited and the data must be exported.

How to Browse Available Datasets

Before we dig into the code, let’s get an understanding of the vast amount of data residing on the Nasdaq Data Link. You can search for financial data, economic, and alternative datasets using the Nasdaq Data Link Search.

Nasdaq Data Link Search Page

Screenshot of the Nasdaq Data Link Search Page

With a dataset or two in mind, let’s log into Nasdaq Data Link and create a free account.

How to Create a Quandl Account

You need to create a Nasdaq Data Link account in order to obtain a free API key. You can do this at the Nasdaq Data Link sign-up page. Here are the steps:

Sign Up for Quandl Step 1 Sign Up for Quandl Step 2 Sign Up for Quandl Step 3 Sign Up for Quandl Step 4

You can always find your API key in your account settings, also.

Understanding the Quandl API

With your free API key in hand, we can start using Quandl. While we’ll cover most of what you need, if you run into questions at any point, please review the Nasdaq Data Link API Documentation.

There are four ways to access the data using the API and various packages:

Nasdaq Data Link / Quandl Data Access Options

I will cover connecting to the API directly and using the Python API. Sorry, Microsoft users – after learning Pandas, using Excel feels underwhelming.

It’s rare to access the Nasdaq Data Link API directly, but it’s simple to do.

All you need to do is make a GET request in the following format:

GET https://data.nasdaq.com/api/v3/datasets/{database_code}/{dataset_code}/data.{return_format}?api_key=your_api_key 

So to get Facebook’s price data in JSON format, you would send the following GET request using Insomnia, Curl, or simply put it in your browser window using the following url structure:

https://data.nasdaq.com/api/v3/datasets/WIKI/FB/data.json?api_key=YOURAPIKEY

Nasdaq Data Link Quandl API Web Requests

Example using Insomnia to GET data from Quandl API

Installation & Authentication

Installing the Quandl Python API package is simple. You can install it from GitHub or from PyPy by running with your virtual environment activated:

!pip install quandl

Authenticating is just as easy. Import quandl and set its api_key attribute to your API key.

import quandl
quandl.ApiConfig.api_key = YOUR_API_KEY_HERE

Now that we’ve performed the authentication, let’s grab some data!

Getting Data

Quandl returns data as either a time series or a dataframe. This is fitting because using Pandas is my favorite tool for data analysis. Let’s go through a few examples.

We’ll start the quandl.get method, which returns a time series.

Quarterly GDP Data

Below we grab the FRED/GDP data.

data = quandl.get("FRED/GDP", start_date="2011-01-01", end_date="2021-12-31")
data.head(10)
                Value
Date                  
2011-01-01  15351.444
2011-04-01  15557.535
2011-07-01  15647.681
2011-10-01  15842.267
2012-01-01  16068.824
2012-04-01  16207.130
2012-07-01  16319.540
2012-10-01  16420.386
2013-01-01  16629.050
2013-04-01  16699.551
...

Instead of returning a dataframe, we can return a ndarray, or more specifically, a recarry.

data = quandl.get("FRED/GDP", start_date="2011-01-01", end_date="2021-12-31", returns=numpy)
rec.array([('2011-01-01T00:00:00.000000000', 15351.444),
          ('2011-04-01T00:00:00.000000000', 15557.535),
          ('2011-07-01T00:00:00.000000000', 15647.681),
          ('2011-10-01T00:00:00.000000000', 15842.267),
          ('2012-01-01T00:00:00.000000000', 16068.824),
          ('2012-04-01T00:00:00.000000000', 16207.13 ),
          ('2012-07-01T00:00:00.000000000', 16319.54 ),
          ('2012-10-01T00:00:00.000000000', 16420.386),
          ('2013-01-01T00:00:00.000000000', 16629.05 ),
          ('2013-04-01T00:00:00.000000000', 16699.551)],
         dtype=[('Date', '<M8[ns]'), ('Value', '<f8')])

Now let’s see a few examples of the quandl.get_table method.

Quandl WIKI Prices

The below grabs data from the ‘WIKI/PRICES’ table, and uses the qopts parameter to select the desired columns, the AAPL and MSFT tickers, and applies a date filter.

Please note that due to the large volume of API calls, the free stock prices program was discontinued on April 11, 2018.

data = quandl.get_table('WIKI/PRICES',
                        qopts = { 'columns': ['ticker', 'date', 'close'] },
                        ticker = ['AAPL', 'MSFT'],
                        date = { 'gte': '2016-01-01', 'lte': '2016-12-31' })
     ticker       date   close
None                           
0      MSFT 2016-12-30   62.14
1      MSFT 2016-12-29   62.90
2      MSFT 2016-12-28   62.99
3      MSFT 2016-12-27   63.28
4      MSFT 2016-12-23   63.24
...     ...        ...     ...
499    AAPL 2016-01-08   96.96
500    AAPL 2016-01-07   96.45
501    AAPL 2016-01-06  100.70
502    AAPL 2016-01-05  102.71
503    AAPL 2016-01-04  105.35

You can download an entire table with get_table:

quandl.get_table('MER/F1', paginate=True)

If there are more than 1,000,000 rows, you’ll need to use quandl.export_table, which will save it as a zipped csv file.

quandl.export_table('MER/F1', filename='/my/path/db.zip')

At this point, you might say, Leo, these are great, but they are basic examples from the help documentation. You’re right. Let’s get to something more useful. Let’s build our own RESTClient for Quandl using the popular paid SHARADAR database. It’s time for some python code!

Using the Paid Quandl Price Database (SHARADAR)

We’ll create a RESTClient to access the Sharadar data from Quandl/Nasdaq Data Link. This will be very similar to get historical price data from polygon.io

Creating our own RESTClient makes our lives easier in the long run. The __init__ method initializes quandl with various settings. We’ll then create a get_tickers method and a get_bars method for our convenience.

Install Quandl Python API

If you haven’t already, activate your virtual environment and install quandl. We covered this above, but I’m keeping it here for completeness.

!pip install quandl

Import Required Packages

from datetime import date
import quandl
import numpy as np
import pandas as pd
from local_settings import quandl as settings

Create MyRESTCLient

class MyRESTClient():
    def __init__(self, auth_key: str=None):
        '''
        Input: dictionary with configuration parameters
        Returns: Configured quandl connection
        '''
        if not ('api_key' in settings.keys() and
            'number_of_retries' in settings.keys() and 
            'max_wait_between_retries' in settings.keys() and
            'retry_backoff_factor' in settings.keys() and
            'retry_status_codes' in settings.keys()):
            raise Exception('Bad quandl config file.')

        # https://github.com/quandl/quandl-python#configuration
        quandl.ApiConfig.api_key = settings['api_key']
        quandl.ApiConfig.NUMBER_OF_RETRIES = settings['number_of_retries']
        quandl.ApiConfig.MAX_WAIT_BEWTEEN_RETRIES = settings['max_wait_between_retries']
        quandl.ApiConfig.RETRY_BACKOFF_FACTOR = settings['retry_backoff_factor']
        quandl.ApiConfig.RETRY_STATUS_CODES = settings['retry_status_codes']

        self._session = quandl

Create Get Tickers Method

    def get_tickers(self) -> pd.DataFrame:

        tickers = self._session.get_table('SHARADAR/TICKERS', paginate=True)

        # Filter tickers for equities and funds. Remove instituational tables.
        tickers = tickers[(tickers['table'] == 'SEP') | \
                          (tickers['table'] == 'SFP')]


        # Set NaNs to None and strings to boolean
        tickers.replace({np.nan: None}, inplace=True)

        # Convert isdelated to active
        tickers['active'] = tickers['isdelisted'].apply(lambda x: bool(x == 'N'))

        # Rename and get only fields of interest
        tickers = tickers.rename(columns={'permaticker':'quandl_id',
                                          'siccode':'sic'})

        # Set type of quandl_id to int64
        tickers['quandl_id'] = tickers['quandl_id'].astype(int)

        # Return only columns we want
        cols = ['ticker', 'name', 'active', 'sic',
                'sector', 'industry', 'quandl_id', 'category']
        tickers = tickers[cols]

        # Prevents duplicate data sent by API provider
        tickers = tickers.drop_duplicates(subset='ticker')
        return tickers

Create Get Bars Method

    def get_bars(self, market:str='stock', ticker:str=None,
                 from_:date=None, to:date=None) -> pd.DataFrame:

        # Convert np.NaT to None
        from_ = None if pd.isnull(from_) else from_
        to = None if pd.isnull(to) else to

        # Set datea to most recent year if None
        to = to if to else date.today()
        from_ = from_ if from_ else date(2000,1,1)

        tables = ['SHARADAR/SEP', 'SHARADAR/SFP']
        
        for table in tables:
            df = self._session.get_table(table,
                                         ticker=ticker,
                                         date={'gte':from_,'lte':to},
                                         paginate=True)

            if not df.empty:
                df['date'] = pd.to_datetime(df['date'])
                df = df.sort_values(by='date')
                df = df[['date', 'open','high','low','close','volume']]
                return df

        return None

The Python Code

class MyRESTClient():
    def __init__(self, auth_key: str=None):
        '''
        Input: dictionary with configuration parameters
        Returns: Configured quandl connection
        '''
        if not ('api_key' in settings.keys() and
            'number_of_retries' in settings.keys() and 
            'max_wait_between_retries' in settings.keys() and
            'retry_backoff_factor' in settings.keys() and
            'retry_status_codes' in settings.keys()):
            raise Exception('Bad quandl config file.')

        # https://github.com/quandl/quandl-python#configuration
        quandl.ApiConfig.api_key = settings['api_key']
        quandl.ApiConfig.NUMBER_OF_RETRIES = settings['number_of_retries']
        quandl.ApiConfig.MAX_WAIT_BEWTEEN_RETRIES = settings['max_wait_between_retries']
        quandl.ApiConfig.RETRY_BACKOFF_FACTOR = settings['retry_backoff_factor']
        quandl.ApiConfig.RETRY_STATUS_CODES = settings['retry_status_codes']

        self._session = quandl
        
    def get_tickers(self) -> pd.DataFrame:
        tickers = self._session.get_table('SHARADAR/TICKERS', paginate=True)

        # Filter tickers for equities and funds. Remove instituational tables.
        tickers = tickers[(tickers['table'] == 'SEP') | \
                          (tickers['table'] == 'SFP')]

        # Set NaNs to None and strings to boolean
        tickers.replace({np.nan: None}, inplace=True)

        # Convert isdelated to active
        tickers['active'] = tickers['isdelisted'].apply(lambda x: bool(x == 'N'))

        # Rename and get only fields of interest
        tickers = tickers.rename(columns={'permaticker':'quandl_id',
                                          'siccode':'sic'})
        
        # Set type of quandl_id to int64
        tickers['quandl_id'] = tickers['quandl_id'].astype(int)

        # Return only columns we want
        cols = ['ticker', 'name', 'active', 'sic',
                'sector', 'industry', 'quandl_id', 'category']
        tickers = tickers[cols]

        # Prevents duplicate data sent by API provider
        tickers = tickers.drop_duplicates(subset='ticker')
        return tickers
    
    def get_bars(self, market:str='stock', ticker:str=None,
                  from_:date=None, to:date=None) -> pd.DataFrame:

        # Convert np.NaT to None
        from_ = None if pd.isnull(from_) else from_
        to = None if pd.isnull(to) else to

        # Set datea to most recent year if None
        to = to if to else date.today()
        from_ = from_ if from_ else date(2000,1,1)

        tables = ['SHARADAR/SEP', 'SHARADAR/SFP']
        
        for table in tables:
            df = self._session.get_table(table,
                                         ticker=ticker,
                                         date={'gte':from_,'lte':to},
                                         paginate=True)

            if not df.empty:
                df['date'] = pd.to_datetime(df['date'])
                df = df.sort_values(by='date')
                df = df[['date', 'open','high','low','close','volume']]
                return df

        return None

Get Apple’s Prices

client = MyRESTClient()
client.get_bars(ticker='AAPL')
date open high low close volume
5489 2000-01-03 0.936 1.004 0.908 1.000 535796800.0
5488 2000-01-04 0.967 0.988 0.903 0.915 512377600.0
5487 2000-01-05 0.926 0.987 0.920 0.928 778321600.0
5486 2000-01-06 0.948 0.955 0.848 0.848 767972800.0
5485 2000-01-07 0.862 0.902 0.853 0.888 460734400.0
... ... ... ... ... ... ...
4 2021-10-20 148.700 149.754 148.120 149.260 58085532.0
3 2021-10-21 148.810 149.640 147.870 149.480 61247633.0
2 2021-10-22 149.690 150.180 148.640 148.690 58703843.0
1 2021-10-25 148.680 149.370 147.621 148.640 50376102.0
0 2021-10-26 149.330 150.840 149.010 149.320 60630899.0

5490 rows × 6 columns

leo

Leo Smigel

Based in Pittsburgh, Analyzing Alpha is a blog by Leo Smigel exploring what works in the markets.