Get Historical Crypto Price Data (Free & Paid)

Getting accurate cryptocurrency price data is critical when backtesting discretionary and algorithmic crypto trading strategies. This post covers what I’ve found to be the best free and paid crypto data sources. I also demonstrate how to use Python & Pandas to resample the freely available Kaggle data into one-minute bars filling missing data for future backtesting purposes.

Getting Cryptocurrency Price Data

Multiple free and paid sources exist for historical data for Bitcoin and other cryptocurrencies. The paid versions are more extensive and accurate than the free ones; however, unlike equities data, free crypto data is typically suitable for backtesting purposes.

You can also connect directly to an exchange’s API to download historical data, typically with a more limited history.

SOURCEPAID OR FREE
Polygon.ioPaid
CoinMarketCap.comPaid
CryptoDataDownload.comFree
Kaggle.comFree
Analyzing Alpha (Kaggle Resampled)Free

Download Free Historical Cryptocurrency Price Data

If you’re just here for the CSV data, please use the following, which is offered freely using the following creative commons license:

  1. Analyzing Alpha Historical Crypto Price Data CSV (zipped & resampled)
  2. Kaggle Historical Price Data CSVs (zipped)

Create Your Own Dataset Using Python & Pandas

Learn how to download free crypto data and convert it into minute bars using Python and Pandas for backtesting purposes. Follow along below or download the Get Historical Cryptocurrency Price Data Jupyter Notebook.

1. Get Imports

import datetime as dt
import numpy as np
import pandas as pd

2. Import Universe

from zipfile import ZipFile
zf = ZipFile('/home/leosmigel/Downloads/archive.zip')
cols = ['time', 'open', 'high', 'low', 'close', 'volume']
dfs = pd.concat({text_file.filename.split('.')[0]: pd.read_csv(zf.open(text_file.filename),
                                                              usecols=cols)

                for text_file in zf.infolist()
                if text_file.filename.endswith('.csv')
                })
dfs
df = dfs.droplevel(1).reset_index().rename(columns={'index':'ticker'})
df = df[df['ticker'].str.contains('usd')]
df['date'] = pd.to_datetime(df['time'], unit='ms')
df = df.sort_values(by=['date','ticker'])
df = df.drop(columns='time')
df = df.set_index(['date','ticker'])
df = df['2020-07-01':'2020-12-31']
df

4. Resample Timeframes

There will be “missing” bars from an exchange when an asset isn’t traded during that period. We are forward filling the price information and setting the volume to zero for the missing bars. A requirement for many backtesting engines is that there are no missing bars, but doing this can cause issues with various statistical methods if only considering prices and not volume.

bars1m = df
bars1m = bars1m.reset_index().set_index('date').groupby('ticker').resample('1min').last().droplevel(0)
bars1m.loc[:, bars1m.columns[:-1]] = bars1m[bars1m.columns[:-1]].ffill()
bars1m.loc[:, 'volume'] = bars1m['volume'].fillna(value=0.0)
bars1m = bars1m.reset_index().set_index(['date','ticker'])
bars1m

5. Export CSV File

bars1m.to_csv('crypto-price-data.csv')

Leave a Comment