/ CRYPTO

Get Historical Cryptocurrency Price Data (Free & Paid)

Getting accurate cryptocurrency price data is critical when backtesting both discretionary and algorithmic crypto trading strategies. In this post, I cover what I’ve found to be the best free and paid crypto data sources. I also demonstrate how to use Python & Pandas to resample the freely available Kaggle data into one minute bars filling missing data for future backtesting purposes.

Getting Cryptocurrency Price Data

Multiple free and paid sources exist for historical data for Bitcoin and other cryptocurrencies. The paid versions are more extensive and accurate than the free ones; however, unlike equities data, free crypto data typically is suitable for backtesting purposes.

You can also connect directly to an exchange’s API to download historical data typically with a much more limited history.

SOURCE PAID OR FREE
Polygon.io Paid
CoinMarketCap.com Paid
CryptoDataDownload.com Free
Kaggle.com Free
Analyzing Alpha (Kaggle Resampled) Free

Download Free Historical Cryptocurrency Price Data

If you’re just here for the CSV data, please use the following which is offered freely using the following creative commons license:

  1. Analyzing Alpha Historical Crypto Price Data CSV (zipped & resampled)
  2. Kaggle Historical Price Data CSVs (zipped)

Create Your Own Dataset Using Python & Pandas

Learn how to download free crypto data and convert it into minute bars for backtesting purposes using Python and Pandas. Follow along below or download the Get Historical Cryptocurrency Price Data Jupyter Notebook.

1. Get Imports

import datetime as dt
import numpy as np
import pandas as pd

2. Import Universe

from zipfile import ZipFile
zf = ZipFile('/home/leosmigel/Downloads/archive.zip')
cols = ['time', 'open', 'high', 'low', 'close', 'volume']
dfs = pd.concat({text_file.filename.split('.')[0]: pd.read_csv(zf.open(text_file.filename),
                                                              usecols=cols)
                
                for text_file in zf.infolist()
                if text_file.filename.endswith('.csv')
                })
dfs
time open close high low volume
1inchusd 0 1627916520000 2.38090 2.38240 2.38240 2.38090 29.697338
1 1627916580000 2.37750 2.38730 2.38730 2.37740 115.688719
2 1627916700000 2.37980 2.37980 2.37980 2.37980 0.041380
3 1627921080000 2.37940 2.37930 2.37940 2.37930 1296.114682
4 1627921140000 2.37930 2.37990 2.37990 2.37930 0.026510
... ... ... ... ... ... ... ...
zrxusd 407696 1628630940000 0.97622 0.97622 0.97622 0.97622 59.570146
407697 1628631420000 0.97525 0.97377 0.97525 0.97377 228.824921
407698 1628631900000 0.97182 0.97182 0.97182 0.97182 60.145555
407699 1628632080000 0.96957 0.97008 0.97008 0.96957 4231.639947
407700 1628632680000 0.97205 0.97205 0.97205 0.97205 7.550140

61784313 rows × 6 columns

df = dfs.droplevel(1).reset_index().rename(columns={'index':'ticker'})
df = df[df['ticker'].str.contains('usd')]
df['date'] = pd.to_datetime(df['time'], unit='ms')
df = df.sort_values(by=['date','ticker'])
df = df.drop(columns='time')
df = df.set_index(['date','ticker'])
df = df['2020-07-01':'2020-12-31']
df
open close high low volume
date ticker
2020-07-01 btcusd 9150.646722 9147.30000 9150.646722 9147.300000 1.452704
btgusd 10.403000 10.40300 10.403000 10.403000 141.000000
eosusd 2.370600 2.37060 2.370600 2.370600 136.577291
ethusd 225.880000 225.69000 225.880000 225.671073 12.266386
gotusd 0.042020 0.04380 0.043800 0.042020 160.000000
... ... ... ... ... ... ...
2020-12-31 xlmusd 0.131800 0.13180 0.131800 0.131800 91.628590
xrpusd 0.211350 0.21094 0.211350 0.209870 50985.665741
xtzusd 1.988200 1.98820 1.988200 1.988200 125.735553
yfiusd 21792.000000 21792.00000 21792.000000 21792.000000 0.148400
zecusd 64.057000 64.09400 64.094000 64.057000 1.263063

3842732 rows × 5 columns

4. Resample Timeframes

bars1m = df
bars1m = bars1m.reset_index().set_index('date').groupby('ticker').resample('1min').last().droplevel(0)
bars1m.loc[:, bars1m.columns[:-1]] = bars1m[bars1m.columns[:-1]].ffill()
bars1m.loc[:, 'volume'] = bars1m['volume'].fillna(value=0.0)
bars1m = bars1m.reset_index().set_index(['date','ticker'])
bars1m
open close high low volume
date ticker
2020-08-06 10:11:00 adausd 0.14270 0.14270 0.14270 0.14270 10.0000
2020-08-06 10:12:00 adausd 0.14270 0.14270 0.14270 0.14270 0.0000
2020-08-06 10:13:00 adausd 0.14270 0.14270 0.14270 0.14270 0.0000
2020-08-06 10:14:00 adausd 0.14270 0.14270 0.14270 0.14270 0.0000
2020-08-06 10:15:00 adausd 0.14251 0.14251 0.14251 0.14251 7557.1124
... ... ... ... ... ... ...
2020-12-30 23:51:00 zrxusd 0.36008 0.36008 0.36008 0.36008 0.0000
2020-12-30 23:52:00 zrxusd 0.36008 0.36008 0.36008 0.36008 0.0000
2020-12-30 23:53:00 zrxusd 0.36008 0.36008 0.36008 0.36008 0.0000
2020-12-30 23:54:00 zrxusd 0.36008 0.36008 0.36008 0.36008 0.0000
2020-12-30 23:55:00 zrxusd 0.36039 0.36039 0.36039 0.36039 33.1428

32011924 rows × 5 columns

5. Export CSV File

bars1m.to_csv('crypto-price-data.csv')
leo

Leo Smigel

Based in Pittsburgh, Analyzing Alpha is a blog by Leo Smigel exploring what works in the markets.