Get Historical Price Data Using Quandl & Sharadar
606 views
Dec 11, 2024
Learn what Nasdaq Data Link / Quandl is, and how to use the API and the Quandl Python package to access a vast array of free and paid financial, economic and alternative datasets. š Subscribe for more: https://bit.ly/3lLybeP š Follow along: https://analyzingalpha.com/nasdaq-data-link-quandl-python-api/ 00:00 Intro 00:10 What is Nasdaq Data Link / Quandl? 03:29 Install The Quandl Python Package 03:59 Import Required Packages 05:50 Create REST Client Class 08:18 Create Get Tickers Method 15:05 Create Get Bars Method 20:24 Get Bar Data #quandl #nasdaq-data-link #algotrading #python
View Video Transcript
0:00
hello world today we're going to learn
0:02
about the quandl python api now if
0:05
you're not familiar with what quandl is
0:07
i like to think of quantum is a data
0:09
exchange but before we move any further
0:12
quandl is now named nasdaq datalink so i
0:16
might say quandl i might say nasdaq
0:18
datalink they're the same thing i'm
0:19
going to probably stick with quantum
0:21
because that's what i'm used to but just
0:23
know that they're the same thing so
0:25
i am a paying customer of quandl i use
0:27
quandl's sharadar fundamentals database
0:30
i like how
0:32
the charadar team handles the financial
0:35
restatements and things like that but
0:37
i'm digressing
0:39
you can go on to quandl and go to their
0:41
search and browse their vast data data
0:44
set library essentially and then you can
0:47
go ahead and pay some money
0:49
for each one of those as a subscription
0:51
fee to then use in your analysis
0:54
but that's actually not the only way or
0:56
the only you know customer
0:58
dynamic you can actually be a vendor to
1:00
quandl if you have uh alternative data
1:02
or your own data set you can sell that
1:04
to quandl and they will pay pay you for
1:06
it
1:07
so um so that's essentially what it is
1:09
it's just a bunch of data providers you
1:12
know providing data to quandl which is
1:14
then a platform that you know we can
1:17
then use
1:18
in our analysis now bundle's massive i
1:21
think they have 650 000 users and i
1:23
think 12 of the top or the largest 15
1:27
hedge funds are are using their data
1:30
double check that on the website but the
1:31
point is it's a massive service the nice
1:34
thing about it is because it's so
1:36
massive tons of different data and a lot
1:38
of the data is vetted so it's usually
1:40
correct
1:41
and
1:42
you only need to have one api so it
1:44
doesn't matter what the data set is
1:46
you just use the the one api to get the
1:49
data no matter what you know the the
1:51
data set was uploaded as right so it's
1:53
pretty nice
1:54
so what are we going to do in this video
1:56
well i'm not going to walk you through
1:57
getting an api key you just go to the
1:59
website login sign up right
2:01
same thing with some of the basic stuff
2:03
so i'll cover that
2:05
basic stuff you know again their
2:07
documentation is pretty good so i don't
2:08
feel like i can really add to it but
2:11
where i'm going to go different is that
2:12
i'm going to
2:14
essentially show you how to use or show
2:16
you how you can use
2:19
you know a paid database or use that
2:21
polygon
2:22
polygon that was the last video use the
2:24
quandl
2:25
api to you know kind of create your own
2:28
methods to get the data in a more
2:31
in an easier way right so or a better
2:34
format for you in your analysis so one
2:37
of the things when you're
2:38
using tons of different data sources you
2:40
want to standardize and make everything
2:42
conventional so you'll notice as we go
2:45
through here's the polygon one if you
2:46
haven't watched that and it goes into
2:48
detail
2:49
but basically we use all the same method
2:51
names and methods same for data format
2:53
etc
2:55
you know when combining all of these
2:56
different vendors together so it's just
2:58
easier to remember i don't have to think
3:00
well is this a data frame or is this a
3:02
data frame what was this method called
3:03
what was this method called it's all the
3:05
same and that's what i'm doing here so
3:07
hopefully that makes sense to you if
3:09
you're not a paid subscriber um you
3:12
might want to be that's pretty good
3:13
service but if you're not you're still
3:15
going to learn how to use the uh the api
3:19
in this video so hopefully you're
3:21
excited let's uh create some code
3:23
because i think i've been talking long
3:24
enough
3:25
we'll open up jupyter notebook and the
3:27
first thing that we're going to want to
3:28
do is install quandl now you should have
3:31
your own virtual environment active if
3:33
you don't have a virtual environment yet
3:35
or don't know how to create one i'll put
3:37
a link in the description below you
3:39
definitely don't want to be installing
3:40
everything into your global environment
3:43
so to install quandl super easy you
3:46
could use github but in this case we're
3:48
just going to use
3:49
zip we'll do bang pip space install
3:52
quandl and that will install quandl into
3:55
the active environment
3:57
simple enough
3:58
the next thing we'll want to do is get
4:00
the imports
4:01
we're going to use a few imports in this
4:04
program so we're going to want date
4:07
we'll get from
4:09
date time import date we're going to use
4:12
date to set the start and end date for
4:14
the dates we're going to kind of get for
4:16
for quandl right for our price bars
4:19
we're going to want to import quandl
4:22
because we need that to connect to
4:24
quandl we'll import numpy
4:27
as np because we're going to want to
4:29
handle
4:30
uh commands or not of numbers
4:32
we'll import pandas as pd to be able to
4:36
check for
4:37
null values and just use data frames and
4:40
from local settings
4:43
import quandl as settings
4:46
now if you follow my videos you'll know
4:47
what this is but basically the local
4:49
settings is just a file that we don't
4:52
upload to our
4:54
github repository because we don't ever
4:56
want to upload
4:57
you know sensitive information such as
4:59
api keys to github for
5:01
prying eyes to maliciously steal all of
5:04
our api calls and do other wacky things
5:07
so if you're wondering what the format
5:09
of this file looks like it's quite
5:10
simple
5:11
it's just
5:13
you know just like this we've already
5:14
done it twice now
5:16
you just have in this case would be
5:18
quandl and then you have a dictionary
5:20
where the api key would be the key and
5:22
the actual api key would be
5:25
the value but this is essentially what
5:28
we needed actually it's just like
5:29
polygon only
5:32
you would substitute the polygon text
5:34
with quandl and obviously change
5:37
your polygon api to the quantum api
5:40
okay
5:41
perfect
5:43
so with the imports out of the way
5:45
hit enter just to make sure it can
5:46
locate them all let's create our rest
5:49
client so then create my
5:52
rest
5:53
client
5:56
awesome
5:58
so let's create our rest
6:00
client class
6:02
a class my rest client
6:05
be super easy
6:07
well
6:08
f the thunder init
6:11
self
6:13
off key
6:14
and then if you're not familiar with
6:15
type hints
6:16
um that's okay
6:18
basically all they do is they tell you
6:21
what type of variable you're passing in
6:23
and passing out just you know gives you
6:25
a hint of the type right so auth key
6:27
which is our api key
6:30
is a string and if we don't set it it
6:32
gets the default value from settings.api
6:35
key so we actually don't have to pass
6:36
this in it'll essentially know
6:40
what our api key is but if we do want to
6:42
pass a different api key in for whatever
6:44
reason we can
6:45
okay so now what we do is we just simply
6:48
take our quandl package that we imported
6:52
we'll type quandl
6:54
dot api config api key
6:58
set that equal to the auth key
7:01
right which is our api key here
7:04
and now what we'll do um
7:07
is we'll
7:08
go ahead and set self
7:11
underscore session equal to quandl
7:14
that's it now uh whenever i upload this
7:17
jupyter notebook i'll add some more
7:18
information because you can have like a
7:21
number of retries max weight between
7:23
retries again a retry strategy right
7:25
where polygon didn't provide any of that
7:27
stuff quandl does
7:29
but i'll provide some documentation i
7:31
just feel like going through all that
7:32
here might not be the best use of your
7:34
time
7:36
okay so i'll hit enter and let's test to
7:38
see if we can get this to work
7:41
we'll create a client from my rest
7:44
client
7:46
and we don't have to pass the api key
7:47
because it's default
7:49
okay and then
7:50
enter so we did get a client
7:53
and now we should be able to access all
7:55
of the quandl
7:57
you know method so we'll do client
7:59
dot underscore session
8:02
period hit tab and we can we can see
8:05
all of this stuff that the quandl
8:09
package provides
8:11
awesome
8:12
so now what we want to do is create our
8:14
get tickers method
8:17
by clearing up some of this
8:20
and let's go ahead and create a new
8:21
title
8:23
create
8:24
get
8:29
all right perfect
8:30
now let's think about what we want to do
8:32
here
8:33
right
8:34
so we want to
8:36
grab all of the tickers from the
8:38
charadar tickers database there's going
8:41
to be a lot of them so we'll need to set
8:43
page nate equal to true so you know
8:45
it'll only give us one page if we don't
8:48
uh but the paginate allows it to
8:50
essentially continue paging and adding
8:53
to that ticker's data frame until all of
8:56
the pages are complete or you hit the
8:58
api limit but in this case we won't i
9:00
think that's one million rows
9:02
so let's go ahead and start there
9:06
we're going to need to grab our client
9:08
because this is a class method i'm sorry
9:10
grab our class because it's a class
9:12
method
9:14
type def
9:15
get pickers
9:18
and then that'll output a data frame
9:24
okay and now let's grab our ticker so do
9:26
tickers equal self
9:29
session again we created that session up
9:31
here
9:33
get table we're going to get the shard
9:36
our tickers table r
9:40
stickers
9:42
then we paginate
9:44
equal to true
9:46
okay and now let's filter for only the
9:49
equity
9:50
and fun tickers we'll do tickers equals
9:54
stickers
9:56
and let's do this
10:02
okay we'll do tickers table
10:05
equals sap
10:07
and ticker's
10:09
table
10:12
equals sfp
10:14
okay
10:16
there's no space after that okay so
10:18
that'll filter where the table is the
10:21
equity pricing table or the fund pricing
10:23
table
10:24
now we'll also want to do some more data
10:26
cleanup let's uh fix the nand values
10:29
i'll do tickers replace
10:34
the np man
10:37
the none we'll do that in place
10:48
there we go and now what we want to do
10:50
is this is one of my largest gripes
10:52
about this data set
10:54
the
10:55
there's a field call or column called
10:57
isd listed and it's not boolean it's
11:00
actually character yes or no so let's
11:02
fix that we'll make it active so i'll
11:04
make a note here convert is delisted
11:08
to active and i'll make this cleaner
11:10
whenever i upload it to github
11:13
we'll do uh tickers
11:15
active
11:16
that's the field we want
11:18
we'll do tickers is delisted because
11:20
that's the field we need to check
11:22
we'll put apply an anonymous function
11:25
as a lambda function x
11:27
then bool x is equal to n
11:30
let's walk through that right so if it
11:33
is delisted or active if it is not
11:36
delisted it's active so if we set
11:40
uh for each row if x is delisted if it's
11:43
n it means it's active if n is equal
11:45
equal to n that'll return true which is
11:48
active
11:48
if it is delisted
11:51
this will be a
11:53
y i think
11:54
or t i'm i'm pretty sure it's a y y is
11:57
equal equal to no that means is d listed
12:00
as yes and that will return false to
12:02
active that's how we fix
12:05
that field
12:06
so let's also do some basic renaming
12:09
we'll do rename
12:11
and get rename fields
12:13
okay
12:14
we'll do tickers equal pickers rename
12:18
columns
12:21
perma ticker
12:23
call this the quandl id even though
12:25
technically it's the charadar id but
12:28
again
12:29
you know it is what it is this is just
12:30
for learning purposes code
12:33
and
12:35
basic right there okay
12:37
and now let's make sure that our quandl
12:40
id is a type integer we'll do
12:43
that type of bundle id to
12:47
hint 64.
12:48
for tickers
12:50
bundle id
12:52
is equal to tickers
12:55
bundle id
12:57
as type hint
13:00
okay so we fixed that now let's only get
13:02
the columns that we want so we'll return
13:04
only columns
13:05
of interest
13:07
we'll do calls equal
13:10
ones we want
13:33
perfect now so there's one last thing
13:35
that we need to do and this happens and
13:38
it's almost like a
13:39
de facto standard at this point to
13:41
prevent
13:43
duplicates
13:45
because it does happen even with paid
13:47
sources speakers equal tickers dot drop
13:51
duplicates
13:52
do subset equals quicker so if for some
13:56
reason there is a row that has the same
13:59
ticker it will drop the second one and
14:02
return tickers
14:03
and keep our fingers crossed i didn't
14:05
make any mistakes
14:07
so what i do here in ballast syntax
14:10
of course i made a mistake line 17
14:14
active equal is delisted
14:19
y
14:20
lam
14:22
lambda x
14:26
okay and let's give r
14:28
alkaline a test i'll do client
14:32
equals
14:33
my ref client
14:37
enter
14:38
that does look like
14:40
a cree oh no
14:42
there we go
14:45
now let's see so we'll have everything
14:46
that we had before right with our
14:48
session
14:49
okay but now we should have it tickers
14:53
so we'll capture that into a data frame
14:56
and hit enter and see if we get all of
14:59
the tickers in the format that we want
15:02
it looks like we did
15:05
now let's get the bars
15:07
we're more than 50 percent done
15:09
say
15:10
create get bars method
15:15
let's think about what we want to do we
15:17
want to be able to pass a ticker to this
15:19
method and it gets all of the bars for
15:21
us
15:22
makes sense
15:23
and we also want to be able to pass the
15:26
start and end date
15:28
and we probably should pass the market
15:30
too just to make sure we don't
15:31
accidentally pull in
15:33
you know the wrong ticker although the
15:35
ticker set is unique but we'll just keep
15:38
that as is for now when you're designing
15:40
your system you can you know design it
15:42
however you want
15:43
so we'll go ahead and grab
15:45
all of our
15:47
code so far
15:50
we'll paste this
15:52
now we'll start getting to work on the
15:54
get bars method
15:56
okay
15:57
class method so pass self
16:00
a mark it is a string i'm going to say
16:02
it's equal to stock
16:04
from a default
16:05
through ticker
16:09
and that should be a string
16:12
um pass is none that could just be
16:14
optional well no it's you have to have a
16:16
string so from
16:18
would be a date
16:19
none and two is date
16:22
none and that'll
16:24
output a pd
16:26
data frame
16:28
okay
16:29
all right so let's first handle the
16:31
start and end date so we'll say from
16:33
underscores equal to none
16:36
if cd is null
16:39
problem so basically we want to make
16:40
sure it's not an n
16:42
so else
16:45
this just handles is common
16:47
we're handling um bands
16:51
because well i'm not going to get into
16:53
the technicals here but basically we're
16:54
just going to handling hands uh
16:58
is null
17:02
okay so now that we know that the data
17:05
is not a nan because that can mess us up
17:08
we know that it's uh at least data's in
17:10
there or to none
17:13
and now let's set
17:14
uh the day
17:16
to let's set 2 to today if it wasn't set
17:19
so 2 equals 2
17:21
if 2 right so that means 2 is equal to
17:24
whatever was set to else date today
17:28
right now from we'll do something
17:30
similar so we'll do
17:32
from
17:34
underscore if from
17:36
underscore right so that means if from
17:39
exists right at the from else make it a
17:42
date
17:43
just start in the 2000 right
17:46
okay now we also have the two tables
17:49
that we're interested in right we got
17:51
the tickers from those two you know from
17:53
the sep and sfp
17:56
table so let's just create a list for
17:58
that
17:58
tables
18:00
r r
18:02
scp
18:04
and sharadar
18:08
sfp
18:10
perfect
18:11
okay so now basically what we want to do
18:13
we've got our two set up correctly now
18:15
we've got our from set up correctly now
18:17
what we want to do is we want to
18:20
loop through
18:21
the tables of four
18:24
table and tables
18:27
df equal self
18:30
underscore
18:32
session
18:33
that table
18:34
and we pass in the table name
18:37
the ticker which is what we provided
18:40
the date is equal to
18:43
uh gte or greater than or equal to from
18:47
and then we have
18:49
our less than or equal to
18:52
two
18:54
and then paginate
18:56
when it means we'll loop through all of
18:58
it if some reason the first page didn't
19:01
get everything it'll page
19:03
us all them so i'll do if not df empty
19:08
right so we have our our data frame now
19:10
with everything if it's not empty we'll
19:13
change the date
19:16
into something more readable we'll do pd
19:18
do date
19:20
time df
19:21
date
19:22
all right so we're just setting that to
19:24
a date date time and then df equal
19:27
pf sort values by
19:30
date
19:32
and then we select the columns of
19:33
interest so we'll do
19:35
date
19:36
open
19:37
i
19:38
flow
19:39
close
19:41
and volume and then we'll return the df
19:45
and we'll also just say return
19:48
turn none
19:51
we don't have to actually do that but i
19:53
don't i i like to make sure things are
19:55
explicit
19:58
right there
19:59
and then hit
20:00
i'll enter and see if we got it okay now
20:03
let's see
20:04
9 47 syntax error
20:15
now let's test it out
20:17
create a new client it'll be my rest
20:21
client
20:24
we'll say df equals client and see both
20:28
of the methods now get bars
20:31
picker is apple
20:33
and why don't we try to get all of the
20:36
data
20:37
okay
20:38
i'll type df here and this may take a
20:40
minute so i might oh
20:42
that was actually pretty quick
20:44
and it does look like the adjusted
20:47
to be working
20:48
perfect and that's it i can hear you say
20:51
it already leo that was one of the
20:53
easiest videos we've done so far and
20:55
you're right the nice thing about the
20:57
quandl api is that so many people have
20:59
used it it's pretty fleshed out at this
21:01
point right there's not much we had to
21:03
do we just created you know our own
21:05
methods just to make our lives easier so
21:07
i hope you found this video valuable if
21:09
you did please subscribe and hit the
21:12
thumbs up button and also if you're
21:14
interested this video right here google
21:16
thinks you'll like so i hope you have a
21:18
wonderful day and i'll see you in the
21:20
next one thanks
#Data Management
#Finance
#Investing
#Web Stats & Analytics