/ PYTHON

yfinance Python Tutorial [2022]

Yahoo Finance offers an excellent range of market data on stocks, bonds, currencies, and cryptocurrencies. It also offers news reports with a variety of insights into different markets from around the world - all accessible through the yfinance python library.

Ran Aroussi is the man behind yfinance, a Python library that gives you easy access to financial data available on Yahoo Finance. Since Yahoo decommissioned their AP on May 15th,2017 (a move that left developers searching for an adequate alternative), Ran’s yfinance fit the bill. The software gained traction and has been downloaded over 100k times with around 300k+ installs per month, according to PyPi!

Read on if you’re interested in learning how to use the yfinance API to download financial data for free.

You can even follow along with The yfinance Python Tutorial Jupyter Notebook.

But before you get too excited, you need to ask yourself:

Should You Use the Yahoo Finance API?

I wouldn’t recommend using Yahoo Finance data for making live trading decisions. Why?

All Yahoo Finance APIs are unofficial solutions.22

If the look Yahoo Finance! is ever changed, it’ll break many of the APIs as the web scraping code will need to be updated. Yahoo might rate limit or blacklisted you if you create too many requests.

The data is good, not great. Good paid data sources generally offer a higher level of reliability than freely available datasets

Assuming you’re okay with these drawbacks, Yahoo Finance is arguably the best freely available data source. And yfinance is one of the most popular ways to access this incredible data.

Speaking of, today you’re going to learn:

When Should You Use yfinance?

If you’ve decided to use Yahoo Finance as a data source, yfinance is the way to go. It’s the most popular way to access Yahoo Data, and the API is open-source and free to use. There are other free and paid APIs to access Yahoo’s data, but yfinance is the best place to start, and here’s why.

  1. It’s simple to use
  2. It returns data as Pandas DataFrames
  3. One-minute bar granularity

If you’re using AI to perform sentiment analysis, you can’t you yfinance. You’ll have to grab that data directly or use another API.

How to Install yfinance

Installing yfinance is incredibly easy. As with most packages, there are two steps:

  1. Load your Python virtual environment
  2. Install yfinance using pip or conda

If you’re not familiar with virtual environments, read: Python Virtual Environments: Setup & Usage.

The following packages are required:

  • Python >= 2.7, 3.4+
  • Pandas (tested to work with >=0.23.1)
  • Numpy >= 1.11.1
  • requests >= 2.14.2
  • lxml >= 4.5.1

The following package is optional and used for backward compatibility:

  • pandas_datareader >= 0.4.0

With your virtual environment loaded, you’re now ready to install finance.

Install yfinance Using Pip:

$ pip install yfinance --upgrade --no-cache-dir

Install yfinance Using Conda:

$ conda install -c ranaroussi yfinance

yfinance Classes

After loading yfinance, you’ll have access to the following:

yfinance Classes & Methods

You’ll mainly use the following:

  • Ticker
  • Tickers
  • Download

How to Download Historical Price Data Using yfinance

We can download data for one ticker using the Ticker object and multiple tickers using the download method.

Download One Ticker Using yfinance

First, we need to create a ticker object and then use that object to get our data. Creating a ticker object is straightforward:

obj = yf.Ticker(goog) 

Now we can use the various methods to grab the data we want.

yfinance Ticker Methods

Most of the methods are self-explainatory, but here are a few that might trip new users up:

  1. Actions - Corporate actions such as dividends and splits
  2. Analysis - EPS targets and revsisions
  3. Info - Commonly queried data as a dictionary
  4. Recommendations - Analyst buy, hold and sell ratings

Let’s download historical market data using the history method. We can see that history takes the following parameters:

def history(self, period="1mo", interval="1d",
            start=None, end=None, prepost=False, actions=True,
            auto_adjust=True, back_adjust=False,
            proxy=None, rounding=False, tz=None, timeout=None, **kwargs):
    """
    :Parameters:
        period : str
            Valid periods: 1d,5d,1mo,3mo,6mo,1y,2y,5y,10y,ytd,max
            Either Use period parameter or use start and end
        interval : str
            Valid intervals: 1m,2m,5m,15m,30m,60m,90m,1h,1d,5d,1wk,1mo,3mo
            Intraday data cannot extend last 60 days
        start: str
            Download start date string (YYYY-MM-DD) or _datetime.
            Default is 1900-01-01
        end: str
            Download end date string (YYYY-MM-DD) or _datetime.
            Default is now
        prepost : bool
            Include Pre and Post market data in results?
            Default is False
        auto_adjust: bool
            Adjust all OHLC automatically? Default is True
        back_adjust: bool
            Back-adjusted data to mimic true historical prices
        proxy: str
            Optional. Proxy server URL scheme. Default is None
        rounding: bool
            Round values to 2 decimal places?
            Optional. Default is False = precision suggested by Yahoo!
        tz: str
            Optional timezone locale for dates.
            (default data is returned as non-localized dates)
        timeout: None or float
            If not None stops waiting for a response after given number of
            seconds. (Can also be a fraction of a second e.g. 0.01)
            Default is None.
        **kwargs: dict
            debug: bool
                Optional. If passed as False, will suppress
                error message printing to console.
    """

Don’t feel overwhelmed. The defaults are great, and in most cases, we’ll only be changing the period or dates and the interval.

Let’s grab the most recent thirty days daily data for Google. Remember, data is returned as a pandas dataframe:

goog = yf.Ticker('goog')
data = goog.history()
data.head()
                   Open         High          Low        Close   Volume  Dividends  Stock Splits
Date                                                                                            
2021-12-10  2982.000000  2988.000000  2947.149902  2973.500000  1081700          0             0
2021-12-13  2968.879883  2971.250000  2927.199951  2934.090088  1205200          0             0
2021-12-14  2895.399902  2908.840088  2844.850098  2899.409912  1238900          0             0
2021-12-15  2887.320068  2950.344971  2854.110107  2947.370117  1364000          0             0
2021-12-16  2961.540039  2971.030029  2881.850098  2896.770020  1370000          0             0

That was easy!

Now let’s download the most recent week’s minute data; only this time, we’ll use the start and end dates instead of the period.

Keep in mind the following restrictions when using minute data:

  1. The period must be within the last 30 days
  2. Only 7 days of 1m granularity are allowed per request
data = goog.history(interval='1m', start='2022-01-03', end='2022-01-10')
data.head()
                                  Open         High          Low        Close  Volume  Dividends  Stock Splits
Datetime                                                                                                      
2022-01-03 09:30:00-05:00  2889.510010  2901.020020  2887.733398  2899.060059   67320          0             0
2022-01-03 09:31:00-05:00  2900.520020  2906.060059  2900.489990  2904.580078    8142          0             0
2022-01-03 09:32:00-05:00  2904.719971  2904.719971  2896.310059  2899.209961    7069          0             0
2022-01-03 09:33:00-05:00  2898.699951  2898.699951  2898.699951  2898.699951     623          0             0
2022-01-03 09:34:00-05:00  2896.209961  2896.330078  2894.913086  2896.239990    3443          0             0

Download Multiple Tickers Using yfinance

Downloading multiple tickers is similar to downloading a single ticker using the Ticker object.

Please note that you’re limited to the daily granularity when downloading multiple tickers. If you want to get more granular, up to minute granularity, you’ll need to use the Ticker object above.

Now back to multiple ticker downloading…

We need to pass download a list of tickers instead of a single ticker and optionally let the method know how to group the tickers – by ticker or column (column is the default). We can also optionally use threads to download the tickers faster.

def download(tickers, start=None, end=None, actions=False, threads=True,
             group_by='column', auto_adjust=False, back_adjust=False,
             progress=True, period="max", show_errors=True, interval="1d", prepost=False,
             proxy=None, rounding=False, timeout=None, **kwargs):
    """Download yahoo tickers
    :Parameters:
        tickers : str, list
            List of tickers to download
        period : str
            Valid periods: 1d,5d,1mo,3mo,6mo,1y,2y,5y,10y,ytd,max
            Either Use period parameter or use start and end
        interval : str
            Valid intervals: 1m,2m,5m,15m,30m,60m,90m,1h,1d,5d,1wk,1mo,3mo
            Intraday data cannot extend last 60 days
        start: str
            Download start date string (YYYY-MM-DD) or _datetime.
            Default is 1900-01-01
        end: str
            Download end date string (YYYY-MM-DD) or _datetime.
            Default is now
        group_by : str
            Group by 'ticker' or 'column' (default)
        prepost : bool
            Include Pre and Post market data in results?
            Default is False
        auto_adjust: bool
            Adjust all OHLC automatically? Default is False
        actions: bool
            Download dividend + stock splits data. Default is False
        threads: bool / int
            How many threads to use for mass downloading. Default is True
        proxy: str
            Optional. Proxy server URL scheme. Default is None
        rounding: bool
            Optional. Round values to 2 decimal places?
        show_errors: bool
            Optional. Doesn't print errors if True
        timeout: None or float
            If not None stops waiting for a response after given number of
            seconds. (Can also be a fraction of a second e.g. 0.01)
    """

Let’s download the most recent monthly data for Google and Facebook (META).

data = yf.download(['GOOG','META'], period='1mo')
data.head()
              Adj Close               Close                High                 Low                 Open          Volume         
                   GOOG   META         GOOG   META         GOOG   META         GOOG    META         GOOG   META     GOOG     META
Date                                                                                                                             
2021-12-10  2973.500000  15.52  2973.500000  15.52  2988.000000  15.83  2947.149902  15.390  2982.000000  15.77  1081700  1845200
2021-12-13  2934.090088  15.24  2934.090088  15.24  2971.250000  15.55  2927.199951  15.130  2968.879883  15.53  1205200  2178500
2021-12-14  2899.409912  15.06  2899.409912  15.06  2908.840088  15.17  2844.850098  14.850  2895.399902  15.02  1238900  2662900
2021-12-15  2947.370117  15.28  2947.370117  15.28  2950.344971  15.28  2854.110107  14.615  2887.320068  14.95  1364000  2356300
2021-12-16  2896.770020  14.79  2896.770020  14.79  2971.030029  15.46  2881.850098  14.680  2961.540039  15.45  1370000  2511100

Let’s group by the ticker, and provide start and end dates for the same tickers.

data = yf.download(['GOOG','META'], start='2021-12-10', end='2021-12-30', group_by='ticker')
data.head()
             META                                                  GOOG                                                             
             Open   High     Low  Close Adj Close   Volume         Open         High          Low        Close    Adj Close   Volume
Date                                                                                                                                
2021-12-10  15.77  15.83  15.390  15.52     15.52  1845200  2982.000000  2988.000000  2947.149902  2973.500000  2973.500000  1081700
2021-12-13  15.53  15.55  15.130  15.24     15.24  2178500  2968.879883  2971.250000  2927.199951  2934.090088  2934.090088  1205200
2021-12-14  15.02  15.17  14.850  15.06     15.06  2662900  2895.399902  2908.840088  2844.850098  2899.409912  2899.409912  1238900
2021-12-15  14.95  15.28  14.615  15.28     15.28  2356300  2887.320068  2950.344971  2854.110107  2947.370117  2947.370117  1364000
2021-12-16  15.45  15.46  14.680  14.79     14.79  2511100  2961.540039  2971.030029  2881.850098  2896.770020  2896.770020  1370000

How to Download Fundamental Data Using yfinance

We use the Ticker object to download fundamental data.

Download Fundamentals for One Ticker Using yfinance

We can loop through multiple tickers objects to download fundamental data for various tickers.

Let’s get the fundamental information for Danaher.

Ticker Fundamental Methods

We can see that the Ticker object ‘dhr’ provides a lot of data to consume. Many of the get_ methods give us exciting fundamental data.

Using one of my favorite industrial companies, Danaher, let’s run through some examples.

We can get Danaher’s general and frequently-used information using the info method, which returns a dictionary.

dhr = yf.Ticker('DHR')
info = dhr.info
info.keys()
dict_keys(['zip', 'sector', 'fullTimeEmployees', 'longBusinessSummary',
 'city', 'phone', 'state', 'country', 'companyOfficers', 'website',
'maxAge', 'address1', 'fax', 'industry', 'address2', 'ebitdaMargins',
'profitMargins', 'grossMargins', 'operatingCashflow', 'revenueGrowth',
'operatingMargins', 'ebitda', 'targetLowPrice', 'recommendationKey',
'grossProfits', 'freeCashflow', 'targetMedianPrice', 'currentPrice',
'earningsGrowth', 'currentRatio', 'returnOnAssets', 'numberOfAnalystOpinions',
'targetMeanPrice', 'debtToEquity', '...'])

We can access this data using the dictionary.

info['sector']
'Healthcare'

Let’s grab Danaher’s annual revenue and earnings using the earnings method.

dhr.earnings
          Revenue    Earnings
Year                         
2017  15518800000  2492100000
2018  17049000000  2651000000
2019  17911000000  3008000000
2020  22284000000  3646000000

And if the provided methods don’t work, we can calculate financial ratios using the financial statements.

dhr.get_financials()
                                           2020-12-31     2019-12-31     2018-12-31     2017-12-31
Research Development                     1348000000.0   1126000000.0   1059000000.0    956400000.0
Effect Of Accounting Charges                     None           None           None           None
Income Before Tax                        4495000000.0   3305000000.0   2962000000.0   2543200000.0
Minority Interest                          11000000.0     11000000.0     12300000.0      9600000.0
Net Income                               3646000000.0   3008000000.0   2651000000.0   2492100000.0
Selling General Administrative           6880000000.0   5577000000.0   5356000000.0   5011900000.0
Gross Profit                            12932000000.0   9984000000.0   9505000000.0   8571300000.0
Ebit                                     4704000000.0   3281000000.0   3090000000.0   2603000000.0
Operating Income                         4704000000.0   3281000000.0   3090000000.0   2603000000.0
Other Operating Expenses                         None           None           None           None
Interest Expense                         -275000000.0   -108000000.0   -137000000.0   -140100000.0
Extraordinary Items                              None           None           None           None
Non Recurring                                    None           None           None           None
Other Items                                      None           None           None           None
Income Tax Expense                        849000000.0    873000000.0    556000000.0    371000000.0
Total Revenue                           22284000000.0  17911000000.0  17049000000.0  15518800000.0
Total Operating Expenses                17580000000.0  14630000000.0  13959000000.0  12915800000.0
Cost Of Revenue                          9352000000.0   7927000000.0   7544000000.0   6947500000.0
Total Other Income Expense Net           -209000000.0     24000000.0   -128000000.0    -59800000.0
Discontinued Operations                           NaN    576000000.0    245000000.0    319900000.0
Net Income From Continuing Ops           3646000000.0   2432000000.0   2406000000.0   2172200000.0
Net Income Applicable To Common Shares   3510000000.0   2940000000.0   2651000000.0   2492100000.0

We can also concatenate all financial statements together to calculate the ratios more easily.

pnl = dhr.financials
bs = dhr.balancesheet
cf = dhr.cashflow
fs = pd.concat([pnl,bs,cf])
print(fs)
                                             2020-12-31    2019-12-31    2018-12-31    2017-12-31
Research Development                       1348000000.0  1126000000.0  1059000000.0   956400000.0
Effect Of Accounting Charges                       None          None          None          None
Income Before Tax                          4495000000.0  3305000000.0  2962000000.0  2543200000.0
Minority Interest                            11000000.0    11000000.0    12300000.0     9600000.0
Net Income                                 3646000000.0  3008000000.0  2651000000.0  2492100000.0
...                                                 ...           ...           ...           ...
Change To Inventory                        -123000000.0   -22000000.0  -134000000.0     3100000.0
Change To Account Receivables              -264000000.0  -157000000.0   -55000000.0  -142500000.0
Other Cashflows From Financing Activities   -29000000.0   369000000.0   -18000000.0  -124200000.0
Change To Netincome                         182000000.0  -122000000.0   271000000.0   139300000.0
Capital Expenditures                       -791000000.0  -636000000.0  -584000000.0  -570700000.0

[68 rows x 4 columns]

I also often find it helpful to transpose the data and have the time as the index and the column as the data field.

fs.T
           Research Development Effect Of Accounting Charges  ... Change To Netincome Capital Expenditures
                                                              ...                                         
2020-12-31         1348000000.0                         None  ...         182000000.0         -791000000.0
2019-12-31         1126000000.0                         None  ...        -122000000.0         -636000000.0
2018-12-31         1059000000.0                         None  ...         271000000.0         -584000000.0
2017-12-31          956400000.0                         None  ...         139300000.0         -570700000.0

[4 rows x 68 columns]

And while there’s no download method for downloading multiple symbols fundamentals at once, we can loop through the tickers we’re interested in and aggregate the data.

Download Fundamentals for Multiple Tickers Using yfinance

The first thing we want to do when attempting to download data for multiple tickers is to come up with a list of tickers!

Let’s create a new list called fang:

tickers = ['FB','AMZN','NFLX','GOOG']
tickers
['FB', 'AMZN', 'NFLX', 'GOOG']

Now let’s turn this list into a list of ticker objects using list comprehension.

tickers = [yf.Ticker(ticker) for ticker in fang]
[yfinance.Ticker object <FB>,
 yfinance.Ticker object <AMZN>,
 yfinance.Ticker object <NFLX>,
 yfinance.Ticker object <GOOG>]

Now let’s concatenate all of the financial data together. We’ll loop through each ticker, aggregating the profit and loss, balance sheet, and cash flow statement. We’ll then add this data to a list.

Once we have a list of each company’s aggregated financial statements, we’ll concatenate them, removing duplicate headings.

dfs = [] # list for each ticker's dataframe
for ticker in tickers:
    # get each financial statement
    pnl = ticker.financials
    bs = ticker.balancesheet
    cf = ticker.cashflow
    
    # concatenate into one dataframe
    fs = pd.concat([pnl, bs, cf])
    
    # make dataframe format nicer
    # Swap dates and columns
    data = fs.T 
    # reset index (date) into a column
    data = data.reset_index() 
    # Rename old index from '' to Date
    data.columns = ['Date', *data.columns[1:]]
    # Add ticker to dataframe
    data['Ticker'] = ticker.ticker
    dfs.append(data)
data.iloc[:,:3]# for display purposes
  Date        Research Development  Effect Of Accounting Charges ...
0 2020-12-31  27573000000.0         None ...
1 2019-12-31  26018000000.0         None ...
2 2018-12-31  21419000000.0         None ...
3 2017-12-31  16625000000.0         None ...

Now that we have a list of dataframes, we need to iterate through concatenating them and fixing the duplicate headers using pandas.io.parser.

We’ll also reindex the dataframe to make it cleaner to use.

parser = pd.io.parsers.base_parser.ParserBase({'usecols': None})

for df in dfs:
     df.columns = parser._maybe_dedup_names(df.columns)
df = pd.concat(dfs, ignore_index=True)
df = df.set_index(['Ticker','Date'])
df.iloc[:,:5] # for display purposes
  Research Development  Effect Of Accounting Charges  Income Before Tax
Ticker  Date      
FB2020-12-31    18447000000.0   None    33180000000.0
2019-12-31    13600000000.0   None    24812000000.0
2018-12-31    10273000000.0   None    25361000000.0
2017-12-31    7754000000.0    None    20594000000.0
AMZN 2020-12-31   42740000000.0   None    24194000000.0
2019-12-31    35931000000.0   None    13962000000.0
2018-12-31    28837000000.0   None    11270000000.0
2017-12-31    22620000000.0   None    3802000000.0
NFLX 2020-12-31   1829600000.0    None    3199349000.0
2019-12-31    1545149000.0    None    2062231000.0
2018-12-31    1221814000.0    None    1226458000.0
2017-12-31    953710000.0   None    485321000.0
GOOG 2020-12-31   27573000000.0   None    48082000000.0
2019-12-31    26018000000.0   None    39625000000.0
2018-12-31    21419000000.0   None    34913000000.0
2017-12-31    16625000000.0   None    27193000000.0

Congratulations! Now you have the ticker’s financial information organized by ticker and date. You can now use Pandas to pull out any data of interest.

How to Get Options Data Using yfinance

Options give traders the right but not the obligation to buy or sell underlying assets at a specific price at a predetermined date.

You’ll need to use the Ticker.options and Ticker.option_chain methods to download options data.

yfinance Options Methods

  • options returns the options expiry dates as a tuple.
  • option_chain returns a yfinance.ticker.Options chain object that gives you the chain for an expiry, or the entire chain if you don’t specify a date.
aapl = yf.Ticker('aapl')
options = aapl.option_chain()

With a chain object, you’ll have the following available to you.

yfinance Option chain methods

Get yfinance Options Call Data

Use call on the options object to get the call data.

calls = options.calls
calls

yfinance Apple call data

Get yfinance Options Put Data

Getting puts is just as easy. We’ll use options.puts to get the put data.

puts = options.puts
puts

yfinance Apple put data

How to Get Institutional Holders Using yfinance

You can also gauge institutional sentiment using yfinance.

aapl.insitutional_holders
                      Holder      Shares Date Reported   % Out         Value
0           Vanguard Group, Inc. (The)  1266332667    2021-09-29  0.0775  179186072380
1                       Blackrock Inc.  1026223983    2021-09-29  0.0628  145210693594
2              Berkshire Hathaway, Inc   887135554    2021-09-29  0.0543  125529680891
3             State Street Corporation   622163541    2021-09-29  0.0381   88036141051
4                             FMR, LLC   350617759    2021-09-29  0.0215   49612412898
5        Geode Capital Management, LLC   259894947    2021-09-29  0.0159   36775135000
6           Northern Trust Corporation   195321532    2021-09-29  0.0120   27637996778
7        Price (T.Rowe) Associates Inc   188489966    2021-09-29  0.0115   26671330189
8    Norges Bank Investment Management   167580974    2020-12-30  0.0103   22236319440
9  Bank Of New York Mellon Corporation   149381117    2021-09-29  0.0091   21137428055

Why You Shouldn’t Use Yahoo Finance for Live Trading

Let’s grab the data for Facebook. Facebook recently changed its name to Meta.

fb = yf.Ticker('fb')
meta = yf.Ticker('meta')
fb.get_cashflow()
                                            2020-12-31    2019-12-31    2018-12-31    2017-12-31
Investments                               -1.452000e+10 -4.254000e+09  2.449000e+09 -1.325000e+10
Change To Liabilities                      9.100000e+07  2.360000e+08  2.740000e+08  4.700000e+07
Total Cashflows From Investing Activities -3.005900e+10 -1.986400e+10 -1.160300e+10 -2.011800e+10
Net Borrowings                            -5.800000e+08 -7.750000e+08  5.000000e+08  5.000000e+08
Total Cash From Financing Activities      -1.029200e+10 -7.299000e+09 -1.557200e+10 -5.235000e+09
Change To Operating Activities            -1.302000e+09  8.975000e+09  9.100000e+07  3.449000e+09
Net Income                                 2.914600e+10  1.848500e+10  2.211200e+10  1.593400e+10
Change In Cash                            -1.325000e+09  9.155000e+09  1.920000e+09 -9.050000e+08
Repurchase Of Stock                       -9.836000e+09 -6.539000e+09 -1.608700e+10 -5.222000e+09
Effect Of Exchange Rate                    2.790000e+08  4.000000e+06 -1.790000e+08  2.320000e+08
Total Cash From Operating Activities       3.874700e+10  3.631400e+10  2.927400e+10  2.421600e+10
Depreciation                               6.862000e+09  5.741000e+09  4.315000e+09  3.025000e+09
Other Cashflows From Investing Activities -3.600000e+07 -3.600000e+07 -3.600000e+07 -1.300000e+07
Change To Account Receivables             -1.512000e+09 -1.961000e+09 -1.892000e+09 -1.609000e+09
Other Cashflows From Financing Activities  1.240000e+08  1.500000e+07  1.500000e+07 -1.300000e+07
Change To Netincome                        5.462000e+09  4.838000e+09  4.374000e+09  3.370000e+09
Capital Expenditures                      -1.511500e+10 -1.510200e+10 -1.391500e+10 -6.733000e+09

And now for Meta…

meta.get_cashflow()
Empty DataFrame
Columns: [Open, High, Low, Close, Adj Close, Volume]
Index: []

Facebook and Meta are the same company, but they return different data. This is just one of the many risks of using Yahoo Finance.

The Bottom Line

yfinance is a fantastic tool to grab data from Yahoo Finance. Yahoo Finance is probably the best source for free data.

Free data is free, though. And as I discussed and demonstrated above, I wouldn’t recommend it for live trading.

But if you’re looking to do some high-level research and free what you need, yfinance has got you covered.

leo

Leo Smigel

Based in Pittsburgh, Analyzing Alpha is a blog by Leo Smigel exploring what works in the markets.