The S&P 500 Historical Components & Changes

You can pay to get the S&P 500 historical constituents or use various free sources. Due to survivorship bias, getting an accurate list of the S&P500 components over time is critical when developing a trading strategy.

Getting the S&P500 Historical Constituents

Multiple paid and free data providers provide the S&P500 constituents list. Finding the components of other indices can be more complex and generally requires a paid source. I show the best free and paid resources that I’ve found for S&P500 constituents below. I added Analyzing Alpha’s files created from Wikipedia for convenience:

SourcePaid or Free
Siblis Research S&P500 Historical ComponentsPaid
Norgate Data Historical Index ConstituentsPaid
iShares Core S&P500 ETFFree
Wikipedia List of S&P500 CompaniesFree
Analyzing Alpha (Components without History from Wikipedia)Free
Analyzing Alpha (Components with History from WikipediaFree

Download the S&P 500 Historical Components

If you’re just here for the CSV data, please use the following, which is offered freely using the following creative commons license:

  1. CSV File of SP500 Constituents
  2. CSV File of SP500 Historical Changes

Creating your Own S&P500 Components List

I will show you how to create your own S&P500 constituents dataset using Python by web scraping Wikipedia as it provides more S&P 500 history data. If you’re following along with me, the Jupyter notebook is the best tool to use for this sort of data manipulation and cleanup.

You’ll notice that Wikipedia stays up-to-date and includes the S&P 500 additions and deletions for 2022.

Scraping The Constituents with Pandas

pandas.read_html enables us to scrape a web page for Html tables and add them into a dataframe. I import the required libraries and grab the data.

import datetime as dt
import pandas as pd

url = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
data = pd.read_html(url)

If you check out the Wikipedia List of S&P500 Companies, you’ll notice there is a table containing the current S&P500 components and a table listing the historical changes.

Let’s grab the first table and use Pandas to manipulate it into the format we want. Iterative data tasks like these are usually best performed in Jupyter Notebook. I added the first day of S&P500 trading and verified the dates using RegEx if there was missing data.

# Get current S&P table and set header column
sp500 = data[0].iloc[:, [0,1,6,7]]
sp500.columns = ['ticker', 'name', 'date' , 'cik']

# Get rows where date is missing or not formatted correctly.
mask = sp500['date'].str.strip().str.fullmatch('\d{4}-\d{2}-\d{2}')
mask.loc[mask.isnull()] = False
mask = mask == False
sp500[mask].head()
ticker 	name 	date 	cik
7 	AMD 	Advanced Micro Devices 	NaN 	2488
51 	T 	AT&T 	1983-11-30 (1957-03-04) 	732717
126 	ED 	Consolidated Edison 	NaN 	1047862
130 	GLW 	Corning 	NaN 	24741
138 	DHR 	Danaher Corporation 	NaN 	313616
139 	DRI 	Darden Restaurants 	NaN

Fill The Missing Data

Next, we’ll use zfill to zerofill the cik code as it’s a ten-digit string and not an integer and set all missing dates to 1900-01-01. Hopefully, the community and others can help fill in these gaps!

current = sp500.copy()
current.loc[mask, 'date'] = '1900-01-01'
current.loc[:, 'date'] = pd.to_datetime(current['date'])
current.loc[:, 'cik'] = current['cik'].apply(str).str.zfill(10)

With the current table organized in the manner we want, it’s time to work on the historical adjustments using pandas to wrangle the data into the format we want. We’ll create a dataframe for additions and removals, then concatenate them.

# Get the adjustments dataframe and rename columns
adjustments = data[1]
columns = ['date', 'ticker_added','name_added', 'ticker_removed', 'name_removed', 'reason']
adjustments.columns = columns

# Create additions dataframe.
additions = adjustments[~adjustments['ticker_added'].isnull()][['date','ticker_added', 'name_added']]
additions.columns = ['date','ticker','name']
additions['action'] = 'added'

# Create removals dataframe.
removals = adjustments[~adjustments['ticker_removed'].isnull()][['date','ticker_removed','name_removed']]
removals.columns = ['date','ticker','name']
removals['action'] = 'removed'

# Merge the additions and removals into one dataframe.
historical = pd.concat([additions, removals])
historical.head()
 	date 	ticker 	name 	action
0 	September 20, 2021 	MTCH 	Match Group 	added
1 	September 20, 2021 	CDAY 	Ceridian 	added
2 	September 20, 2021 	BRO 	Brown & Brown 	added
3 	August 30, 2021 	TECH 	Bio-Techne 	added
4 	July 21, 2021 	MRNA 	Moderna 	added

Now that we have both the current and historical data let’s add any tickers in the S&P 500 index but not in Wikipedia history.

missing = current[~current['ticker'].isin(historical['ticker'])].copy()
missing['action'] = 'added'
missing = missing[['date','ticker','name','action', 'cik']]
missing.loc[:, 'cik'] = current['cik'].apply(str).str.zfill(10)
missing.head()
date 	ticker 	name 	action 	cik
0 	1976-08-09 	MMM 	3M 	added 	0000066740
1 	1964-03-31 	ABT 	Abbott Laboratories 	added 	0000001800
6 	1997-05-05 	ADBE 	Adobe 	added 	0000796343
9 	1998-10-02 	AES 	AES Corp 	added 	0000874761
10 	1999-05-28 	AFL 	Aflac 	added 	0000004977

Merge and Dedup the Data

We’ll now merge the historical and the S&P 500 companies and then dedupe them.

sp500_history = pd.concat([historical, missing])
sp500_history = sp500_history.sort_values(by=['date','ticker'], ascending=[False, True])
sp500_history = sp500_history.drop_duplicates(subset=['date','ticker'])
sp500_history
 	date 	ticker 	name 	action 	cik
112 	September 8, 2016 	CHTR 	Charter Communications 	added 	NaN
112 	September 8, 2016 	EMC 	EMC Corporation 	removed 	NaN
113 	September 6, 2016 	MTD 	Mettler Toledo 	added 	NaN
113 	September 6, 2016 	TYC 	Tyco International 	removed 	NaN
208 	September 5, 2012 	LYB 	LyondellBasell 	added 	NaN
... 	... 	... 	... 	... 	...
484 	1900-01-01 00:00:00 	WAT 	Waters Corporation 	added 	0001000697
493 	1900-01-01 00:00:00 	WHR 	Whirlpool Corporation 	added 	0000106640
483 	1900-01-01 00:00:00 	WM 	Waste Management 	added 	0000823768
491 	1900-01-01 00:00:00 	WRK 	WestRock 	added 	0001732845
492 	1900-01-01 00:00:00 	WY 	Weyerhaeuser 	added 	0000106535

Export Data to CSV

And finally, we’ll export both files out to a CSV for download, which you can find in this notebook and the associated files on the Analyzing Alpha Github.

Leave a Comment