How To Scrape Yahoo Finance Stock Data with Python

Yahoo Finance is one of the top sources for financial data on the web. It provides a wealth of information on stocks, bonds, commodities and more – all available for free. By web scraping Yahoo Finance with Python, you can collect years of historical stock prices and fundamental financial data to analyze in your models.

In this in-depth tutorial, we‘ll walk through how to build a robust Python web scraper for extracting data from Yahoo Finance. We‘ll cover the required tools and libraries, how to use Selenium to scrape dynamic page content, and the full code to store the extracted data in CSV format.

Whether you‘re a financial analyst, trader, data scientist, or developer, scraping financial websites like Yahoo Finance provides invaluable data for your analysis. Let‘s dive in and learn how to scrape it with Python!

Why Scrape Financial Data from Yahoo Finance?

There are numerous reasons you may want to collect data from Yahoo Finance programmatically:

  • Fundamental analysis: Scrape key financial metrics like P/E ratios, market cap, dividend yields etc. to evaluate and compare different stocks
  • Historical prices: Extract years of daily stock prices for back-testing trading strategies and building financial models
  • Financial news: Scrape the latest finance news to stay abreast of market-moving events and shifting sentiment
  • Bulk data: Quickly grab financial data for hundreds or thousands of stocks for large-scale analysis
  • Live prices: Get up-to-date, real-time stock quotes to power your trading algorithms

Manually obtaining this data is extremely tedious and time-consuming. By automating it with web scraping, you can collect huge amounts of valuable financial data with minimal effort.

Python is the ideal language for this, with its simple syntax and powerful web scraping libraries. Let‘s look at the tools we‘ll need.

Required Tools & Libraries for Web Scraping

Since Yahoo Finance is a dynamic website that heavily uses JavaScript to load data, we‘ll need tools capable of rendering and interacting with JS content. Here are the key libraries we‘ll use:

  • Python – Make sure you have Python 3.6+ installed on your machine. We‘ll be using Python for the entirety of the scraping script.

  • Selenium – Selenium is a powerful web automation tool that can fully render and interact with websites, including clicking buttons, filling forms, and scrolling pages. We‘ll use it to load the JavaScript content and navigate Yahoo Finance.

  • BeautifulSoup – A Python library for parsing HTML and XML content. After extracting the page source with Selenium, we‘ll use BeautifulSoup to locate and extract the specific stock data we want.

  • CSV – A built-in Python library for reading and writing CSV files. We‘ll use it to save our extracted stock data into an structured format for further analysis.

Make sure you have these libraries installed in your Python environment:

pip install selenium beautifulsoup4

We‘ll also need the web drivers for Selenium to interface with the browser. I recommend using Firefox and geckodriver:

  1. Install Firefox
  2. Download geckodriver and add the executable to your system PATH

With our tools ready, let‘s start building the Yahoo Finance scraper!

Scraping Yahoo Finance with Python and Selenium

Here is the complete Python script to scrape stock data from a Yahoo Finance page. We‘ll walk through it step-by-step:

from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import csv
import time

def scrape_stock(ticker):
    """
    Scrape financial data from a Yahoo Finance stock page
    :param ticker: str, stock symbol e.g. ‘AAPL‘
    :return: dict, scraped stock data
    """
    print(f"Scraping data for {ticker}...")

    # Set up Selenium web driver
    options = Options()
    options.headless = True
    driver = webdriver.Firefox(options=options)

    try:
        # Load the page
        driver.get(f"https://finance.yahoo.com/quote/{ticker}")

        # Wait for the page elements to load
        WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID, "quote-summary")))

        # Parse the page HTML
        soup = BeautifulSoup(driver.page_source, ‘html.parser‘)

        # Extract the relevant data using BeautifulSoup and CSS selectors
        stock_data = {
            ‘symbol‘: ticker, 
            ‘name‘: soup.find(‘h1‘, {‘class‘: ‘D(ib)‘}).text,
            ‘price‘: soup.find(‘fin-streamer‘, {‘data-test‘: ‘qsp-price‘}).text,
            ‘change‘: soup.find(‘fin-streamer‘, {‘data-test‘: ‘qsp-price-change‘}).text,
            ‘percent_change‘: soup.find(‘fin-streamer‘, {‘data-field‘: ‘regularMarketChangePercent‘}).text,
            ‘market_cap‘: soup.find(‘td‘, {‘data-test‘: ‘MARKET_CAP-value‘}).text,
            ‘pe_ratio‘: soup.find(‘td‘, {‘data-test‘: ‘PE_RATIO-value‘}).text,
            ‘dividend_yield‘: soup.find(‘td‘, {‘data-test‘: ‘DIVIDEND_AND_YIELD-value‘}).text
        }

        return stock_data

    except Exception as e:
        print(f"Could not scrape {ticker}: {e}")
    finally:
        driver.quit()

def save_data(stock_data):
    """Save a list of scraped stock data to a CSV file"""
    print("Saving data to stock_data.csv")

    fieldnames = [‘symbol‘, ‘name‘, ‘price‘, ‘change‘, ‘percent_change‘, ‘market_cap‘, ‘pe_ratio‘, ‘dividend_yield‘]
    with open(‘stock_data.csv‘, ‘w‘, newline=‘‘) as f:
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        writer.writeheader()
        writer.writerows(stock_data)

if __name__ == ‘__main__‘:
    # The stock symbols to scrape
    stocks = [‘AAPL‘, ‘MSFT‘, ‘AMZN‘, ‘GOOGL‘, ‘FB‘]

    stock_data = []
    for symbol in stocks:
        data = scrape_stock(symbol) 
        if data:
            stock_data.append(data)
        time.sleep(1)  # be nice and don‘t hammer the site

    save_data(stock_data)

Step 1 – Setup

First we import the required libraries. We‘ll use Selenium to load the Yahoo Finance page and render its JavaScript content. BeautifulSoup will then parse the HTML and extract the data we want. Finally, the csv module will let us save the data to a CSV file.

Next, set up the Selenium web driver to load the Yahoo Finance page. We configure it in headless mode to run in the background.

Step 2 – Navigating to the stock page

Inside the scrape_stock() function, we use Selenium‘s driver.get() method to load the Yahoo Finance page for the given stock ticker.

We wrap this in a try/except block to catch any errors while loading the page. The finally block ensures we always quit the web driver to avoid memory leaks.

Step 3 – Waiting for the page to render

Since the stock data loads dynamically, we need to wait for the relevant page elements to appear before scraping.

Using Selenium‘s WebDriverWait and ExpectedConditions classes, we explicitly wait up to 10 seconds for the ‘quote-summary‘ div to be present on the page. This signals the stock data has loaded.

Step 4 – Extracting data with BeautifulSoup

Now that the full page content is loaded, we parse it into a BeautifulSoup object. We can then use BeautifulSoup‘s methods like find() and CSS selectors to pinpoint and extract the specific stock data.

After inspecting the page source, we see the data we want is contained in elements like:

<h1 class="D(ib)">Apple Inc. (AAPL)</h1>
<fin-streamer data-test="qsp-price" class="Fw(b)">131.56</fin-streamer>  
<fin-streamer data-test="qsp-price-change" class="Fw(500)">-2.64</fin-streamer>
<fin-streamer data-field="regularMarketChangePercent" class="Fw(500)">(-1.96%)</fin-streamer>
<td data-test="MARKET_CAP-value" class="Ta(end)">2.117T</td>
<td data-test="PE_RATIO-value" class="Ta(end)">27.19</td>
<td data-test="DIVIDEND_AND_YIELD-value" class="Ta(end)">0.92 (0.70%)</td>

Using BeautifulSoup‘s selectors, we can extract each piece of data and store it in a dictionary.

Step 5 – Saving to CSV

After scraping data for each stock, we store it in a list of dictionaries. The save_data() function then uses csv.DictWriter to save this list of dictionaries to a CSV file.

We specify the CSV column names in the ‘fieldnames‘ list to ensure the proper ordering. Finally, we call writeheader() to output the CSV header row, followed by writerows() to write out each row of stock data.

Step 6 – Scraping multiple stocks

To scrape data for multiple stocks, simply populate the ‘stocks‘ list with the desired ticker symbols. The script will loop through these, calling scrape_stock() for each one.

In between scraping each stock, we use time.sleep(1) to pause for 1 second. This is to avoid sending too many requests to Yahoo Finance in a short time period. We want to be respectful and avoid overloading their servers.

Final thoughts

With this script, you now have a reusable tool to automatically scrape Yahoo Finance and collect stock data for analysis. Selenium handles rendering the dynamic page content, while BeautifulSoup makes it easy to surgically extract just the data points we want.

This is just a starting point – feel free to customize and expand it to pull all the financial data you need. You could scrape historical prices, income statements, analyst ratings, and more. Just be sure to inspect the page HTML to find the right CSS selectors to extract the desired data.

I hope this tutorial helps you get started with web scraping Yahoo Finance using Python and Selenium! Let me know in the comments if you have any questions.

The full code is available on GitHub.

Similar Posts