Pulling yahoo finance data with python

Python is fun. It's probably the language that is the most fun to play around with data in. I've been messing around with Yahoo finance data recently and I've always wanted to make it big on the stock market so let's get started making millions!

You'll need Python, the requests library and the pandas library.

Grab Tesla Motor's price history

Yahoo has a nice rest api for grabbing share market price history. For instance to download the price history for Tesla Motors we just need to download the link: http://ichart.finance.yahoo.com/table.csv?s=TSLA

import requests
import StringIO

def downloadFile(url):
    req = requests.get(url)
    if req.status_code != 200:
        raise Exception(req.status_code, 'Error downloading file ' + url)
    else:
        return req.text

csv = downloadFile('http://ichart.finance.yahoo.com/table.csv?s=TSLA')

This should print the contents of the TSLA csv price history:

>>> print(csv)
Date,Open,High,Low,Close,Volume,Adj Close
2013-09-27,187.52,191.28,186.43,190.90,5916400,190.90
2013-09-26,186.70,189.68,185.61,188.64,6614400,188.64
2013-09-25,183.56,186.30,180.50,185.24,8239000,185.24
2013-09-24,179.14,184.96,177.65,182.33,6273400,182.33
2013-09-23,184.48,185.48,177.11,181.11,8173400,181.11
2013-09-20,178.90,185.83,178.56,183.39,13376700,183.39
2013-09-19,170.80,180.47,169.08,177.92,15578300,177.92
2013-09-18,167.07,167.45,164.20,166.22,5330600,166.22
2013-09-17,165.08,168.42,163.36,166.23,5496900,166.23
2013-09-16,168.00,170.85,165.85,166.58,7574900,166.58
... and so on

So that gives us our data. Now we can use the excellent pandas library to load it and do some calculations.

But first we need to reverse the order of the lines so that when we're calculating our moving averages we don't miss the most recent days.

lines = csv.split('\n')
lines.reverse()
lines[0] = lines[-1] #move the header row back to the top
orderedcsv = "\n".join(lines[0:-1])

Now we can read the file properly:

from pandas import *

tsla = read_csv(StringIO.StringIO(orderedcsv))

tsla is now a pandas data frame. It has columns matching the header line from our csv.

>>> tsla
<class 'pandas.core.frame.DataFrame'>
Int64Index: 819 entries, 0 to 818
Data columns (total 7 columns):
Date         819  non-null values
Open         819  non-null values
High         819  non-null values
Low          819  non-null values
Close        819  non-null values
Volume       819  non-null values
Adj Close    819  non-null values
dtypes: float64(5), int64(1), object(1)

Let's check out the first 10 rows of tsla

>>> DataFrame(tsla, index=range(0,10))
         Date   Open   High    Low  Close    Volume  Adj Close
0  2010-06-29  19.00  25.00  17.54  23.89  18766300      23.89
1  2010-06-30  25.79  30.42  23.30  23.83  17187100      23.83
2  2010-07-01  25.00  25.92  20.27  21.96   8218800      21.96
3  2010-07-02  23.00  23.10  18.71  19.20   5139800      19.20
4  2010-07-06  20.00  20.00  15.83  16.11   6866900      16.11
5  2010-07-07  16.40  16.63  14.98  15.80   6921700      15.80
6  2010-07-08  16.14  17.52  15.57  17.46   7711400      17.46
7  2010-07-09  17.58  17.90  16.55  17.40   4050600      17.40
8  2010-07-12  17.95  18.07  17.00  17.05   2202500      17.05
9  2010-07-13  17.39  18.64  16.90  18.14   2680100      18.14

Now we can use the built in pandas rolling_mean() function to calculate some moving averages:

tsla['CloseMA20'] = rolling_mean(tsla.Close, 20)
tsla['CloseMA52'] = rolling_mean(tsla.Close, 52)

#Note that I cut out a couple columns here for brevity
>>> DataFrame(tsla, index=range(0,10), columns=['Date', 'Open', 'Close', 'Volume', 'CloseMA20', 'CloseMA52'])
         Date   Open  Close    Volume  CloseMA20  CloseMA52
0  2010-06-29  19.00  23.89  18766300        NaN        NaN
1  2010-06-30  25.79  23.83  17187100        NaN        NaN
2  2010-07-01  25.00  21.96   8218800        NaN        NaN
3  2010-07-02  23.00  19.20   5139800        NaN        NaN
4  2010-07-06  20.00  16.11   6866900        NaN        NaN
5  2010-07-07  16.40  15.80   6921700        NaN        NaN
6  2010-07-08  16.14  17.46   7711400        NaN        NaN
7  2010-07-09  17.58  17.40   4050600        NaN        NaN
8  2010-07-12  17.95  17.05   2202500        NaN        NaN
9  2010-07-13  17.39  18.14   2680100        NaN        NaN

Wait a minute, why are our new columns not numbers?

This is because we calculated rolling averages for these values but we didnt yet have enough previous days to calculate the average so they become NaN. If we look a bit further in to the dataframe we'll see that we start getting values as we expect:

>>> DataFrame(tsla, index=range(18,28), columns=['Date', 'Open', 'Close', 'Volume', 'CloseMA20', 'CloseMA52'])
          Date   Open  Close   Volume  CloseMA20  CloseMA52
18  2010-07-26  21.50  20.95   922200        NaN        NaN
19  2010-07-27  20.91  20.55   619700    19.8715        NaN
20  2010-07-28  20.55  20.72   467200    19.7130        NaN
21  2010-07-29  20.77  20.35   616000    19.5390        NaN
22  2010-07-30  20.20  19.94   426900    19.4380        NaN
23  2010-08-02  20.50  20.92   718100    19.5240        NaN
24  2010-08-03  21.00  21.95  1230500    19.8160        NaN
25  2010-08-04  21.95  21.26   913000    20.0890        NaN
26  2010-08-05  21.54  20.45   796200    20.2385        NaN
27  2010-08-06  20.10  19.59   741900    20.3480        NaN

>>> DataFrame(tsla, index=range(48,58), columns=['Date', 'Open', 'Close', 'Volume', 'CloseMA20', 'CloseMA52'])
          Date   Open  Close   Volume  CloseMA20  CloseMA52
48  2010-09-07  20.61  20.54   243400    19.4285        NaN
49  2010-09-08  20.66  20.90   288400    19.5220        NaN
50  2010-09-09  21.00  20.71   376200    19.6625        NaN
51  2010-09-10  20.75  20.17   386600    19.7910  19.856923
52  2010-09-13  20.89  20.72   360800    19.9110  19.795962
53  2010-09-14  20.54  21.12   654700    20.0280  19.743846
54  2010-09-15  20.98  21.98   684600    20.1695  19.744231
55  2010-09-16  22.15  20.94  2684500    20.2780  19.777692
56  2010-09-17  21.02  20.23  1198500    20.3500  19.856923
57  2010-09-20  20.67  21.06   947500    20.4480  19.958077

We now have a column of 20 day moving averages and a column of 52 day moving averages of the Close price.

Let's make ourselves a simple moving average cross trading system. We'll add a column called Crossover that tells us if today is a buy signal (when the 20 day average crossed above the 52 day average) or whether today is a sell signal (when the 20 day average crossed below the 52 day average). The values for the Crossover column will be: - 1 = Buy signal - 0 = No change - -1 = Sell signal

import math

def crossover(seriesA, seriesB):
  if len(seriesA) != len(seriesB):
    raise Exception(0, 'seriesA and seriesB are not the same length')

  comparisons = (seriesA - seriesB) / abs(seriesA - seriesB)
  cross = [0]
  for i in range(1,len(comparisons)):
    current = comparisons[i]
    prev = comparisons[i-1]
    if (math.isnan(current) or math.isnan(prev)) or current == prev:
      # No change so no crossover.  (or NaN so no crossover)
      cross.append(0)
    elif current != prev:
      cross.append(current)

  return cross

Now we have our function to calculate the crossover row, it's just a matter of adding the row to the tsla data frame:

tsla['Crossover'] = crossover(tsla.CloseMA20, tsla.CloseMA52)

We can now see which days would have been moving average crossover days and whether we needed to buy or sell:

>>> DataFrame(tsla[tsla['Crossover'] != 0], columns=['Date', 'Close', 'CloseMA20', 'CloseMA52', 'Crossover'])
           Date  Close  CloseMA20  CloseMA52  Crossover
52   2010-09-13  20.72    19.9110  19.795962          1
135  2011-01-10  28.45    28.7520  28.941346         -1
192  2011-04-01  26.66    23.7345  23.691154          1
278  2011-08-04  24.75    27.9745  28.105385         -1
323  2011-10-07  26.99    25.1540  24.968077          1
378  2011-12-27  28.57    30.4135  30.479423         -1
410  2012-02-13  31.49    29.3660  29.333654          1
460  2012-04-25  32.91    34.1180  34.261346         -1
508  2012-07-03  30.66    31.0105  30.980000          1
530  2012-08-03  27.27    30.4110  30.540385         -1
570  2012-10-01  29.16    29.2970  29.228654          1
581  2012-10-16  28.06    28.9760  29.161154         -1
599  2012-11-13  31.61    29.1675  29.115385          1
685  2013-03-20  35.95    36.3400  36.453462         -1
689  2013-03-26  37.86    36.7370  36.683462          1
>>>

So if we had been trading this particular strategy on TSLA for the past two years we would have had 15 trading events. Of course it's easy to pick a stock that has been trending heavily upward show that a simple moving average cross strategy would have made money, but hopefully this post has shown how fun and easy it is to do surprisingly powerful things with python and pandas.