Green-Hand-Zhang Database





Kerry Back

Green, Hand, and Zhang 2017

  • 100+ predictors studied in other papers
  • monthly data, predictors known at beginning of month
  • linear regressions of returns on predictors
  • predictors are described in appendix of paper
  • Green, Hand, and Zhang, 2017

Gu, Kelly, and Xiu, 2020

  • Same predictors plus some macro variables, monthly data
  • Neural networks and random forests
  • Very high Sharpe ratios
  • Gu, Kelly, and Xiu, 2020

SQL Database

  • GHZ predictors and returns, monthly 2000-2021
  • all non-micro-cap U.S. stocks
    • meaning stocks with market caps above the NYSE 20th percentile
    • number of stocks varies by month in the range of 1,500 to 2,700
  • hosted on CloudClusters.net, no need to be on Rice network

Connect to the database

from sqlalchemy import create_engine
import pymssql
import pandas as pd

server = "mssql-82792-0.cloudclusters.net:16272"
username = "user"
password = "RiceOwls1912" 
database = "ghz"
string = "mssql+pymssql://" + username + ":" + password + "@" + server + "/" + database

conn = create_engine(string).connect()

Example: ROE (quarterly) and book-to-market

df = pd.read_sql(
    """
    select ticker, date, ret, roeq, bm
    from data
    order by ticker, date
    """, 
    conn
)
df = df.dropna()
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1018855 entries, 11 to 1022114
Data columns (total 5 columns):
 #   Column  Non-Null Count    Dtype  
---  ------  --------------    -----  
 0   ticker  1018855 non-null  object 
 1   date    1018855 non-null  object 
 2   ret     1018855 non-null  float64
 3   roeq    1018855 non-null  float64
 4   bm      1018855 non-null  float64
dtypes: float64(3), object(2)
memory usage: 46.6+ MB

df[df.ticker=="AAPL"].head()
ticker date ret roeq bm
2490 AAPL 2000-01 0.009119 0.093333 0.289471
2491 AAPL 2000-02 0.104819 0.093333 0.289471
2492 AAPL 2000-03 0.184842 0.037374 0.289471
2493 AAPL 2000-04 -0.086516 0.037374 0.290160
2494 AAPL 2000-05 -0.322922 0.037374 0.290160