Predicting Current Quarterback Win Rates

By: Michael Del Bene

Introduction¶

In the NFL, and football in general, the Quarterback is said to be the most important player on the field. This is because he has the ball in his hands on every offensive play, and also makes a lot of the play calls. He is usually the leader of his team, on and off the field. With that, I am interested in how a Quarterbacks statistics affect their win rate so I am going to attempt to predict QB win ratios based off of their statistics using data analysis techniques.

Data Scraping¶

To begin my data science project I will need to first import all of the libraries I am going to be using. Which is what this next cell accomplishes. After this I will begin to scrape the data, which is where I gather data from a website that has the information I need to do a data analysis. I will be using www.pro-football-reference.com, a website that has all the data I wanted about Quarterbacks from the 2019 and 2020 season so far.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from bs4 import BeautifulSoup
import requests
import statsmodels.formula.api as smf
from sklearn.linear_model import LinearRegression
import statsmodels.api as sm
from statsmodels.formula.api import ols
import seaborn as sns
!pip3 install lxml

Requirement already satisfied: lxml in /opt/conda/lib/python3.8/site-packages (4.6.2)

url = "https://www.pro-football-reference.com/years/2019/passing.htm"
reqs = []

for i in range(2019, 2021):
    reqs.append(requests.get(url))
    # change 2010 to 2011 in the url, 2011 to 2012, etc.
    url = url.replace(str(i), str(i+1))
tables = []
for r in reqs:
    root = BeautifulSoup(r.content, "html.parser")
    tables.append(root.find("table"))
# converts the table to a dataframe. this is currently only the 2010 season, because
# tables[0], the first page we scraped, is 2010
temp = pd.read_html(str(tables[0]))[0]
data_2020 = pd.read_html(str(tables[1]))[0]
temp

My variable temp is storing the 2019 passer data and the variable data_2020 is storing the 2020 passer dataframe. I also want to look at QB rushing statistics so I will be scraping from a different table from the same website to accomplish this.

url = "https://www.pro-football-reference.com/years/2019/rushing.htm"
reqs = []

for i in range(2019, 2021):
    reqs.append(requests.get(url))
    # change 2010 to 2011 in the url, 2011 to 2012, etc.
    url = url.replace(str(i), str(i+1))
tables = []
for r in reqs:
    root = BeautifulSoup(r.content, "html.parser")
    tables.append(root.find("table"))
# converts the table to a dataframe. this is currently only the 2010 season, because
# tables[0], the first page we scraped, is 2010
rushing = pd.read_html(str(tables[0]))[0]
rushing_2020 = pd.read_html(str(tables[1]))[0]
rushing

The variable rushing now stores the rushing data for 2019 and rushing_2020 for the 2020 season.

Data Tidying¶

Now that I have scraped all of the necessary data I need, I will begin to tidy the data which means to prepare it to be analyzed. First, I will begin removing unnecessary rows from the data. I will also remove columns I do not need. I will do this for the 4 dataframes I am going to be working with. As part of the analysis, I have deemed it necessary to only look at QBs who started in more than half of the games that season to reduce possible bias, and to keep our stats consistent. Also, because we care about win ratio we will need to parse the column that contains wins and losses so that we can divide wins by losses and created a column called win percentage, which becomes the most important column in our dataframes because that is what we are comparing each statistic to.

# column titles. this removes them
temp = temp[~temp["GS"].str.contains("GS")]

# preparing QBrec for further analysis 

temp["QBrec"] = temp["QBrec"].astype(str)
temp = temp[~temp["QBrec"].str.contains("NaN")]
temp = temp[~temp["QBrec"].str.contains("nan")]

temp["GS"] = pd.to_numeric(temp["GS"]) 


data_2020 = data_2020[~data_2020["GS"].str.contains("GS")]
data_2020["QBrec"] = data_2020["QBrec"].astype(str)
data_2020 = data_2020[~data_2020["QBrec"].str.contains("NaN")]
data_2020 = data_2020[~data_2020["QBrec"].str.contains("nan")]

data_2020["GS"] = pd.to_numeric(data_2020["GS"])

<ipython-input-4-f7cabff60bdc>:6: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  temp["QBrec"] = temp["QBrec"].astype(str)

rushing.columns = ['Rk', 'Player', 'Tm', 'Age','Pos','G','GS','RAtt','RYds', 'RTD', '1D', 'Lng', 'RY/A', 'RY/G', 'Fmb']


# remove column titles from middle of dataframe
rushing = rushing[~rushing["GS"].str.contains("GS")]

rushing["GS"] = pd.to_numeric(rushing["GS"]) 

rushing = rushing.loc[rushing['Pos'] == 'QB']
rushing = rushing.drop(["1D", "Lng", "Tm", "Age", "Pos", "G", "GS", "Rk"], axis = 1)


rushing_2020.columns = ['Rk', 'Player', 'Tm', 'Age','Pos','G','GS','RAtt','RYds', 'RTD', '1D', 'Lng', 'RY/A', 'RY/G', 'Fmb']


# remove column titles from middle of dataframe
rushing_2020 = rushing_2020[~rushing_2020["GS"].str.contains("GS")]

rushing_2020["GS"] = pd.to_numeric(rushing_2020["GS"]) 

rushing_2020 = rushing_2020.loc[rushing_2020['Pos'] == 'QB']
rushing_2020 = rushing_2020.drop(["1D", "Lng", "Tm", "Age", "Pos", "G", "GS", "Rk"], axis = 1)
rushing_2020.head()

<ipython-input-5-c4041bf072e4>:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rushing["GS"] = pd.to_numeric(rushing["GS"])

Here, we have a list of all of the statistics represented by column heads in our dataframes.

Rk -- Rank This is a count of the rows from top to bottom. It is recalculated following the sorting of a column.

Age -- Player's age on December 31st of that year

Pos -- Position

G -- Games played

GS -- Games started as an offensive or defensive player

QBrec -- Team record in games started by this QB (regular season)

Cmp -- Passes completed

Att -- Passes attempted

Cmp% -- Percentage of Passes Completed

Yds -- Yards Gained by Passing (For teams, sack yardage is deducted from this total)

TD -- Passing Touchdowns

TD% -- Percentage of Touchdowns Thrown when Attempting to Pass

Int -- Interceptions thrown

Int% -- Percentage of Times Intercepted when Attempting to Pass

1D -- First downs passing

Lng -- Longest Completed Pass Thrown (complete since 1975)

Y/A -- Yards gained per pass attempt

AY/A -- Adjusted Yards gained per pass attempt (Passing Yards + 20 Passing TD - 45 Interceptions) / (Passes Attempted)

Y/C -- Yards gained per pass completion (Passing Yards) / (Passes Completed)

Y/G -- Yards gained per game played

Rate -- Quarterback Rating

QBR -- QBR (ESPN s Total Quarterback Rating, calculated since 2006)

Sk -- Times Sacked (first recorded in 1969, player per game since 1981)

Yds -- Yards lost due to sacks (first recorded in 1969, player per game since 1981)

NY/A -- Net Yards gained per pass attempt (Passing Yards - Sack Yards) / (Passes Attempted + Times Sacked)

ANY/A -- Adjusted Net Yards per Pass Attempt

(Passing Yards - Sack Yards + (20 Passing TD) - (45 Interceptions)) / (Passes Attempted + Times Sacked)

Sk% -- Percentage of Time Sacked when Attempting to Pass: Times Sacked / (Passes Attempted + Times Sacked)

4QC -- Comebacks led by quarterback. Must be an offensive scoring drive in the 4th quarter, with the team trailing by one score, though not necessarily a drive to take the lead. Only games ending in a win or tie are included.

GWD -- Game-winning drives led by quarterback. Must be an offensive scoring drive in the 4th quarter or overtime that puts the winning team ahead for the last time.

# Only looking at quarterbacks who started atleast half of the seasons games

data = temp[temp["GS"] >= 8]

# Drop unnecessary columns
data = data.drop(["Cmp", "Att", "TD%", "1D", "Lng", "NY/A", "ANY/A", "4QC", "GWD"], axis=1)

data_2020 = data_2020[data_2020["GS"] >= 6]

data_2020 = data_2020.drop(["Cmp", "Att", "TD%", "1D", "Lng", "ANY/A", "4QC", "GWD"], axis=1)

data.head()

# Parse the record column so we can form win percentage
data = (data
         .assign(Wins= data.QBrec.str.split('-').str.get(0),
                 Losses = data.QBrec.str.split('-').str.get(-2),
                 Ties = data.QBrec.str.split('-').str.get(-1)
            )
          )

# Creation of win percentage column based on record
data["Win Percentage"] = 100 * (data["Wins"].astype(int) + data["Ties"].astype(int)*0.5) / data["GS"].astype(int)


data = data.drop(["QBrec", "Wins", "Losses", "Ties"], axis=1)

# Merge passing and rushing data 
final = pd.merge(data, rushing, how = "inner", left_on = "Player", right_on = "Player")

final = final.fillna(0)
final.index = range(0,30) 
final.head()

# Repeat process for 2020 dataframe

data_2020 = (data_2020
         .assign(Wins= data_2020.QBrec.str.split('-').str.get(0),
                 Losses = data_2020.QBrec.str.split('-').str.get(-2),
                 Ties = data_2020.QBrec.str.split('-').str.get(-1)
            )
          )

data_2020["Win Percentage"] = 100 * (data_2020["Wins"].astype(int) + data_2020["Ties"].astype(int)*0.5) / data_2020["GS"].astype(int)


data_2020 = data_2020.drop(["QBrec", "Wins", "Losses", "Ties"], axis=1)

final_2020 = pd.merge(data_2020, rushing_2020, how = "inner", left_on = "Player", right_on = "Player")

final_2020 = final_2020.fillna(0)
final_2020.head()

Exploratory Data Analysis¶

In this section I will try to answer the question of "What Quarterback statistics affects their win ratio the most?" This will allow me to then performa exact Linear Regression analysis on these statistics so that I can find weights for them in my formula to predict win ratio. I will do this by plotting data from the database using matplotlib so that I can make visual comparisons and also will use a line of best fit to visualize trends.

The first statistic I will compare with win ratio is passing yards per game because it is generally thought that the more yards a quarterback throws for the better, but is this true?

# Yards compared to win rate 

x = final['Y/G'].astype(float) + final['RY/G'].astype(float)
y = final['Win Percentage'].astype(float)
z = np.polyfit(x = x, y = y, deg = 1)
f = np.poly1d(z)
x2 = np.linspace(x.min(), x.max(), 100)
y2 = f(x2)
plt.figure(figsize=(15,10))
plt.plot(x, y,'o', x2, y2)


plt.title("Winning Percentage vs. Total Yards per Game")
plt.xlabel("Total Yards per Game")
plt.ylabel("Winning Percentage")

for i, txt in enumerate(final["Player"]):
        plt.annotate(txt, (x[i], y[i]), size = 8)

plt.show()

Apparently the more yards a quarterback gains may not be for the better. I can actually see a negative trend on this plot which is extremely interesting. An overwhelming majority of Quarterbacks gain between 220 and 300 yards per game, but the win rates fluctuate so I can conclude that this statistic does not affect win rate. Quarterbacks like Jameis Winston, Dak Prescott, and Matthew Stafford had the most total yards per game, but were only winning about 50% or less of their games. Lets look deeper and see why this might be.

The next statistic I will look at is touchdowns per game.

x = final['TD'].astype(float)/final['G'].astype(float)
y = final['Win Percentage'].astype(float)
z = np.polyfit(x = x, y = y, deg = 1)
f = np.poly1d(z)
x2 = np.linspace(x.min(), x.max(), 100)
y2 = f(x2)
plt.figure(figsize=(15,10))
plt.plot(x, y,'o', x2, y2)


plt.title("Winning Percentage vs. Touchdowns per Game")
plt.xlabel("Touchdowns per Game")
plt.ylabel("Winning Percentage")

for i, txt in enumerate(final["Player"]):
        plt.annotate(txt, (x[i], y[i]), size = 8)

plt.show()

As expected, I can see a clear trend that the more touchdowns a quarterback throws per game the higher their win percentage will be. This makes a lot of sense because you have to score points to win games, and the quarterbacks who are throwing more touchdowns for their teams are winning more games. However, Jameis Winston, who averaged the most yards per game, also had a high touchdowns per game statistic, but his win ratio was still only around 40%. So clearly touchdowns are not the only thing that contribute to a Quarterback in the NFL's success.

I will now factor in turnovers, and compare a Quarterback's touchdowns to his fumbles and interceptions, to see if I can get more insights on why some Quarterbacks with high statistics are losing so many games.

x = final['TD'].astype(float)/(final['Int'].astype(float) + final['Int'].astype(float))
y = final['Win Percentage'].astype(float)
z = np.polyfit(x = x, y = y, deg = 1)
f = np.poly1d(z)
x2 = np.linspace(x.min(), x.max(), 100)
y2 = f(x2)
plt.figure(figsize=(15,10))
plt.plot(x, y,'o', x2, y2)


plt.title("Winning Percentage vs. Touchdowns per Turnover")
plt.xlabel("Touchdowns per Turnover")
plt.ylabel("Winning Percentage")

for i, txt in enumerate(final["Player"]):
        plt.annotate(txt, (x[i], y[i]), size = 8)

plt.show()

This looks to be the most telling statistic yet. I finally see the beloved Jameis Winston at the bottom end of this plot. I can conclude that the reason he is not winning games despite throwing for so many yards and touchdowns, is because he throws less touchdowns than he has turnovers. He is throwing touchdowns to interceptions and fumbles at a rate of almost 0.5:1 which explains why his win percentage is so low. The most efficient quarterbacks had the highest win percentages. Russell Wilson, Drew Brees, Patrick Mahomes, Lamar Jackson, and Aaron Rodgers threw between 5 and 7 touchdowns per 1 turnover and also won atleast 60% of their games. It is also worth noting that all of these players were selected to the pro bowl and Lamar was the league MVP. So, I can conclude here that Touchdowns per turnover has a direct effect on win ratio and is a very important statistic for Quarterbacks, however there is still more to discover because we still see some quarterbacks with high win percentages but low touchdowns per interceptions thrown.

Let's take a look at another negative statistic for Quarterbacks and see how many times they are sacked per game.

x = final['Sk'].astype(float)/final['G'].astype(float)
y = final['Win Percentage'].astype(float)
z = np.polyfit(x = x, y = y, deg = 1)
f = np.poly1d(z)
x2 = np.linspace(x.min(), x.max(), 100)
y2 = f(x2)
plt.figure(figsize=(15,10))
plt.plot(x, y,'o', x2, y2)


plt.title("Winning Percentage vs. Sacks per Game")
plt.xlabel("Sacks per Game")
plt.ylabel("Winning Percentage")

for i, txt in enumerate(final["Player"]):
        plt.annotate(txt, (x[i], y[i]), size = 8)

plt.show()

I can clearly see here that there is a negative trend for Quarterbacks who get sacked the most. The more sacks a quarterback takes results in a lower win percentage. I see similar names like Lamar Jackson, Patrick Mahomes, and Drew Brees at the better end of my data once again, and names like Jameis Winston and Andy Dalton towards the worse half. This is definitely an important statistic and I will use it in our calculation of rating the best winning Quarterbacks.

Let's now look at completion percentages and see how that statistic plays a factor.

x = final['Cmp%'].astype(float)
y = final['Win Percentage'].astype(float)
z = np.polyfit(x = x, y = y, deg = 1)
f = np.poly1d(z)
x2 = np.linspace(x.min(), x.max(), 100)
y2 = f(x2)
plt.figure(figsize=(15,10))
plt.plot(x, y,'o', x2, y2)


plt.title("Winning Percentage vs. Completion Percentage")
plt.xlabel("Completion Percentage")
plt.ylabel("Winning Percentage")

for i, txt in enumerate(final["Player"]):
        plt.annotate(txt, (x[i], y[i]), size = 8)

plt.show()

I see a slight trend here in better completion percentage resulting in a higher win percentage, but it is not overwhelming. Some Quarterbacks with very high win percentages have low completion percentages. This could be the result of them throwing more passes, so I will have to consider that in our calculation.

Linear Regression Models¶

I have used plots to visualize winning percentage vs qb stats, but I need to be more precise so that I can put weights on stats in my formula for predicting QB win percentage. So, I will now use a linear regression from stats models api to get an exact coefficient for how each stat affects win percentage. This coefficient will also be able to do some hypothesis testing with the information I find to test if a stat does affect win percentage or not. I can use this coefficient as the weight for the equation I make, and hopefully will be able to create a 1:1 ratio.

X = final['Sk'].astype(float)/final['G'].astype(float)
Y = final["Win Percentage"].astype(float)

X = sm.add_constant(X)

model = sm.OLS(endog=Y,exog=X)

results = model.fit()
results.summary()

I see a clear effect here as win percentage decreased by 13.162 as sacks per game increased by 1. I reject the null hypothesis that sacks per game does not affect win rate.

X = final["Cmp%"].astype(float)
Y = final["Win Percentage"].astype(float)

X = sm.add_constant(X)

model = sm.OLS(endog=Y,exog=X)

results = model.fit()
results.summary()

I can see from the regression summary that win percentage increases by 1.94% as completion percentage increases, so I reject the null hypothesis that completion percentage does not affect win percentage.

X = final["TD"].astype(float)/(final["Int"].astype(float) + final["Fmb"].astype(float))
Y = final["Win Percentage"].astype(float)

X = sm.add_constant(X)

model = sm.OLS(endog=Y,exog=X)

results = model.fit()
results.summary()

Like I saw in my plot this statistic is very important. I see that winning percentage increases by 8% as touchdowns per turnover increases by 1. Again, I am rejecting the null hypothesis.

X = final['Y/G'].astype(float) + final['RY/G'].astype(float)
Y = final["Win Percentage"].astype(float)

X = sm.add_constant(X)

model = sm.OLS(endog=Y,exog=X)

results = model.fit()
results.summary()

I see a coefficient of basically 0 here as it is less than 0.05 so I will accept the null hypothesis that total yards does not affect win percentage and I will not be using it in my formula.

Now that I have our coefficients for the statistics I analyzed to have affects on win percentage, I want to see how accurate my statistic actually is.

x = (8 * (final["TD"].astype(float)/(final["Int"].astype(float) + final["Fmb"].astype(float))) + \
          1.94 * (final["Cmp%"].astype(float)) - \
          13.162 * final['Sk'].astype(float)/final['G'].astype(float))/2
        
y = final['Win Percentage'].astype(float)
z = np.polyfit(x = x, y = y, deg = 1)
f = np.poly1d(z)
x2 = np.linspace(x.min(), x.max(), 100)
y2 = f(x2)
plt.figure(figsize=(15,10))
plt.plot(x, y,'o', x2, y2)


plt.title("Winning Percentage vs. Rating")
plt.xlabel("Rating")
plt.ylabel("Winning Percentage")

for i, txt in enumerate(final["Player"]):
        plt.annotate(txt, (x[i], y[i]), size = 8)

plt.show()

X = (8 * (final["TD"].astype(float)/(final["Int"].astype(float) + final["Fmb"].astype(float))) + \
          1.94 * (final["Cmp%"].astype(float)) - \
          13.162 * final['Sk'].astype(float)/final['G'].astype(float))/2
Y = final["Win Percentage"].astype(float)

X = sm.add_constant(X)

model = sm.OLS(endog=Y,exog=X)

results = model.fit()
results.summary()

My formula looks pretty good. I was able to get a coefficient of 1 which means that the average win rate goes up by 1 as my rating goes up by 1 which was my goal. Lets calculate how accurate the prediction was by taking the difference between my rating and the actual win percentage for each player.

final["QBwin"] = (8 * (final["TD"].astype(float)/(final["Int"].astype(float) + final["Fmb"].astype(float))) + \
          1.94 * (final["Cmp%"].astype(float)) - \
          13.162 * final['Sk'].astype(float)/final['G'].astype(float))/2


final["difference"] = abs(final["Win Percentage"] - final["QBwin"])

final["difference"].mean()

11.914310751189037

So my model is on average off by about 12% which is actually pretty good considering there are only 16 games in a season so my model on average can predict within a margin of error of 2 games. Lets take a deeper look at this distribution with a violinplot to get a better understanding of the data.

sns.violinplot(x=final["difference"])

<matplotlib.axes._subplots.AxesSubplot at 0x7f9612b90520>

So I do not see a normal distribution here which is okay because the biggest chunk of our data actually falls between 0 and 10 percent meaning I was a bit more accurate on average. The median was a round 12% and the IQR was from 5% to 20%. Lets see how the data would do in this years season.

final_2020["QBwin"] = (8 * (final["TD"].astype(float)/(final["Int"].astype(float) + final["Fmb"].astype(float))) + \
          1.94 * (final["Cmp%"].astype(float)) - \
          13.162 * final['Sk'].astype(float)/final['G'].astype(float))/2

prediction = abs(final_2020["Win Percentage"] - final_2020["QBwin"])

prediction.mean()

19.35355429143091

Considering that the model is based on the 2019 season this is not too bad. There have only been 14 games so far this season so this could get a little better once the season ends. Also the amount of games 19% represents is less so we are still around a margin of error of 2-3 games.

Policy and Insight Decisions¶

After analyzing the data from the 2019 season and comparing Quarterback statistics to their win rates I have found that the statistics that are most important are their touchdown per turnovers, completion percentage, and sacks per game. With that information being discovered teams should focus on acquiring efficient Quarterbacks. Quarterbacks who throw more touchdowns, have fewer turnovers, do not get sacked often, and complete a higher percentage of their passes simply win more games. There is a false narrative that the quarterbacks who throw for the most yards and touchdowns are the better ones, but just because they have higher stats does not mean that they are efficient. Negatives seem to outweigh the positives, like we saw with Jameis Winston, who threw for the most yards and threw for a lot of touchdowns, and yet his win rate was very bad. This is because he had a poor completion percentage and also had more turnovers than touchdowns. Efficiency reigned supreme for Quarterbacks who won the most games in the 2019 season, and teams should draft and acquire efficient quarterbacks if they want to win more games.

Conclusion¶

And the data science process is now complete. First, I scraped data from a table I found on www.pro-football-reference.com that had all of the necessary data I needed. Then I tidied this data and got rid of the data I did not need. I merged two dataframes so that I could analyze passing and rushing data for Quarterbacks in 2019. After all of this the data was ready to be analyzed, so I used matplotlib to plot statistics we hypothesized to affect win rate and then analyzed what I saw. With a line of regression I was able to see the average affect that these statistics had on the win ratio that a Quarterback had that season. Once I saw what statistics had larger affects on win rate I used statsmodels to perform a regression model on our data. Here, I was able to get exact coefficients that I could then use as weights in our formula. I was also able to confirm our hypotheses for which statistics affected win rate. Once this was completed I could then make my formula for predicting Quarterback win rate and analyze how accurate it was. I found my rating to have around a 10% margin of error for the 2019 season, and a 18% win rate for the 2020 season. I considered this to be a success considering this is only a few games off in each season. After all of this I can conclude that Quarterback win rate can be predicted with relative accuracy using Quarterback statistics, but there are many more factors that play into if a Quarterback will win a game or not. Yes, the Quarterback definitely plays a huge role in whether their team is successful or not, but they are not the only players on the team. Some teams defenses and players that surround a Quarterback are better or worse than others and they also play a role in whether the team wins or not. So, in conclusion I have found statistics that affect the win rate of a quarterback the most, and have created a formula to predict their win rate based on these statistics, but I understand that more data would need to be considered to predict the exact win rate of a Quarterback.

	Rk	Player	Tm	Age	Pos	G	GS	QBrec	Cmp	Att	...	Y/G	Rate	QBR	Sk	Yds.1	NY/A	ANY/A	Sk%	4QC	GWD
0	1	Jared Goff	LAR	25	QB	16	16	9-7-0	394	626	...	289.9	86.5	50.2	22	170	6.90	6.46	3.4	1	2
1	2	Jameis Winston	TAM	25	QB	16	16	7-9-0	380	626	...	319.3	84.3	59.1	47	282	7.17	6.15	7.0	2	2
2	3	Matt Ryan	ATL	34	QB	15	15	7-8-0	408	616	...	297.7	92.1	60.4	48	316	6.25	6.08	7.2	3	2
3	4	Tom Brady	NWE	42	QB	16	16	12-4-0	373	613	...	253.6	88.0	54.5	27	185	6.05	6.24	4.2	1	1
4	5	Carson Wentz	PHI	27	QB	16	16	9-7-0	388	607	...	252.4	93.1	64.8	37	230	5.91	6.26	5.7	2	4
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
100	98	Emmanuel Sanders	2TM	32	NaN	17	16	NaN	1	1	...	2.1	158.3	NaN	0	0	35.00	55.00	0.0	NaN	NaN
101	99	Steven Sims	WAS	22	NaN	16	2	NaN	0	1	...	0.0	39.6	NaN	0	0	0.00	0.00	0.0	NaN	NaN
102	100	Courtland Sutton *	DEN	24	WR	16	14	NaN	1	1	...	2.4	118.7	100.0	0	0	38.00	38.00	0.0	NaN	NaN
103	101	Alex Tanney	NYG	32	NaN	1	0	NaN	1	1	...	1.0	79.2	1.6	0	0	1.00	1.00	0.0	NaN	NaN
104	102	James White	NWE	27	NaN	15	1	NaN	1	1	...	2.3	118.7	99.9	0	0	35.00	35.00	0.0	NaN	NaN

	Unnamed: 0_level_0	Unnamed: 1_level_0	Unnamed: 2_level_0	Unnamed: 3_level_0	Unnamed: 4_level_0	Games		Rushing							Unnamed: 14_level_0
	Rk	Player	Tm	Age	Pos	G	GS	Att	Yds	TD	1D	Lng	Y/A	Y/G	Fmb
0	1	Derrick Henry *	TEN	25	RB	15	15	303	1540	16	73	74	5.1	102.7	5
1	2	Ezekiel Elliott*	DAL	24	RB	16	16	301	1357	12	78	33	4.5	84.8	3
2	3	Nick Chubb*	CLE	24	RB	16	16	298	1494	8	62	88	5.0	93.4	3
3	4	Christian McCaffrey*+	CAR	23	RB	16	16	287	1387	15	57	84	4.8	86.7	1
4	5	Chris Carson	SEA	25	RB	15	15	278	1230	7	75	59	4.4	82.0	7
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
340	330	Danny Vitale	GNB	26	NaN	15	4	1	3	0	0	3	3.0	0.2	0
341	331	Greg Ward	PHI	24	NaN	7	3	1	5	0	0	5	5.0	0.7	0
342	332	Trevon Wesco	NYJ	24	NaN	16	1	1	2	0	1	2	2.0	0.1	0
343	333	Mike Williams	LAC	25	WR	15	15	1	2	0	0	2	2.0	0.1	0
344	334	Jarius Wright	CAR	30	WR/wr	16	9	1	-7	0	0	-7	-7.0	-0.4	0

	Player	RAtt	RYds	RTD	RY/A	RY/G	Fmb
24	Lamar Jackson	135	828	7	6.1	63.7	9
30	Kyler Murray	123	741	11	6.0	52.9	9
31	Cam Newton	122	489	11	4.0	37.6	6
44	Josh Allen	96	383	8	4.0	27.4	9
56	Deshaun Watson	82	394	3	4.8	28.1	6

	Rk	Player	Tm	Age	Pos	G	GS	QBrec	Cmp%	Yds	...	Int%	Y/A	AY/A	Y/C	Y/G	Rate	QBR	Sk	Yds.1	Sk%
0	1	Jared Goff	LAR	25	QB	16	16	9-7-0	62.9	4638	...	2.6	7.4	7.0	11.8	289.9	86.5	50.2	22	170	3.4
1	2	Jameis Winston	TAM	25	QB	16	16	7-9-0	60.7	5109	...	4.8	8.2	7.1	13.4	319.3	84.3	59.1	47	282	7.0
2	3	Matt Ryan	ATL	34	QB	15	15	7-8-0	66.2	4466	...	2.3	7.3	7.1	10.9	297.7	92.1	60.4	48	316	7.2
3	4	Tom Brady	NWE	42	QB	16	16	12-4-0	60.8	4057	...	1.3	6.6	6.8	10.9	253.6	88.0	54.5	27	185	4.2
4	5	Carson Wentz	PHI	27	QB	16	16	9-7-0	63.9	4039	...	1.2	6.7	7.0	10.4	252.4	93.1	64.8	37	230	5.7

	Rk	Player	Tm	Age	Pos	G	GS	Cmp%	Yds	TD	...	Sk	Yds.1	Sk%	Win Percentage	RAtt	RYds	RTD	RY/A	RY/G	Fmb
0	1	Jared Goff	LAR	25	QB	16	16	62.9	4638	22	...	22	170	3.4	56.250000	33	40	2	1.2	2.5	10
1	2	Jameis Winston	TAM	25	QB	16	16	60.7	5109	33	...	47	282	7.0	43.750000	59	250	1	4.2	15.6	12
2	3	Matt Ryan	ATL	34	QB	15	15	66.2	4466	26	...	48	316	7.2	46.666667	34	147	1	4.3	9.8	9
3	4	Tom Brady	NWE	42	QB	16	16	60.8	4057	24	...	27	185	4.2	75.000000	26	34	3	1.3	2.1	4
4	5	Carson Wentz	PHI	27	QB	16	16	63.9	4039	27	...	37	230	5.7	56.250000	62	243	1	3.9	15.2	16

	Rk	Player	Tm	Age	Pos	G	GS	Cmp%	Yds	TD	...	Yds.1	NY/A	Sk%	Win Percentage	RAtt	RYds	RTD	RY/A	RY/G	Fmb
0	1	Matt Ryan	ATL	35	QB	14	14	64.2	4016	22	...	227	6.50	6.2	28.571429	25	92	1	3.7	6.6	3
1	2	Patrick Mahomes	KAN	25	QB	14	14	67.3	4462	36	...	147	7.62	3.9	92.857143	59	287	2	4.9	20.5	5
2	3	Tom Brady	TAM	43	QB	14	14	65.1	3886	32	...	128	6.70	3.4	64.285714	25	3	3	0.1	0.2	4
3	4	Justin Herbert	LAC	22	QB	13	13	66.5	3781	27	...	171	6.47	4.8	30.769231	45	199	4	4.4	15.3	7
4	5	Ben Roethlisberger	PIT	38	QB	13	13	66.2	3292	29	...	97	6.01	2.1	84.615385	21	13	0	0.6	1.0	3

Dep. Variable:	Win Percentage	R-squared:	0.220
Model:	OLS	Adj. R-squared:	0.192
Method:	Least Squares	F-statistic:	7.903
Date:	Mon, 21 Dec 2020	Prob (F-statistic):	0.00891
Time:	17:34:45	Log-Likelihood:	-124.79
No. Observations:	30	AIC:	253.6
Df Residuals:	28	BIC:	256.4
Df Model:	1
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
const	84.7075	11.075	7.649	0.000	62.022	107.393
0	-13.1620	4.682	-2.811	0.009	-22.752	-3.572

Omnibus:	1.019	Durbin-Watson:	2.053
Prob(Omnibus):	0.601	Jarque-Bera (JB):	0.821
Skew:	0.009	Prob(JB):	0.663
Kurtosis:	2.190	Cond. No.	10.4

Omnibus:	0.250	Durbin-Watson:	2.324
Prob(Omnibus):	0.883	Jarque-Bera (JB):	0.444
Skew:	0.050	Prob(JB):	0.801
Kurtosis:	2.412	Cond. No.	1.12e+03

Omnibus:	0.378	Durbin-Watson:	2.032
Prob(Omnibus):	0.828	Jarque-Bera (JB):	0.539
Skew:	-0.152	Prob(JB):	0.764
Kurtosis:	2.419	Cond. No.	3.84

Omnibus:	0.284	Durbin-Watson:	1.972
Prob(Omnibus):	0.868	Jarque-Bera (JB):	0.469
Skew:	-0.066	Prob(JB):	0.791
Kurtosis:	2.402	Cond. No.	2.15e+03

Omnibus:	2.158	Durbin-Watson:	2.163
Prob(Omnibus):	0.340	Jarque-Bera (JB):	1.273
Skew:	-0.175	Prob(JB):	0.529
Kurtosis:	2.054	Cond. No.	296.

	coef	std err	t	P>\|t\|	[0.025	0.975]
const	-70.1253	52.882	-1.326	0.196	-178.450	38.199
Cmp%	1.9427	0.822	2.364	0.025	0.259	3.626

	coef	std err	t	P>\|t\|	[0.025	0.975]
const	42.1624	4.638	9.092	0.000	32.663	51.662
0	7.9962	2.364	3.383	0.002	3.154	12.839

	coef	std err	t	P>\|t\|	[0.025	0.975]
const	66.0814	26.558	2.488	0.019	11.681	120.482
0	-0.0429	0.099	-0.433	0.669	-0.246	0.160

	coef	std err	t	P>\|t\|	[0.025	0.975]
const	-0.8900	14.531	-0.061	0.952	-30.655	28.875
0	1.0374	0.267	3.891	0.001	0.491	1.584