NBA data analysis 1913-1997 python code

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import axes3d
import seaborn as sns

from sklearn.preprocessing import scale
import sklearn.linear_model as skl_lm
from sklearn.metrics import mean_squared_error, r2_score
import statsmodels.api as sm
import statsmodels.formula.api as smf

%matplotlib inline
plt.style.use('seaborn-white')
a = pd.read_csv('/.../1913-1933nba.csv') # add your location for your file in ...

a.head()
# You can break the data down into 4 csv files a - 1913-1933; b - 1934-1959; c-1960-1979; and 1980-1997
sns.regplot(a.weight_lbs, a.height_ft, order=1, ci=None, scatter_kws={'color':'g', 's':12})
sns.regplot(b.weight_lbs, b.height_ft, order=1, ci=None, scatter_kws={'color':'r', 's':12})
sns.regplot(c.weight_lbs, c.height_ft, order=1, ci=None, scatter_kws={'color':'b', 's':12})
sns.regplot(d.weight_lbs, d.height_ft, order=1, ci=None, scatter_kws={'color':'y', 's':12})
plt.xlim(140,325)
plt.ylim(ymin=5.5);

# multiple regression lines and changing the color with letter symbol 
regr = skl_lm.LinearRegression()

X = a.weight_lbs.values.reshape(-1,1)
y = a.height_ft

regr.fit(X,y)
print(regr.intercept_)
print(regr.coef_)

# you can run regression coefficient for each dataset
sns.regplot(a.height_ft, a.born, order=1, ci=None, scatter_kws={'color':'g', 's':9})
sns.regplot(b.height_ft, b.born, order=1, ci=None, scatter_kws={'color':'r', 's':9})
sns.regplot(c.height_ft, c.born, order=1, ci=None, scatter_kws={'color':'b', 's':9})
sns.regplot(d.height_ft, d.born, order=1, ci=None, scatter_kws={'color':'y', 's':9})
plt.xlim(5.5, 7.5)
plt.ylim(1913, 1997);

# green data points and blue line indicate 1913-1933; red data points and orange line indicate 1934-1959
# blue data points and green line indicate 1960-1979; yellow data points and red line indicate 1980-1997
a[['weight_lbs', 'height_ft']].describe()
# Run the descriptive statistics for all data sets
# Create a coordinate grid
weight_lbs = np.arange(0,50)
height_ft = np.arange(0,300)

B1, B2 = np.meshgrid(weight_lbs, height_ft, indexing='xy')
Z = np.zeros((height_ft.size, weight_lbs.size))

for (i,j),v in np.ndenumerate(Z):
        Z[i,j] =(regr.intercept_ + B1[i,j]*regr.coef_[0] + B2[i,j]*regr.coef_[1])
# Create plot
fig = plt.figure(figsize=(12,8))
fig.suptitle('NBA players born between 1910 - 1997', fontsize=20)

ax = axes3d.Axes3D(fig)

ax.plot_surface(B1, B2, Z, rstride=10, cstride=5, alpha=0.4)
ax.scatter3D(a.weight_lbs, a.height_ft, a.born, c='g')
ax.scatter3D(b.weight_lbs, b.height_ft, b.born, c='r')
ax.scatter3D(c.weight_lbs, c.height_ft, c.born, c='b')
ax.scatter3D(d.weight_lbs, d.height_ft, d.born, c='y')

ax.set_xlabel('weight_lbs')
ax.set_xlim(350,150)
ax.set_ylabel('height_ft')
ax.set_ylim(5.5,8)
ax.set_zlabel('born')
ax.set_zlim(1910,1997);
sns.pairplot(a[['height_ft','weight_lbs']]);
sns.pairplot(b[['height_ft','weight_lbs']]);
sns.pairplot(c[['height_ft','weight_lbs']]);
sns.pairplot(d[['height_ft','weight_lbs']]);
sns.jointplot(x='weight_lbs',y='height_ft',data=a,kind='hex') 
# interchange the data sets into this code
a = pd.DataFrame(np.random.randn(1000, 2), columns=['height_ft', 'weight_lbs'])
a.plot.hexbin(x='height_ft',y='weight_lbs',gridsize=25,cmap='Oranges') 
# interchange the other data sets into this code

Trends on emission data since 1990 – 2015

Trends on emission data since 1990 – 2015

Source for the data: https://catalog.data.gov/dataset/greenhouse-gas-emissions-from-fuel-combustion-million-metric-tons-beginning-1990

The following plots are conducted in Seaborn Jointplot                                                                       Python code: jointplot sns.jointplot(x=’Year’,y=’Residential’,data=df,kind=’reg’)

Figure 1: Transportation emissions (in metric tons) between 1990 – 2015.

year-trans.png

Figure 2: Residential emissions (in metric tons) between 1990 – 2015.

res v year.png

Figure 3: Commercial emissions (in metric tons) between 1990 – 2015.

com v year.png

Figure 4: Electricity Generated emissions (in metric tons) between 1990 – 2015. This is a pretty strong trend of electricity generated has decreased considerably. Question might be how is the US generating enough electricity to be sustainable?

E v year.png

Figure 5: Net Electricity emissions (in metric tons) between 1990 – 2015. Curious where is the net electricity being stored? Is this coming from dams, solar, or wind?

EN v year.png

Figure 6: Year Total emissions (in metric tons) between 1990 – 2015. Overall with all the factors has a negative slope.

year v yt.png

 

 

 

 

NBA metrics on height and weight

My first post on my blog last week, I posted 3d linear regression plot. In this plot it showed all the data points of year born, height, and weight of nba players.

Here is the figure again.

nba born~weight x height.png

This figure is a bit focusing since it is lumping all the data together. (I will be posting all my python code soon).

Question: Compare data in 20-year intervals to see if nba players are really getting taller and heavier?

In doing divided up the data into 4 groups:

a (represented 1913-1933); b (represented 1934-1959); c (represented 1960-1979);              d (represented 1980-1997)

1913-1933 1934-1959 1960-1979 1980-1997
A B C D
regression coef. 4.24 4.3 4.6 4.66
mean height ft 6.31 6.48 6.56 6.598
mean weight lbs 192.7 201.8 213.1 219.9
Number of players 470 1188 1288 975

NBA weight by height over time.png

(a) green data = 1913-1933;  (b) red data = 1934-1959; (c) blue data = 1960-1979;                (d) yellow data = 1980-1997

Since 1960 NBA players have gotten significantly heavier. As the NBA game has evolved to more physical play the players must have had to compensate with putting on more weight and muscle. The height of NBA players is likely controlled by other factors such as teams wanting to move towards larger and more dominate players. By analyzing NBA draft metrics of height and weight we could start build a picture of when height started to a major asset in the NBA.

Jointplot in Seaborn – the darker hexagons indicate the highest frequency of that data.

 

NBA 1913-1934 – looking at height and weight

1913.png

NBA 1980-1997 – looking at height and weight

1980.png

Stay tuned to more interesting data analyses in the near future.

 

 

 

 

 

NBA metric height and weight (python code)

Python 3.6 using Jupyter Notebook

# %load ../standard_import.txt
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import axes3d
import seaborn as sns

from sklearn.preprocessing import scale
import sklearn.linear_model as skl_lm
from sklearn.metrics import mean_squared_error, r2_score
import statsmodels.api as sm
import statsmodels.formula.api as smf

%matplotlib inline
plt.style.use(‘seaborn-white’)

a = pd.read_csv(‘/…/1913-1933nba.csv’) # add your location for your file in …
a.head()

b = pd.read_csv(‘/…/1934-1959nba.csv’) # add your location for your file in …
b.head()

c = pd.read_csv(‘/…/1960-1979 nba.csv’) # add your location for your file in …
c.head()

d = pd.read_csv(‘/…/1980-1997 nba.csv’) # add your location for your file in …
d.head()

sns.regplot(a.weight_lbs, a.height_ft, order=1, ci=None, scatter_kws={‘color’:’g’, ‘s’:12})
sns.regplot(b.weight_lbs, b.height_ft, order=1, ci=None, scatter_kws={‘color’:’r’, ‘s’:12})
sns.regplot(c.weight_lbs, c.height_ft, order=1, ci=None, scatter_kws={‘color’:’b’, ‘s’:12})
sns.regplot(d.weight_lbs, d.height_ft, order=1, ci=None, scatter_kws={‘color’:’y’, ‘s’:12})
plt.xlim(140,325)
plt.ylim(ymin=5.5);

# multiple regression lines and changing the color with letter symbol

regr = skl_lm.LinearRegression()

X = a.weight_lbs.values.reshape(-1,1)
y = a.height_ft

regr.fit(X,y)
print(regr.intercept_)
print(regr.coef_)

# regression coefficient for 1913-1933 (4.24)

a[[‘weight_lbs’, ‘height_ft’]].describe()
# 1913-1933 note 6.3 ft mean and 192.7 lbs mean

d[[‘weight_lbs’, ‘height_ft’]].describe()
# 1980-1997 note 6.59 ft mean and 219.9 lbs mean
# increase of +0.03 ft in the mean and +6.8 lbs in the mean

# Create a coordinate grid
weight_lbs = np.arange(0,50)
height_ft = np.arange(0,300)

B1, B2 = np.meshgrid(weight_lbs, height_ft, indexing=’xy’)
Z = np.zeros((height_ft.size, weight_lbs.size))

for (i,j),v in np.ndenumerate(Z):
Z[i,j] =(regr.intercept_ + B1[i,j]*regr.coef_[0] + B2[i,j]*regr.coef_[1])

# Create plot
fig = plt.figure(figsize=(12,8))
fig.suptitle(‘NBA players born between 1910 – 1997′, fontsize=20)

ax = axes3d.Axes3D(fig)

ax.plot_surface(B1, B2, Z, rstride=10, cstride=5, alpha=0.4)
ax.scatter3D(a.weight_lbs, a.height_ft, a.born, c=’g’)
ax.scatter3D(b.weight_lbs, b.height_ft, b.born, c=’r’)
ax.scatter3D(c.weight_lbs, c.height_ft, c.born, c=’b’)
ax.scatter3D(d.weight_lbs, d.height_ft, d.born, c=’y’)

ax.set_xlabel(‘weight_lbs’)
ax.set_xlim(350,150)
ax.set_ylabel(‘height_ft’)
ax.set_ylim(5.5,8)
ax.set_zlabel(‘born’)
ax.set_zlim(1910,1997);

sns.jointplot(x=’weight_lbs’,y=’height_ft’,data=a,kind=’hex’) #1913-1933 showing the mean of height and weight
# note the pearson r is the strength of the linear relationship between the two variables 0.79

Data Science 3/27/2018

Regression analysis of NBA players since the 1950’s looking at height, weight, and year born. This figure shows a relationship since the 1950’s to taller and heavier nba players.

 

nba born~weight x height.png