Community Health Status Indicators (CHSI) to combat obesity, heart disease, and cancer are major components of the Community Health Data Initiative. This dataset provides key health indicators for local communities and encourages dialogue about actions that can be taken to improve community health (e.g., obesity, heart disease, cancer).

The CHSI report and dataset was designed not only for public health professionals but also for members of the community who are interested in the health of their community. The CHSI report contains over 200 measures for each of the 3,141 United States counties. Although CHSI presents indicators like deaths due to heart disease and cancer, it is imperative to understand that behavioral factors such as obesity, tobacco use, diet, physical activity, alcohol and drug use, sexual behavior and others substantially contribute to these deaths.

Machine learning Python Code

Data and Resources

%matplotlib inline
import pandas as pd
import numpy as pd
from sklearn.linear_model import LinearRegression
%pylab inline
import matplotlib.pyplot as plt
Populating the interactive namespace from numpy and matplotlib
import pandas as pd 
X = pd.read_csv('/Users/annettechiu/Desktop/Health_indicators/RISKFACTORSANDACCESSTOCARE.csv')
X.head()  5 rows × 31 columns
Data Cleaning Process : Remove no-exercise , diabetes  value under -100
X = X[X['No_Exercise'] > -100]
X = X[X['Diabetes'] > -100]
I picked No_Exercise as a value because the hist grams is similar like normal distribution.
X.hist('No_Exercise');
In [132]:
X.hist('Diabetes');
In [133]:
X.hist('Diabetes');
In [134]:
#remove any points with a missing y value
filtered_data =X[~np.isnan(X["No_Exercise"])]
filtered_data.head(3)
filtered_data.columns
Out[134]:
Index([u'State_FIPS_Code', u'County_FIPS_Code', u'CHSI_County_Name',
       u'CHSI_State_Name', u'CHSI_State_Abbr', u'Strata_ID_Number',
       u'No_Exercise', u'CI_Min_No_Exercise', u'CI_Max_No_Exercise',
       u'Few_Fruit_Veg', u'CI_Min_Fruit_Veg', u'CI_Max_Fruit_Veg', u'Obesity',
       u'CI_Min_Obesity', u'CI_Max_Obesity', u'High_Blood_Pres',
       u'CI_Min_High_Blood_Pres', u'CI_Max_High_Blood_Pres', u'Smoker',
       u'CI_Min_Smoker', u'CI_Max_Smoker', u'Diabetes', u'CI_Min_Diabetes',
       u'CI_Max_Diabetes', u'Uninsured', u'Elderly_Medicare',
       u'Disabled_Medicare', u'Prim_Care_Phys_Rate', u'Dentist_Rate',
       u'Community_Health_Center_Ind', u'HPSA_Ind'],
      dtype='object')
In [135]:
filtered_data[['No_Exercise','Disabled_Medicare']].corr()
Out[135]:
No_Exercise Disabled_Medicare
No_Exercise 1.000000 -0.058422
Disabled_Medicare -0.058422 1.000000
In [136]:
filtered_data[['No_Exercise','High_Blood_Pres']].corr()
Out[136]:
No_Exercise High_Blood_Pres
No_Exercise 1.000000 -0.171865
High_Blood_Pres -0.171865 1.000000
In [137]:
filtered_data[['No_Exercise','Elderly_Medicare']].corr()
Out[137]:
No_Exercise Elderly_Medicare
No_Exercise 1.000000 -0.101842
Elderly_Medicare -0.101842 1.000000
In [138]:
filtered_data[['No_Exercise','Obesity']].corr()
Out[138]:
No_Exercise Obesity
No_Exercise 1.000000 0.066733
Obesity 0.066733 1.000000
In [139]:
filtered_data[['No_Exercise','Diabetes']].corr()
Out[139]:
No_Exercise Diabetes
No_Exercise 1.000000 0.484777
Diabetes 0.484777 1.000000
In [140]:
filtered_data[['No_Exercise','Prim_Care_Phys_Rate']].corr()
Out[140]:
No_Exercise Prim_Care_Phys_Rate
No_Exercise 1.000000 -0.305625
Prim_Care_Phys_Rate -0.305625 1.000000
In [147]:
npMatrix = np.matrix(filtered_data)
No_Exercise,Prim_Care_Phys_Rate = npMatrix[:,0], npMatrix[:,1]
mdl = LinearRegression().fit(No_Exercise,Prim_Care_Phys_Rate) # either this or the next line
#mdl = LinearRegression().fit(filtered_data[['x']],filtered_data.y)
m = mdl.coef_[0]
b = mdl.intercept_
print "formula: y = {0}x + {1}".format(m, b) # following slope intercept form 
formula: y = [ 0.45243906]x + [ 75.08070777]
In [148]:
plt.scatter(No_Exercise,Prim_Care_Phys_Rate, color='blue')
plt.plot([0,100],[b,m*100+b],'r')
plt.title('Linear Regression', fontsize = 20)
plt.xlabel('No_Exercise', fontsize = 15)
plt.ylabel('Prim_Care_Phys_Rate', fontsize = 15)
Out[148]:
<matplotlib.text.Text at 0x111f4ea50>
In [141]:
npMatrix = np.matrix(filtered_data)
No_Exercise, Diabetes = npMatrix[:,0], npMatrix[:,1]
mdl = LinearRegression().fit(No_Exercise,Diabetes) # either this or the next line
#mdl = LinearRegression().fit(filtered_data[['x']],filtered_data.y)
m = mdl.coef_[0]
b = mdl.intercept_
print "formula: y = {0}x + {1}".format(m, b) # following slope intercept form 

Leave a Reply

Your email address will not be published. Required fields are marked *