hello world
python project 5
C:\Users\pc\OneDrive\Desktop\Python\Data Projects\Python\Fall 2023\Python Projects Fall 2023\Project 4
import pandas as pd
import statsmodels.api as sm
import seaborn as sns
import matplotlib.pyplot as plt
# Create a DataFrame from the provided Excel file
df = pd.read_excel("Facebook Friends.xlsx")
# Create a subset DataFrame with non-binary numerical variables
subset_df = df[['Age', 'Photos', '# of Tags', 'Albums', 'Posts', 'Replies', 'Children', 'Likes', 'Edu', 'Events', 'Friends']]
# Calculate the correlation matrix
correlation_matrix = subset_df.corr()
# Create a heatmap of the correlation matrix
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap="coolwarm", linewidths=.5)
plt.title("Correlation Heatmap")
plt.show()
# Perform single-variable linear regression analysis for each variable
for column in subset_df.columns:
X = sm.add_constant(df[column])
y = df['Friends'] # Friends as the independent variable
model = sm.OLS(y, X).fit()
print(f"Regression Analysis for {column}:")
print(model.summary())
print("\n")
Regression Analysis for Age:
OLS Regression Results
==============================================================================
Dep. Variable: Friends R-squared: 0.039
Model: OLS Adj. R-squared: 0.038
Method: Least Squares F-statistic: 29.27
Date: Sun, 01 Oct 2023 Prob (F-statistic): 8.61e-08
Time: 17:48:08 Log-Likelihood: -5572.9
No. Observations: 715 AIC: 1.115e+04
Df Residuals: 713 BIC: 1.116e+04
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 1112.7647 80.137 13.886 0.000 955.431 1270.098
Age -17.0816 3.157 -5.410 0.000 -23.280 -10.883
==============================================================================
Omnibus: 511.614 Durbin-Watson: 1.641
Prob(Omnibus): 0.000 Jarque-Bera (JB): 8730.929
Skew: 3.036 Prob(JB): 0.00
Kurtosis: 19.006 Cond. No. 92.6
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Regression Analysis for Photos:
OLS Regression Results
==============================================================================
Dep. Variable: Friends R-squared: 0.060
Model: OLS Adj. R-squared: 0.059
Method: Least Squares F-statistic: 45.79
Date: Sun, 01 Oct 2023 Prob (F-statistic): 2.75e-11
Time: 17:48:08 Log-Likelihood: -5565.0
No. Observations: 715 AIC: 1.113e+04
Df Residuals: 713 BIC: 1.114e+04
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 611.5835 25.063 24.402 0.000 562.378 660.789
Photos 0.1164 0.017 6.766 0.000 0.083 0.150
==============================================================================
Omnibus: 528.533 Durbin-Watson: 1.639
Prob(Omnibus): 0.000 Jarque-Bera (JB): 10056.607
Skew: 3.139 Prob(JB): 0.00
Kurtosis: 20.267 Cond. No. 1.68e+03
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.68e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
Regression Analysis for # of Tags:
OLS Regression Results
==============================================================================
Dep. Variable: Friends R-squared: 0.057
Model: OLS Adj. R-squared: 0.055
Method: Least Squares F-statistic: 42.87
Date: Sun, 01 Oct 2023 Prob (F-statistic): 1.12e-10
Time: 17:48:08 Log-Likelihood: -5566.4
No. Observations: 715 AIC: 1.114e+04
Df Residuals: 713 BIC: 1.115e+04
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 605.1193 25.825 23.432 0.000 554.418 655.821
# of Tags 0.1981 0.030 6.547 0.000 0.139 0.257
==============================================================================
Omnibus: 519.930 Durbin-Watson: 1.636
Prob(Omnibus): 0.000 Jarque-Bera (JB): 9709.290
Skew: 3.071 Prob(JB): 0.00
Kurtosis: 19.976 Cond. No. 1.01e+03
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.01e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
Regression Analysis for Albums:
OLS Regression Results
==============================================================================
Dep. Variable: Friends R-squared: 0.052
Model: OLS Adj. R-squared: 0.051
Method: Least Squares F-statistic: 39.25
Date: Sun, 01 Oct 2023 Prob (F-statistic): 6.45e-10
Time: 17:48:08 Log-Likelihood: -5568.1
No. Observations: 715 AIC: 1.114e+04
Df Residuals: 713 BIC: 1.115e+04
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 580.5135 28.567 20.321 0.000 524.428 636.599
Albums 6.0857 0.971 6.265 0.000 4.179 7.993
==============================================================================
Omnibus: 525.737 Durbin-Watson: 1.672
Prob(Omnibus): 0.000 Jarque-Bera (JB): 9573.127
Skew: 3.134 Prob(JB): 0.00
Kurtosis: 19.795 Cond. No. 38.5
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Regression Analysis for Posts:
OLS Regression Results
==============================================================================
Dep. Variable: Friends R-squared: 0.002
Model: OLS Adj. R-squared: 0.001
Method: Least Squares F-statistic: 1.723
Date: Sun, 01 Oct 2023 Prob (F-statistic): 0.190
Time: 17:48:08 Log-Likelihood: -5586.4
No. Observations: 715 AIC: 1.118e+04
Df Residuals: 713 BIC: 1.119e+04
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 683.6227 24.270 28.168 0.000 635.974 731.272
Posts 0.3255 0.248 1.313 0.190 -0.161 0.812
==============================================================================
Omnibus: 494.645 Durbin-Watson: 1.621
Prob(Omnibus): 0.000 Jarque-Bera (JB): 7580.646
Skew: 2.935 Prob(JB): 0.00
Kurtosis: 17.832 Cond. No. 106.
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Regression Analysis for Replies:
OLS Regression Results
==============================================================================
Dep. Variable: Friends R-squared: 0.001
Model: OLS Adj. R-squared: -0.000
Method: Least Squares F-statistic: 0.6610
Date: Sun, 01 Oct 2023 Prob (F-statistic): 0.416
Time: 17:48:08 Log-Likelihood: -5587.0
No. Observations: 715 AIC: 1.118e+04
Df Residuals: 713 BIC: 1.119e+04
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 688.2570 24.295 28.329 0.000 640.559 735.955
Replies 0.2182 0.268 0.813 0.416 -0.309 0.745
==============================================================================
Omnibus: 494.853 Durbin-Watson: 1.620
Prob(Omnibus): 0.000 Jarque-Bera (JB): 7609.023
Skew: 2.935 Prob(JB): 0.00
Kurtosis: 17.864 Cond. No. 98.1
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Regression Analysis for Children:
OLS Regression Results
==============================================================================
Dep. Variable: Friends R-squared: 0.027
Model: OLS Adj. R-squared: 0.025
Method: Least Squares F-statistic: 19.62
Date: Sun, 01 Oct 2023 Prob (F-statistic): 1.09e-05
Time: 17:48:08 Log-Likelihood: -5577.6
No. Observations: 715 AIC: 1.116e+04
Df Residuals: 713 BIC: 1.117e+04
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 727.6564 23.270 31.271 0.000 681.971 773.341
Children -150.5914 33.995 -4.430 0.000 -217.334 -83.849
==============================================================================
Omnibus: 500.357 Durbin-Watson: 1.648
Prob(Omnibus): 0.000 Jarque-Bera (JB): 7956.183
Skew: 2.969 Prob(JB): 0.00
Kurtosis: 18.225 Cond. No. 1.65
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Regression Analysis for Likes:
OLS Regression Results
==============================================================================
Dep. Variable: Friends R-squared: 0.054
Model: OLS Adj. R-squared: 0.052
Method: Least Squares F-statistic: 40.56
Date: Sun, 01 Oct 2023 Prob (F-statistic): 3.42e-10
Time: 17:48:08 Log-Likelihood: -5567.5
No. Observations: 715 AIC: 1.114e+04
Df Residuals: 713 BIC: 1.115e+04
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 618.8046 24.954 24.798 0.000 569.813 667.796
Likes 0.5321 0.084 6.369 0.000 0.368 0.696
==============================================================================
Omnibus: 456.035 Durbin-Watson: 1.635
Prob(Omnibus): 0.000 Jarque-Bera (JB): 5831.785
Skew: 2.685 Prob(JB): 0.00
Kurtosis: 15.920 Cond. No. 342.
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Regression Analysis for Edu:
OLS Regression Results
==============================================================================
Dep. Variable: Friends R-squared: 0.002
Model: OLS Adj. R-squared: 0.000
Method: Least Squares F-statistic: 1.351
Date: Sun, 01 Oct 2023 Prob (F-statistic): 0.246
Time: 17:48:08 Log-Likelihood: -5586.6
No. Observations: 715 AIC: 1.118e+04
Df Residuals: 713 BIC: 1.119e+04
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 711.5821 26.184 27.176 0.000 660.175 762.989
Edu -58.8805 50.661 -1.162 0.246 -158.343 40.582
==============================================================================
Omnibus: 497.249 Durbin-Watson: 1.628
Prob(Omnibus): 0.000 Jarque-Bera (JB): 7766.707
Skew: 2.949 Prob(JB): 0.00
Kurtosis: 18.030 Cond. No. 2.46
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Regression Analysis for Events:
OLS Regression Results
==============================================================================
Dep. Variable: Friends R-squared: 0.004
Model: OLS Adj. R-squared: 0.003
Method: Least Squares F-statistic: 3.057
Date: Sun, 01 Oct 2023 Prob (F-statistic): 0.0808
Time: 17:48:08 Log-Likelihood: -5585.8
No. Observations: 715 AIC: 1.118e+04
Df Residuals: 713 BIC: 1.118e+04
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 681.4055 23.865 28.552 0.000 634.551 728.260
Events 1.6319 0.933 1.748 0.081 -0.201 3.465
==============================================================================
Omnibus: 491.715 Durbin-Watson: 1.621
Prob(Omnibus): 0.000 Jarque-Bera (JB): 7522.138
Skew: 2.910 Prob(JB): 0.00
Kurtosis: 17.785 Cond. No. 27.3
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Regression Analysis for Friends:
OLS Regression Results
==============================================================================
Dep. Variable: Friends R-squared: 1.000
Model: OLS Adj. R-squared: 1.000
Method: Least Squares F-statistic: 2.299e+33
Date: Sun, 01 Oct 2023 Prob (F-statistic): 0.00
Time: 17:48:08 Log-Likelihood: 19526.
No. Observations: 715 AIC: -3.905e+04
Df Residuals: 713 BIC: -3.904e+04
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const -2.132e-14 1.92e-14 -1.113 0.266 -5.89e-14 1.63e-14
Friends 1.0000 2.09e-17 4.79e+16 0.000 1.000 1.000
==============================================================================
Omnibus: 522.883 Durbin-Watson: 0.689
Prob(Omnibus): 0.000 Jarque-Bera (JB): 9335.266
Skew: 3.117 Prob(JB): 0.00
Kurtosis: 19.568 Cond. No. 1.41e+03
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.41e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
import pandas as pd
import statsmodels.api as sm
# Create a DataFrame from the provided Excel file
df = pd.read_excel("Facebook Friends.xlsx")
# Define the dependent variable and independent variables
dependent_variable = df['Friends']
independent_variables = df[['Age', 'Photos', '# of Tags', 'Albums', 'Posts', 'Replies', 'Children', 'Likes', 'Edu', 'Events']]
# Add a constant (intercept) to the independent variables
independent_variables = sm.add_constant(independent_variables)
# Perform multivariate linear regression
model = sm.OLS(dependent_variable, independent_variables).fit()
# Print the regression summary
print(model.summary())
OLS Regression Results
==============================================================================
Dep. Variable: Friends R-squared: 0.145
Model: OLS Adj. R-squared: 0.133
Method: Least Squares F-statistic: 11.99
Date: Sun, 01 Oct 2023 Prob (F-statistic): 2.90e-19
Time: 17:51:38 Log-Likelihood: -5531.1
No. Observations: 715 AIC: 1.108e+04
Df Residuals: 704 BIC: 1.113e+04
Df Model: 10
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 782.3715 94.103 8.314 0.000 597.616 967.127
Age -9.5473 3.719 -2.567 0.010 -16.849 -2.245
Photos 0.0559 0.028 2.008 0.045 0.001 0.111
# of Tags 0.1037 0.036 2.908 0.004 0.034 0.174
Albums 0.8066 1.604 0.503 0.615 -2.343 3.956
Posts 1.0858 0.617 1.759 0.079 -0.126 2.298
Replies -1.1618 0.670 -1.735 0.083 -2.476 0.153
Children -51.3321 39.267 -1.307 0.192 -128.426 25.762
Likes 0.4073 0.082 4.944 0.000 0.246 0.569
Edu -53.9129 47.731 -1.130 0.259 -147.625 39.799
Events 1.0351 0.876 1.182 0.238 -0.684 2.754
==============================================================================
Omnibus: 517.634 Durbin-Watson: 1.687
Prob(Omnibus): 0.000 Jarque-Bera (JB): 9446.379
Skew: 3.060 Prob(JB): 0.00
Kurtosis: 19.722 Cond. No. 7.31e+03
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 7.31e+03. This might indicate that there are
strong multicollinearity or other numerical problems.