[Get Solution] Estimating Models Using Dummy Variables
Instructions: Respond to at least one of your colleagues posts and provide a constructive comment on their assessment of diagnostics. Were all assumptions tested for? Are there some violations that the model might be robust against? Why or why not? Explain and provide any additional resources (i.e., web links, articles, etc.) to provide your colleague with addressing diagnostic issues. Student’s response: Using the General Social Survey dataset (mean AGE = 48.27), this paper will examine the research question: At what levels do varying categories of employment impact the number of hours spent watching television? This question is answered through the analysis of the scale variable TVHOURS (hours per day watching TV) and the categorical variable WRKSTAT (labor force status). According to Frankfort-Nachmias et al., (2020) a linear regression analysis is the most suitable mechanism to analyze these two variables. However, in order to conduct a linear regression between the two variables, several Dummy Variables must be extrapolated from the categorical variable WRKSTAT (Laureate Education, 2016). The WRKSTAT (labor force status) variable collects responses into one of nine categories. The values are collected as WORKING FULL TIME, WORKING PART TIME, TEMP NOT WORKING, UNEMPL, LAID OFF, RETIRED, SCHOOL, KEEPING HOUSE, OTHER, and NA. SPSS allows for the recoding of the WRKSTAT variable into different independent Dummy Variables with WORKING FULL TIME identified as the reference category. OTHER and NA were removed from this analysis. Once the Dummy Variables were created they could be used in a linear regression analysis. The coefficients of the model are identified in Table 1 below. Table 1. Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. Collinearity Statistics B Std. Error Beta Tolerance VIF 1 (Constant) 2.312 .170 13.632 .000 parttime .063 .426 .009 .148 .882 .937 1.067 temporary .438 .800 .031 .548 .584 .980 1.020 unemployed 1.951 .535 .207 3.648 .000 .958 1.044 retired .713 .385 .107 1.852 .065 .927 1.079 inschool .617 .615 .057 1.003 .317 .967 1.034 stayathome .723 .444 .093 1.627 .105 .942 1.062 a. Dependent Variable: HOURS PER DAY WATCHING TV Table 1 provides a visual reference for each of the Dummy Variables created from the categorical variable WRKSTAT (labor force status). With each of the Dummy Variables represented in the table, interpretation of their relationship to the constant variable (WORKING FULL TIME) can be readily accomplished. The Unstandardized Coefficients represent a value in reference to the reference category, or constant. In this analysis, the PARTTIME displays an unstandardized coefficient of .063. Analysis of this coefficient indicates that compared to the WORKING FULL TIME constant, there is a difference of .063 units in the dependent variable TVHOURS (hours per day watching TV). Said another way, there is a difference in the amount of television watched of .063 units between part time employment and full time employment. Further analysis of all Dummy Variables allows for interpretation between the data. One can recognize that PARTTIME is the closest variable to the constant, and the UNEMPLOYED is the furthest with a difference of 1.951 units. Testing of the assumptions of multiple regression models is important to ensure proper interpretation of the results (Laureate Education, 2016). In this model, all assumptions were met with some pertinent assumptions further explained below. The Analysis of Variance, or ANOVA, output indicated the overall model was significant with a reported value of .011, which falls well below the standard threshold for significance (ASA, 2016). Diagnostics of the regression model reveals data regarding the assumptions. The first piece of output is displayed in the Model Summary visualized in Table 2 below. Table 2. Model Summaryb Model R R Square Adjusted R Square Std. Error of the Estimate Durbin-Watson 1 .229a .052 .034 2.211 2.177 a. Predictors: (Constant), stayathome, temporary, inschool, unemployed, parttime, retired b. Dependent Variable: HOURS PER DAY WATCHING TV In this Table, attention is drawn to the Durbin-Watson statistic. According to Laureate Education (2016), the Durbin-Watson statistic has values from 0-4.0, and provides information regarding the independence of errors. A Durbin-Watson statistic of 2.177 indicates there is no correlation between the residuals. Further, a Durbin-Watson value of less than 1.0 or greater than 3.0 indicates a high probability of serial correlation (Laureate Education, 2016). Further review of the diagnostics draws attention back to Table 1. Attention is drawn to the Variance Inflation Factor, or VIF. As a general rule, VIF values close to 10, and certainly values greater than 10, indicate multi-collinearity within the model. With the highest VIF level reported as 1.079 from the RETIRED group, and all other groups reported as significantly less than the 10-point threshold, it is certain that the groups have a low level of correlation among each other, which leads to the acceptance of the assumption. The Cooks Distance examines undue influence, or specific outlying variables that may cause undue influence on the model that may have a significant impact. According to Laureate Education (2016), a Cooks Distance value greater than 1.0 is considered problematic. The Cooks Distance values for this model range from a minimum of .000 to a maximum of .424 which is far below the threshold of 1.0. This allows the assumption that there is no undue influence in the model. References American Statistical Association. (2016). American statistical association releases statement on statistical significance and p-values. Retrieved from https://www.amstat.org /asa/files/pdfs/P-ValueStatement.pdf Frankfort-Nachmias, C., Leon-Guerrero, A., & Davis, G. (2020). Social statistics for a diverse society (9th ed.). Sage Publications. Laureate Education (Producer). (2016). Regression diagnostics and model evaluation [Video file]. Author. Laureate Education (Producer). (2016). Dummy variables [Video file]. Author.