Longitudnal data analysis in R studio

Spring 2024
Assignment 2: SOC386L/SDS385
Briefly answer the following questions. Please include selected segments of computer code you used
in your write-up if needed. You may consult with other students, but the work you hand in should
reflect your own thinking.
Background. The data set (RiesbyWide.dta) posted on Canvas comes from a study by Riesby
et al. (1977) in which hospitalized depressed patients were treated for 5 weeks with imipramine.
Clinical rating of depressive symptoms was carried out once weekly by Hamilton’s Rating Scale
(HRS). The data are in wide form as follows: id HamD0 – HamD5, EndoG, where id is the subject
identifier and HamD0-HamD5 are depression measures (HamD0 is the baseline or initial measurement)
and Endog identifies subjects whose depression, as classified on the WHO Depression Scale, was
“endogenous” or “non-endogenous” depression, with “endogenous” roughly pertaining to clinical
depression in current practice.
10 points each.
1. Convert these data from the individual-level format to the “person-period” format (i.e., from
wide to long). Carry out a basic exploratory analysis by graphing the raw trajectories for
each subject (or randomly selected subjects) along with the linear fit through each individual’s
measurements in side-by-side plots (e.g., spaghetti plots). Comment on whether a linear growth
curve model would seem appropriate given the observed and fitted patterns.
2. Fit an unconditional means model to these data and assess whether this model is an improvement over a simple regression model and why?
3. Fit an unconditional growth model to these data using a full model specification for the variance
components (i.e., a model that estimates the variance of the random slope and intercept as
well as their covariance). Use a likelihood ratio test to determine if this model offers an
improvement over the unconditional means model.
4. Using a likelihood ratio test, test whether a more parsimonious unconditional growth model fits
these data as well as the model used in Question 3. Use the preferred model as the “working”
model. Interpret the fixed effects and variance components from your working unconditional
growth model.
5. Under the assumptions of normality of the MLEs (i.e., asymptotic normality) we can determine
some aspects of the distribution of random slopes in the depressed population. Using the slope
fixed effect and the estimated variance of the random slope as a gauge of variability of the
individual slopes around the fixed effect, about what proportion of the random slopes in the
depressed population would you expect to be greater than 0?
6. Using the properties of MLEs we can determine some aspects of the distribution of random
intercept (i.e., initial levels) in the depressed population. Using the intercept fixed effect and
the estimated variance of the random intercept as gauge of variation in initial depression levels,
about what proportion of the random intercepts in the depressed population would be expected
be less than 20?
7. Obtain estimates of the random slope by predicting the random effect U1i for each subject and
adding these to the fixed effect βb1. Obtain estimates of the random intercept by predicting the
random effect U0i for each subject and adding these to the fixed effect βb0. Provide a histogram
of the random slopes and intercepts.
1
8. Using your preferred model from Question 4, assess whether there is evidence that the “endogenous depression” classification is able to statistically differentiate between depression patients’
initial levels of depression and also whether there is evidence that this classification moderates
the change in depression scores over time.
Extra Credit Problems. (2 points) Consider the OLS estimates of the individual-level slopes
obtained from the marital adjustment data (see handouts 1-2). Use the information in the handout
to obtain estimates of the slopes and their standard errors. We can follow the steps spelled out in
the Bryk and Raudenbush article (available on Canvas) to improve upon these estimates as follows:
• Denote the slope for the ith subject by bi
. The total variance in the individual slope is a
function of the sampling variance and the parameter variance.
• The parameter variance is simply the squared standard error of the slope from the OLS model,
denoted by se(bi)
2 = vi
.
• The parameter variance is the variance in the slopes, estimated as τ =
P
i
(bi − ¯b)
2/(n − 1).
• Define the weight function
Wi = τ /(τ + vi)
• An improved estimate can be obtained as a weighted average of the OLS slope and the average
slope. That is,
b
∗ = biWi + (1 − Wi)
¯b
1. Use the individual slope estimates and their std. errors from handouts 1-2 to obtain the
improved estimates and plot them against the original OLS estimates. Explain in what sense
these estimates are “improved.”
2. A simplified growth model can be fit using “centered” data, where the centered dependent
variable is y
c
ij = yij − y¯i and t
c
ij = tij − t¯i
. Consider the following model
y
c
ij = βit
c
ij + ε
and
βi = β + Ui
Discuss the differences between this model and the standard linear growth model. Fit this
model as a linear mixed model.
3. Compare the slope estimate (fixed effect) to the one from the centered model. Explain why
they are similar or different.
4. Obtain the empirical Bayes estimates of the level-2 residuals, the Ubi
. Compute the EB estimates of the slope as βb + Ubi
. Compare these to the ones computed in the first extra-credit
problem.
2Spring 2024
Assignment 2: SOC386L/SDS385
Briefly answer the following questions. Please include selected segments of computer code you used
in your write-up if needed. You may consult with other students, but the work you hand in should
reflect your own thinking.
Background. The data set (RiesbyWide.dta) posted on Canvas comes from a study by Riesby
et al. (1977) in which hospitalized depressed patients were treated for 5 weeks with imipramine.
Clinical rating of depressive symptoms was carried out once weekly by Hamilton’s Rating Scale
(HRS). The data are in wide form as follows: id HamD0 – HamD5, EndoG, where id is the subject
identifier and HamD0-HamD5 are depression measures (HamD0 is the baseline or initial measurement)
and Endog identifies subjects whose depression, as classified on the WHO Depression Scale, was
“endogenous” or “non-endogenous” depression, with “endogenous” roughly pertaining to clinical
depression in current practice.
10 points each.
1. Convert these data from the individual-level format to the “person-period” format (i.e., from
wide to long). Carry out a basic exploratory analysis by graphing the raw trajectories for
each subject (or randomly selected subjects) along with the linear fit through each individual’s
measurements in side-by-side plots (e.g., spaghetti plots). Comment on whether a linear growth
curve model would seem appropriate given the observed and fitted patterns.
2. Fit an unconditional means model to these data and assess whether this model is an improvement over a simple regression model and why?
3. Fit an unconditional growth model to these data using a full model specification for the variance
components (i.e., a model that estimates the variance of the random slope and intercept as
well as their covariance). Use a likelihood ratio test to determine if this model offers an
improvement over the unconditional means model.
4. Using a likelihood ratio test, test whether a more parsimonious unconditional growth model fits
these data as well as the model used in Question 3. Use the preferred model as the “working”
model. Interpret the fixed effects and variance components from your working unconditional
growth model.
5. Under the assumptions of normality of the MLEs (i.e., asymptotic normality) we can determine
some aspects of the distribution of random slopes in the depressed population. Using the slope
fixed effect and the estimated variance of the random slope as a gauge of variability of the
individual slopes around the fixed effect, about what proportion of the random slopes in the
depressed population would you expect to be greater than 0?
6. Using the properties of MLEs we can determine some aspects of the distribution of random
intercept (i.e., initial levels) in the depressed population. Using the intercept fixed effect and
the estimated variance of the random intercept as gauge of variation in initial depression levels,
about what proportion of the random intercepts in the depressed population would be expected
be less than 20?
7. Obtain estimates of the random slope by predicting the random effect U1i for each subject and
adding these to the fixed effect βb1. Obtain estimates of the random intercept by predicting the
random effect U0i for each subject and adding these to the fixed effect βb0. Provide a histogram
of the random slopes and intercepts.
1
8. Using your preferred model from Question 4, assess whether there is evidence that the “endogenous depression” classification is able to statistically differentiate between depression patients’
initial levels of depression and also whether there is evidence that this classification moderates
the change in depression scores over time.
Extra Credit Problems. (2 points) Consider the OLS estimates of the individual-level slopes
obtained from the marital adjustment data (see handouts 1-2). Use the information in the handout
to obtain estimates of the slopes and their standard errors. We can follow the steps spelled out in
the Bryk and Raudenbush article (available on Canvas) to improve upon these estimates as follows:
• Denote the slope for the ith subject by bi
. The total variance in the individual slope is a
function of the sampling variance and the parameter variance.
• The parameter variance is simply the squared standard error of the slope from the OLS model,
denoted by se(bi)
2 = vi
.
• The parameter variance is the variance in the slopes, estimated as τ =
P
i
(bi − ¯b)
2/(n − 1).
• Define the weight function
Wi = τ /(τ + vi)
• An improved estimate can be obtained as a weighted average of the OLS slope and the average
slope. That is,
b
∗ = biWi + (1 − Wi)
¯b
1. Use the individual slope estimates and their std. errors from handouts 1-2 to obtain the
improved estimates and plot them against the original OLS estimates. Explain in what sense
these estimates are “improved.”
2. A simplified growth model can be fit using “centered” data, where the centered dependent
variable is y
c
ij = yij − y¯i and t
c
ij = tij − t¯i
. Consider the following model
y
c
ij = βit
c
ij + ε
and
βi = β + Ui
Discuss the differences between this model and the standard linear growth model. Fit this
model as a linear mixed model.
3. Compare the slope estimate (fixed effect) to the one from the centered model. Explain why
they are similar or different.
4. Obtain the empirical Bayes estimates of the level-2 residuals, the Ubi
. Compute the EB estimates of the slope as βb + Ubi
. Compare these to the ones computed in the first extra-credit
problem.
2