The researchers follow 44 gender dysphoric (GD) adolescents (24 male and 19 female) treated with puberty blockers and given psychosocial support at the Gender Identity Development Service (GIDS) in the UK. They measure bone density, mental health, self-harm behaviour, and reported life changes at baseline, 12, 24, and 36 months. Subjects could progress to cross-sex hormones soon after age 16, so only 24 remained on blockers at 12 months and only 14 at 36 months.
The study is the follow-up to the initial results reported in Carmichael (2015).
Controversy and Errors
The study had more media coverage than typical and has been scrutinised in-depth. The research team were defendants in a High Court's judicial review of GIDS and released the study preprint on December 1st, 2020, the same day the High Court handed down the Bell Judgement. The defendants claimed to be unable to release the results during the review proceedings.
The study reports that the review by Chew et al. (2018) "included data on the physical and mental health outcomes of pubertal suppression using GnRHa in over 500 young people". However, an analysis of the citations reveals that the quoted 500 figure double-counts some study subjects. The true figure is closer to 300 (Cederblom, 2021).
Baseline mental health scores were non-clinical, and there was no significant change in mental health outcomes. Measures included the parental questionnaire Child Behaviour Checklist (CBCL) and the Youth Self Report.
An odd aspect of recent clinical practice of puberty suppression and the research literature is that, despite the seriousness of the intervention, clinicians do not expect blockers to lessen the child's current GD. Earlier literature, in contrast, hypothesised that blockers would reduce GD (Viner et al., 2010 [pdf]; Biggs, 2021). Instead, clinicians propose that (a) puberty suppression will lessen future potential dysphoria from further pubertal development. Or (b) dysphoria the child may have as an adult when they have committed to further opposite sex imitation. In other words, the intervention is experimental, but the benefits are hypothetical and impossible to measure without a (non-existent) control. In any case, GD did not improve during the follow-up period.
The study reports median and interquartile self-harm scores from the CBCL. CBCL scores can be 0 (no harm or not true), 1 (somewhat or sometimes true), or 2 (very true or often true). The median self-harm score is always 0 at baseline and in the follow-up intervals. The upper-quertile is always 1 except at the the 12 month interval when it is 2, and 0 at 36 months for the YSR. The median of a score with a 0-2 range is coarse, and its not suprising that there were no significant changes during followup.
However, an upper quartile score of 2 at 24 months implies that at least 10 out of 39 subjects reported that they "very often" harmed themselves. Given the seriousness of this result, it is odd that the researchers did not report the self-harm score as a frequency or break out the results by sex as Carmichael (2015) does for the interim results of the same cohort. In the intermim results: "a significant increase was found in the first item “I deliberately try to hurt or kill self...especially natal girls”. The researchers correct their prior results, stating:
"These data correct reports from a recent letter by [Biggs, (2020)] which used preliminary data from our study which were uncleaned and incomplete data used for internal reporting. In addition there were many statistical comparisons which inflated the risk oftype 1 error...Contrary to Biggs’s letter, we found no evidence of reductions over time in any psychological outcomes, and no material differences by sex."
The researchers could have strengthened this point by reporting the frequencies and the breakdown by sex. We note that reporting an increase in suicidal behaviour in children under their care would have been unlikely to benefit GIDS's legal defense.
The study includes self-reported life changes at 6-15 months and 15-24 months.
49% of participants reported a positive changes in mood at 6-15 months, and 24% reported negative changes in mood. At 15-24 months, the proportion reporting positive mood change reduced to around 30%. A mix of positive and negative mood changes, and negative mood changes were also around 30% at 15-24 months.
Overall wellbeing was reported as only positive by 46% and 55% at the two time periods respectively. 37% had a mix of positive and negative changes, and 12% had overall negative changes at 6-15 months.
Without a control, it is difficult to interpret these results as meaningful. The participants also reported a net improvement in friendships, therefore positive changes could be related to these extraneous factors. An initial spike in positive mood changes at the first time period, followed by a rebalancing is consistent with a placebo effect.
Participants in unblinded studies are also susceptible to the observer-expectancy effect where the researchers expectations influence the participants' responses.
During child-hood, Bone Mineral Density (BMD) increases until peak bone mass is reached in early adulthood (Boot et al, 1997).
The majority of BMD accrual occurs in puberty. Blocking puberty can halt or adversely impact BMD accrual with uncertain impacts on adult BMD, risk of fracture, and osteporosis.
Measuring absolute changes in BMD under treatment does not account for BMD impacts - a stable BMD indicates a failure to accrue adult levels. And indeed, absolute levels of BMD did not change much during the study period.
Therefore the study includes relative differences in BMD in comparison to a reference population. Reference population comparisons a measured with z-scores. A z-score of 0 represents the mean (50th percentile) for that population. A z-score of -1 represents BMD 1 standard deviation below the mean, or approximately the 16th percentile. In adults, a BMD z-score between -1 and -2.5 is considered to be osteopenia. Below -2.5 (0.6 percentile) is considered osteoporosis.
After 12 months of treatment, the mean z-score was around -1, meaning about half of participants had clinically low BMD relative to a same-age reference population. At 24 months, the mean z-score was around -1.3, meaning that about 60% had clinically low BMD. Z-scores did not change at 36 months.
Five years later, and under the duress of a judicial review, the researchers finally report that their experiemntal intervention may have impacted particpants' lifetime BMD and provided no firm mental health benefits.