Polygenic risk score-analysis of thromboembolism in patients with acute lymphoblastic leukemia

performed polygenic risk score (PRS) analysis on TE development in the cohort, progressively the PRS by increasing the p -value threshold of single nucleotide polymorphism inclusion. Results and conclusion: Eighty-nine of 1252 patients with ALL developed TE, 2.5 year cumulative incidence 7.2%. PRS of genome-wide signi ﬁ SNPs from the INVENT and UK Biobank data were not signi ﬁ cantly associated with TE, HR 1.16 ( p 0.14) and 1.02 ( p 0.86), respectively. Expanding PRS by increasing p -value threshold did not reveal polygenic overlap. However, subgroup analysis of adolescents 10.0 – 17.9 years (n = 231), revealed signi ﬁ cant polygenic overlap with the INVENT GWAS. The best ﬁ t PRS, including 16,144 SNPs, was associated with TE with HR 1.76 (95% CI 1.23 – 2.52, empirical p -value 0.02). Our results support an underlying genetic predisposition for TE in adolescents with ALL and should be explored further in future TE risk prediction models.


Introduction
Patients with acute lymphoblastic leukemia (ALL) are at increased risk of thromboembolism (TE) due primarily to the cancer, the chemotherapy treatment (not least asparaginase), and the presence of central venous lines. A meta-analysis from 2006 found a 5.2% incidence of TE in children with ALL, and several studies have found that this risk increases in adolescents > 10 years and in adults [1][2][3]. To study the genetics of TE, large sample sizes are needed [4], which is challenging in the setting of patients with ALL. A genome wide association study (GWAS) on TE in children with ALL did not reveal genome-wide significant hits, but was also underpowered [5]. In contrast, there exist large GWAS on venous TE in the general adult population, for example the 2015 GWAS from the International Network of VENous Thromboembolism Clinical Research Networks (INVENT) and the 2017 UK Biobank GWAS, which have found several single nucleotide polymorphisms (SNPs) robustly associated with venous TE from the F5, FGG, F11, ABO, F2, PROCR TSPAN15, SLC44A2, and ZFPM2 genes.
Attempts to extrapolate from the genetics of TE in the general population to the genetics of TE in patients with ALL have been many and the results diverging, making it difficult to draw sound conclusions. However, by creating polygenic risk scores (PRS) based on large GWAS on TE in the general population, we can summarize the effects of multiple SNPs to identify individuals at increased risk. Because PRS, in contrast to GWAS, are not challenged by the need for correction for multiple testing, the power to detect association is higher than when testing the SNPs individually. The PRS also allow the inclusion of SNPs that do not reach genome-wide significance, but may still play an additive role in a complex phenotype. Importantly, the PRS also allow us to investigate the extent of common genetic etiology between venous TE in the general adult population and TE in patients with ALL, which is currently unknown. Identification of overlapping etiology can subsequently be used to identify SNPs to be incorporated in prediction models to identify patients eligible for thromboprophylaxis.
We aimed to test the hypothesis that TE in ALL and TE in the general adult population have a shared genetic etiology. Secondly, we explored if this would be different for patients with ALL in different age groups. Based on large GWAS on TE in the general adult population, we performed PRS analyses on individual-level data in the Nordic Society of Pediatric Hematology and Oncology (NOPHO) ALL2008 cohort. We also investigated the effect of the 37-SNP venous TE-associated PRS from the 2019 INVENT GWAS [6] on risk of TE in ALL.

Patient population
From 7/2008 to 7/2016, patients diagnosed with ALL and treated according to the NOPHO ALL2008 study were invited to participate in genetic add-on studies. The NOPHO ALL2008 study, which was a population based treatment and research protocol for ALL patients 1.0-45.9 years old in Denmark, Estonia, Finland, Iceland, Lithuania, Norway, and Sweden, has been described in detail elsewhere [1,[7][8][9][10], and was approved by the national authorities and the relevant national or regional ethical committees in each participating country. Participation in the genetic add-on study required additional informed consent and was approved by the ethical committees in the participating countries. The study was conducted in accordance with the Declaration of Helsinki. Data on patient demographics, ALL characteristics, and treatment were collected from the NOPHO registry on October 10th 2017, and patients with ALL predisposition syndromes, such as ataxia telangiectasia or Downs syndrome, patients with bilineage or ambiguous phenotype, and patients not following the treatment protocol were excluded (Fig. 1). Samples on 1812 individuals were sent to genotyping, and after clinical and genetic quality control, 1252 patients were included in the genetic study, which has been described previously

TE events
TE events during ALL treatment were registered as part of the mandatory prospective toxicity registration in NOPHO ALL2008 [9]. TE was defined as first-time symptomatic arterial or venous TE verified by imaging or asymptomatic arterial or venous TE diagnosed by imaging due to non-TE related symptoms and requiring anticoagulation treatment. The date of TE was defined as the date of diagnostic image analysis or the date of death if diagnosed at autopsy.

Genetic data
SNP profiling of post-remission DNA in the NOPHO cohort was done using the Omni 2.5exome-8-BeadChip arrays (Illumina, San Diego, CA, USA). The imputation procedure has been described previously [11]. Standard quality control procedures were performed according to previously published criteria [12,13], excluding individuals with: (i) sex mismatch; (ii) > 2% missing genotyped SNPs; (iii) excess heterozygosity; or (iv) high relatedness/duplicate samples. SNPs were excluded based on: (i) > 2% missing genotyped individuals; (ii) minor allele frequency (MAF) < 0.01; or (iii) Hardy-Weinberg-equilibrium (p < 0.00001). Genetic ancestry was determined according to identity by state clustering analysis, removing individuals > 15 standard deviations away from the HapMap defined CEU (Northern European) centroid mean. The threshold for certainty of the SNP imputation was set at 0.7.
We obtained summary statistics from two large-scale genomic studies; the 2015 INVENT GWAS [14], comprising 7507 venous TE cases and 52,632 controls, and the 2017 UK Biobank GWAS [15], including 3920 venous TE cases and 116,868 controls, all of European ancestry. Details on the inclusion criteria and phenotype characteristics are described in the original publications [14,15].

Statistical analyses
The PRS were constructed from the summary statistics using the PRSice software [16], and the PRS weights were standardized to a Zscore with a mean of 0 and a standard deviation (SD) of 1. We first considered only genome-wide significant SNPs (p < 5 * 10 −8 ), and then gradually relaxed the p-value threshold to include more SNPs in the PRS to capture polygenicity. Power calculations performed using the Avengeme package in R [17] showed that for an overall alpha of 0.05 (supplementary material) [18], we have power > 80% when including up to 17,300 and 8300 SNPs from the 2015 INVENT and the 2017 UK Biobank summary statistics, respectively. The PRS plots were stopped at p-value threshold 0.03 (including 34-22,700 and 6-27,000 SNPs from the 2015 INVENT and 2017 UK Biobank summary statistics, respectively). The PRSice software automatically performs linkage disequilibrium (LD)-clumping. Due to the strong effect-sizes of a few well-known SNPs, we repeated the analyses excluding the LD-regions around F5 rs6025, F11 rs2036914, FGG rs2066865, and ABO rs8176719 as a sensitivity analysis (supplementary material).
Based on the 2019 INVENT GWAS meta-analysis, which also included the UK Biobank data, a 37 SNP PRS was proposed for venous TE in adults including 34 genome-wide significant variants and 3 previously identified variants with p-value < 5.6 × 10 −3 without reaching genome-wide significance [6]. Of the 37 SNPs from the 2019 INVENT PRS [6], we had missing data on 3 SNPs (F9 rs6048 and F8 rs143478537 on the X chromosome, and rs191945075 downstream of F2 with MAF 0.01), leaving 34 SNPs in the PRS. The PRS 34 was calculated as the sum of the effect alleles per patient, each multiplied by their reported effect size, and standardized to a z-score.
The PRS were analyzed using a Cox regression model of time to TE event. Patients were censored at end of ALL treatment (n = 977), loss to follow-up (n = 4), date of hematopoietic cell transplantation (n = 61), date of data collection (10.10.2017) (n = 138) or date of competing event (death, relapse or second primary malignancy) (n = 72), whichever came first. In all Cox models we controlled for age as a categorical variable (1.0-9.9 years, 10.0-17.9 years, or 18.0-45.9 years) based on our previous NOPHO study [11], sex, and the first two genetic principal components. We used a stratification approach to the subgroup analysis by age group since our study is underpowered for modeling an age-PRS interaction. PRS HRs are reported for 1 SD from the mean. The null hypothesis was that TE in ALL and TE in adults in the general population do not have a shared genetic etiology. Since we looked at a wide range of PRS, we used 10,000 permutations to calculate empirical p-values; thus properly controlling the type 1 error rate for calculating PRS at a large number of evenly spaced p-value thresholds.
A drop-out analysis was performed comparing patients in the genetic cohort with patients not included in the genetic cohort, but meeting the same clinical criteria without excluding non-European ancestry. All statistical analyses were performed using R computing software, version 3.4.3.

Results
There were 658 patients treated according to the NOPHO ALL2008 study that met the clinical inclusion criteria of the genetic cohort, but were not included due to lack of consent, failed genetic quality control or non-European ancestry, while 1252 patients were included. Drop-out analysis (see Supplementary Table S1) revealed more adults and patients with T-cell ALL among the non-included patients. There was no significant difference in number of TE events (p 0.11). Eighty-nine of 1252 patients with ALL in the genetic cohort developed TE (2.5 year cumulative incidence 7.2%, 95% confidence interval (CI) 5.7-8.6) at a median 12.7 weeks (50% range: 7.4-18.1 weeks) from diagnosis. There were 50 (56.2%) deep vein thromboses, including one combined arterial and venous event in the portal hepatic system, 25 (28.1%) cerebral sinovenous thromboses (CSVT), and 14 (15.7%) pulmonary embolisms. Data on patient characteristics and cumulative incidences of TE are displayed in Table 1. PRS of genome-wide significant SNPs from the 2015 INVENT GWAS [14], including 34 SNPS, and from the 2017 UK Biobank GWAS [15], including 6 SNPs, were not associated with increased risk of TE in the NOPHO cohort; HR 1.16 (95% CI 0.95-1.42, p 0.14) and 1.02 (95% CI 0.83-1.26, p 0.86), respectively.
A stepwise increase in the significance threshold to include more SNPs from the 2015 INVENT and the 2017 UK Biobank GWAS did not reveal evidence of polygenic overlap with TE in patients with ALL (Fig. 2).
Neither an association between the 2019 INVENT PRS 34 and TE development in patients with ALL (HR 1.12, 95% CI 0.91-1.38, p 0.28) nor in subgroup analysis of patients 10.0-17.9 years old (Table 2) was found.

Discussion
In this large cohort of patients with ALL and TE, we found neither an association for PRS including only genome-wide significant SNPs from the 2015 INVENT or the 2017 UK Biobank GWAS with TE in ALL nor an association for the proposed 2019 INVENT PRS 34 [6]. A lot of focus in the field of PRS is on identifying individuals at high risk for use in precision medicine. However, PRS can also be useful in helping us understand the shared genetic etiology between diseases, especially in situations where one of the patient populations is rare. We did not see evidence of overlapping polygenic etiology between the adult summary statistics and TE in the full NOPHO-cohort. However, there was In the subgroup analysis by age group the absolute numbers were small. In addition, the polygenic overlap curve between the INVENT summary statistics and TE in adolescents with ALL was broad with no clear peak-making it difficult to ascertain the ideal number of SNPs to include in the PRS. Using permutation analysis to correct for the multiple testing the top PRS had p-value 0.02. This means that given that there is no association, there is still a 2% chance of finding a result like ours or more extreme. Thus, it is possible that it is a chance finding, but we believe it is interesting enough to merit further investigation. We know that adolescents are at increased risk of TE compared to children < 10 years [1]. In our previous candidate SNP study, we also found the strongest effect of the significant SNPs in adolescents [11]. Adolescents go through changes in hormonal profile and it is possible that this affects the genetic risk. It is more surprising that we do not find evidence of genetic overlap in adults. However, we know that in general TE occurs more frequently in adults as part of natural aging, while children and adolescents are physiologically protected. It is reasonable that genetics might play a stronger role in the high risk situation of cancer and chemotherapy in adolescents who do not have as many additional exogenic risk factors as adults. It is also possible that some adults were protected by anticoagulation prophylaxis, which is more commonly used in adults; however, the previous clinical study on TE in the NOPHO ALL2008 cohort found that about 17% of patients ≥17 years received anticoagulation prophylaxis, but the incidence of TE was similar as for those without prophylaxis [1].
We found no effect of the top SNPs associated with venous TE in the general adult population, despite high power. Many of these top variants have been explored individually in previous studies of TE in patients with ALL, but the results have been conflicting. An example is the role of blood group O (rs8176719) and the factor V Leiden mutation (rs6025), both of which are among the strongest genome-wide significant variants in adult studies. We recently found no effect of rs8176719 and rs6025 on risk of TE in this population of patients with ALL [11].
We did not see any evidence of polygenic overlap with the 2017 UK Biobank GWAS, as the HRs were stable around 1. This may be related to slightly inferior power of the UK Biobank GWAS compared to the 2015 INVENT GWAS due to fewer cases. In addition, the 2015 INVENT meta-  analysis was based on mostly case-control studies with venous TE cases confirmed through image-analysis, while the UK Biobank data were based on electronic health records and medical code-based phenotypes, which may result in a less precise phenotype [15]. Though the 2015 INVENT and 2017 UK Biobank GWAS included both deep vein thromboses and pulmonary embolisms, they did not include CSVT, which made up 28% of the NOPHO TE events. We did not exclude the CSVT cases as they are also venous thromboembolic events and because it would reduce the power; however, if the CSVT cases have different genetic risk factors this may have attenuated the polygenic overlap. Strengths of this study are that we have a large cohort of patients with the same diagnosis and uniform treatment with prospective toxicity registration and high registration compliance [9]. The events occur early during ALL treatment and there is hardly loss of patients to followup before end of treatment. The patients censored while still on treatment at date of data collection had all completed at least the first year of treatment and were thus past the period of highest risk of TE. Additionally, we had excellent power to detect overlapping polygenetic etiology between TE in the general adult population and TE in patients with ALL for a limited number of SNPs. However, due to limited numbers we did not have power to test all PRS with p-value thresholds up to 1.0, nor a formal test of a PRS-age interaction. A limitation to the study is a delay in consent to participation in the genetic add-on study in some cases; thus there may have been early deaths that were not included. This might explain why drop-out analysis revealed more adults and patients with T-cell ALL among those not included. However, as patients who died early would have contributed very little  HRs are calculated comparing those with high PRS with the remained of the population in a cox regression model controlling for sex and the first two genetic principal components. observation time and due to our prospective study design, this should not cause a large bias to our study.

Conclusion
Part of the unexplained variation in TE development in patients with ALL may be due to genetics. However, this exploratory analysis shows that the main genetic factors associated with TE in the general adult population are not important in the setting of TE in patients with ALL. Patients with ALL are a unique group with clear clinical risk factors for TE, and the usefulness of genetic studies on TE in the general adult population is limited when it comes to understanding the etiology of TE in patients with ALL. However, we found evidence of polygenic overlap in subgroup analysis of adolescents aged 10.0-17.9 years with ALL, and we believe the genetics of TE in this group should be further explored in future risk prediction models for identification of those who might benefit from thromboprophylaxis.

Declarations of competing interest
The authors declare no conflicts of interest.