Introduction

Group A Streptococcus (GAS, Streptococcus pyogenes) is a human-restricted pathogen causing diseases ranging from sore throat and scarlet fever to more serious invasive infections, including soft tissue infections, pneumonia, and toxic shock, as well as auto-immune sequelae1. Although advanced age and specific presentations such as necrotising fasciitis increase the risk of death from invasive infection, the genetic background of S. pyogenes strains also contributes to the risk of mortality2,3 underlining the role of strain genotype and virulence in disease outcome. Among more than 250 recognised emm types, the emm1 genotype is most frequently associated with invasive infections in high-income countries4. emm1 strains are considered highly virulent5,6 and often acquire inactivating mutations in the covRS two-component regulator, which de-represses key virulence factors during invasive infection7. In the 1980s, emm1 emerged as a leading cause of invasive infection following several genomic changes that altered phage content and streptolysin O (SLO) expression, leading to a new clone that spread globally8.

In England, prompt notification and antibiotics are advocated for scarlet fever and invasive GAS (iGAS) infections9, however guidelines that recommend a non-treatment or delayed treatment approach to sore throat were introduced in 2008, to limit unnecessary use of antibiotics10. Unexpectedly large seasonal upsurges in scarlet fever were documented annually in England between 2014-201811,12 coinciding with the expansion and recognition of a new lineage of emm1 termed M1UK among S. pyogenes isolates5. M1UK differed from other globally circulating emm1 strains8 (hereafter referred to as M1global) by 27 signature SNPs and was characterised by increased expression of the scarlet fever toxin, streptococcal pyrogenic exotoxin A (speA)5,6,13. Two intermediate lineages, M113SNPs and M123SNPs, that share subsets of the 27 SNPs, were also identified5,6. M123SNPs expresses SpeA at the same level typical of M1UK, whereas M113SNPs does not6. By 2016, the M1UK lineage represented 84% of all emm1 invasive strains in England5, increasing to 91.5% by 202014.

The onset of the COVID-19 pandemic, and implementation of non-pharmaceutical interventions (NPI) to limit SARS-CoV2 transmission triggered a reduction in scarlet fever and iGAS notifications in 202012. However, in late 2022, a highly pronounced out-of-season upsurge in both scarlet fever and iGAS cases was reported in England, with unexpected increase in paediatric pleural empyema and several fatalities15. Similar increases in severe paediatric iGAS infections were reported worldwide16.

In this article, we show that the S. pyogenes upsurge in England and Wales was predominantly associated with M1UK, a lineage we estimate to have emerged around 2008, and, in particular, three emergent clades that are now widely dispersed. The expansion of M1UK occurred following a bottleneck in growth, likely related to reduced transmission during the COVID-19 pandemic.

Results

Trends in S. pyogenes-positive samples, England 2016–2023

S. pyogenes identified from non-sterile and sterile site samples are recorded through a national laboratory reporting system (Second Generation Surveillance System, SGSS). The typical pattern of seasonal spring-time peaks (Q1-Q2) in S. pyogenes infections was interrupted abruptly in April 2020, coinciding with NPI introduced at the onset of the COVID-19 pandemic (Fig. 1). A profound reduction in S. pyogenes-positive samples, from both sterile and non-sterile sites, lasted almost two years, ending in Q1 2022. Following cessation of widespread NPI in February 2022, a delayed seasonal increase in microbiologically-confirmed S. pyogenes infections returned in April 2022, subsiding only in Q3 2022, in keeping with the UK summer vacation period. Unexpectedly, a second, exponential increase in S. pyogenes samples occurred in Q4 of 2022 (Fig. 1). This marked increase in microbiologically-confirmed infections peaked in week 49, when 8906 non-sterile site and 241 sterile site S. pyogenes-positive samples were recorded (Fig. 1), coinciding with increased disease notifications15,17.

Fig. 1: Trend in S. pyogenes-positive samples, England 2016–2023.
figure 1

Data show absolute numbers of weekly S. pyogenes-positive samples from non-sterile sites (light blue bars, left hand, positive axis) and sterile sites (dark blue bars, right hand, negative axis) recorded by the Second Generation Surveillance System (SGSS) in England, by week and by year. Timing of non-pharmaceutical interventions (NPI) related to COVID-19 in England is indicated by the horizontal bar: red, lockdown periods; orange, legally enforced NPI including no mixing; yellow, non-severe NPI. Schools were closed during lockdown periods and between the two later lockdown periods except for children of key workers and vulnerable children. Source data are provided as a Source Data file Fig. 1.

S. pyogenes isolates cultured from iGAS cases are submitted to the national reference laboratory for emm typing. Between Q1 of January 2017 and Q1 of 2020, emm1 was the leading cause of iGAS, responsible for 16-28% of all iGAS cases; emm1 dominance was greater in children than adults (Fig. 2). During the period of COVID-19-related NPI, annual iGAS isolates reduced ~6.5-fold in children (274 isolates/year 2017-2019; 44 in 2021) and ~2.5-fold in adults (1944 isolates/year 2017-2019; 785 in 2021) (Fig. 2). The proportion of iGAS isolates that were emm1 also reduced significantly (p < 0.001), to less than 8% of all iGAS cases. From Q1 of 2022, emm1 then showed a sustained quarterly increase in frequency, peaking in Q1 of 2023. For over nine months, emm1 accounted for > 50% of all iGAS cases, coinciding with the period of upsurge (Fig. 2). Indeed, emm1 was the only genotype to expand significantly during this time, increasing from 20% to 55%. In children ( < 15 years), this increase was more apparent; emm1 accounted for 60% and 70% of iGAS in the same period (Fig. 2).

Fig. 2: Contribution of emm1 S. pyogenes to invasive group A streptococcal (iGAS) infections 2017–2023.
figure 2

emm1 isolates are shown as proportions of the total number of isolates from iGAS cases submitted to and genotyped at the national reference laboratory for each quarter of each year. A adults ( ≥ 15 years); B children <15 years. The total number of isolates from iGAS cases received by the reference laboratory and genotyped in each quarter are shown on top of each bar; emm1 proportions are shown in navy blue. Pink shaded region highlights the period of COVID-19 non pharmaceutical interventions. Q1, January-March; Q2, April-June; Q3, July- September; Q4, October-December. Statistical analysis applied to 2022-2023: one-tailed proportion test of emm1 from Q1 2022 to Q3 2023 (p-values indicated in the figure). Source data are provided as a Source Data file Fig. 2.

Population genomics of emm1 S. pyogenes strains comprising the upsurge

To investigate any genetic basis for the increase in emm1 iGAS cases, genomes of all 1092 iGAS emm1 isolates submitted to the reference laboratory from January 2022 to March 2023 were whole genome sequenced. Phylogenetic analysis revealed clustering of emm1 genomes into expected lineages. The vast majority (1001/1092, 91.8%) of isolates were M1UK, 4.1% (44/1092) were derivatives of M1UK having lost the phi5005.3 phage (and therefore lacking the phage portal protein SNP that is typical but not essential to M1UK) and 4.2% (46/1092) were M1global. Taken together, 95.7% of all emm1 strains from the 2022/2023 upsurge period were M1UK or a 26SNP derivative thereof, representing overall expansion of the lineage since 202014 (Fig. 3A). Isolates from 2022/2023 were further compared to 723 emm1 iGAS strains sequenced in the same reference laboratory between 2013-2021 to determine evidence for recent genomic change. Phylogenetic analysis of these 1815 emm1 S. pyogenes genomes associated with iGAS showed M1UK isolates from 2022/2023 to be broadly distributed across the pre-existing M1UK population, with three emergent dominant clades and several small clades formed almost exclusively of isolates from 2022/2023 (Fig. 3A). Three clades accounted for over half (54.8%) of all M1UK from 2022/2023. Clade 1 comprised 123 invasive strains exclusively from 2022/2023 and was characterised by two SNPs (Supplementary Table 1). Clade 2 comprised 166 invasive strains exclusively from 2022/2023 and was characterised by 6 SNPs, including three non-synonymous mutations (in sic1.01, pyrC and M5005_Spy1146). Clade 3 comprised 284 strains from 2022/2023, plus a single strain collected in February 2020, and was defined by 3 non-synonymous mutations (in xerD, huTu and secA). Clade 3 was enriched by invasive strains collected in southern England (70%), consistent with regional transmission. In contrast, Clades 1 and 2 had similar proportions of strains from northern (26% and 35%), southern (43% and 35%), and central regions including Wales (23% and 28%) consistent with a wider national outbreak (Fig. 3A). The average genetic distance between any two strains from Clade 1 was just 2 SNPs, while for Clades 2 and 3, the average was just 3 SNPs (Supplementary Table 2). The low diversity was consistent with rapid emergence and dispersion through the year and across the country from a recent common ancestor.

Fig. 3: Genetic analysis of 1815 emm1 S. pyogenes isolates from invasive group A streptococcal (iGAS) infections 2013-2023.
figure 3

A Phylogenetic tree comprising sequenced emm1 isolates associated with invasive infections (iGAS) from 2013-2023 sequenced at reference laboratory: Maximum likelihood phylogenetic tree constructed from 278 core SNPs (excluding recombination regions) extracted after mapping 1815 emm1 isolates to the MGAS5005 reference genome. The tree was drawn in a circular layout and rooted on outgroup genome NCTC8198. Bars in concentric circles represent (from inside to outside) regional location of isolate; collection period (pre-upsurge 2013-2021 or upsurge 2022-2023); emm1 lineage, and presence/absence of the phi5005.3 phage. Regional data have been grouped for purpose of data visualisation as follows: Northern (North-East England, North-West England, Yorks & Humber); Central and Wales (East Midlands, West Midlands, Wales); Southern (South-East England, South-West England, London); and Northern Ireland and others (comprises regions with less than 5 isolates including Scotland, Eire, Jersey, Malta). B Frequency of covR and covS non-synonymous and other mutations within M1UK and M1global isolates from invasive infections. Percentage of strains with non-synonymous mutations, deletions/insertions, or an inactive protein in 1552 M1UK and 189 M1global isolates is shown. Mutation types are indicated by coloured bars. Percentage of strains where sequence quality precluded analysis (sequence identity <90%) are in dark grey. Differences in covR and covS mutation frequency between M1global (covR 15/189; covS 34/189) and M1UK (covR 38/1552; covS 106/1552) are significant (one-tailed proportion test: covR p < 0.001; covS p < 0.001). Ten M1global isolates formed a previously unrecognised clade with covRS mutations. If all strains from this cluster are removed, the covS mutation frequency within M1global (24/179) remains significantly greater than M1UK (106/1552) strains (one-tailed proportion test covS p < 0.001). Source data are provided as a Source Data file Fig. 3.

Among the 1815 emm1 genomes associated with iGAS from 2013-2023, the clinical sources of isolates were known for most strains: 67.7% (1229/1815) were blood isolates; 6.9% (125/1815) were lower respiratory tract isolates, of which 71.2% (89/125) were pleural sample isolates, indicative of empyema (Supplementary Table 3). Overall, a higher proportion of M1UK (5.0%) isolates were associated with pleural samples compared to M1global (2.6%), in particular Clade 3 (8.4%) (Supplementary Table 4). Considering only diseases occurring in 2022/2023, inter-lineage differences were not significant, however M1global isolate numbers were very low (Supplementary Table 4). Pleural sample isolates were notably more frequent at the time of the upsurge. Despite the notable impact of the upsurge on children, no single clade was uniquely associated with a specific age group, and closely related strains ( < 3 SNPs apart) caused invasive infections in both adults and children (Supplementary Fig. 1).

The average pairwise distance within M1UK increased from 16 SNPs in 2013-2021 to 22 SNPs in 2022/2023, while the average pairwise distance within the M1global lineage increased from 39 SNPs in 2013-2021 to 55 SNPs in 2022/2023 (Supplementary Table 2). Despite the recent increase in the genetic diversity of both lineages (M1global and M1UK), M1UK showed greater genomic stability (point mutations) than M1global. Most mutations (excluding the 27 M1UK signature SNPs) were unique to individual strains outside the main clades (Supplementary Fig. 2) consistent with a rapid population size expansion. The four indels previously reported13 were present in 99% of M1UK isolates but were not lineage specific (Supplementary Data 1).

Recombination and pangenome analyses showed little evidence of gain or loss of transferable elements between M1UK and M1global, and no genomic feature(s) associated only with M1UK from 2022/2023 or M1global from 2022/2023, or the three M1UK clades previously described. Most strains had three prophages typical of emm1: Φ5005.1, which encodes speA; Φ5005.2, which encodes spd3 or spd4; and Φ5005.3, which encodes another DNase, sdaD2/sda1, reported to contribute virulence to modern M1global strains8. Although M1UK strains are characterised by increased SpeA expression, 9/1552 (0.6%) invasive M1UK strains had a partial deletion of phage Φ5005.1 including speA (Supplementary Table 5). Furthermore, 43/1552 (2.8%) invasive M1UK strains had lost Φ5005.3 and consequently cannot express sdaD2/sda1. Prophage Φ370.1 containing speC and spd1 was present in ~10% (174/1815) of emm1 strains, 9% (139/1552) in M1UK, and 16% (31/189) in M1global. Only 4/1815 emm1 strains (one M1global and three M1UK) from 2014-2020 had the ΦSP1380.vir phage (with speC, ssa, spd1) reported in Australia13 and Hong Kong18.

Emm1 invasiveness has been associated with regulatory gene mutations in vivo7. Among iGAS clinical isolates from invasive infection, mutations in the two-component regulatory genes covR and covS were significantly more frequent in M1global (7.9% and 18% respectively) than M1UK (2.4% and 6.8%) (one-tailed proportion test: covR p-value 0.001; covS p-value < 0.001) (Fig. 3B), pointing to greater selection pressure on M1global strains during invasive infection. Though observed in both sterile and non-sterile site isolates from invasive infections, this difference in the frequency of covS mutations could not be replicated by in vivo passage of non-invasive M1global and M1UK isolates in mice, although only five strains from each lineage were tested using intramuscular inoculation (Supplementary Fig. 3). Mutations in rgg1 and rgg4 were frequent, but not different between lineages (Supplementary Fig. 4). The frequency of resistance to common antimicrobials among emm1 isolates was low ( < 0.5%); furthermore, pbp2x missense mutations (T553K and P601L)19,20 were absent in our dataset (Supplementary Data 2).

Relationship between non-invasive and invasive emm1 isolates

To extend our analysis to include the reservoir of non-invasive pharyngitis S. pyogenes isolates, we sequenced 133 emm1 strains collected sequentially from pharyngitis cases in west London in 2022-2023. 14.3% (19/133) of emm1 throat isolates were M1global while 85.7% were either M1UK (111/133) or M1UK without the phi5005.3 phage (3/133). Interestingly, the proportion of non-invasive and invasive M1global isolates was higher in London than observed nationally during the same period. Phylogenetic analysis of invasive and non-invasive isolates showed that non-invasive M1UK isolates from west London clustered mostly within Clade 3 (62/111, 55.9%), with other isolates scattered throughout the wider M1UK population, including Clade 1 (8/111, 7.2%) and Clade 2 (4/111, 3.6%) (Fig. 4). The average number of mutations between two isolates from the same clade (Clade 1, 2 or 3) was 2-3 SNPs. 48% (64/133) of non-invasive isolates were found to be identical to at least one invasive isolate (0 SNPs apart, Fig. 4). Point mutations in bacterial regulatory genes in non-invasive emm1 sore throat isolates were rare ( < 5%), in comparison to invasive isolates. 5/133 (4%) of non-invasive isolates collected in London in 2022 had the ΦSP1380.vir phage.

Fig. 4: emm1 phylogenetic tree showing non-invasive sore throat isolates collected in London in 2022 with isolates from invasive infection from UK 2013–2023.
figure 4

Maximum likelihood phylogenetic tree constructed with the core alignment of 274 SNPs extracted after mapping 1815 emm1 invasive isolates and 133 non-invasive isolates against MGAS5005. The relationship between invasive and non-invasive infection isolates within Clades1-3 is shown in inset (A) Clade 3; (B) Clade 2; and (C) Clade 1. Source data are provided as a Source Data file Fig. 4.

Time and place of emergence of M1UK and intermediate lineages

To elucidate the origin and time of emergence of the M1UK lineage, a dated phylogenetic tree was constructed using a newly sequenced M1UK reference strain H1490 (NCTC14935). The tree comprised 2364 M1UK and intermediate (M113SNPs, M123SNPs and M126SNPs) genomes collected from Europe (Denmark21, Iceland21, Netherlands22), United Kingdom5,23, plus the isolates from the current study, North America (Canada24 and USA25), and Australia13 between March 2005 and July 2023. This showed M113SNPs and M123SNPs to share a common ancestor with the M1UK lineage, while M126SNPs are derivatives of M1UK that have lost the Φ5005.3 phage (Fig. 5A). According to the inferred ancestral dates in the tree, the M113SNPs lineage diverged in 2002 (95% confidence interval (CI): 2000–2004), followed by M123SNPs in 2006 (95% CI: 2004–2007), and M1UK in 2008 (95% CI: 2006–2009), prior to rapid expansion. The genome-wide mutation rate was estimated to be 1.49 nucleotide substitutions per year.

Fig. 5: Global distribution and potential introduction events of M1UK and intermediate populations.
figure 5

A Phylogenetic tree of 2364 M1UK and intermediate strains collected globally March 2005 to July 2023. The tree was built based on 3406 SNPs from a core genome alignment relative to M1UK (H1490/NCTC14935) reference genome and rooted on a closely related M1global genome gas81595 (also included in this tree). Leaves are coloured based on the country where samples were collected. Shading indicates the 3 emergent clades (Clade 1, green; Clade 2, blue; Clade 3, red). Coloured bars in concentric circles represent (from inside to outside): collection years (pre-upsurge 2013-2021 and upsurge 2022-2023); emm1 lineage; and presence/absence of the phi5005.3 phage. B Simplified transmission tree by PastML showing the ancestral epidemic location of M1UK and intermediate lineages. Each node represents a cluster of leaves sharing the same probable ancestral location and is labelled by the range of leaves numbers. Each arrow indicates inferred international transmission events; arrow width and labels indicate the number of identical origin-destination transmission events. For example, the arrow labelled “6” pointing at the node “1–4” (USA) indicates six clusters of 1 to 4 leaves were present in the USA that were likely imported from the UK. C Estimated effective population size (Ne) of M1UK in the UK through time. The red line and pink shading at each time point indicate the mean and 95% confidence interval of Ne, respectively. Source data are provided as a Source Data file Fig. 5.

Ancestral state reconstruction of geographical locations was limited to those regions that undertake and report sequencing of S. pyogenes; this revealed that M1UK, M113SNPs and M123SNPs originated in the UK and then dispersed, with multiple independent introductions into Australia, North America, Netherlands, Iceland, and Denmark (Fig. 5A-B). Denmark and UK strains collected in 2022-2023 were dispersed within the M1UK circulating population, including Clade 3, while almost all 2022-2023 Iceland isolates grouped together in Clade 2.

Bayesian inference of the M1UK effective population size through time in the UK demonstrated rapid population growth of M1UK from 2008 until 2015, followed by a progressive decline until 2019, and then a sharp decline in early 2020 (Fig. 5C). Strikingly, the population dynamics suggested a transmission bottleneck in M1UK during the implementation of severe NPI designed to limit the spread of COVID-19 (April 2020 –March 2021). The mean effective population size over this period dropped to one-fifth of the pre-pandemic maximum and then rose steeply after the lifting of the lockdown and other NPI measures. Importantly the inferred patterns of population growth and decline were not driven by any variation in the number of sequenced M1UK isolates in the UK through time (Supplementary Fig. 5).

Discussion

The marked increase in bacteriologically confirmed S. pyogenes infections in England in late 2022-2023 coincided with the reported national upsurge in notifications of both scarlet fever and iGAS15,16. The upsurge in invasive infections was clearly associated with a significant increase in emm1 S. pyogenes only, the vast majority (95.7%) of which belonged to the emergent M1UK lineage or its derivatives. No substantial genomic changes in M1UK were observed during the upsurge, but three new clades emerged and expanded within M1UK, accounting for 53% of emm1 iGAS in 2022-2023.

Several countries have now reported similar iGAS upsurges in the period 2022-2023, chronologically associated with the end of mitigation strategies implemented during the COVID-19 pandemic21,26,27,28,29, including association with emm126 or M1UK28,29. Although emm12 infections were prominent in early 202215 as reported elsewhere29,30, the very marked increase in iGAS observed in the second half of 2022 in England was accounted for by emm1. The dominance of emm1 among invasive isolates ( > 50% overall, and almost 70% in children) is unprecedented in UK records. In contrast, during the period of the COVID-19 pandemic-related NPI in 2020-2022, bacteriologically confirmed S. pyogenes infections were rare. While reduction in non-invasive infection detection might be explained by a reduction in consultations, this would not explain the reduction in sterile site isolates. Furthermore, during the period of COVID-19 NPI, invasive infections due to emm1 were exceedingly rare, with no emm1 isolates identified during some quarters of 2020-2021 in either adults or children. We posit this points to differential modes of transmission, whereby ‘throat specialist’ strains31 such as emm1 require respiratory transmission in order to circulate, while others may spread via skin contact.

The reported increase in iGAS in late 2022 was particularly evident in children, with complicated clinical presentations including meningitis28 and, specifically, rapidly progressive pleural empyema in countries where such data are collected15,29. Isolates from empyema are often not cultured due to antibiotic pre-treatment. Hence, the pleural sample isolates in the current study represent a subset of all pleural empyema cases. Regardless, pleural isolates were significantly associated with the emergent M1UK clades. The timing of the upsurge in Q3 2022 is very likely to have contributed to the pleural empyema phenotype; respiratory viral infections were identified in 25% of paediatric cases of empyema15, playing a potential role in progression to lower respiratory tract infection. Due to the design of our study and the widespread adoption of respiratory viral point-of-care tests to diagnose respiratory viral infection in 2022, we are unable to assess the effect of respiratory viral infection as a contributor to empyema over time in the current study.

M1UK is increasingly dominant in the UK. Our findings are mirrored to different degrees in other countries, where the proportion of emm1 isolates that are M1UK ranges from 41.5%-78%21,28,29. The fitness of M1UK has been attributed to its ability to express SpeA, a superantigen that can promote pharyngeal infection5. Increased SpeA is associated with a SNP in the leader sequence of ssrA13, which is present in not only M1UK but also the near-extinct intermediate M123SNPs lineage. The contraction of the M123SNPs lineage suggests that additional fitness advantages prevail in M1UK6. Genome stability appeared greater in M1UK than M1global, suggesting the accumulated 27 SNPs in M1UK may be sufficient to confer a fitness advantage during human infection, including increased transmissibility. Indeed, in one study, the mean secondary attack rate was 40% among asymptomatic contacts of M1UK infection in two classes of schoolchildren, compared with 22.8% in classroom outbreaks involving different emm types32, supporting a potential transmission advantage. In the current study, M1UK invasive isolates were significantly less likely to exhibit mutations in covRS than M1global strains, suggesting a fitness advantage in invasive infection as well. Although we were unable to reproduce this difference experimentally, the intramuscular route of infection in mice does not reflect the bottleneck of natural mucosal infection in humans and was necessarily limited to just five strains per group.

A comparison of non-invasive emm1 isolates from London and invasive emm1 isolates nationally revealed both groups to be interspersed and clustered tightly in the phylogenetic tree, indicating a common genetic pool. The analysis showed that individual invasive isolates can be derived repeatedly from the population of pharyngitis strains. The identical nature of strains underlines the route of direct transmission from cases of pharyngitis and scarlet fever to dangerous invasive infections, often unnoticed. We found that diversifying selection in the invasive population, especially in M1global, drives the accumulation of mutations in covRS, as reported33.

Our study evaluated the origin, dispersion, and population dynamics of M1UK by assembling the most comprehensive global collection of M1UK strains to date. The analysis showed M1UK to be globally distributed, with nearly identical strains found all over the world and multiple introductions from the UK population. The 2022 upsurge in the UK was characterised by the rapid expansion of three clades within M1UK, of which two showed swift dispersal to at least two other European countries. In Iceland, a single introduction event appeared responsible for reported M1UK cases, whereas in Denmark, multiple introductions seemed likely. We found no evidence of importation of a new lineage recently reported in Denmark (M1DK)21.

The origin of M1UK was estimated to date from 2008, the year in which national guidelines to reduce swab testing and unnecessary antibiotic treatment of sore throat were introduced in England10. An exponential increase in the M1UK population commenced around 2010. Given the propensity for M1UK to spread readily in classrooms32, it is conceivable that new lineages can emerge and rapidly expand if active S. pyogenes throat infections are not detected and treated with antibiotics and transmission is not controlled. Antecedent intermediate lineages emerged in 2002 (M113SNPs) and 2006 (M123SNPs), during which time secular changes in sore throat management were ongoing in the UK34,35.

Our dataset is limited to the UK and other high-income temperate countries, hence no inferences about the importation of M1UK into low-income countries were possible. This underlines the importance of global surveillance to monitor the evolution and epidemiology of emerging variants with increased capacity for pathogenicity. Although M1UK geographic origin was identified as the UK, this was the only country with genomes available from the time of emergence, as such, we cannot exclude an alternate origin.

The phylodynamic analysis of M1UK in the UK showed a decline in population size between 2015-2019 after the initial rapid rise, consistent with the cyclical changes in S. pyogenes populations known to occur36, however population size plummeted in early 2020 when NPI to combat spread of COVID-19 were introduced. The marked M1UK population bottleneck was followed by rapid expansion in 2022 and 2023, raising the question of whether strain-specific survival advantages exist during periods of such low S. pyogenes population activity. Global reductions in other bacterial respiratory pathogens were seen during the period of COVID-19 NPI37. However, the scale of resurgence in invasive S. pyogenes following the relaxation of NPI thus far appears unique, perhaps related to the lack of a vaccine for S. pyogenes compared with other pathogens studied37. The observed magnitude and severity of the upsurge could be explained by the coincidence of enhanced M1UK pathogenicity and diminished human population immunity to S. pyogenes, as a predictable but perhaps unintended consequence of interventions to limit the spread of COVID-1938. The role of exposure-driven human immunity in shaping cyclical and post-COVID-19 changes in S. pyogenes epidemiology is the subject of ongoing research. Scarlet fever affects children in their first year of school39, an experience that was delayed for many during two years of COVID-19-related NPI. We hypothesise this resulted in a ~ 3-fold increase in susceptible children starting school in Q3 2022, with a similar reduction in immunity in siblings and adults. We posit that the transmissibility and invasiveness of M1UK facilitated the exponential and unprecedented increase in invasive S. pyogenes infections.

Methods

Surveillance of S. pyogenes detection in clinical samples in England

UK Health Security Agency surveillance of infections for health protection purposes is approved under Regulation 3 of The Health Service (Control of Patient Information) Regulations 2020 and under Section 251 of the NHS Act 2006. All reports of S. pyogenes-positive clinical samples, including post-mortem, from ISO-8601 week 1 2016 to week 30 2023 reported by English laboratories were extracted from the UK Health Security Agency (UKHSA) Second Generation Surveillance System (SGSS) on 7 December 2023. SGSS captures approximately 98% of electronically supplied hospital microbiology laboratory data in England; however, is the primary route for statutory reporting40 of laboratory-confirmed invasive S. pyogenes infections. Invasive S. pyogenes samples are defined as culture-positive samples (or positive by molecular detection) obtained from a normally sterile site. S. pyogenes-positive samples were deduplicated where patients had more than one positive S. pyogenes similar specimen type taken on the same date.

Invasive Streptococcus pyogenes isolates

S. pyogenes isolates from invasive disease (iGAS) cases in England, Wales, and Northern Ireland are routinely submitted to the national reference laboratory (SSRS, Staph and Strep Reference Section, UKHSA, London, UK) for emm genotyping using standard methods (https://www.cdc.gov/streplab/groupa-strep/emm-typing-protocol.html). Processes and reporting requirements for isolate submission, including clinical sample source, were unchanged during the study period. The percentage of invasive isolates that were determined to be emm1 was determined compared with the overall total number of isolates genotyped. As part of the investigation into the upsurge of S. pyogenes, all S. pyogenes isolates from 2022/23 were whole genome sequenced (WGS). For this study, we included all emm1 isolates from invasive infections that had been genome sequenced at the reference laboratory from 2014–2023, including a small number from other regions. This included emm1 isolates from 2014-2015, previously reported (n = 516)41; emm1 isolates from 2016–2021 (n = 207) intermittently sequenced as part of service delivery; and all emm1 strains (n = 1092) submitted to the reference laboratory from January 2022-March 2023 that were sequenced as part of this outbreak investigation. Metadata and accessions for all isolate genome sequences are listed in Supplementary Data 3. Isolate WGS was linked to reported clinical sample type. Differences in the proportion of emm1 between time points were evaluated using a one-tailed proportion test (https://www.socscistatistics.com/tests/ztest/).

Non-invasive S. pyogenes isolates

The collection and analysis at Imperial College London of fully anonymised bacterial isolates from a diagnostic laboratory previously linked to routine data was approved by a national research ethics committee (West London Research Ethics Committee 06/Q0406/20). S. pyogenes throat isolates were identified by MALDI-Biotyper (Bruker) from swabs submitted to the Diagnostic Laboratory at Imperial College Healthcare NHS Trust (London, UK) during 2022 (1 January - 31 December). This laboratory serves northwest London, a population of ~2 million people, representing ~3.5% of the population of England. S. pyogenes isolates were cultured on Columbia Blood Agar (CBA, Oxoid, Basingstoke, UK) or in Todd Hewitt broth (Oxoid) at 37 °C with 5% CO2. Demographic data were linked to all isolates and anonymised in accordance with the approved protocol (06/Q0406/20). All emm1 pharyngitis isolates (from throat swabs) were genome sequenced at the National Reference Laboratory (Supplementary data 3).

Genomic data contextualisation

Three different genomic datasets were included in this study. The first contains 1815 (1092 newly sequenced from 2022-2023; and 723 from 2013-2021) emm1 strains associated with invasive infections collected at the national level and sequenced at the UKHSA national reference laboratory from 2013 to 2023 (Supplementary Data 3); 12 isolates were from outbreak investigations. The second dataset contained the 1815 invasive strains described above plus 133 newly sequenced non-invasive emm1 isolate whole genome sequences (WGS) collected in London during 2022 as part of this study (1 January to 31 December), yielding a total of 1948 S. pyogenes isolate WGS (Supplementary Data 3). The third dataset was created to provide phylogenetic context for the M1UK global population and intermediate strains only. This dataset included an additional 385 previously-published M1UK or intermediate WGS from the UK sequenced at the Wellcome Trust Sanger Institute dating from 2005-20185,23; 163 M1UK or intermediate WGS collected in Australia 2010-202213; 16 M1UK or intermediate WGS collected in Canada 2016-201924; 120 M1UK or intermediate WGS collected in Denmark 2018-202321; 18 M1UK or intermediate WGS from Iceland, 202321; 27 M1UK or intermediate WGS collected in the Netherlands 201922; and 10 M1UK or intermediate WGS from USA collected in 2015-201825. Data collection finished in July 2023, and therefore genomes reported after that time point were not included. The final global dataset contained 2365 M1UK and intermediate strains (Supplementary Data 3).

Generation of new M1UK reference genome: Reference strain NCTC14935

Genomic DNA from S. pyogenes M1UK isolate H1490 and M1global isolate H1499 (both sore throat isolates) was sheared using a Megaruptor to prepare 20-22 kb PacBio SMRT libraries, following the manufacturer´s recommendations. The libraries were sequenced using one Single Molecule Real-Time (SMRT) cell in a PacBio RSII platform (Pacific Biosciences of California, Inc., Menlo Park, CA, USA) at the University of Edinburgh. The data was demultiplexed using Lima v2.2.0 (https://lima.how/). The demultiplexed CLR data was converted to CCS using ccs tool v6.3.0, and further HiFi reads (CCS > Q20) were extracted using extract hi fi tool from the same package. The genome assemblies were generated from the HiFi reads using Redbean v 2.2542 and Trycycler v0.5.343. The assembly quality was assessed using QUAST v5.0.244 and BUSCO v5.3.045. The annotation was performed using prokka v1.14.646. PacBio sequencing reads and data are deposited in the European Nucleotide Archive under BioProject accession PRJEB68198 (M1UK, H1490 - ERR12378139 and M1global, H1499 - ERR12378140). The two isolates have been deposited in the National Collection of Type Cultures (NCTC) with the accessions NCTC14935 (M1UK, H1490) and NCTC14936 (M1global, H1499).

Illumina genome sequencing, assembly, and annotation

For this study, whole genome sequencing of all clinical isolates (invasive and non-invasive) was performed by the UKHSA reference laboratory using the Illumina NextSeq 1000 platform with 100 base paired-end chemistry. Reads were trimmed to remove adaptor sequences and low-quality bases with Trimmomatic v0.3947. Contamination was assessed based on Kraken248 classification of reads mapped against a standard database for bacteria. Genomes with less than 90% of the reads mapped against S. pyogenes were excluded. Draught genomes were generated using SPAdes v3.15.449. The assembly quality was assessed using QUAST v5.0.244, and poor assemblies were filtered out if the genome size was higher than 2.1 Mbp and/or had more than 400 contigs. Genome annotation was performed with prokka v1.14.646.

Identification of single nucleotide variations and phylogenetic analysis

Core genome alignment was obtained by mapping trimmed reads of S. pyogenes genomes to MGAS5005 (GenBank accession: CP000017.2) reference genome using snippy v4.6.0 (https://github.com/tseemann/snippy), with a minimum coverage of 10, a minimum fraction of 0.9 and minimum vcf variant call quality of 100. The SNP distance matrix was obtained using snp-dist (https://github.com/tseemann/snp-dists). SNPs identified were classified as non-coding, missense or synonymous according to their location in the genome and their effect on protein sequence using Snippy. Gubbins v3.3.050 was used to identify and remove recombinant regions. A maximum-likelihood (ML) phylogenetic tree was constructed from the multi-sequence alignment using RAxML-NG v1.0.151 implemented in Gubbins v3.3.0 (substitution and rate variation model: GTR + Gamma). The ML tree was rooted on NCTC8198 (GenBank accession: GCA_002055535.1, reference genome of old emm1 lineage). Phylogenetic trees and associated data were visualised using iTOLv6.8.152.

Characterisation of genomic features of interest

The presence of AMR genes was predicted by combining the results from ABRicate (https://github.com/tseemann/abricate), Ariba53 and srst254. The pbp gene sequences (pbp1a, pbp1b, and pbp2x) were obtained using a BLASTN (NCBI BLAST+ v2.7.1) search. The nucleotide sequences were converted to amino acids and examined for the presence of non-synonymous mutations. None of the non-synonymous mutations previously associated with penicillin resistance in S. pyogenes were identified. A similar approach was used to identify non-synonymous mutations in S. pyogenes regulatory genes (covR, covS, fasA, fasB, fasC, rgg1, rgg2, rgg3, rgg4, rivR, rofA and rocA). The presence of superantigens (smeZ, speA2, speC, speG, speH, speI, speJ, speK, speL, speM, speN, speO, speP, speQ, speR, ssa) and DNAses (sda2, sdn1, spdn1, spd3, spd4, spdB, spnA) was accessed with a BLASTN (NCBI BLAST+ v2.7.1) analysis with the default parameters. Differences between lineages (M1global and M1UK) regarding the number/type of mutations found in regulatory genes and pbp genes were evaluated using a one-tailed proportion test (https://www.socscistatistics.com/tests/ztest/). Regulatory gene sequences with <90% similarity to the reference genome were excluded from the identification of regulatory gene mutations.

Pangenome analysis

A pangenome graph was constructed from annotated genome assemblies of MGAS5005 and 1815 emm1 isolates collected from across the UK between 2013 and 2023 using Panaroo v1.3.055 under its moderate decontamination mode. Clusters of orthologous genes (COGs) were defined by a minimum nucleotide identity of 98%, and core genes were defined by a minimum frequency of 95%. The resulting gene presence-absence matrix was filtered to remove pseudo and fragmented genes as well as those of unusual lengths. The pangenome graph was simplified with the MGAS5005 genome as a reference using Panaroo’s helper script reference_based_layout.py for visualisation in Cytoscape v3.10.156. Presence-absence of COGs was compared between M1UK and M1global and between pre-2022/2023 and 2022/2023 groups using Python v 3.11.6.

Phylodynamic analysis of M1UK

A maximum-likelihood (ML) phylogenetic tree corrected for recombination events was constructed from the multi-sequence alignment of global M1UK and intermediate genomes (against the M1UK reference genome H1490) using RAxML-NG v1.0.151 as implemented in Gubbins v3.3.050 (model: GTR + Gamma). The ML tree was rooted on M1global isolate gas81595 (ERS17508611), which was the most closely related to M1UK and intermediate lineages according to SNP distances. A dated phylogenetic tree was generated from the ML tree using the least-squares dating method implemented in the LSD2 module of IQ-Tree v2.2.2.7 (model: GTR + I + G4)57,58. Ancestral geographical locations were inferred from the dated tree and isolate information using the MPPA method and F81 model as implemented in PastML59.

To reconstruct the population dynamics of the M1UK lineage in the UK, a UK-specific subtree of M1UK genomes was extracted from the dated tree, and the M1UK effective population size (Ne) was thereby modelled through time using a skygrowth model60 implemented in R package mlesky (with 60-time intervals as determined using the package’s parameter-optimisation algorithm based on the Akaike Information Criterion)61. Furthermore, the same model was iteratively fitted on 40 subtrees of randomly sampled UK M1UK genomes (with a maximum of 76, 22, and 14 genomes per year, respectively, based on sample sizes between 2019 and 2021) to evaluate if the variation in sample size over time could impact the inference of Ne.

In vivo screening for covR/S mutations using five representative strains of M1UK and M1global

All animal experiments were undertaken using protocols approved by the Imperial College Animal Welfare Advisory Board (AWERB) and authorised by a UK Home Office Project Licence. Mice were maintained in a standard 12 h light/12 h dark cycle with food and water available ad libitum.

Five M1global and five M1UK strains were used in this study (Supplementary Fig. 3). Strains were selected from isolates from 2022 that were broadly representative of each lineage and that had no existing covR or covS mutations. Experimental soft tissue infections were performed using female BALB/c mice aged 6 weeks (Charles River, UK). Bacteria were cultured on CBA overnight and resuspended in sterile PBS. Mice were infected with 5 × 108 CFU of one of the 10 strains (3 mice per strain) into the thigh muscle. 24 h after infection, mice were sacrificed, and 150 μl heparinized blood obtained by cardiac puncture from each mouse was plated onto CBA prior to euthanisation. Each spleen was removed, homogenised using FastPrep-24™ 5 G in 1 ml PBS and plated on CBA for enumeration. Agar-based casein digestion assay was used to determine SpeB activity to infer covS mutations. 50 colonies cultured on CBA from spleens were patched onto 2% w/v skim milk Todd Hewitt agar (THA) to determine SpeB activity. One spleen sample with only a single colony was excluded from analysis; three samples with 16, 33 and 36 colonies were included. Fifty colonies from the inoculum of each strain were patched onto skim milk THA to rule out covS mutations occurring before introduction to the mice. SpeB (caseinolytic) activity was determined by comparing zones of clearance from S. pyogenes isolates to positive controls on the same plates and repatched to confirm the phenotype. Statistical analysis was performed with GraphPad Prism 10. A comparison of the two groups was carried out using a two-tailed nested t-test.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.