Whole-genome sequencing for MRSA epidemiology: Transmission and "clouds of variation"

It's an unusual sensation to wake up in the morning and hear Moira Stewart on the Radio 2 breakfast show talking about bacterial genomics and whole-genome sequencing. But it wasn't a lucid dream, the publication of a new paper from Simon Harris and Sharon Peacock (of Cambridge University and the Sanger Centre, respectively) in Lancet Infectious Diseases yesterday triggered a wave of great publicity proclaiming the power of sequencing bacterial genomes to perform hospital epidemiology of Meticillin-resistant Staphylococcus aureus (MRSA). Indeed my parents were listening to Radio 4 and excitedly emailed to say "they are talking about the kind of stuff you do on the Today programme!". With parents, somehow your work is only really valid when it is featured on the news.

If you haven't read about this study, Ewan Callaway at Nature News provides a neat overview. In fact, I'd actually heard about this story before as Sharon Peacock came and gave our departmental seminar to a packed audience last week. We were enthralled with the level of discrimination shown, and translational benefits of such a study, that the results of WGS combined with "shoe-leather epidemiology" could actually end an ongoing outbreak, and change hospital practice and policy. I tweeted:

I then didn't check Twitter for a few hours, and came back to a flurry of tweets, triggered by some comments from Ed Feil:

This led to a long and hard to follow discussion on Twitter debating the finer points of this paper.

Now, working on the basis that the Twitter conversation was hard to follow, but this was an important subject, I asked Ed to summarise his thoughts in a blog post. Ed is another pioneer in the field of bacterial genomic epidemiology and has co-authored a seminal paper on intercontinental spread of the ST239 clone of MRSA with some of the same team responsible for the Lancet study.

Personally I am extremely interested in his thoughts as we are now at the point where WGS studies might start to be applied routinely in clinical practice and any remaining issues in the practical usage deserve a decent airing. Ed was a little reluctant as he'd only written one blog post before (about football) but I encouraged him to give it a try anyway ... here's what he said.

Ed Feil guest post

Harris et al (LID 14 Nov 2012) describe the use of rapid new sequencing technologies to investigate an outbreak of MRSA on a special care baby unit (SCBU). There is much that is very impressive about this work. The authors very powerfully demonstrate how this technology is set to revolutionise real time outbreak investigation and epidemiological surveillance. The data identify epidemiologically linked cases (belonging to a single outbreak), broadly confirming inferences from antibiotic resistance profiling; at least in those cases where the antibiotic resistance data was correct. The data also reveal this outbreak to be due to a novel clone, closely related to EMRSA-15 but with the addition of a PVL toxin gene. The presence of this gene can lead to skin and soft tissue infections, and an epidemiological shift from the hospital to the community. This study demonstrates that this technology is fast enough to realistically generate informative data during the course of an outbreak (i.e. within 24 hours). This is at least as quick as current methods and allows real-time interventions. It is also cheap, and getting cheaper. This is a good thing.

Notably, the study also implicates the role of a single health care individual in maintaining the outbreak, and perhaps even as the ultimate source:

"Moreover, use of bacterial whole-genome sequencing in real time was able to identify the potential source of an ongoing MRSA outbreak".

This individual was taken off the ward and decolonised - this almost certainly prevented onward infections. This is clearly all good.

[caption id="attachment_1458" align="aligncenter" width="483"] Part of one of the brilliant figures from this paper, showing the patients affected by MRSA and possible transmission events. The heathcare worker is marked in orange.[/caption]

There is no doubt in my mind that this technology will be used routinely for outbreak investigation within hospitals in the near to mid-term future. It is inevitable (and indeed desirable) that single individuals will be implicated as a major source of infection in this way. However, this places an extra burden of responsibility to ensure inferences concerning transmission are as accurate as possible, or at least that a clear measure of uncertainty is provided. The identification of single individuals in health care settings as "potential sources" of outbreaks has personal (and indeed legal) implications which have barely begun to be addressed. Pioneering high profile papers such as this need to provide clear signposts and templates for the medical microbiology and infection control communities to follow and develop.

In fact, the role of this individual in maintaining the outbreak is far from clear, and it seems very unlikely to me that the authors have correctly identified this individual as the source or even "a potential source".

The basic problem stems from the fact that the staff member in question and the cases (infected babies) were not treated the same way. Whereas only a single colony was sequenced from the infected babies on the outbreak ward, 20 colonies were sequenced from the member of staff. These 20 isolates from the staff member corresponded to a cloud of variation. This was interpreted to reflect prolonged carriage, and thus to be consistent with a key role of this individual in spreading this MRSA clone over a reasonable length of time, possibly the entire outbreak. Although there is no data to support this, it is argued in the paper that had 20 colonies also been sequenced from each infected baby, the level of diversity in each set of 20 would have been considerably lower than that observed in the staff member lower. That work really needs to be done.

[caption id="attachment_1457" align="aligncenter" width="965"] Phylogeny of the outbreak using awesome "bullseye" diagrams. On the right the isolates from the healthcare worker are plotted in orange.[/caption]

There are two concerns here:

1. Detectable within-host variation does not necessarily reflect prolonged carriage. The uncomfortable truth is that the ultra-high resolution in the data means we can no longer think in terms of transmission resulting in the movement of one "strain" from patient A to B. Whereas variation can, and will, accumulate within a single individual by de novo mutation over time, it seems highly unlikely that the starting point will be a single pure founding genotype with no variation. Instead, transmission by droplet infection or skin contact may well mean that these clouds of variation are transmitted en masse in a single event. Currently we simply don't have the data to quantify this. However, the upshot is that it is entirely possible that the staff member may in fact have been colonised immediately prior to being sampled from an infected baby on the ward rather than the other way around. If so, this individual may have had absolutely no role at all in the infections that occurred earlier on in the outbreak.

2. The "cloud of variation" present within the staff member does not encompass the whole variation observed within the outbreak isolates, but is restricted to a single branch of the tree. If this individual was indeed the source of infection throughout the course of the outbreak, then an explanation is required as to why the variation within this individual is not scattered throughout the tree. Similarly, the authors argue that the most recent common ancestor of the isolates recovered from the staff member existed prior to the start of the outbreak, thus supporting the possibility that this individual was the source of infections from the start. The clear problem with this logic is that the estimated time to the most recent common ancestor of all the genomes isolated during the outbreak (from infected babies), which is a much more meaningful estimate of the time the outbreak really did emerge, would be much earlier than this.

It is quite clear that we need a great deal more data on within-host variation, and the proportion of this variation that is likely to be transmitted to new hosts to get a firmer grip on the transmission dynamics of this pathogen.

How about the "transmissome"? you heard it here first ...

Harris, S., Cartwright, E., Török, M., Holden, M., Brown, N., Ogilvy-Stuart, A., Ellington, M., Quail, M., Bentley, S., Parkhill, J., & Peacock, S. (2012). Whole-genome sequencing for analysis of an outbreak of meticillin-resistant Staphylococcus aureus: a descriptive study The Lancet Infectious Diseases DOI: 10.1016/S1473-3099(12)70268-2