Massive new generation sequencing (NGS) applied to clinical microbiology: reality or fiction?

Direct experiences in the study of hepatitis C infection

By Francisco Rodriguez Frias, Liver Pathology Lab, Vall d'Hebron Clinical Laboratories, Biochemistry and Microbiology departments, Vall d’Hebron University Hospital, Barcelona, Spain; Vall d’Hebron Institute of Research, VHIR; CIBERehd, Carlos III institute of health; Autonoma University of Barcelona.

Co-authors: Josep Quer, VHIR and CIBERehd; and Josep Gregori, VHIR and Roche Diagnostic Spain

The Vall d’Hebron Clinical Laboratories, located in the Vall d’Hebron University Hospital, centralize the in vitro diagnostic studies of both hospital activity and primary care for the 1.2 million people population of the cosmopolitan city of Barcelona. This intense activity allows us to keep a very broad view of every aspect of in vitro diagnosis, including Clinical Microbiology.

I will rely on this activity, and the experience that it entails, to answer the question posed in the title regarding how real or fictitious is the application of new generation mass sequencing technologies (NGS) to Clinical Microbiology, which I will exemplify using our own experience in the study of hepatitis C.

Researchers in our centre have been studying this pathology since 1990, almost immediately after the characterisation of its causative agent, hepatitis C virus (HCV), in 1989. To give an idea of the magnitude of our care activity, we conduct approximately 70,000 HCV serology studies every year, of which 10% are positive, and perform around 7,000 viral loads, 45% of them with detectable levels of viral RNA. In addition, we also conduct more than 1,000 studies of HCV subtyping by NGS per year, which we usually call “High-resolution hepatitis C virus subtyping”. As I will discuss later, this is required to start the treatment of the infection. Up to this day, our hospital has treated more than 1,000 patients.

Introduction to HCV infection

Viral hepatitis represents a major global health problem. Around 500 million people worldwide are actively infected by one of the five viruses responsible for these infections (viruses A, B, C, D, and E). Of particular relevance are B (HBV) and HCV viruses with 250 and 100 million infected respectively. These five viruses present different transmission pathways (enteral or parenteral) and virological characteristics (e.g., they belong to different families, or their genomes might be RNA or DNA). Nevertheless, to study their genomic characteristics, which are of great clinical relevance, the same parameters are considered: phylogenetic classification to establish their genotypes/subtypes, quantification of the presence and proportions of possible variants associated with treatment failure, and studies to discern possible transmissions. All these possible studies can exploit completely analogous technologies.

In fact, this similarity is ideal in Clinical Microbiology studies, given the intensity of the care activity and the need for obtaining responses for these determinations as fast as possible, for example to decide a treatment strategy. This is precisely the case of HCV that we will detail later. At this point, we must take into account the enormous variability and complexity of the viral populations of these agents (quasispecies), which requires population studies by clonal sequencing. This fact alone already seems to justify the application of NGS techniques, as they allow to obtain thousands of clonal sequences of infectious agent genomes in the same sample, in contrast to Sanger direct sequencing, which provides the average (consensus) sequence of the population.

To get an idea of the impact of HCV infection, the most recent data published by the World Health Organization as of July 2017 states that around 70 million patients suffer chronic infection, reaching almost 400,000 deaths per year. In only 15-45% of acute infections (e.g., influenced by IL28b genotype), the virus is spontaneously eliminated. In the remaining 55-85%, the infection becomes chronic, with a 15-30% risk of progressing to cirrhosis after 20 years of infection, and 2-7% risk of progression to hepatocellular carcinoma. This infection has a pandemic character with a prevalence of 1.5-2.3% in Europe (in Spain, recent preliminary studies indicate 1.1%), but it is higher in some areas such as Egypt or Pakistan.

HCV is an enveloped virus belonging to the Flavivirus family. It contains a single-stranded RNA genome of 9.9 kb in length, which encodes a single protein that once translated is proteolytically processed to provide different structural and non-structural components of the virus. From the N-terminal to C-terminal region, its structural components are: E1 and E2 (components of the envelope), C (core, viral capsid component), followed by the nonstructural or functional components of the virus, p7 (ionic channel), NS2 (self protease), NS3 (helicase serinprotease), NS4 (NS3 cofactor), NS5A (replication regulatory phosphoprotein), and NS5B (polymerase: RNA-dependent RNA polymerase). The NS3, NS5A, and NS5B proteins are the therapeutic targets of the direct-action antiviral drugs (DAA) currently used against this infection. The virus travels with associated lipoproteins and is replicated in the cellular cytoplasm in an induced membranous web. Therefore, unlike other agents, such as hepatitis B virus (HBV) or human immunodeficiency virus (HIV), HCV does not have a nuclear reservoir.

HCV has a very high mutation rate due to the lack of error correction capability of the viral polymerase (1.5x10-3 substitutions/base/replication cycle). This way, up to 6 mutations are produced in each replicative cycle, causing all new viral genomes to become different from the previous ones. Therefore, the viral population that infects a patient will consist of a very complex mixture of different but related genomes known as "quasispecies". Viral populations constituting the quasispecies differ by amino acid polymorphisms that arise by mutation during replication, and are subsequently selected on the basis of their effects on viral fitness (replicative capacity). Among these polymorphisms, we may find some which confer reduced susceptibility to antiviral treatments, such as DAA, which we refer to as Resistance Associated Substitutions (RAS).  These are often present in minority populations with lower "fitness" than wild type or major variants. When a DAA is administered, these variants with reduced susceptibility are positively selected, resulting in viral resistance, i.e., treatment failure.

We say sustained virological response (SVR12/SVR24) occurs when plasma levels of viral RNA remain undetectable (by ultrasensitive real-time PCR <15 IU / mL) for 12/24 weeks after the end of treatment. In the context of an HCV infection, this is generally assumed to mean that the HCV infection has been successfully cured. This situation, however, is exceptional among large viral infections, such as HBV or HIV, in which undetectable levels of viral genomes (virological response) only indicate the inhibition of viral activity, and not necessarily cure from the infection.

The standard of care for hepatitis C is rapidly changing after the DAA introduction. These drugs are inhibitors of the NS3 proteases (referred to with the suffix -previr), the regulatory protein NS5A (suffix -asvir), and the NS5B polymerase (suffix -buvir). They reach cure rates above 95% in very short treatments (generally 12 weeks, and even in 8 weeks). Although the cost of production of DAA is low, these drugs are still expensive (15,000 euros/12 weeks), which restricts their universal application. Although access to treatment for HCV is improving, it remains limited. By 2015, of the 71 million people living with HCV infection worldwide, 20% (14 million) were aware of their diagnosis, of which only 7.4% (1.1 million) had initiated treatment before 2015. In Spain, it is estimated that 300,000 individuals suffer chronic HCV infection, of whom only 40% have been diagnosed. Among them, about 70,000 have already been treated, representing almost 40% treatment coverage, well above the world average.

Applications of NGS techniques to the study of HCV infection: a clear advantage over conventional techniques

In spite of the great effectiveness of the treatments with DAA (in Spain an average SVR of 95%), the number of patients for whom the treatment fails should not be ignored. These patients must be correctly classified by HCV genotype/subtype, and it is necessary to detect in them possible resistant variants (RAS) to inform new treatment strategies, at least in some specific regions such as NS5A, as the International Guidelines (EASL) indicate: “Physicians who have easy access to reliable test assessing HCV resistance to NS5A inhibitors (spanning amino acids 24 to 93) can use these results to guide their decisions”.

This motivates the study of these variants through clonal population sequencing methodologies, such as NGS. In this sense, the same international guidelines remark that the test “should be based on population sequencing (‘Sanger’) reporting RASs as ‘present’ or ‘absent’), or deep sequencing (NGS) with a cut-off of 15% (RASs that are present in more than 15% of the sequences generated must be considered)”. No clear evidence, however, supports this 15% proportion, which was previously recommended by JP Pawlotsky. In this respect, from a sample of 1000 individuals treated with DAA and analyzed by NGS, Sarrazin et al reported a 93.3% SVR12 among patients with baseline RAS at 1% level, dropping to 88.2% among those whose baseline RAS exceeded 15%. In patients who did not present baseline RAS, however, they report SVR12 of 98.4%. This suggests that the presence of RAS in proportions <15% influences SVR (at least 5% worse), and such low proportions can only be detected by NGS.

The potential usefulness of population sequencing (‘Sanger’) indicated in the EASL guidelines seems to be merely based on considering this 15% as the sensitivity limit of direct sequencing. Here, we should keep in mind that population sequencing is not a quantitative methodology, and the lowest level of detection is highly dependent on the observer. In contrast, NGS (or deep sequencing) is quantitative. Even though the 15% “cut off” emerges from expert recommendation, we should remember that “medicine must be based on evidence and not on eminence”.

Taking into account that viral genotype influences SVR, to optimise treatment success and avoid therapeutic failures by RAS selection, HCV genotype/subtype should be correctly determined, as indicated in international guidelines like those from EASL: “The HCV genotype and genotype 1 subtype (1a or 1b) must be assessed prior to treatment initiation and will determine the choice of therapy”. In this sense, the genotyping techniques available in the market present substantial error rates, whereas the NGS in the NS5B region of the viral genome seems to be free of these errors.

In fact, in our centre, where almost a thousand viral genotype/subtype studies are carried out by NGS every year, after more than 1000 treatments, SVR is 98.4% or 3% higher than the average in Spain. It seems clear that these figures support the suitability of NGS techniques for genotyping and resistant variants analysis studies. But, are they also suitable for our daily work? In our laboratory, we have developed NGS protocols for optimising and automating the initially complex process of creating amplicon libraries, their purification, titration, etc.

Our NGS procedure was initially developed for the “ultradeep pyrosequencing” methodology (UDPS) on platform 454 (Roche), in collaboration with the Vall d’Hebron Institute of Research and Roche company itself. After the discontinuation of 454 platform, however, these methods have been easily adapted to the “sequencing by synthesis” methodology (SBS MiSeq-Illumina). The automated extraction of viral genomes, as well as the use of universal molecular adapters and the incorporation of molecular identifiers in preloaded plates, all through robotic systems, allows us to apply this technology for care tasks. We have recently reported our experience with more than 1400 HCV subtyping clinical trials.

This methodology is based on the computerised phylogenetic analysis of a fragment of the NS5B region, the one recommended for the classification of HCV. Moreover, it currently supports the characterisation of all the 67 HCV subtypes recognised to date, very easily allowing the incorporation of any future discovery, such as a new subtype of genotype 1 identified in our own laboratory. The sensitivity and quantitative nature of this technology allows us to detect mixtures of subtypes and quantify their relative proportions. These tasks would be challenging, or even impossible in the latter case, using any other methodology.

On the other hand, our NGS procedure, although applicable to care studies, still requires a level of complexity not feasible in many laboratories. For this reason, we have recently validated a new real time system based on the Roche Cobas 4800 analyser. This methodology achieves good performance in carrying out the genotype 1a/1b subtyping, as well as the other genotypes (obviously not subtypes), with only a 4% of indetermined cases that should be processed through sequencing. The continuous evolution of the treatments with DAA announces the appearance of pangenotypic treatments, hence not making necessary the genotype/subtype assessment. This same announcement, however, has already been performed before in relation to some of the currently available DAA, and experience has shown that SVR still differs between distinct genotypes, as recognised by the international guidelines.

The RAS study, the most recent incorporation into our care activity (only 150 care studies this year, although more than 300 are in the research and development phase), is based on the analysis of four amplicons that cover the therapeutic target regions (NS3, NS5A and NS5B). Its processing practically matches that of the subtyping study, although including more amplicons per patient. To optimise these type of studies, and to avoid biases due to the amplification process, we use subtype-specific primers, so that the HCV subtype is determined before the RAS study. Most of the samples we process come from other centres, located in the rest of Catalonia or other Spanish regions, where genotyping has been performed by conventional techniques. With this information, we have been able to confirm our previous results regarding the high error rate obtained by these conventional techniques (>10%). Our RAS care study data also confirms that most treatment failures are associated with variants of the NS5A region, which have been detected in 72% of cases, followed by variants of the NS3 region, in 52% of cases. Furthermore, 36% of failures appear with combined RAS from several regions, since treatments usually rely on combinations of DAA.

A further application of these NGS technologies may be the study of infection transmission between patients, as is the case of nosocomial infections. In this application, a simple alignment and phylogenetic study of the quasispecies obtained in the possible source patient and the recipient(s) allows to infer with greater certainty the shared origin of the infections through the observation of common haplotypes.

In conclusion, our experience in the clinical study of HCV infection allows us to assert the immense usefulness of NGS techniques in Clinical Microbiology, as any of the three applications in which we have incorporated them are perfectly applicable to every other pathogen.

References available on request.