Next-generation DNA sequencers are still pricey, but they are dramatically reducing the cost and time it takes to sequence the human genome.
Thanks to a steady stream of technology advances, the cost of DNA sequencing has plummeted in recent years. When it was first accomplished in 2001, the cost of sequencing the human genome was estimated at $3 billion, and it took more than a decade to complete with the traditional Sanger sequencing approach and automated capillary-based sequencing instruments.1 While this was a phenomenal achievement with broad implications, it was clearly a long way from being a practical technique for disease diagnosis or drug development.
Today, many dream of being able to routinely sequence a person’s entire genome for just $1000 in a matter of weeks or even days. Researchers and healthcare professionals alike believe that achieving this goal would “democratize” DNA sequencing and bring with it the anticipated health benefits of knowing someone’s complete genetic makeup and the associated risks of disease and drug susceptibilities (see “$1000 genome disease scans now available to the public,” p. 36).
The path to the $1000 genome and a possible new era of predictive and preventive medicine is being paved by the introduction of some dramatic new DNA sequencing technologies—notably massively parallel next-generation sequencing instruments. In fact, in 2007 the genome of Nobel Laureate James Watson—codiscoverer of the DNA double helix and father of the Human Genome Project—was sequenced in just two months at a cost of $2 million to $3 million using a massively parallel sequencing system.2
Several equipment manufacturers have begun introducing various massively parallel approaches, all with the goal of supplanting traditional Sanger sequencing by reducing costs and increasing throughput. Some favor pyrosequencing, while others swear by two-base encoding. Whatever the approach, they all have the ability to perform extraordinarily large numbers of sequencing reactions at the same time, according to Elaine Mardis, codirector of the Genome Sequencing Center at Washington University (St. Louis, MO). “Having all these different instruments with their different capabilities is opening up all kinds of possibilities in terms of biological inquiry,” Mardis says.
Eric Green, scientific director of the National Human Genome Research Institute and director of the NIH Intramural Sequencing Center, agrees. “The sheer volume of data is quite impressive and raises possibilities of doing things that you couldn’t even think about doing at a practical level with Sanger sequencing,” he says.
The Genome Sequencer FLX System from 454 Life Sciences/Roche (Branford, CT) uses an approach called pyrosequencing. Each time a base is added to the growing DNA strand, a pyrophosphate molecule is released from the attached base. This pyrophosphate serves as the substrate for the enzyme luciferase to produce light via a chemiluminescence reaction (see Fig. 1). This light is captured in real time by fiber optics and a CCD camera and associated with the position on a picotiter plate in which the reactions take place. Successive exposures to the four different bases allow the stepwise establishment of the DNA sequence. This sequencing approach is a form of sequencing-by-synthesis, meaning that the sequencing is accomplished by a DNA polymerase synthesizing a new strand, base by base, reading off of a template fragment.
FIGURE 1. A single well within a picotiter plate includes DNA copies bound to the bead as sequencing reagents are added. (Courtesy of Roche)
The Roche system uses native, unmodified DNA bases in its process. In the DNA preparation step, the DNA sample is sheared into small fragments that are then attached to 26 µm beads, one fragment to one bead. Then, in a process called emulsion PCR, the DNA is amplified so that each bead carries 100,000 copies of the original DNA fragment. The DNA-coated beads are then loaded into the wells of a 1.6 million-well picotiter plate so that, on average, there is one bead per well. The wells of the picotiter plate are made of fiber-optic material so that they can transmit (via a coupled CCD imager) the light signals that are used to indicate the DNA sequence. The sequencing reagents and one of the four DNA bases are added to start the pyrosequencing process. The camera records all the wells in which that base was added, and the intensity of the signal is used to infer the number of times that base was added to the growing strand. Then that base is washed away and the second base is added, and so on, until the sequence of the fragment is established.
According to Roche, the GS FLX can achieve read lengths of 250 to 300 bases, a throughput of 100 megabases per 7.5-hour run, and an estimated accuracy of 99.5% at a cost of approximately $8000. At this time, it would cost $1 million to $2 million to sequence a complete human genome using the Roche instrument. The U.S. list price of the GS FLX is $500,000.
The next-generation system from Applied Biosystems (Foster City, CA) is the SOLiD system (Sequencing by Oligonucleotide Ligation and Detection). Like the Roche instrument, the SOLiD system uses bead-based emulsion PCR for its amplification step. Its beads are smaller (1 µm vs. 26 µm) and they are packed on an open glass slide (the flow cell) rather than in wells. Applied Biosystems notes that an advantage of this system is the ability to accommodate increasing densities of beads per flow cell, which will result in a higher level of throughput from the same system.
The chemistry in this approach is very different from sequencing-by-synthesis. The SOLiD system uses sequencing by ligation, with two-base encoding. According to the company, this approach provides very high accuracy because each base in the sequence is actually interrogated twice. Each of the 16 possible two-base combinations is produced as part of an otherwise universal 8-mer probe that is tagged with one of four fluorescent dyes. In each cycle, the oligomer probes with the 16 possible two-base combinations are added, and the DNA ligase joins the single correctly matching probe to the template fragment, with the color tag of that two-base combination read as a fluorescent signal given off in response to light stimulation from a xenon lamp. The produced light signal is recorded by a fluorescent microscope with filter cubes. The bound oligomer probe is then cleaved back to a five-mer with a ligatable end, and the cycle is continued until the sequence is established.
Applied Biosystems says the SOLiD system can produce average read lengths of 35 bases and generate more than 4 gigabases of sequence data per run with an estimated accuracy of 99.94% (with two-base encoding). It is estimated that the current SOLiD system could sequence an entire human genome in two months for approximately $300,000. The U.S. list price is $600,000.
The next-generation Genome Analyzer from Illumina (San Diego, CA) uses a process called cluster amplification. DNA fragments are attached to a flow cell, and the amplification is carried out on the slide with the reagents flowing over the bound fragments. The amplification of each DNA fragment occurs in a unique cluster so that ultimately each cluster contains thousands of copies of the starting fragment. This amplification is carried out in a device called a cluster station; the flow cell, with the amplified fragments, is then placed in the Genome Analyzer for sequencing.
The Illumina instrument does sequencing-by-synthesis using a DNA polymerase to extend the complementary DNA strand off the template, base by base. Unlike Roche, Illumina uses fluorescently labeled, proprietary reversible terminator nucleotides with removable fluorescence in its process (see Fig. 2). The terminators are similar to those used in the current Sanger sequencing process but are reversible, allowing Illumina to read the terminating base and then reverse the termination and add the next base in the next round of synthesis, and so on, until the sequence of the fragment is determined. Reading of each added base is accomplished by high-sensitivity fluorescence detection using laser excitation and total-internal-reflection optics.
FIGURE 2. These individual clusters have undergone one round of DNA sequencing-by-synthesis using fluorescently labeled nucleotides and the Illumina Genome Analyzer. (Courtesy of Illumina)
The Illumina instrument achieves averages of 35-base read lengths with a throughput of 1.4 gigabases per run (three days) at a cost of approximately $3000 with 99.999% accuracy. It is estimated that it would cost approximately $200,000 and take two to three months to sequence an entire human genome using the Illumina system. The U.S. list price is $430,000, including the cluster station that automates sample preparation.
The HeliScope Single Molecule Sequencer from Helicos BioSciences (Cambridge, MA) directly sequences single molecules of DNA rather than large numbers of DNA molecule copies (see Fig. 3). This reduces sample preparation requirements but places significant demands on the sensitivity of optical detection. The DNA fragments to be sequenced are anchored to a proprietary surface on a glass slide (the flow cell) at a density of 1 strand/µm2 (approximately 1.5 billion strands/slide). The special characteristics of this surface, including its cleanness in terms of optical noise, are crucial to the ability to achieve true single-molecule sequencing, according to Parris Wellman, director of engineering at Helicos.
FIGURE 3. The HeliScope Single Molecule Sequencer directly sequences single molecules of DNA; insert at right shows a close-up of individual single molecules with base calls below. (Courtesy of Helicos)
The system uses a form of sequencing-by-synthesis, with a DNA polymerase and special fluorescently labeled DNA bases being flowed over bound fragments, one base per cycle. The bases are called “virtual terminator nucleotides” because they inhibit the DNA polymerase from adding more than one base. Only one fluorescent label is used, so four cycles are necessary to read all four bases. For sensitive detection of fluorescence from single molecules, Helicos employs laser stimulation of the fluorescent dye, with detection by a custom total-internal-reflection microscope and a CCD camera.
Helicos specifies its throughput as 90 megabases per hour in single-pass mode and 25 megabases per hour in two-pass mode. Typical run times are seven days in single-pass mode and 14 days in two-pass mode, with better than 99% for a two-pass run, according to Chip Leveille, vice president of sales and marketing. It is estimated that it would cost $75,000 to $100,000 and take six to eight weeks to sequence an entire human genome using the HeliScope sequencer. The price of the Helicos Genetic Analysis System, including the HeliScope, is $1.35 million.
Sequencing technology has clearly come a long way, but it still has a long way to go to truly achieve the $1000 genome. Still, many are confident this will happen soon. Among the optimists is Leroy Hood, president of the Institute for Systems Biology and a member of the Helicos scientific advisory board. He believes the $1000 genome will be achieved in less than ten years.
“The ability of these companies to scale up the capacity for DNA sequencing is going to be really transformational in terms of genomics, no question about it,” he says. “What this will lead to as the scale of sequencing increases over the next five to ten years and as the cost continues to come down is the sequencing of individual genomes. I see that as being one of the foundational data sources for predictive medicine.”
- Nature 409, 860 (Feb. 15, 2001).
- N. Wade, “Genome of DNA Pioneer is Deciphered,” New York Times, May 31, 2007.
MICHAEL O’NEILL is a science writer based in San Francisco, CA. He has covered biotechnology for many years and has written articles for Laser Focus World, The Scientist, Wired News, Genetic Engineering News, Medscape, the New York Academy of Sciences, and the American Society of Human Genetics; e-mail: firstname.lastname@example.org.
For $1000, you can get your genome scanned
The race to the $1000 genome sequence has been preceded by the race to make the $1000 genome scan available to the public. These services make use of microarray technology to scan individual samples of DNA (from cheek swabs or saliva samples) for hundreds of thousands to more than 1 million DNA markers. The goal is to provide information on the risk of various common diseases/conditions with genetic components, such as asthma, diabetes, hypertension, heart attack, multiple sclerosis, certain cancers, and stroke.
Last November, three companies launched genome test services. Navigenics (Redwood Shores, CA) said it would begin taking orders for its new genome scan service in early 2008. This service, which will focus on identifying genetic disease risks for 20 diseases, will include consultation with a genetic counselor and will cost about $2500.
The new deCODEme service from deCODE Genetics (Reykjavík, Iceland) is focused not only on disease risk (17 diseases), but also on ancestry and the inheritance of certain nondisease traits such as hair and eye color. The service is priced at $985 and includes the opportunity for expert consultation.
23andMe (Mountain View, CA), a start-up company with funding from Genentech and Google, is marketing the Personal Genome Service, which focuses on disease risk (10 diseases), ancestry, and the inheritance of nondisease traits such as bitter-taste perception and athletic ability. The cost is $999.
Each company emphasizes that its service does not provide diagnostic tests but offers information on the risks one might face from various diseases. This information can be used to guide changes in behavior, such as diet, exercise, and medical surveillance, that might reduce an identified risk.