By Mike May
Popping open a door on the left-hand side of an instrument, Scott Hunicke-Smith reveals the imaging end of a DNA-sequencing machine. As the director of genomic sequencing at the University of Texas at Austin, Hunicke-Smith oversees projects ranging from unraveling the genes in algae and coral to looking for changes in proteins caused by, more or less, cutting and pasting sections in various ways. To get the DNA code right, though, Hunicke-Smith and his colleagues must orchestrate a series of steps.
Simply put, DNA sequencing determines the order of parts–composed of four nucleotides or bases–that make up a stretch of a chromosome. The first automated DNA sequencers–now often called first generation–could find differences in a gene from one person to the next. So-called second-generation sequencers, the type that Hunicke-Smith uses, can look for genome differences–like comparing a human genome to a chimpanzee's. Second-generation sequencers are not necessarily better than first-generation ones, just different. Still, second-generation power is amazing. "We can sequence 5 or 10 bacterial genomes in just one run," Hunicke-Smith explains. Other renditions of sequencers lie just ahead.
FIGURE 1. Scott Hunicke-Smith loads samples in a 454 Life Science sequencer. (Photo by Mike May)
Despite technological advances, DNA sequencing still requires some preparation. With second-generation sequencers, a researcher must first turn a whole genome into thousands of pieces, probably tens of thousands for a human genome. The fragments are too small to be detected, so they are amplified; the polymerase chain reaction (PCR) makes many copies of each fragment. Then the amplified pieces go into a sequencer. A sequencer uses some kind of marker–say, four fluorescent markers, one for each kind of nucleotide–to label the bases that make up some length of DNA. Imaging technology collects the data, which consist of lists of nucleotides.
As new sequencers–even modifications of ones considered part of the same generation–come on line, they unravel nucleotides faster. "The speed and throughput of first-generation sequencers has progressed from approximately 25,000 bases per day in the mid-1980s to 2 million bases per day today," explains George Golda, principal engineer of SOLiD instrumentation at Life Technologies (Carlsbad, CA). "Second-generation sequencers are now approaching 100 billion bases per run, with runs taking approximately three to seven days." He adds, "Of course, optics were not the only reason for this improvement, but they play a very significant role."
Enhancing the optics
In the history of DNA sequencing, enhancements in optics have played a role all along the way. In the first sequencers, a scientist simply looked at photographic film that had been exposed to radioactively labeled DNA. Analysis by eye was soon replaced with a scanner that quantified the DNA bands on the film.
The first automated DNA sequencers used a single line-scanning laser, a four-color filter wheel, and a photomultiplier tube detector. "It was not an imaging system per se," says Golda. "It simply detected the intensity of the four-color emitted fluorescent light. In this way, with appropriate multicomponent analysis software, the four bases could be discerned." Later first-generation sequencers used CCD cameras for the detector. Samples were confined in a capillary tube, instead of slab gels, to reduce the area and time required to scan.
One technique employed by "next gen," or second-generation, sequencers uses high-density arrays of 1-µm polystyrene beads. A chemical process binds a DNA-grabbing molecule to a bead, which can also bond chemically to a glass surface. The beads are attached to a substrate at densities in the hundreds of thousands of beads per image. Golda notes that this requires "the CCD cameras and associated optics to have high enough resolution to separate adjacent beads." He adds, "Imaging stages need to have high repeatability and fast settling times."
FIGURE 2. Zero mode waveguides help isolate DNA for sequencing. (Image courtesy Pacific Biosciences)
Third-generation DNA sequencers and beyond will demand even more advanced systems. As Golda describes, "For the third generation and beyond, four-color, simultaneous detection to enable sequencing of large arrays of single molecules… may require a new generation of low-noise, high-resolution, high-speed, large-area CCD cameras in order to reach its full potential."
To further focus on DNA for sequencing, Pacific Biosciences (Menlo Park, CA), uses zero mode waveguides (ZMWs) that direct the illumination and detection. Past versions of this company's sequencers used 3000 ZMWs; its newest incorporates 80,000.
To modify sequencers to use ZMWs, explains Steve Turner, founder and CTO of Pacific Biosciences, "Every optical component received attention." For example, the company uses an objective lens that is completely custom. Although Turner can't divulge details of the objective, he says, "It's a major move forward, a significant advance in the state of optics."
FIGURE 3. SOLiD is a second-generation sequencer. (Image courtesy Life Technologies)
The next step in DNA sequencing from Pacific Biosciences will include millions of ZMWs. "With this," says Turner, "we'll be able to sequence an entire human genome in 15 minutes." He says that instrument is only a few years in the future. (For more on Pacific Biosciences' technology, see "Toward personalized medicine: 3G DNA sequencing," http://bit.ly/aDj4qm.)
All sequencers read DNA in pieces. Whether Hunicke-Smith and his colleagues use a sequencer from 454 Life Science, a Roche company (Branford, CT), or a SOLiD system, a high-performance computer is necessary to determine from the pieces of data how the sequenced segments fit together. Imagine having a pearl necklace that is a few miles long, and you chop it into 10,000 pieces. Amplification creates many copies of each piece. After that, a sequencer dumps out strings of nucleotides for sections of the pieces, and those sections overlap. A computer tries various arrangements of the data by sliding sections of nucleotides back and forth relative to each other–and figures out the order of nucleotides that make up that DNA. To do that, Hunicke-Smith uses a Dell R900 with 16 cores and 1.6 Tbytes of memory, or ships his data to the nearby Texas Advanced Computing Center, which maintains a collection of supercomputing resources.
Even with advanced tools, Hunicke-Smith says, "It's still cowboy science."
So far, sequencing technology depends on the researcher gathering up the little dogies of data and herding the nucleotides into a chromosome corral.