January 22, 2008, Bethesda, MD--An international research consortium has announced the 1000 Genomes Project, an ambitious effort that will involve sequencing the genomes of at least 1000 people from around the world to create the most detailed and medically useful picture to date of human genetic variation. The project will receive major support from the Wellcome Trust Sanger Institute in Hinxton, England, the Beijing Genomics Institute, Shenzhen (BGI Shenzhen) in China and the National Human Genome Research Institute (NHGRI), part of the National Institutes of Health (NIH).
Drawing on the expertise of multidisciplinary research teams, the 1000 Genomes Project will develop a new map of the human genome that will provide a view of biomedically relevant DNA variations at a resolution unmatched by current resources. As with other major human genome reference projects, data from the 1000 Genomes Project will be made swiftly available to the worldwide scientific community through freely accessible public databases.
"The 1000 Genomes Project will examine the human genome at a level of detail that no one has done before," said Richard Durbin, Ph.D., of the Wellcome Trust Sanger Institute, who is co-chair of the consortium. "Such a project would have been unthinkable only two years ago. Today, thanks to amazing strides in sequencing technology, bioinformatics and population genomics, it is now within our grasp. So we are moving forward to build a tool that will greatly expand and further accelerate efforts to find more of the genetic factors involved in human health and disease."
Recently developed catalogs of human genetic variation, such as the HapMap, have proved valuable in human genetic research. Using the HapMap and related resources, researchers already have discovered more than 100 regions of the genome containing genetic variants that are associated with risk of common human diseases such as diabetes, coronary artery disease, prostate and breast cancer, rheumatoid arthritis, inflammatory bowel disease and age-related macular degeneration.
However, because existing maps are not extremely detailed, researchers often must follow those studies with costly and time-consuming DNA sequencing to help pinpoint the precise causative variants. The new map would enable researchers to more quickly zero in on disease-related genetic variants, speeding efforts to use genetic information to develop new strategies for diagnosing, treating and preventing common diseases.
"This new project will increase the sensitivity of disease discovery efforts across the genome five-fold and within gene regions at least 10-fold," said NHGRI Director Francis S. Collins, M.D., Ph.D. "Our existing databases do a reasonably good job of cataloging variations found in at least 10% of a population. By harnessing the power of new sequencing technologies and novel computational methods, we hope to give biomedical researchers a genome-wide map of variation down to the 1% level. This will change the way we carry out studies of genetic disease."
Going a major step beyond the HapMap, the 1000 Genomes Project will map not only the single-letter differences in people's DNA, called single nucleotide polymorphisms (SNPs), but also will produce a high-resolution map of larger differences in genome structure called structural variants. Structural variants are rearrangements, deletions, or duplications of segments of the human genome. The importance of these variants has become increasingly clear with surveys completed in the past 18 months that show these differences in genome structure may play a role in susceptibility to certain conditions, such as mental retardation and autism.
The sequencing work will be carried out at the Sanger Institute, BGI Shenzhen and NHGRI's Large-Scale Sequencing Network, which includes the Broad Institute of MIT and Harvard; the Washington University Genome Sequencing Center at the Washington University School of Medicine in St. Louis; and the Human Genome Sequencing Center at the Baylor College of Medicine in Houston. The consortium may add other participants over time.
The project depends on large-scale implementation of several new sequencing platforms. Using standard DNA sequencing technologies, the effort would likely cost more than $500 million. However, leaders of the 1000 Genomes Project expect the costs to be in the range of $30 million to $50 million.
During its two-year production phase, the 1000 Genomes Project will deliver sequence data at an average rate of about 8.2 billion bases per day, the equivalent of more than two human genomes every 24 hours. The volume of data, and the interpretation of those data, will pose a major challenge for leading experts in the fields of bioinformatics and statistical genetics.
"This project will examine the human genome in a detail that has never been attempted – the scale is immense. At 6 trillion DNA bases, the 1000 Genomes Project will generate 60-fold more sequence data over its three-year course than have been deposited into public DNA databases over the past 25 years," said Gil McVean, Ph.D., of the University of Oxford in England, one of the co-chairs of the consortium's analysis group. "In fact, when up and running at full speed, this project will generate more sequence in two days than was added to public databases for all of the past year."
The data generated by the 1000 Genomes Project will be held by and distributed from the European Bioinformatics Institute and the National Center for Biotechnology Information, which is part of NIH. There will also be a mirror site for data access at BGI Shenzhen.