Only 2% of the human genome gives rise to proteins. Stretches of DNA in the remaining 98% of non-coding sequences inform genes when and where to be turned on or off. Len Pennacchio of the Lawrence Berkeley National Laboratory, California, went fishing for enhancers, a particularly elusive type of these regulatory sequences.

Unlike promoters, which typically sit immediately before a gene and signal where transcription should begin, enhancers exist before, after or even within the genes they regulate. In some cases, they function millions of base pairs away from the genes they help control. Scientists don't yet know enough about enhancers to use mathematical algorithms to distinguish them from other sequences. “We haven't identified enough distant enhancers to enable computational scientists to train on this data set,” explains Pennacchio. Instead, he and his colleagues came up with a brute-force approach for testing a large number of DNA fragments for enhancer activity. Their work is described on page 499.

First, the team needed to select DNA fragments to test. They reasoned that bits of the genome that do not encode proteins but are extremely well conserved among vertebrates must have a fundamental function and thus be good candidates. So they scoured the human genome for DNA segments that are either conserved with a distantly related species, such as the pufferfish, or ultra-conserved, meaning they are 100% identical with a more closely related species, such as the mouse, for at least 200 base pairs. Their initial list contained 167 such elements, which they tested for function.

Pennacchio's team fused each putative enhancer fragment to a gene engineered to turn blue when expressed, and then injected the fragments into fertilized mouse eggs. They implanted the eggs into female mice and let the eggs grow for 12 days. Then, they examined the resulting embryos for spots of blue in different organs and tissues. “Traditionally, people generate transgenic mice, wait for the mice to grow to maturity and then look for gene expression in the resulting transgenic offspring,” says Pennacchio. “We streamlined the process, making it faster and cheaper.”

Over about two years, the team identified 75 enhancer elements (45% of the tested fragments), whose sequences and expression patterns are available at http://enhancer.lbl.gov. A scientist wanting to engineer a gene so that it will only be expressed in the heart, for example, can use the public resource to obtain a list of enhancer sequences that drive expression specifically in that tissue. The data set also allows scientists to search for patterns in the sequences of all heart enhancers and develop computational tools to predict where to find other heart enhancers in the genome. In addition, Pennacchio's team is testing potential enhancer elements that other groups have identified. “I receive numerous e-mails per day from people asking us for help,” he says.

The group will spend the next five years testing about 2,000 additional elements to add to the resource, with the goal of finding enhancers for all human genes. And they might try other methods of enhancer identification. “For now, comparative genomics alone has worked well for us, but there are surely other ways to further prioritize possible enhancers,” says Pennacchio.