Dealing with sequencing data, Jim Kent
Interviewee: Jim Kent. With so much information being generated, both the public and private consortia had to store and process the data in new ways. Here, Jim Kent, who wrote the assembly program for the public sequence, talks about dealing with that amount of data. (DNAi Location: Genome > The Project > Pieces of the puzzle > Dealing with the data )
To start out with you had about four hundred thousand pieces and the size of the whole thing is pretty overwhelming, it's about four billion bases and, well, it was about four billion bases before the assembly. There was a lot of overlap between the pieces, so that when we finally put it together it was only, I think it was about 2.7 billion bases. And so just dealing with data on that scale, I mean if you want to copy that data from one place to another, you know, it can take an hour or two just to make the copy. We had to get it to run on actually a whole farm of computers. We had a hundred computers to do this. And this was actually, it was kind of an interesting farm. It was one we had borrowed. They were machines that had arrived a little bit early for use in the instructional labs at UCSC [The University of California, Santa Cruz] so we absconded them for, for about three months to work on the Human Genome Project instead, because they weren't needed till the next quarter.
california santa cruz,human genome project,university of california santa cruz,jim kent,project pieces,dna sequencing,pieces of the puzzle,instructional labs,assembly program,dnai,interviewee,consortia,hundred thousand,bioinformatics,new ways,three months,little bit,university of california
- ID: 15304
- Source: DNALC.DNAi
- Download: MPEG 4 Video Theora Video
Related Content
16121. A farm of computers, Jim Kent
Jim Kent talks about a farm of computers.
15334. A database of genomes, Robert Sinsheimer
Robert Sinsheimer, then chancellor of the University of California, Santa Cruz, brought experts together in 1985 to discuss the possibility of a Human Genome Project. He talks about his idea.
15314. Genome assembly, repeats, and reading the genome, Jim Kent
Jim Kent, the author of the assembly program for the public sequence, talks about the difficulties of reassembling small pieces of the genome when there are so many repeat sequences.
15305. Compiling the data from the Human Genome Project, Jim Kent
Jim Kent, the author of the assembly program for the public sequence, talks about the challenge of reassembling the genome.
15291. Finding genes in the human genome, Ewan Birney
For the first draft of the genome sequence, both teams were working to identify the number of human genes. Here, Ewan Birney, a "numbers man" from the public genome project, explains how genes can be recognized and the data from the genome project used.
15307. The data structure of the human genome: poetry or prose?, Jim Kent
Jim Kent talks about the data structure of the human genome: poetry or prose?
15365. Whole genome shotgun, Craig Venter
Craig Venter, the leader of the private genome effort, talks about the "whole genome shotgun" technique that was used by Celera Genomics to sequence the human genome.
15335. The feasability of sequencing the human genome, Robert Sinsheimer
Robert Sinsheimer talks about the feasability of sequencing the human genome.
15169. The public sequencing process, John Sulston
Nobel Laureate John Sulston, a key figure in the UK sequencing effort, talks about breaking DNA apart so that the sequence can be reassembled.