Dealing with sequencing data, Jim Kent
Interviewee: Jim Kent. With so much information being generated, both the public and private consortia had to store and process the data in new ways. Here, Jim Kent, who wrote the assembly program for the public sequence, talks about dealing with that amount of data. (DNAi Location: Genome > The Project > Pieces of the puzzle > Dealing with the data )
To start out with you had about four hundred thousand pieces and the size of the whole thing is pretty overwhelming, it's about four billion bases and, well, it was about four billion bases before the assembly. There was a lot of overlap between the pieces, so that when we finally put it together it was only, I think it was about 2.7 billion bases. And so just dealing with data on that scale, I mean if you want to copy that data from one place to another, you know, it can take an hour or two just to make the copy. We had to get it to run on actually a whole farm of computers. We had a hundred computers to do this. And this was actually, it was kind of an interesting farm. It was one we had borrowed. They were machines that had arrived a little bit early for use in the instructional labs at UCSC [The University of California, Santa Cruz] so we absconded them for, for about three months to work on the Human Genome Project instead, because they weren't needed till the next quarter.
california santa cruz,human genome project,university of california santa cruz,jim kent,project pieces,dna sequencing,pieces of the puzzle,instructional labs,assembly program,dnai,interviewee,consortia,hundred thousand,bioinformatics,new ways,three months,little bit,university of california
Jim Kent talks about a farm of computers.
Robert Sinsheimer, then chancellor of the University of California, Santa Cruz, brought experts together in 1985 to discuss the possibility of a Human Genome Project. He talks about his idea.
Jim Kent, the author of the assembly program for the public sequence, talks about the difficulties of reassembling small pieces of the genome when there are so many repeat sequences.
Jim Kent, the author of the assembly program for the public sequence, talks about the challenge of reassembling the genome.
For the first draft of the genome sequence, both teams were working to identify the number of human genes. Here, Ewan Birney, a "numbers man" from the public genome project, explains how genes can be recognized and the data from the genome project used.
Jim Kent talks about the data structure of the human genome: poetry or prose?
Craig Venter, the leader of the private genome effort, talks about the "whole genome shotgun" technique that was used by Celera Genomics to sequence the human genome.
Robert Sinsheimer talks about the feasability of sequencing the human genome.
Nobel Laureate John Sulston, a key figure in the UK sequencing effort, talks about breaking DNA apart so that the sequence can be reassembled.