Dealing with sequencing data, Jim Kent

Interviewee: Jim Kent. With so much information being generated, both the public and private consortia had to store and process the data in new ways. Here, Jim Kent, who wrote the assembly program for the public sequence, talks about dealing with that amount of data. (DNAi Location: Genome > The Project > Pieces of the puzzle > Dealing with the data )

To start out with you had about four hundred thousand pieces and the size of the whole thing is pretty overwhelming, it's about four billion bases and, well, it was about four billion bases before the assembly. There was a lot of overlap between the pieces, so that when we finally put it together it was only, I think it was about 2.7 billion bases. And so just dealing with data on that scale, I mean if you want to copy that data from one place to another, you know, it can take an hour or two just to make the copy. We had to get it to run on actually a whole farm of computers. We had a hundred computers to do this. And this was actually, it was kind of an interesting farm. It was one we had borrowed. They were machines that had arrived a little bit early for use in the instructional labs at UCSC [The University of California, Santa Cruz] so we absconded them for, for about three months to work on the Human Genome Project instead, because they weren't needed till the next quarter.

california santa cruz,human genome project,university of california santa cruz,jim kent,project pieces,dna sequencing,pieces of the puzzle,instructional labs,assembly program,dnai,interviewee,consortia,hundred thousand,bioinformatics,new ways,three months,little bit,university of california

Related Content

16121. A farm of computers, Jim Kent

Jim Kent talks about a farm of computers.

  • ID: 16121
  • Source: DNAi

15334. A database of genomes, Robert Sinsheimer

Robert Sinsheimer, then chancellor of the University of California, Santa Cruz, brought experts together in 1985 to discuss the possibility of a Human Genome Project. He talks about his idea.

  • ID: 15334
  • Source: DNAi

16102. Genetic and physical mapping

  • ID: 16102
  • Source: DNAi

15314. Genome assembly, repeats, and reading the genome, Jim Kent

Jim Kent, the author of the assembly program for the public sequence, talks about the difficulties of reassembling small pieces of the genome when there are so many repeat sequences.

  • ID: 15314
  • Source: DNAi

15305. Compiling the data from the Human Genome Project, Jim Kent

Jim Kent, the author of the assembly program for the public sequence, talks about the challenge of reassembling the genome.

  • ID: 15305
  • Source: DNAi

15291. Finding genes in the human genome, Ewan Birney

For the first draft of the genome sequence, both teams were working to identify the number of human genes. Here, Ewan Birney, a "numbers man" from the public genome project, explains how genes can be recognized and the data from the genome project used.

  • ID: 15291
  • Source: DNAi

15307. The data structure of the human genome: poetry or prose?, Jim Kent

Jim Kent talks about the data structure of the human genome: poetry or prose?

  • ID: 15307
  • Source: DNAi

15365. Whole genome shotgun, Craig Venter

Craig Venter, the leader of the private genome effort, talks about the "whole genome shotgun" technique that was used by Celera Genomics to sequence the human genome.

  • ID: 15365
  • Source: DNAi

15335. The feasability of sequencing the human genome, Robert Sinsheimer

Robert Sinsheimer talks about the feasability of sequencing the human genome.

  • ID: 15335
  • Source: DNAi

15169. The public sequencing process, John Sulston

Nobel Laureate John Sulston, a key figure in the UK sequencing effort, talks about breaking DNA apart so that the sequence can be reassembled.

  • ID: 15169
  • Source: DNAi