Apr
16
LocusTree - searching genomic loci
"Contigs should not know where they are." That's a phrase uttered by James Bonfield when presenting his work on gap5, the successor to gap4, a much-used assembly software suite. So you think: "Wait a second: you're talking about assembly, and the contigs should not store their position?"
This statement addresses a problem that we encounter often when working with genomic data: how to handle features. The approach often used is to give the feature a 'chromosome', 'start position' and 'stop position'. Seems reasonable, right? So if you want all features on chromosome 1 between positions 6,124,627 and 6,827,197 you just loop over all features and check if their range overlaps with this query range. Indeed: seems reasonable. Unless your collection of features goes into the millions.
This statement addresses a problem that we encounter often when working with genomic data: how to handle features. The approach often used is to give the feature a 'chromosome', 'start position' and 'stop position'. Seems reasonable, right? So if you want all features on chromosome 1 between positions 6,124,627 and 6,827,197 you just loop over all features and check if their range overlaps with this query range. Indeed: seems reasonable. Unless your collection of features goes into the millions.