Sunday, November 4, 2012

Grab sequences using web query (DAS)

Instead of downloading reference sequences and then interrogating them using API/other custom codes, you can make use of DAS client-server system.

Example:

http://www.ensembl.org/das/Homo_sapiens.GRCh37.transcript/features?segment=13:32889611,32973347
Request all transcripts (exons really) in the region [32889611,32973347] on human chromosome 13 (this is where the gene BRCA2 is located in the GRCh37 assembly).

IMPORTANT NOTE!!!: The DAS URL query only returns + strand. If you need the - strand, you have to reverse complement the sequence (I would suggest using the function found in BIOPYTHON).  

To use other species/assemblies in:

I) Ensembl:

http://www.ensembl.org/das/dsn

II) UCSC:

http://genome.ucsc.edu/cgi-bin/das/dsn



Example Code:

#Function for passing url as an argument and parsing the XML output for obtaining the sequence. 

 def py_url(url):
    seq=""
    u=urllib.urlopen(url)
    data=u.readlines()
    for line in data:
        if line.startswith("<"):
            continue
        else:
          
            seq=seq+line
    return seq

url='http://useast.ensembl.org/das/Homo_sapiens.GRCh37.reference/dna?segment=13:32889611,32973347"

seq=py_url(url)


No comments:

Post a Comment