Instead of downloading reference sequences and then interrogating them using API/other custom codes, you can make use of DAS client-server system.
Example:
http://www.ensembl.org/das/Homo_sapiens.GRCh37.transcript/features?segment=13:32889611,32973347
Example Code:
#Function for passing url as an argument and parsing the XML output for obtaining the sequence.
def py_url(url):
seq=""
u=urllib.urlopen(url)
data=u.readlines()
for line in data:
if line.startswith("<"):
continue
else:
seq=seq+line
return seq
url='http://useast.ensembl.org/das/Homo_sapiens.GRCh37.reference/dna?segment=13:32889611,32973347"
seq=py_url(url)
Example:
http://www.ensembl.org/das/Homo_sapiens.GRCh37.transcript/features?segment=13:32889611,32973347
Request all transcripts (exons really) in the
region [32889611,32973347] on human chromosome 13 (this is where the
gene BRCA2 is located in the GRCh37 assembly).
IMPORTANT NOTE!!!: The DAS URL query only returns + strand. If you need the - strand, you have to reverse complement the sequence (I would suggest using the function found in BIOPYTHON).
To use other species/assemblies in:
I) Ensembl:
http://www.ensembl.org/das/dsn
II) UCSC:
http://genome.ucsc.edu/cgi-bin/das/dsn
IMPORTANT NOTE!!!: The DAS URL query only returns + strand. If you need the - strand, you have to reverse complement the sequence (I would suggest using the function found in BIOPYTHON).
To use other species/assemblies in:
I) Ensembl:
http://www.ensembl.org/das/dsn
II) UCSC:
http://genome.ucsc.edu/cgi-bin/das/dsn
#Function for passing url as an argument and parsing the XML output for obtaining the sequence.
def py_url(url):
seq=""
u=urllib.urlopen(url)
data=u.readlines()
for line in data:
if line.startswith("<"):
continue
else:
seq=seq+line
return seq
url='http://useast.ensembl.org/das/Homo_sapiens.GRCh37.reference/dna?segment=13:32889611,32973347"
seq=py_url(url)