“I am quite concerned about the API [NCBI’s EUtils] that you’re planning to use! I think we should give it a try before start working on the database and the interface! Give me the URL! ‘Pypi’s EUtils‘?” Those words were from my supervisor at ITI as you might have guessed!
We started EUtils in python setup, tried the example in the README file, a very sticky error faced us, “File “/usr/local/lib/python2.6/dist-packages/EUtils/parse.py”, line 25, in _load_module mod = __import__(name) ImportError: No module named DTDs.pubmed_020114″.
>>> from EUtils import DBIdsClient >>> import EUtils >>> from EUtils import DBIdsClient #repeated line >>> import EUtils #repeated line >>> dbids = EUtils.DBIds("pubmed", ["9390282"]) >>> pom = DBIdsClient.from_dbids(dbids).fetch() #efetch() is the correction >>> print pom #print pom.read() is the correction
My supervisor refused to give up and gave it a shot by using efetch() and printing pom, and a logical error hit us “<addinfourl at 142981388 whose fp = <socket._fileobject object at 0x8827f2c>>”!
After that, I tried to summarize the “central dogma” to him, and to a colleague “Imagine a class “rhodopsin”, you can say: new rhodopsin() in eye or just don’t create new instance in the heart!”. Then, we decided to install BioPython, happiest moment in my day! I would like to say that Biopython is not just an API for NCBI databases.. it’s way more than that.
We downloaded Biopython-1.57 source code and ran our first test before even reading the documentation! It’s amazing! You can connect to NCBI databases anytime, anyquery, to grab data using ID or keywords and insert it into our database. You may want to install BioSql (to create database for biological data with different DBMS) and have to install NumPy packages.
My script (compiled form the tutorial):
#!/usr/bin/python #magic line suggested by my supervisor from Bio import Entrez Entrez.email = 'mariam.rizkallah@gmail.com' #one reason to write script was to avoid writing email each time I run something #handle = Entrez.esearch(db="pubmed", term="biopython") #record = Entrez.read(handle) #print record["IdList"] handle = Entrez.einfo(db="pubmed") record = Entrez.read(handle) #print record["DbInfo"].keys() ##Returns: [u'Count', u'LastUpdate', u'MenuName', u'Description', u'LinkList', u'FieldList', u'DbName'] for field in record["DbInfo"]["FieldList"]: print "%(Name)s, %(FullName)s, %(Description)s" % field
Further readings:
-Bio.Entrez: http://www.biopython.org/DIST/docs/api/Bio.Entrez-module.html
-Biopython: freely available Python tools for computational molecular biology and bioinformatics [PMID: 19304878]: http://bioinformatics.oxfordjournals.org/content/25/11/1422.full