StepByStep description of the mBioSQL design

Step 1 - RDBMS install


  • In order to use mBioSQL you have to have an installed RDBMS (Relational Database Management System), like PostgreSQL or MySQL;
  • You need RDBMS administrator privileges for the next step;
  • See docs#tuning for RDBMS and kernel tuning for increased performance.

Step 2 - mBioSQL install


  • Creation of databases (no data are loaded at this stage);
  • Optional creation of a bioroot and/or biouser database users;
  • Scripts are created and copied to $MBIOSQL/bin and $MBIOSQL/lib;
  • Configuration file containing options for the scripts is created ($MBIOSQL/etc/modbiosql.conf);
  • See docs#intall for details.

Step 3 - Populating the database


  • In order to populate your databases you have to run mbs_init.py with the appropriate options (this creates the necessary tables);
  • To run mbs_init.py for the UniProt database you have to obtain *.dat, reldate.txt, keywlist.txt, dbxref.txt files from EBI;
  • Use bl_load.py to load data into BioLocal;
  • Loading of data (results) into BioRes is discussed below (Step 5).

Step 4 - Analysis I


  • Sequences for analysis can be retrieved from the RDBMS itself (arrows b) or from flat files (arrows c);
  • connectorom script is used as a bridge between EMBOSS and the database system; see EMBOSS configuration in this section;
  • To query RDBMS by ID or SQL mbs_query.py can be used; it returns FASTA formatted sequence(s).

Step 5 - Loading the results back into the RDBMS


  • To demonstrate the power of SQL for analysis of results, loading of two types of analyses were implemented: (1) pattern search by fuzzpro (EMBOSS) (2) restriction mapping analysis by remap & restrict (EMBOSS);
  • Use br_load.py for processes marked as arrow d and e;
  • See examples directory for examples.

Step 6 - Analysis II from BioRes


  • Use br_analy2.py or your own script for analysis;
  • If you do not want to merge your results with sequence data then you store them in BioRes (arrow e), and perform the analysis II step from there (arrow g);
  • Call mbs_info.py -i -T=res symbolic_db to get a list of result tables in your database;
  • The br_anal2.py script figures out the type of analysis based on result table name (based on the meta data stored in the db_info table identified with that table name), but you have to provide the appropriate options for the given analysis.

Step 7 - Analysis II from a sequence database


  • See step 6;
  • We would like to emphasize that storage of result sets in the sequence database itself allows easy merging of sequence data and results by joins;
  • At this moment this type of analysis is implemented for fuzzpro results (run br_anal2.py on a fuzzpro result table loaded into UniProt).