TFBS

Center for Genomics and Bioinformatics, Karolinska Institutet, Stockholm, Sweden
Boris.Lenhard@cgb.ki.se
TFBS Perl OO modules implement classes for the representation of objects encountered in analysis of protein-binding sites in DNA sequences.The objects defined by TFBS classes include:
-
pattern definition objects, currently position specific score matrices (raw frequency, information content and position weight matrices)with methods for interconversion between matrix types, sequence searching with a matrix profile, sequence 'logo' drawing and matrix manipulation;
-
a composite object representing a set of position specific score matrices, with methods for the identification of motifs within DNA sequences with the set of profiles from its member matrices;
-
methods for searching pairwise alignments for patterns conserved in both sequences (phylogenetic footprinting) defined for both matrix profile and composite (matrix set) objects;
-
an object representing DNA binding site sequence, and an object representing sets of DNA binding sequences, with methods and helper classes to facilitate scanning, filtering and statistical analyses;
-
an object representing a pair of DNA binding site sequences, and an object representing a set of such pairs, for storage, manipulation and analysis of phylogenetic footprinting searches;
-
database interfaces to relational, flat file and WWW database of position-specifc score matrices, with methods for searching existing databases, as well as creating new ones containing user-defined matrices.
-
interfaces to matrix pattern generating programs
The modules within the TFBS set are fully integrated and compatible with Bioperl.
Download
The current release of TFBS is 0.4.1 (July 10 , 2003). It has been tested on Linux 2.4 (i686 and alpha) with perl 5.6.1 and 5.8.0, and on Sun Solaris with perl 5.8. The tarballs are here:
NEW: CVS repository
To check out the latest development snapshot of TFBS, do
cvs -d :pserver:anonymous@eriador.cgb.ki.se:/opt/cvs/tfbs login
enter the password "tfbs" and check out the code with
cvs -d :pserver:anonymous@eriador.cgb.ki.se:/opt/cvs/tfbs checkout TFBS
NEW: Browse TFBS CVS repository
You can browse the TFBS CVS tree at http://eriador.cgb.ki.se/tfbs_cvs/read/cvswebread.cgi/TFBS/.
Recent changes
Changes in 0.4.1:
-
TFBS::DB::LocalTRANSFAC : Added suppord for the most recent format of TRANSFAC's matrix.dat file (contributed by Leonardo Marino-Ramirez).
-
Fixed the regression tests for TFBS::PatternGen::Gibbs so they do not fail miserably if Gibbs binary cannot be found.
-
New TFBS::PatternGen::AnnSpec module.
Changes in 0.4.0:
-
Support for arbitraty nucleotide backgrounds and small sample correction for conversion of PFMs to PWMs
-
Enhanced TFBS::PatternGen::Gibbs wrapper - stores many more output results than the previous version
-
New functionality for logo drawing (error bars)
Changes in 0.3.3:
-
Fixed TFBS/DB/TRANSFAC.pm - it got broken because of the change of page format in TESS.
-
Fixed GD font errors in draw_logo method (the problem was specific to perl 5.6.1 and newer)
-
This version will not work with older versions of bioperl (0.7 ir older). Please upgrade to bioperl 1.0 or newer.
Changes in 0.3.2:
-
Bugfixes and enhancements to database modules.
-
This version will not work with older versions of bioperl (0.7 ir older). Please upgrade to bioperl 1.0 or newer.
Changes in 0.3.1:
-
Available as two distributions
-
0.3.1s - for use with previous stable (0.7.*) release of bioperl
-
0.3.1d - for use with the latest 1.0 release of bioperl; produced using patches kindly provided by Jason Stajich
-
-
Added POD documentation for
-
Iterator method in TFBS::MatrixSet and TFBS::SiteSet and TFBS::SitePairSet
-
search_seq and search_aln methods in TFBS::MatrixSet
-
-
TFBS::Matrix::PWM : fixed a bug in handling -seqstring parameter passed to the search_seq method
-
TFBS::Matrix::* : fixed a bug in handling -matrixstring parameter passed to the constructor
Changes in 0.3.0:
-
All aggregate classes (TFBS::MatrixSet, TFBS::SiteSet and TFBS::SitePairSet) have iterators with uniform interface.
-
added search_aln method to TFBS::MatrixSet, making possible phylogenetic footprinting scans with sets of matrices
-
Removed absolute requirement for the GD.pm module: its import is deferred until the first call of draw_logo method of TFBS::Matrix subclasses. Package test suite now does not require it, either.
-
Changes in Makefile.PL: it now very clearly notifies user about missing prerequisute modules.
-
Improved documentation: added README and CHANGES files, and data model information for JASPAR2 database in TFBS::DB::JASPAR2 POD
-
More example scripts included in the distribution (see below)
-
Fixed quite a few bugs, mainly in TFBS::DB::FlatFileDir and aggregate classes
Installation
The installation procedure is fairly standard:
$ tar xvfz TFBS-0.4.1.tar.gz
$ cd TFBS-0.4.1
$ perl Makefile.PL
At this point you will be asked for MySQL server acces information, which is needed for testing the TFBS::DB::JASPAR2 module. If you do not have write access to a MySQL server, just answer 'no' to the first question.
$ make
TFBS contains a perlxs extension which is a (at present quick and dirty) adaptation of a short C program pwm_search by James Fickett and Wyeth Wasserman, used for searching a DNA sequence against a position weight matrix. It is included for performance reasons. (For developers: there is also a currently undocumented way to make TFBS::Matrix::PWM's search methods work without the extension. For details, contact the author (or wait for the more extensive documentation of TFBS guts to appear. The latter is not recommended :) )
$ make test
The test suite is not omnipotent. For access to TRANSFAC, the TFBS::DB::TRANSFAC assumes that Internet connection is present and no proxy is required. Test of TFBS::PatternGen::Gibbs is skipped if Gibbs executable is not found in the PATH.
$ su
# make install
Any questions?
Requirements
Absolutely required
-
Perl 5.005_03 or later
-
bioperl 1.0 or newer
-
PDL 1.1 or later (Note for Linux users: PDL is available as a RPM package for most major Linux distributions. Since some TFBS testers were severely frustrated by problems they encountered compiling PDL, I recommend the use of binary RPMs where possible. Solaris users should upgrade to perl 5.8 and compile it without thread support for PDL or database connectivity to work. These issues are unrelated to TFBS code.)
Note for RedHat 9 users: RedHat 9 is badly broken in several important respects. (1) The PDL installed from a rpm package shipped with RedHat 9 issues "Possible precedence problem" warnings (probably harmless). (2) Some users have had trouble compiling PDL from CPAN. If you try to install PDL from CPAN shell and get the warning "I could not locate your pod2man program..." and the error "Makefile:93: *** missing separator.", you should unset your $LANG environmental variable before starting the CPAN shell:
# unset LANG
The above is strictly a RedHat issue, and is unrelated to TFBS code.
-
File::Temp - if you run perl 5.6.1 or newer, you already have it
Optional
-
GD 1.3 or later (only required by TFBS::Matrix::ICM for drawing sequence logos)
-
DBI and DBD::MySQL modules, as well as access to a mysql server (only required for storage and retrieval matrix objects in a MySQL database by TFBS::DB::JASPAR2)
-
Gibbs, a program by the group of C.L. Lawrence for matrix pattern generation from a set of nucleotide sequences (only required by TFBS::PatternGen::Gibbs module); write to Dr. Lawrence to obtain a copy
All of the above except Gibbs are also available from CPAN.
Example scripts
Here are two very simple code snippets that demonstrate some of the TFBS functionality.
-
Script1: a script that retrieves a sequence from GenBank using BioPerl, a C/EBP position weight profile from TRANSFAC, scans the sequence with the matrix and outputs the detected sites in GFF format.
-
Script2: a script that identifies new patterns from a set of DNA sequences stored in the file sequences.fa and stores them in a simple flat -file database.
The following two somewhat longer scripts have a fully functional command-line interface and annotated source code. Those who want to learn how to use TFBS are advised to study their code:
-
list_matrices.pl: a script that displays information about matrix patterns stored in a flat file directory-type database in several different formats.
-
phylofoot.pl: a script that scans conserved regions of a pairwise DNA sequence alignment with a set of matrices form a flat file databases and produces GFF output.
And finally, a simple CGI script:
-
viewpfm.cgi: a CGI script that outputs two kinds of pages: a list of matrices from a FlatFileDir database, and a detailed info page for individual matrices. The latter includes a graphical representation("sequence logo") of matrix sprcificity.
Documentation (POD)
From here you can access POD documentation for the modules. It is still far from perfect, but I think it is enough for start. (Internal modules and internal methods are not yet documented.)
TFBS::Matrix
TFBS::Matrix::PFM
TFBS::Matrix::ICM
TFBS::Matrix::PWM
TFBS::Matrix::PWM
TFBS::Site
TFBS::SiteSet
TFBS::SitePair
TFBS::SitePairSet
TFBS::DB::FlatFileDir
TFBS::DB::TRANSFAC
TFBS::DB::JASPAR2
TFBS::PatternGen
TFBS::PatternGen::SimplePFM
TFBS::PatternGen::Gibbs

浙公网安备 33010602011771号