Swissknife

An object-oriented Perl library to handle Swiss-Prot entries

Swissknife has been developed in the Swiss-Prot groups at the European Bioinformatics Institute and the Swiss Institute of Bioinformatics.

The latest release is always available from https://sourceforge.net/projects/swissknife/files/latest/download.
The current and development version are hosted at SourceForge [Swissknife project page] [Browse CVS tree].

General information on installation etc. is contained in the file README. This document is the starting point to the usage documentation of the Swissknife modules.

Usage

To use Swissknife, include the line
use SWISS::Entry;
in your program.

A small program using Swissknife is example.pl.

The program benchmark.pl can be used to test the Swissknife components and to give a rough idea of the system performance. The program can be called with the Swiss-Prot example file SWISS100.dat.
Usage example:
cd SWISS/examples
perl benchmark.pl -file SWISS100.dat -repeats 10

The output should look similar to:

*** Swissknife Benchmark and Test suite *** 
Read only:             :  0 wallclock secs ( 0.03 usr +  0.01 sys =  0.04 CPU) @ 250.00/s (n=10)
            (warning: too few iterations for a reliable count)
Read/Write NULL:       :  0 wallclock secs ( 0.03 usr +  0.01 sys =  0.04 CPU) @ 250.00/s (n=10)
            (warning: too few iterations for a reliable count)
Read/Write:            :  0 wallclock secs ( 0.03 usr +  0.03 sys =  0.06 CPU) @ 166.67/s (n=10)
            (warning: too few iterations for a reliable count)
Read/Write/addAC:      :  0 wallclock secs ( 0.11 usr +  0.03 sys =  0.14 CPU) @ 71.43/s (n=10)
            (warning: too few iterations for a reliable count)
Read/Write/Fullparse:  : 10 wallclock secs ( 9.29 usr +  0.05 sys =  9.34 CPU) @  1.07/s (n=10)
Read/Write/Fp/Update:  : 14 wallclock secs (14.22 usr +  0.04 sys = 14.26 CPU) @  0.70/s (n=10)
Read/equals:           :  0 wallclock secs ( 0.26 usr +  0.01 sys =  0.27 CPU) @ 37.04/s (n=10)
            (warning: too few iterations for a reliable count)
Read/Write/Modify:     :  4 wallclock secs ( 3.71 usr +  0.04 sys =  3.75 CPU) @  2.67/s (n=10)

A more comprehensive test set is provided in the t/ directory:

cd SWISS/t
perl test.pl *.t

should produce an output similar to

*** Swissknife test suite ***

DEs.t ............. ok   
FTId.t ............ ok   
GNs.t ............. ok   
annot.t ........... ok   
crc64.t ........... ok   
evidence.t ........ ok   
fasta.t ........... ok   
formatProblems.t .. ok   
identity.t ........ ok   
util.t ............ ok   
All tests successful.
Files=10, Tests=20,  2 wallclock secs ( 0.05 usr  0.02 sys +  1.39 cusr  0.09 csys =  1.55 CPU)
Result: PASS

Modules

Module Documentation Comment
The main module
Entry.pm Entry.html The main module to handle Swiss-Prot entries. One Entry object represents one Swiss-Prot entry and provides an API for its modification.
Line objects Each line object implements a class to handle one line object of an entry or (e.g. Ref.pm) a group of related line objects.
ACs.pm ACs.html Representation of the AC line.
DTs.pm DTs.html The date lines.
DEs.pm DEs.html The description lines.
DE.pm DE.html A single name for the protein.
DRs.pm DRs.html The DR lines, crossreferences to other databases.
CCs.pm  CCs.html  Comment lines.
CCcopyright.pm CCcopyright.html The copyright statement (part of the comment lines).
CCalt_prod.pm CCalt_prod.html One comment object of the topic ALTERNATIVE PRODUCTS.
CCrna_editing.pm CCrna_editing.html One comment object of the topic RNA EDITING.
CCbpc_properties.pm CCbpc_properties.html One comment object of the topic BIOPHYSICOCHEMICAL PROPERTIES.
CCinteraction.pm CCinteraction.html One comment object of the topic INTERACTION.
CCdisease.pm CCdisease.html One comment object of the topic DISEASE.
CCsubcell_location.pm CCsubcell_location.html One comment object of the topic SUBCELLULAR LOCATION.
CC.pm CC.html One comment object of any other topic.
FTs.pm FTs.html The feature lines.
GNs.pm GNs.html The gene lines.
GeneGroup.pm GeneGroup.html The different synonyms for a single gene name.
GN.pm GN.html One single gene name.
IDs.pm IDs.html The ID line.
KWs.pm KWs.html The keyword lines. KWs is a container object, it holds an array of KW objects.
KW.pm KW.html One keyword object.
OCs.pm OCs.html The OC line encoding the taxonomy of the source organism.
OGs.pm OGs.html The OG lines. OGs is a container object, it holds an array of OG objects.
OG.pm OG.html One organism name.
OSs.pm OSs.html The OS lines. OSs is a container object, it holds an array of OS objects.
OS.pm OS.html One organism name.
OXs.pm OXs.html The OX lines. OXs is a container object, for each valid taxonomic resource it contains a ListBase object which holds a list of OX objects.
OX.pm OX.html One tax id object.
Ref.pm Ref.html Represents one literature reference.
Refs.pm Refs.html Represents the list of literature references.
Stars.pm Stars.html The "annotator's section" (See note)
Stars/aa.pm Stars/aa.html Unstructured notes in the internal "annotator's section". See Stars.html
Stars/DR.pm Stars/DR.html DR in the internal "annotator's section". See Stars.html
Stars/EV.pm Stars/EV.html Evidence section. See Stars.html
Stars/default.pm Stars/default.html Default class for structured information in the internal "annotator's section". See Stars.html
SQs.pm SQs.html The sequence lines.
Base objects These modules implement the base classes from which all line object classes are derived.
BaseClass.pm BaseClass.html The base class, implementing common methods, e.g. equals
ListBase.pm ListBase.html Provides methods to manipulate list-based objects like KWs.pm
Auxiliary modules
TextFunc.pm TextFunc.html Auxiliary functions, mainly for text formatting.
CRC64.pm CRC64.html Provides a method to calculate the CRC64 checksum.

Bugs, Feedback

The Swissknife modules have been developed for internal use and are provided as they are. However, if they are actually used by external users, we'll happily try to incoporate any suggestions for improvement (especially on the documentation side?). Therefore:

Please report any bugs and suggestions for improvement to sk at ebi dot ac dot uk.

Notes

The ** lines (Stars.pm)

The Swissknife modules are used in the production of the TrEMBL protein database in the Swiss-Prot group at the EBI. The internal version of the entries may contain additional information for the database curators. This information is stored in lines with the line tag '**'. Therefore the Swissknife modules provide methods to handle these lines, although they are not visible to the public. As the ** lines may also be used to store additional information of the external users, the corresponding methods are not removed for the public release.

Evidence tags

From June 2000 onwards, we are introducing
evidence tags into UniProtKB/Trembl. In the beginning, these will be deleted from the public version. However, Swissknife provides functions to handle them. See ListBase.html and BaseClass.html. Making Swissknife "Evidence tag compatible" also required a major change to the interface of the KWs, OGs and OSs modules. Originally they were simple ListBase classes, where each keyword/organism name was one element of the ListBase array. Now each keyword/organism name is held in its own object, see KW.html, OG.html and OS.html. evTest.pl is a sample program manipulating evidence tags.

Evidences in the UniProtKB flat file format

(Not public in UniProtKB before October 1, 2014) The evidence for annotations in UniProtKB entries has been available for several years in the XML and RDF representation of the data and we now intend to add this information to the text format (aka flat file format). Swissknife handles both those new evidences (in form {ECO:...[, ECO:...]}) and the old ones...

Authors

Swissknife has been developed by:
Wolfgang Fleischmann (European Bioinformatics Institute)
Alexandre Gattiker (Swiss Institute of Bioinformatics)
Henning Hermjakob (European Bioinformatics Institute)
Eric Jain (Swiss Institute of Bioinformatics)
Paul Kersey (European Bioinformatics Institute)
Edouard de Castro (Swiss Institute of Bioinformatics)