NAME
pfscan - scan a protein or DNA sequence with a profile
library
SYNOPSIS
pfscan [ -abflLrsuxy ] [ seq-file | - ]
[ profile-library-file | - ] [L=#] [W=#]
DESCRIPTION
pfscan compares a protein or nucleic acid sequence against a
profile library. The result is an unsorted list of profile-
sequence matches written to the standard output. A variety
of output formats containing different informations can be
specified via the options -a, -l, -L, -r, -u, -s, and -z.
seq-file contains a sequence in EMBL/SWISS-PROT format
(assumed by default) or in Pearson/Fasta format (indicated
by option -f). profile-library-file contains a library of
profiles in PROSITE format. pfscan can be used as a filter
if - is used instead of one of the input filenames.
OPTIONS
-a Report optimal alignment scores for all profiles
regardless of the cut-off value. This option simultane-
ously forces DISJOINT=UNIQUE.
-b Search the complementary strand of the DNA sequence as
well.
-f Input sequence is in Pearson/Fasta format.
-l Indicate highest cut-off level exceeded by the match
score in the output list.
-L Indicate by character string the highest cut-off level
exceeded by the match score in the output list. Note
that the generalized profile format includes a text
string field to specify a name for a cut-off level. The
-L option causes the program to display the first two
characters of this text string (usually something like
"!" "?", "??", etc.) at the beginning of each match
description.
-r Use raw scores rather than normalized scores for match
selection. Normalized scores will not be listed in the
output.
-s List the sequences of the matched regions as well. The
output will be a Pearson/Fasta-formatted sequence
library.
-u Forces DISJOINT=UNIQUE.
-x List profile-sequence alignments in pftools PSA format.
-y Display alignments between the profile and the matched
sequence regions in a human-friendly format.
-z Indicate starting and ending position of the matched
profile range. The latter position will be given as a
negative offset from the end of the profile. Thus the
range [ 1, -1] means entire profile.
PARAMETERS
L=# Cut-off level to be used for match selection. If level
L is not specified in the profile, the next higher (if
L is negative) or next lower (if L is positive) level
specified is used instead.
W=# Output width. Output lines will be truncated after W
characters. Default: W=132.
EXAMPLES
(1) pfscan -s GTPA_HUMAN prosite13.prf
Scans the human GAP protein for matches to profiles in
PROSITE release 13. GTPA_HUMAN contains the SWISS-PROT
entry P20936|GTPA_HUMAN. prosite13.prf contains all
profile entries of PROSITE release 13. The output is a
Pearson/Fasta-formatted sequence library containing all
sequence regions of the input sequence matching a pro-
file in the profile library.
(2) pfscan -by CVPBR322 ecp.prf L=2
Scans both strands of plasmid PBR322 for high-scoring
(level 2) E. coli promoter matches. CVPBR322 contains
EMBL entry J01749|CVPBR322. ecp.prf contains a profile
for E. coli promoters. The output includes profile-
sequence alignments in a human-friendly format.
AUTHOR
Philipp Bucher
Philipp.Bucher@isrec.unil.ch