first previous next last contents

Naming Conventions

By naming each reading using a rigid convention it is possible to encode much information about each reading in its name. Indeed as it is important to make sure that each reading has a unique name this will generally be the case. By regidly adhering to a convention it is possible to configure pregap to obtain the answers to many of its questions by simply looking at the name. In some cases this may completely avoid the need (and work involved) of creating an database file to extract this information from.

The key is in chosing a naming convention which is easy to process. As we cannot request that everyone uses the same naming convention Pregap cannot magically know how to extract information from a reading name. However we provide here an example naming convention and a corresponding `.pregaprc' file. It is hoped that this will serve as a template for those wishing to tailor pregap to their own local conventions.

We start by defining a reading name to consist of the template name followed by full stop and an extension encoding the chemistry, reading direction, and any additional information required to generate unique names. As an example we may have sequences named xb54b12.s1 and xb54b12.r1. These have been sequenced from each end of the same insert.

The extension will consist of several characters. The first is f to indicate a forward reading and r for a reverse reading. We shall use lower case letters to indicate use of the universal primer and capital letters to indicate use of a custom primer. The next few characters are optional and may contain a t to indicate a terminator reaction or L for a long reading. Finally we add a number so that repeats of the sequencing reaction from the same template with the same direction and chemistry can be given unique names.

From this information it should be clear that we can create TN_com, PR_com and CH_com Pregap configurations. These could be coded as follows.

TN_com='echo $ID | sed "s/\..*//"'
PR_com='echo $ID | sed "s/.*\.\(.\).*/\1/;s/f/1/;s/r/2/;s/F/3/; \
CH_com='echo $ID | sed "s/.*\..*[tT].*/1/;s/.*\..*/0/"'

The above file makes extensive use of the Unix sed command. This is just an illustration so you should feel free to use whichever commands you are familier with. All the above configurations echo the reading name into sed, which deletes or replaces segments as needed. The TN_com command replaces ("s/") fullstop ("\.") followed by all other characters (".*") with nothing ("//").

first previous next last contents
This page is maintained by James Bonfield. Last generated on 2 Febuary 1999.