first previous next last contents


SCF format files are used to store data from DNA sequencing instruments. Each file contains the data for a single reading and includes: its trace sample points, its called sequence, the positions of the bases relative to the trace sample points, and numerical estimates of the accuracy of each base. Comments and "private data" can also be stored. The format is machine independent and the first version was described in Dear, S and Staden, R. "A standard file format for data from DNA sequencing instruments", DNA Sequence 3, 107-110, (1992).

Since then it has undergone several important changes. The first allowed for different sample point resolutions. The second, in response to the need to reduce file sizes for large projects, involved a major reorganisation of the ordering of the data items in the file and also in the way they are represented. Note that despite these changes we have retained the original data structures into which the data is read. Also this reorganisation in itself has not made the files smaller but it has produced files that are more effectively compressed using standard programs such as gzip. The io library included in the package contains routines that can read and write all the different versions of the format (including reading of compressed files). The header record was not affected by this change. This documentation covers both the format of scf files and the data structures that are used by the io library. Prior to version 3.00 these two things corresponded much more closely.

first previous next last contents
This page is maintained by James Bonfield. Last generated on 2 Febuary 1999.