first previous next last contents

Sample Points.

The trace information is stored at byte offset Header.samples_offset from the start of the file. For each sample point there are values for each of the four bases. holds the precision of the sample values. The precision must be one of "1" (unsigned byte) and "2" (unsigned short). The sample points need not be normalised to any particular value, though it is assumed that they represent positive values. This is, they are of unsigned type.

With the introduction of scf version 3.00, in an attempt to produce efficiently compressed files, the sample points are stored in A,C,G,T order; i.e. all the values for base A, followed by all those for C, etc. In addition they are stored, not as their original magnitudes, but in terms of the differences between successive values. The C language code used to transform the values for precision 2 samples is shown below.

void delta_samples2 ( uint_2 samples[], int num_samples, int job) {
 
    /* If job == DELTA_IT:
     *  change a series of sample points to a series of delta delta values:
     *  ie change them in two steps:
     *  first: delta = current_value - previous_value
     *  then: delta_delta = delta - previous_delta
     * else
     *  do the reverse
     */
 
    int i;
    uint_2 p_delta, p_sample;
 
    if ( DELTA_IT == job ) {
        p_delta  = 0;
        for (i=0;i<num_samples;i++) {
            p_sample = samples[i];
            samples[i] = samples[i] - p_delta;
            p_delta  = p_sample;
        }
        p_delta  = 0;
        for (i=0;i<num_samples;i++) {
            p_sample = samples[i];
            samples[i] = samples[i] - p_delta;
            p_delta  = p_sample;
        }
    }
    else {
        p_sample = 0;
        for (i=0;i<num_samples;i++) {
            samples[i] = samples[i] + p_sample;
            p_sample = samples[i];
        }
        p_sample = 0;
        for (i=0;i<num_samples;i++) {
            samples[i] = samples[i] + p_sample;
            p_sample = samples[i];
        }
    }
}

The io library data structure is as follows:

/*
 * Type definition for the Sample data
 */
typedef struct {
        uint_1 sample_A;           /* Sample for A trace */
        uint_1 sample_C;           /* Sample for C trace */
        uint_1 sample_G;           /* Sample for G trace */
        uint_1 sample_T;           /* Sample for T trace */
} Samples1;

typedef struct {
        uint_2 sample_A;           /* Sample for A trace */
        uint_2 sample_C;           /* Sample for C trace */
        uint_2 sample_G;           /* Sample for G trace */
        uint_2 sample_T;           /* Sample for T trace */
} Samples2;

NOTE: Features new to version 2

The samples are no longer restricted to 8 bit values.


first previous next last contents
This page is maintained by James Bonfield. Last generated on 2 Febuary 1999.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/formats_4.html