Many years ago Staden R. (1984) Graphic methods to determine the function of nucleic acid sequences. Nucl. Acids Res. 12, 521-538 we separated methods for searching for genes and their control regions into two classes: "gene search by signal", and "gene search by content". Staden R. (1985) Computer methods to locate genes and signals in nucleic acid sequences, Genetic Engineering: Principles and Methods Vol. 7, Edited by J. K. Setlow and A. Hollaender, Plenum Publishing Corp.. Signal searches look for short segments of sequences such as promoters, ribosome binding sites, splice junctions, etc, whereas content searches look for the sequence patterns that are characteristic of protein coding regions, or RNA genes. Protein coding sequences produce particular amino acid sequences, often using preferred codons, and this leaves patterns in the sequence that can be used to distinguish them from non-protein-coding DNA. tRNA genes must produce stable cloverleaf structures and "standard" tRNAs must contain particular (conserved) bases at locations within the cloverleaf. These features can be used to locate tRNA genes, and probably other RNA genes could be sought in a similar way.
The original nip program contained several content searches for protein coding regions, one for tRNA genes, and many ways of searching for signals. We have implemented four of the content searches in nip4 and a splice junction search. Obviously since our early work in this area others have invented new algorithms for these tasks, some of which combine content and signal searches, and it is our intention to include these new algorithms or to enable nip4 to display the results of these methods in the future. However we have implemented our own methods first and they are described here.