This code is provided to illustrate an implementation of composition alignment. The code was tested under Microsoft Windows. It is written in ANSI C and could be compiled on any platform with an appropriate compiler with none or some minor modifications. Files are as follows: cau.c - this is the file you should build to create a console executable. align.h - various data types and helper routines for regular alignment. comp.alignment.h - various data types and helper routines for composition alignment. complex.match.function.h - contains alignment routines for simple matching functions (1-3). simple.match.function.h - contains alignment routines for complex matching functions (4-6). kbest.local.composition.alignment.h - this file contains alternate versions of local alignments, that allow for finding next best alignments, by blocking previous alignment paths from going through. Not used in this version of the console utility. in.fa - sample input file (contains random sequences) run.bat - an example of program invocation Program usage: Assuming your executable file is called cau.exe, please use the following syntax: cau.exe File Match Mismatch Delta Limit isDiNuc MatchFunc IsLocal IsPaired [-m (minscore)] [-r (scoreratio)] Where: (all weights, penalties, and scores are positive) File = multiple sequences input file Match = matching weight Mismatch = mismatching penalty Delta = indel penalty Limit = how many characters can be scrambled at a time isDiNuc = 1 for nucleartide, 2 for dinucleartide MatchFunc = function number between 1 and 6 IsLocal = 0 for Global, 1 for Local, 2 (or anything else) for PatternGlobalTextLocal IsPaired = 0 to align everything against everything, 1 (or anything else) to align pairs from the input file, -m (minscore) = use this switch to provide a minimum compositional score to report an alignment -r (scoreratio) = use this switch to indicate a minimum composition/basic alignment score ratio to report an alignment Note the sequence file should be in FASTA format: >Name of sequence1 aggaaacctg ccatggcctc ctggtgagct gtcctcatcc actgctcgct gcctctccag atactctgac ccatggatcc cctgggtgca gccaagccac aatggccatg gcgccgctgt actcccaccc gccccaccct cctgatcctg ctatggacat ggcctttcca catccctgtg... >Name of sequence2 aggaaacctg ccatggcctc ctggtgagct gtcctcatcc actgctcgct gcctctccag atactctgac ccatggatcc cctgggtgca gccaagccac aatggccatg gcgccgctgt actcccaccc gccccaccct cctgatcctg ctatggacat ggcctttcca catccctgtg... Note for PatternGlobalTextLocal alignment, pattern is the second sequence: Note match functions are as follows: Function 1: The simplest function, a constant times the length of the match. Function 2: Square root of length of extended match times a constant. Function 3: Log base 2 of length+1 of extended match times a constant. Function 4: Relative entropy of substring composition with respect to background composition times length of extended match times a constant. Function 5: Relative entropy of substring composition with respect to background composition times length of extended match times a constant. For length =1, normal match value prevents two identical sequences composed of only one letter from scoring zero if the background is the same letter. Function 6: Shannon-Jensen entropy of substring composition versus the background composition times length of extended match times a constant.