FASTA

FASTA is a DNA and protein sequence alignment software package first described by David J. Lipman and William R. Pearson in 1985. Its legacy is the FASTA format which is now ubiquitous in bioinformatics.

Key facts

Software.name: FASTA
Software.latest_release_version: 36
Software.genre: Bioinformatics
Software.license: apache2.0

via Wikipedia infobox

Source code

github.com →

This directory contains the source code for the FASTA package of programs (W. R. Pearson and D. J. Lipman (1988), "Improved Tools for Biological Sequence Analysis", PNAS 85:2444-2448 ). The current verion of the program is fasta-36.3.8i . If you are reading this at fasta.bioch.virginia.edu/wrpearson/fasta/fasta36, links are available to executable binaries for Linux, MacOS, and Windows. The source code is also available from github.com/wrpearson/fasta36. 2. restore functionality of -A option, which forces Smith-Waterman final display alignments with DNA (normally banded Smith-Waterman is used) 4. changes to annotation scripts for Pfam shutdown; new ann pfam www.py, ann pfam sql.py 1. Enable translation table -t 9 for Echinoderms. This bug has existed since alternate translation tables were first made available. 1. Add an option, -Xg, that preserves the gi 12345 string the score summary and alignment output. macro definitions that allow the smith waterman sse2.c, global sse2.c, and glocal sse2.c code to be compiled on non-Intel architectures (currently tested on ARM/NEON). Many thanks to Michael R. Crusoe ( for the SIMDE code converstion, and to Evan Nemerson for creating SIMDe. 2. The code to read FASTA format sequence files now ignores lines with ' ' at the beginning, for compatibility with PSI Extended FASTA Format (PEFF) files ( 1. Modifications to support makeblastdb format v5 databases. Currently, only simple database reads have been tested. 2. New script for extracting DNA sequences from genomes ( scripts/get genome seq.py ). Currently works with human (hg38), mouse (mm10), and rat (rn6). 2. New features: Both query and library/subject sequences can be generated by specifying a program script, either by putting a ! at the start of the query/subject file name, or by specifying library type 9 . Thus, fasta36 !../scripts/get protein.py+P09488+P30711 /seqlib/swissprot.fa or fasta36 "../scripts/get protein.py+P09488+P30711 9" /seqlib/swissprot.fa will compare two query sequences, P09488 and P30711 , to SwissProt, by downloading them from Uniprot using the get protein.py script (which can download sequences using either Uniprot or RefSeq protein accessions). Often, the leading ! must be escaped from shell interpretation with ! . New scripts that return FASTA sequences using accessions or genome coordinates are available in scripts/ . get protein.py , get uniprot.py , get up prot iso sql.py and get refseq.py . get refseq.py can download either protein or mRNA RefSeq entries. get up prot iso sql.py retrieves a protein and its isoforms from a MySQL database. get genome seq.py extracts genome sequences using coordinates from local reference genomes ( hg38 and mm10 included by default). The scripts/ann exons up www.pl and ann exons up sql.pl now include the option --gen coord which provides the associated genome coordinate (including chromosome) as a feature, indicated by ' ' (end of exon). fasta-36.3.8h provides new scripts and modifications to the fasta programs that normalize the process of merging sub-alignment scores and region information into both FASTA and BLAST results. To move BLASTP towards FASTA with respect to alignment annotation and sub-alignment scoring: 1. annot blast btab2.pl --query query.file --ann script annot script.pl --q ann script annot script.pl blast.btab file blast.btab file ann (a blast tabular file with one or two new fields, an annotation field and (optionally with --dom info) a raw domain content field. 2. merge blast btab.pl --btab blast.btab file ann blast.html blast ann.html (merge the annotations and domain content information in the blast.btab file ann file together with the standard blast output file to produce annotated alignments. 3. In addition, rename exons.py is available to rename exons (later other domains) in the subject sequences to match the exon labeling in the aligned query sequence. 4. relabel domains.py can be used to adjust color sets for homologous domains. 2. Th

~10 min read

Article

6 sections

Contents

History
Uses
Search method
Statistical significance
See also
References

== History == The original FASTA program was designed for protein sequence similarity searching. Because of the exponentially expanding genetic information and the limited speed and memory of computers in the 1980s heuristic methods were introduced aligning a query sequence to entire data-bases. FASTA, published in 1987, added the ability to do DNA:DNA searches, translated protein:DNA searches, and also provided a more sophisticated shuffling program for evaluating statistical significance. There are several programs in this package that allow the alignment of protein sequences and DNA sequences. Nowadays, increased computer performance makes it possible to perform searches for local alignment detection in a database using the Smith–Waterman algorithm.

Gallery (4)

FASTA

Key facts

Source code

Article

Gallery (4)

Available in 14 languages

Connections

What links here49 pages

Similar entities

Categories

Wikidata facts

What links here49 pages

Similar entities