SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM (Sequence Alignment/Map), BAM (Binary Alignment/Map) and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA. Both simple and advanced tools are provided, supporting complex tasks like variant calling and alignment viewing as well as sorting, indexing, data extraction and format conversion. SAM files can be very large (tens of Gigabytes is common), so compression is used to save space. SAM files are human-readable text files, an
Discovered by embedding cosine similarity (sentence-transformers MiniLM, 384-dim).