restructuring data into a desired format; process of transforming and mapping data from one "raw" data form into another format with the intent of making it more appropriate and valuable for a variety of downstream purposes such as analytics
Discovered by embedding cosine similarity (sentence-transformers MiniLM, 384-dim).