type of machine learning where an agent learns how to behave in an environment by performing actions and receiving rewards or penalties in return, aiming to maximize the cumulative reward over time
Discovered by embedding cosine similarity (sentence-transformers MiniLM, 384-dim).