Entity

LLM Self-Recognition: Steering and Retrieving Activation Signatures

Recent advances in interpretability suggest that large language models (LLMs) implicitly encode signals in their generated text that enable self-recognition of their outputs. We demonstrate that this capability is reliable, even in low-entropy scenarios, and that it can be amplified through targeted intervention. By steering the internal residual stream during generation with a random sparse vector, we create a detectable fingerprint that enables attribution of a given text to a specific LLM. Th

Paper · arXiv

cs.AI

Authors: Thibaud Ardoin, Jonas Schäfer, Gerhard Wunder
Published: 2026-06-04

Abstract ↗

via arXiv · 2606.06315