Skip to content
Confidence and Calibration of Activation Oracles for Reliable Interpretation of Language Model Internals · Vinony