Ocrad is an optical character recognition program and part of the GNU Project. It is free software licensed under the GNU GPL.
Ocrad is an optical character recognition program and part of the GNU Project. It is free software licensed under the GNU GPL.
Based on a feature extraction method, it reads images in portable pixmap formats known as Portable anymap and produces text in byte (8-bit) or UTF-8 formats. Also included is a layout analyser, able to separate the columns or blocks of text normally found on printed pages.
Discovered by embedding cosine similarity (sentence-transformers MiniLM, 384-dim).