300px|thumb|right|upright=1.75|Demonstration of the use of DreamBooth to fine-tuning (machine learning)|fine-tune the [[Stable Diffusion v1.5 diffusion model, using training data obtained from Category:Jimmy Wales on Wikimedia Commons. Depicted here are algorithmically generated images of Jimmy Wales, co-founder of Wikipedia, performing bench press exercises at a fitness gym.]] DreamBooth is a deep learning generation model used to personalize existing text-to-image models by fine-tuning. It was developed by researchers from Google Research and Boston University in 2022. Originally developed u
300px|thumb|right|upright=1.75|Demonstration of the use of DreamBooth to fine-tuning (machine learning)|fine-tune the [[Stable Diffusion v1.5 diffusion model, using training data obtained from Category:Jimmy Wales on Wikimedia Commons. Depicted here are algorithmically generated images of Jimmy Wales, co-founder of Wikipedia, performing bench press exercises at a fitness gym.]] DreamBooth is a deep learning generation model used to personalize existing text-to-image models by fine-tuning. It was developed by researchers from Google Research and Boston University in 2022. Originally developed using Google's own Imagen text-to-image model, DreamBooth implementations can be applied to other text-to-image models, where it can allow the model to generate more fine-tuned and personalized outputs after training on three to five images of a subject.
==Technology== Pretrained text-to-image diffusion models, while often capable of offering a diverse range of different image output types, lack the specificity required to generate images of lesser-known subjects, and are limited in their ability to render known subjects in different situations and contexts. The methodology used to run implementations of DreamBooth involves the fine-tuning the full UNet component of the diffusion model using a few images (usually 3--5) depicting a specific subject. Images are paired with text prompts that contain the name of the class the subject belongs to, plus a unique identifier. As an example, a photograph of a [Nissan R34 GTR] car, with car being the class); a class-specific prior preservation loss is applied to encourage the model to generate diverse instances of the subject based on what the model is already trained on for the original class. Pairs of low-resolution and high-resolution images taken from the set of input images are used to fine-tune the super-resolution components, allowing the minute details of the subject to be maintained.
Discovered by embedding cosine similarity (sentence-transformers MiniLM, 384-dim).