AI Can Now Copy Text Style in Images Using Just a Single Word

  • We’re introducing TextStyleBrush, an AI research project that can copy the style of text in a photo using just a single word. With this AI model, you can edit and replace text in images.
  • Unlike most AI systems that can do this for well-defined, specialized tasks, TextStyleBrush is the first self-supervised AI model that replaces text in images of both handwriting and scenes — in one shot — using a single example word.
  • Although this is a research project, it could one day unlock new potential for creative self-expression like personalized messaging and captions, and lays the groundwork for future innovations like photo-realistic translation of languages in augmented reality (AR).
  • By publishing the capabilities, methods, and results of this research, we hope to spur dialogue and research into detecting potential misuse of this type of technology, such as deepfake text attacks — a critical, emerging challenge in the AI field.

AI-generated images have been advancing at breakneck speed — capable of synthetically reconstructing historical scenes or changing a photo to resemble the style of Van Gogh or Renoir. Now, we’ve built a system that can replace text both in scenes and handwriting — using only a single word example as input. 

While most AI systems can do this for well-defined, specialized tasks, building an AI system that’s flexible enough to understand the nuances of both text in real-world scenes and handwriting is a much harder AI challenge. It means understanding unlimited text styles for not just different typography and calligraphy, but also for different transformations, like rotations, curved text, and deformations that happen between paper and pen when handwriting; background clutter; and image noise. Because of these complexities, it’s not possible to neatly segment text from its background, nor is it reasonable to create annotated examples for every possible appearance for the entire alphabet, as well as digits.

Today, we’re introducing TextStyleBrush, the first self-supervised AI model that replaces text in existing images of both scenes and handwriting — in one shot — using just a single example word. The work will also be submitted to a peer-reviewed journal.

It works similar to the way style brush tools work in word processors, but for text aesthetics in images. It surpasses state-of-the-art accuracy in both automated tests and user studies for any type of text. Unlike previous approaches, which define specific parameters such as typeface or target style supervision, we take a more holistic training approach and disentangle the content of a text image from all aspects of its appearance of the entire word box. The representation of the overall appearance can then be applied as one-shot-transfer without retraining on the novel source style samples.

By openly publishing this research, we hope to spur additional research and dialogue preempting deepfake text attacks in the same way that we do with deepfake faces. If AI researchers and practitioners can get ahead of adversaries in building this technology, we can learn to better detect this new style of deepfakes and build robust systems to combat them. While this technology is research, it can power a variety of useful applications in the future, like translating text in images to different languages, creating personalized messaging and captions, and maybe one day facilitating real-world translation of street signs using AR.  

Example of TextStyleBrush replacing text on handwritten signs at a fruit stand.

Lower Barriers to the Study of Deepfake Text

TextStyleBrush proves that it’s possible to build AI systems that can learn to transfer text aesthetics with more flexibility and accuracy than what was possible before — using a one-word example. We’re continuing to improve our system through some limitations that we’ve run into, like text written in metallic objects or characters in different colors. 

We hope this work will continue to lower barriers to photorealistic translation, creative self-expression, and the study of deepfake text attacks.

As the ongoing self-supervised revolution continues to progress, we see it as imperative that the AI field openly facilitate research into detecting misuse of technology. This includes moving beyond fake faces to text and sharing benchmark data sets, such as the Deepfake Detection Challenge data set. We hope that by openly publishing our work and methods for synthetically generated text styles, the broader AI field will be able to build on this work and make cumulative forward progress.

This is an abbreviated version of the original article that appeared on the Facebook AI blog, which includes technical details on how TextStyleBrush works.