Raphael and the de Brécy Tondo Madonna: When One AI Gets Ahead of the Evidence
In early 2023, the art world was jolted by what sounded like a once-in-a-generation discovery. The de Brécy Tondo Madonna, a small devotional painting long admired but cautiously classified, was suddenly proclaimed a lost masterpiece by Raphael. The basis for this dramatic reattribution was artificial intelligence. Researchers at the University of Bradford announced that their AI system had identified a 95 percent similarity between the Madonna’s face and that of Raphael’s Sistine Madonna, housed in the Gemäldegalerie Alte Meister. Headlines followed swiftly. Certainty, it seemed, had arrived by algorithm.
At Art Recognition, the announcement immediately raised questions. Not about Raphael, nor about the painting’s quality, but about methodology. We therefore conducted an independent AI analysis of the de Brécy Tondo Madonna using our own system. Our result pointed in the opposite direction. Our model classified the painting as not autograph by Raphael, with an 85 percent probability. Rather than settling the debate, AI had now split it wide open.
The reason for this divergence lies in fundamentally different operational principles. The Bradford system was based on facial recognition. This kind of AI is trained to identify similarities between faces across variations in angle, lighting, age, or image quality. It is the technology used to unlock smartphones or sort photo libraries. Crucially, it is not designed for art authentication.
When applied to the de Brécy Tondo Madonna and the Sistine Madonna, the Bradford system did exactly what it was built to do. It identified that the two faces were very similar. This is hardly surprising. Renaissance artists routinely worked within shared ideals of beauty, especially when depicting the Virgin Mary. Similarity of facial type is a feature of the period, not proof of authorship. The leap from “these faces resemble each other” to “this painting is undoubtedly by Raphael” is not supported by the logic of the tool itself.
In effect, the AI output was overInterpreted. A facial resemblance was transformed into an attribution claim, and that claim was amplified by the press without a clear understanding of what the technology could, and could not, legitimately conclude. The result was a compelling story, but not a scientifically grounded one.
Art Recognition’s approach operates on a different level. Rather than isolating faces, our models analyze a constellation of artistic features: brushwork, chromatic behavior, compositional structure, object placement, and spatial relationships. The training datasets are assembled by teams of art historians and AI developers and include not only all known authentic works by an artist, but also a wide range of negative examples. These include forgeries, imitations, and stylistically adjacent works, allowing the system to learn not just what an artist does, but what others do when they attempt to imitate that artist.
In response to criticism of their initial methodology, the Bradford group later developed a second AI model, more closely aligned with the principles used by Art Recognition. This system again returned an “authentic” result. But here, another critical difference emerged. The Bradford model was trained on just forty-nine images. Art Recognition’s Raphael dataset consisted of more than one hundred. More importantly, Bradford’s negative examples did not include imitations of Madonna compositions, whereas our dataset deliberately incorporated fake Madonnas alongside imitations of other Raphael-related motifs, such as allegories and religious scenes.
This distinction matters. AI systems can only learn from what they are shown. Gaps in training data translate directly into gaps in understanding. When an AI has never been exposed to convincing imitations of a particular motif, it lacks the comparative framework necessary to evaluate them critically. The resulting conclusions may appear confident, but that confidence rests on incomplete knowledge.
In light of the controversy, Art Recognition made a deliberate choice. We released our full Raphael training dataset publicly, including images and documentation, so that others could examine, critique, and build upon it. Transparency is not a public relations gesture. It is a prerequisite for credibility when AI is used in a field as consequential as art authentication. The dataset is freely available here.
The de Brécy Tondo Madonna debate illustrates a central truth about AI in the art world. The technology itself is neither savior nor saboteur. Its conclusions are only as rigorous as its design, its data, and the care with which its results are interpreted. As AI’s influence grows, so too does the need for accountability, peer scrutiny, and methodological clarity. Without those, algorithms risk becoming oracles. With them, they become what they should be: powerful tools in a much larger conversation about how we decide what is real.