Extract

Bredekamp was right to suggest that there is something especially difficult in identifying a painting of a chair. Computer scientists call this the cross-depiction problem; it was impractical before the early 2010s, because feature extractors which had been designed to work on photographs were not designed to generalize to other forms of picture making. Neural networks have also been getting better at cross-depiction image recognition. If the ResNet network, the winner of the 2015 ImageNet competition, is made to detect objects in drawings rather than photographs, its accuracy drops by more than half. The accuracy of CLIP (Contrastive Language-Image Pretraining), a 2021 multimodal network from OpenAI, drops by just 16 percent from photographs to drawings, partly because its training dataset is more varied, less exclusively photographic, and twenty-five times larger.

Where does this leave art history? We are a decade into a completely new capacity to automate certain kinds of vision, and with it a fundamental reconfiguration of the relationship between digital images and information: between (some kinds of) seeing and knowing. But the small trickle of work which attempts to cross machine vision with art history has struggled to make any real impact on the rest of the discipline. …