The current consensus in the digital humanities is that AI models are opaque but useful. In the visual digital humanities, for instance, multimodal approaches like CLIP increasingly facilitate previously metadata-dependent tasks like image retrieval. At the same time, the move to always larger, and often proprietary, pre-trained systems amplifies a long-standing lack of theoretical reflection in the digital humanities. If large visual models like CLIP are indeed models of visual culture, how is (visual) culture actually “modeled”, that is, represented functionally? What, in other words, are large visual models models of? In the talk, I will attempt to answer this question by discussing common strands in current (technical and critical) research. More specifically, I will present results from a recent study on the representation of history in large visual models, which exposes several significant limitations in the “speculative” abilities of contemporary generative models in particular.