Abstract

This PhD project develops a theoretical framework for analyzing machine translation as a technology of bordering rather than a neutral instrument of linguistic mediation. Against dominant interpretations that define it as the automated transfer between equivalent meanings, it reconceptualizes machine translation as an epistemic and political process that produces boundaries, hierarchies, and distinct regimes of value. Contemporary language automation—from earlier statistical machine translation to large language models—is situated within longer histories of measurement, abstraction, and the organization of labor.

Methodologically, the study advances what it calls epistemologies of translation—that is, ways of analyzing how translation organizes knowledge, measurement, and linguistic difference—drawing on media theory and philosophy of language, semiotics, political economy, and translation theory. In particular, Juri Lotman’s concept of the semiosphere and its boundaries, together with Naoki Sakai’s critique of homolingual translation, are mobilized to reconstruct how translation mechanisms operate by both separating and connecting heterogeneous domains. This framework is further developed through Marxist and post-operaist accounts of linguistic and cognitive labor and through contemporary labor theories of automation, which interpret artificial intelligence as the abstraction and operationalization of semiotic decisions historically distributed across social cooperation. From this intersection of translation theory and automation theory, the study analyzes how language operates within AI systems and how linguistic labor is automated through them.

The study opens with a reconstruction of a material history of language automation that foregrounds labor and energy expenditure, in contrast to structuralist paradigms that locate meaning solely in internal linguistic regularities. The argument then unfolds across three central problem fields:

Problem fields

  1. The taxonomy of “high-resource” and “low-resource” languages in natural language processing is analyzed as a regime of bordering that converts historical and geopolitical asymmetries into technical indices of differential productive capacity.
  2. Tokenization is examined as a metric infrastructure for the discretization of language into computational units and theorized as a contemporary measure of linguistic labor.
  3. The predictive paradigm of contemporary language models is interpreted as a form of epistemic bordering and situated within a longer history of linguistic modeling and inferential techniques, from philosophical languages to vector space models and zero-shot translation.

The study concludes by examining digital metrics and platformization in higher education as an institutional site where translation, measurement, and bordering converge.

Across these domains, the dissertation argues that translation is not merely one application of AI to language but an operative form that reveals a deeper organizational logic of large-scale data processing. Translation marks the point at which cognitive and linguistic labor becomes legible to measurement and automation. Long treated as an opacity within communication, translation emerges as an epistemic scaffold through which cognitive effort is rendered comparable and evaluable.