Alessa Cross is on the founding team at Ventrilo AI.
Despite advances in multilingual modeling, most language AI systems remain anchored in English. Earlier generations, such as GPT-3, drew over 90% of their pretraining data from English sources, and even newer models inherit structural and linguistic biases shaped by English-dominant internet content. The result is a persistent gap between models that can speak many languages and those that can function across linguistic contexts.
Yet in enterprise settings, it remains common to treat these systems as “global-ready” out of the box. When we ask these models to serve users around the world, we’re regularly asking them to operate beyond their training context. The results may still be grammatically correct, but they’re often structurally or culturally misaligned.
If we want AI to be more than an English-speaking assistant with a multilingual dictionary, we must move past the illusion that translation equals localization. That begins with recognizing how deeply English-centric assumptions shape model behavior, and what it takes to build systems that scale across different languages and cultural domains.
The Cost Of Training In English First
Language models perform best in the languages they see most during training. As linguistic distance from English increases, so does performance degradation, especially in languages with different syntactic rules or limited training representation.
In lower-resource languages, models often misread tone or intent. A polite inquiry in Hindi or Japanese may be interpreted as vague or indecisive. Still, many benchmarks, evaluation datasets, annotation protocols and UX assumptions are also designed around American English.
Early voice assistants illustrate this well. Early versions of Siri struggled significantly with Arabic and regional dialects despite strong performance across the board. The issue was that the model failed to reflect how speakers naturally structure requests.
In many cases, users were forced to adopt English-like phrasing just to be understood, which is a complete reversal of the human-computer relationship. In human-centered design, technology is meant to adapt to the user, not the other way around. Instead of augmenting communication, the system becomes another barrier to it.
Why Localization Is Not Translation
Translating prompts into other languages does not localize a product. Localization is a systems-level redesign that touches the underlying model assumptions, UX patterns and even architecture. Consider, for example:
• Design Accommodations: Sentence length and structure vary significantly by language. German UI text can be up to three times longer than its English equivalent, which often requires layout redesigns to prevent truncation or overlap.
• Tone And Register Calibration: What reads as efficient in American English might come across as curt or even disrespectful in Japanese. Nuanced tone calibration requires cultural familiarity.
• Contextual Expectations: A summarization tool for U.S. hospitals must parse SOAP notes and colloquial shorthand. The same tool in France will need to work with structured, coded documentation and adhere to GDPR constraints.
When these factors are ignored, linguistic gaps can grow into usability failures, and eventually into market rejection.
Why Domain Matters As Much As Language
Language is just one dimension. For enterprise users, especially in regulated sectors, functional alignment often matters more than linguistic fidelity.
In the U.S., enterprise AI may be expected to integrate with Salesforce and Slack. In China, workflows depend on WeChat Work or regional CRMs. In India, workflows might span WhatsApp and legacy accounting platforms. When AI ignores these distinctions, it becomes a tool that teams work around, rather than one that augments their productivity. In healthcare or finance, a mismatch can be far more costly.
Even low-level decisions, such as whether your retrieval system applies a single global index or language-specific pipelines, shape usability. How does it prioritize query routing when users include partial translations? These decisions impact latency and quality, even before any output is generated. Designing for global use means understanding not just language, but the lived context of work.
Accepting The World’s Inputs
Most language models are trained on relatively clean, curated data. Real-world inputs are anything but. Especially in multilingual markets, users frequently mix languages and use non-standard grammar, local idioms, creative spelling and hybrid sentence structures. Systems need to account for:
• Robust normalization without flattening distinctions essential to meaning
• Fast, on-device language detection to route inputs to the right models quickly and accurately
• Region-specific preprocessing, because what counts as “clean” input in one market might erase meaning in another
For many users and global contexts, how well your AI handles edge cases is a prerequisite to usability.
The Feedback Loop As Core Architecture
No global AI product will be perfectly localized on day one. But systems that can learn from localized usage improve quickly if the infrastructure supports it. When translation models are built using user-generated content and a corpus which closely mirrors real-world usage, they consistently outperform models trained on broad, general-domain data. These results show the value of tuning your translations to what your users are saying, and suggest that adding direct feedback loops could improve quality even further, and more quickly.
Enabling that learning requires infrastructure that can:
• Capture user edits and corrections
• Cluster user abandonments and rephrased queries
• Feed localized usage patterns back into training
• Surface divergence between user expectations and model output
Often, what separates a brittle AI language product from a scalable platform is how effectively it can learn and improve from real usage.
Redefining ‘Global-Ready’
Global readiness is a question of whether your AI product feels intelligent and usable to users who don’t share your default assumptions. That requires investment across four distinct axes:
• Language: Beyond the translation layer, design systems that feel native in syntax, tone and usage.
• Domain: Align with local workflows, documentation styles, data structures and regulatory standards.
• Input: Engineer systems to understand how users actually speak and write.
• Feedback: Treat user interactions as training data. Build infrastructure that allows products to learn from localized usage, especially when they deviate from expected patterns.
AI products that ignore how global teams actually work tend to stall at the edge of familiar markets. Whether they scale into global workflows or stay locked in English-first regions often comes down to investments made early in design.
Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?
