Skip to content

Translation Needs Experts: Why “Language-to-Language Mapping” Isn’t Enough

Most people talk about translation as if it’s a simple lookup problem:
English in, French out. Swap the words, keep the meaning, done.

 

But anyone who’s worked on real translation—especially in government, defence, or regulated environments—knows the truth:

Translation is not word replacement. It’s structure, syntax, intent, tone, and constraints—all happening at once.

And if you’re translating inside an organization where language is operational (bilingual compliance, policy accuracy, format preservation, acronyms, controlled terminology), you don’t just need a model that knows words.

You need a system with experts.

That’s why we’ve been building a different kind of translation engine inside CastleGuard AI—one where the model learns to route parts of a sentence to the right specialist, not just generate the next token with a single monolithic brain.

 

Why translation is fundamentally an “expert problem”

A sentence isn’t a bag of words. It’s a hierarchy of decisions:

  • Where does the sentence begin and how is emphasis signaled?

  • What clause is being continued, and what does it attach to?

  • Which grammatical ending is required (tense, number, agreement)?

  • What boundaries or control tags define the translation context?

  • How do you preserve formatting and structure without damaging meaning?

In practice, translation requires coordinated competence across micro-skills:

  • Morphology (word endings, inflections, agreement)

  • Syntax (ordering, clause structure, dependencies)

  • Discourse (flow across sentences, connective tissue)

  • Boundaries and control (tags, formatting markers, segmentation)

  • Terminology discipline (acronyms, domain phrases, “don’t translate this” rules)

Human translators do this naturally by switching between modes—sometimes focusing on terminology, sometimes on grammar, sometimes on tone and structure.

So we asked: why shouldn’t the model work the same way?

 

Our approach: a model that learns who should handle what

In our latest architecture, we use a Mixture-of-Experts (MoE) design with:

  • 12 experts in the encoder

  • 12 experts in the decoder

Then we let training do what training does best: discover the division of labour.

After roughly 3 hours of training on a single RTX 4090, we saw something that matters far more than a benchmark score at this stage:

The model began to specialize experts by sentence function.

Not “topic experts” yet (that comes later), but something arguably more foundational:

structural experts—the mechanics of producing correct language.

That’s exactly what you want early, because it’s the base layer of translation quality: grammar, boundaries, continuity, and control.

 

What specialization looks like in practice

Here’s what emerged in this run: experts that reliably dominate specific sentence roles—confirmed by where they activate and the n-gram patterns they “own”.

Sentence-start & emphasis expert

Handles capitalized openings and emphasis phrasing (e.g., “The…”, “Over time…”)—critical for maintaining natural tone and rhythm.

Morphology experts

Specialize in endings like pluralization, gerunds, participles, and technical suffixes (e.g., -ing, -ents, -tion, -ment).
This matters because French agreement, verb forms, and noun gender aren’t optional—errors here destroy credibility fast.

Boundary & control-token experts

Specialize in transitions like language tags, separators, and prompt boundaries.
This is essential for production translation systems where you must preserve structure, formatting markers, and document segmentation.

Discourse glue experts

Handle connective phrasing like “In our…”, “For one…”, “at the same time…”.
This is how you avoid translations that feel like sentence fragments stitched together.

Primary continuation engine

Every MoE learns a “main fluency” expert that carries common function words and keeps generation coherent.

The key point: the system is learning to route—not just generate.

That’s the difference between a model that outputs text and a model that behaves like a translation workflow.

 

Why this matters for real-world translation (especially government)

In government and defence environments, translation isn’t a convenience feature—it’s a workload at scale:

  • bilingual documentation

  • policy and standards

  • procurement, HR, operational writing

  • strict terminology requirements

  • formatting preservation (Word/PowerPoint/PDF workflows)

CastleGuard was designed for exactly this reality: on-premise, air-gapped, and sovereign—so translation can happen where the data already lives, without sending sensitive content to external services.

And because our translation stack is built to learn organizational language (acronyms, phrases, naming conventions), the human translator’s job shifts from re-creating everything to reviewing and refining—the highest-value part of translation.

This is also why we built Evia, our French-Canadian model optimized for Government of Canada terminology and bilingual realities.

 

The bigger idea: controllable translation, not just “better translation”

A monolithic model is hard to govern. When it fails, it fails as a whole.

Experts give you something different:

  • Interpretability: you can observe which specialist handled what

  • Control: you can encourage or constrain certain behaviours

  • Efficiency: specialists can be smaller and faster than one huge generalist

  • Adaptation: you can teach one expert a new skill without retraining everything

Over time, structural experts can evolve into domain experts (policy, legal, technical), but the foundation has to come first.

And in this run, we’re already seeing the foundation being built.

 

Why we’re excited about what happened on a single 4090

Because it demonstrates something important about the direction we’re taking:

You don’t need a massive cloud training run to begin building specialization.

With the right architecture, the model learns to split language into functional responsibilities quickly—and that specialization is exactly what translation requires.

It’s a practical path to:

  • better grammatical correctness

  • stronger formatting/control fidelity

  • more consistent tone and flow

  • safer, more governable deployment in closed networks

Which is the entire point of CastleGuard AI.