Skip to main content
WhatsApp
Get a Free QuoteQuote
Translation file formats hero background
Translation File Formats

Every File Format, Handled

From Word documents and InDesign layouts to XLIFF strings files and SRT subtitles, BeTranslated translates every format with the original structure preserved and ready to use.

18
Formats Supported
100+
Languages
20+
Years Experience
By Mike Bastin · Founder and CEOLast updated May 19, 2026
Why File Formats Matter

A File Is Not Text. It Is Structure With Text Inside.

The most expensive translation mistake is treating every source as plain text. A Microsoft Word DOCX contains paragraph styles, section breaks, footnotes, embedded objects, and revision metadata. An Adobe InDesign IDML carries master pages, anchored frames, character-style overrides, and language-tagged paragraphs that drive hyphenation. An XLIFF 2.1 file carries source segments, translator notes, locked variables, and inline placeholders that, if broken, will crash the build pipeline of the application that consumes it.

Format-preserving translation is the discipline of working inside that structure rather than around it. The same approach underpins our technical translation work. A professional CAT tool (SDL Trados Studio, memoQ, Phrase TMS, Smartcat) parses the source format with a dedicated filter, exposes only the translatable strings to the translator, locks every tag and placeholder, and reassembles a target file that opens cleanly in the originating application. The difference between this and a copy-paste workflow is the difference between a file you ship and a file you rebuild.

The same discipline applies across every format we handle: Word, Excel, PowerPoint, and PDF on the office side; Adobe InDesign, FrameMaker, Illustrator, and AutoCAD on the print and engineering side; XLIFF, JSON, YAML, ARB, .strings, .resx, and PO on the localization-engineering side for website translation; and SRT, VTT, TTML, and SBV on the audiovisual side. Each one has a translation-aware filter, a quality-control protocol, and a delivery format that is ready for the application that will read it.

The Professional Toolkit

CAT Tools and Industry Standards

The software and standards behind every BeTranslated delivery. Translation memory, terminology, and interchange formats are portable: every TMX and TBX we generate belongs to your account.

Computer-Assisted Translation (CAT) Tools

SDL Trados Studio

The industry-standard CAT tool. Native handling of SDLXLIFF, IDML, FrameMaker MIF, and 50+ filters.

memoQ

Server-based translation environment used in regulated sectors (medical, legal, finance) with full XLIFF 2.1 support.

Phrase TMS

Cloud TMS (formerly Memsource) with API and connector ecosystem for Drupal, WordPress, Contentful, Figma, GitHub.

Smartcat

Cloud platform with built-in translator marketplace. Strong on JSON, YAML, ARB and modern web/app strings files.

Wordfast

Lightweight TMX-first toolkit popular for freelance and DTP-heavy projects with Word and InDesign sources.

OmegaT

Open-source CAT tool. Supported as an interchange path when client tooling demands plain XLIFF or TMX outputs.

Open Standards and Interchange Formats

XLIFF 2.1 (OASIS)

The OASIS XML interchange format. The lingua franca for handing translatable strings between CMS, CAT tools, and TMS.

TMX 1.4b (LISA)

Translation Memory eXchange: lets translation memories move between Trados, memoQ, Phrase, and Wordfast without loss.

TBX (ISO 30042)

TermBase eXchange. The ISO 30042 standard for sharing client glossaries and terminology databases across vendors.

ITS 2.0 (W3C)

W3C Internationalization Tag Set. Marks translatability, locale rules, and confidentiality directly inside HTML and XML.

Unicode UTF-8 / UTF-16

Character encoding non-negotiable for CJK, RTL Arabic and Hebrew, Cyrillic, Greek, and combined-script source files.

Workflow Aligned with ISO 17100

International standard for translation services workflow. BeTranslated works in alignment with the ISO 17100 process.

Country and Locale Variants

Where File Formats Meet Local Conventions

The same DOCX or IDML behaves differently in different countries: paper size, date and number formats, official languages per territory, and locale-specific spelling all change the deliverable. We localize for the country and the format together.

United Kingdom

en-GB

British English spelling (organisation, colour, recognise) and DD/MM/YYYY dates. A4 paper. GBP currency. Layout templates routinely expect £ in pre-decimal placement and Royal Mail address blocks. Default Word and InDesign installations in the UK ship with A4 page setup; US source files at Letter need a re-flow, not just a translation.

Belgium

nl-BE / fr-BE / de-BE

Officially trilingual: Dutch (Flemish), French (Walloon), and German. Same DOCX or IDML source, three target locales. nl-BE differs from nl-NL in vocabulary and tone (e.g. Flemish prefers the formal U where Dutch Dutch uses je). fr-BE and fr-FR differ on numerals (septante and nonante in Belgium) and EU-specific legal terminology. Source files are typically delivered as one master IDML translated into three locales.

Netherlands

nl-NL

Dutch Dutch (the prestige register), DD-MM-YYYY dates, comma decimal separator (1.234,56), Euro after the amount in informal use. A4 paper. Strong preference for sentence-case headings in marketing copy. Many NL clients ship XLIFF exports from headless CMS (Contentful, Storyblok) for product strings.

Germany

de-DE

German routinely expands 25-35 percent versus English. Compound nouns (Datenschutz-Grundverordnung) need DTP reflow in IDML and FrameMaker MIF. Comma decimal separator, period thousand separator, DD.MM.YYYY dates. GDPR is referenced under its German statutory name in legal and privacy copy for de-DE output. Capitalization of all nouns is non-negotiable. ß or ss depending on Swiss vs German market.

Australia

en-AU

British-derived spelling with local divergences (program not programme; -ise endings). A4 paper. AUD currency. Local English copy often substituted for en-GB on assumption, then fails on idiom and product name (e.g. capsicum not bell pepper). en-AU should be quoted and translated as a distinct locale, not as a side-effect of British English delivery.

United States

en-US

US English spelling, MM/DD/YYYY dates, US Letter paper (8.5 x 11 inches) which breaks pagination of A4-designed InDesign files. USD currency. ZIP codes in address blocks. Word and InDesign installs default to Letter page setup. Customary units (inches, pounds, Fahrenheit) often need conversion in metric source content for marketing localization.

South Africa

en-ZA / af-ZA / zu-ZA + 8 more

Eleven official languages including English, Afrikaans, isiZulu, isiXhosa, Sesotho, and Setswana. en-ZA follows British conventions but with local vocabulary (robot for traffic light). A4 paper. ZAR currency. Government and NGO content is frequently produced in 4 to 6 of the official languages from one DOCX source, with isiZulu and isiXhosa requiring careful Unicode handling for click consonants and tone marks.

Morocco

ar-MA / fr-MA / ber

Three working languages: Modern Standard Arabic (ar-MA), French (fr-MA, the language of business and higher education), and Tamazight (ber, official since 2011, written in Tifinagh script). Source documents commonly need to ship in two or three locales. Arabic requires RTL handling in IDML (World-Ready Composer or Adobe InDesign ME). Tifinagh requires Unicode-compliant fonts (e.g. Noto Sans Tifinagh) and rarely ships in default InDesign installations.

Common Questions

File-Format Translation FAQ

Why does a Word document need special handling for translation?

A Microsoft Word DOCX file is not text. It is text plus invisible structure: paragraph styles, section breaks, header and footer content, embedded tables, footnotes, comments, tracked changes, and revision history. A translator who delivers plain text leaves you to rebuild the document by hand. We translate inside the DOCX format using Trados Studio or memoQ filters as part of our document translation workflow, so the file you receive opens in Word with structure intact and ready to ship.

Can you translate scanned PDFs and image-based PDFs?

Yes. Image-based PDFs are run through OCR (ABBYY FineReader for production work) and rebuilt as either editable Word or InDesign source. Vector PDFs exported from InDesign or Illustrator are typically converted back to the source format (IDML or AI) so layout, fonts, and image links survive translation. The deliverable is a press-ready PDF that matches the original visual hierarchy.

What is XLIFF and why does the industry use it?

XLIFF (XML Localization Interchange File Format, OASIS standard, current version 2.1) is the universal interchange format for translatable strings. It carries source text, target text, segment status, translator notes, and locked variables in one structured XML file. Every modern CAT tool and TMS reads XLIFF natively, which means a project starting in Phrase TMS can finish in Trados Studio without manual re-import. For developers, XLIFF is the format of choice for mobile apps, web localization, and CMS exports.

Will my Adobe InDesign layout still work after translation?

Yes. We translate from the IDML interchange format (Adobe InDesign Markup Language), which carries paragraph and character styles, master pages, anchored objects, and text frame links. The translated IDML opens in InDesign with layout preserved. Where target languages need expansion (German routinely runs 30 percent longer than English) or contraction (Chinese Simplified can shrink 40 percent) our DTP team reflows in InDesign so the final PDF matches the source design.

Do you support right-to-left and CJK source files?

Yes. Arabic, Hebrew, Persian, and Urdu are handled with the original RTL paragraph direction preserved (in IDML this requires the World-Ready Composer or Adobe InDesign ME). Chinese, Japanese, and Korean files are translated with full Unicode handling (UTF-8 or UTF-16 depending on source), correct font fallback for CJK glyphs, and vertical typesetting where the source uses it. We deliver a file that opens cleanly in your target locale's native software.

Which file formats support translation memory and glossary reuse?

Every format we support feeds a TMX-format translation memory and a TBX-format termbase. That means every DOCX, IDML, XLIFF, JSON, or SRT file you send us trains a memory specific to your account. Repeat content on the next project is reused automatically (often 20-40 percent of a typical brand's content), reducing both turnaround time and cost. The TMX and TBX files belong to you and are portable to any other vendor.

Looking for certified document translation?

USCIS filings, contracts, medical records, and official certificates translated with a signed Certificate of Accuracy.

Document Translation
Get Started

Need Your Files Translated?

Send us your file and language pair. We will quote as quickly as possible, with formatting and layout preserved throughout.

Open WhatsApp chatTranslation File Formats: DOCX, PDF, XLIFF, IDML, JSON