ChatGPT vs Gemini for screenshot translation

I ran the same test on both. One real App Store screenshot, target language Japanese, asked the model to "translate the text in the image and return the same screenshot in Japanese."

What both got right

The translation itself is good. ChatGPT and Gemini both produce natural Japanese, with appropriate kanji density. For pure copy, either one is enough.

Where they break

The image output is the problem.

Resolution. ChatGPT returned 1024×1024. Gemini returned 1024×1536. Neither matches App Store specs (1320×2868 for iPhone 16 Pro Max).
Phone shape. Both lost the rounded corners. Both shifted the safe area. The status bar drifted.
Fonts. Both substituted a generic sans for San Francisco. Apple specifically watches for this in screenshots that show iOS UI.
Text fit. Long Japanese phrases broke the line at the wrong character. CJK does not break on spaces, so a naive line-break gives unreadable output.

Why this happens

Foundation image models are tuned for general image generation. They are not constrained to a specific output resolution and layout. You can ask in the prompt — they will not enforce it.

The fix is to wrap the model in a pipeline that sets the exact output dimensions, picks a font that matches the target locale, and applies layout rules per language family. That is what lokal does, on top of the same models.

Bottom line

Use ChatGPT or Gemini for the translation step alone — they are excellent at it. Do not use them as one-shot screenshot generators. The output will be rejected.

Y

Written by Yassine

Indie dev. Built lokal because translating App Store screenshots by hand was eating my launches. Reach out at hi@lokall.app.

← Older post

App Store localization fallbacks: why English screenshots leak

Newer post →

Screenshot text overlay rules for App Store and Google Play

Try lokal on your screenshots.

Translate into every language in seconds.