> [gettext] also doesn’t solve the 2 hard problems. I would still have to write code to extract strings from source code and build a way to allow users to translate them easily.
The first problem is solved by the xgettext tool.
The second problem is solved by services like Transifex. Or by telling people to use Poedit and having them submit GitHub pull requests.
At the same time, gettext does solve the issue of plural forms. This solution does not.
Poedit is a cross platform offline GUI that makes it easy for translators to incrementally perform their work. This was a whole lot of unnecessary NIH. Even if LGPL isn't viable, you could use all the tooling and reimplement the hashing in your own code.
Using English strings both "as is" and as keys for translation to other languages is terrible idea. This is because the same English string may be translated differently depending on context - this includes different declensions/cases and synonyms. One should use abstract string keys instead and use translation for English like for any other language.
The solution to the "interface/tooling to translate" problem, at least for open source applications, is https://translatewiki.net/ , with the additional benefit that it comes with a team of experts that can help you understand how to deal with stuff you might be unfamiliar with, such as RTL languages and plural forms.
I'm also trying to avoid the gettext monstrosity to be introduced into our c++ codebase after making equality great experiences with a self-build solution in previous teams. It's ok so solve easy problems by yourself. Good for the author for thinking for himself.
Have you noticed context issues in the translated values?
Just translating literally does not always work.
I am building an Android application and facing this issue. I've been thinking about going down the rabbit hole and linking each text to be translated to a screenshot or similar thing, so that the user sees their proposed translation in context
Since this is a Windows C++ app, why not use MUI? It solves this exact problem, and since it's a standard part of the platform, there's broad tooling support for it.
Looking at strings in the linked apptranslator.org website, a lot of them have ampersands (&) e.g. "&Back" or "&Book View"
What do those mean?
Translations seem to put them in the middle sometimes
---
on the other hand, there are languages even with "0 untranslated strings" that have untranslated "unused strings" - what are those about, and is it okay to not have those be tracked?
Windows convention for access keys (as opposed to keyboard shortcuts). E.g. you can choose &File > &Open by pressing Alt-F then O (as an alternative to Ctrl-O), with the requisite keys underlined when you press Alt; you can choose the dialog option labelled &Abort by pressing Alt-A, with the requisite key always underlined in the traditional Win32 toolkit. (By convention, OK and Cancel don’t get access keys, as you can always use Enter and Escape respectively instead.) Not every menu item gets a shortcut, but in a competently designed Windows UI every one gets an accelerator.
(This idea is nice but not without its problems. For those of us who regularly use more than one keyboard layout or even system locale, it doesn’t work all that well.)
When you press ALT on windows you will see that some characters on menu items are underlined. When you press alt and these letters you can open that menu. To indicate which character to act as that you put & before that.
Not related to UI translation but I've been using Sumatra PDF on Linux through wine and it's worked really well. Using it to read and search through 50k page TRMs and it's very fast.
I had to implement multilingual UI for my app recently and the best method I found was using an LLM API inside a script which just runs on build and anytime a new text field gets added it gets the translations and stores it in a JSON where the right strings are picked based on system language or user settings. The annoying part was adjusting the entire UI to the new 'dynamic' string selection functionality but other than that, the tokens I had generated only cost maybe a dollar. I suggest the author looks into that, because waiting for user submissions could take some time
An LLM isn’t going to provide a high quality translation because it doesn’t understand the context of what the string means in the UI, cultural conventions/writing styles, etc. You might as well run every string through Google Translate. Localization isn’t just a 1-to-1 translation of the text - it requires tailoring for each language that a human translator can do best.
Did you just step out of a time machine from 10 years ago? Translation is considered a fully solved problem due to LLMs. If you're bilingual you can find out for yourself, LLMs are able to parse meaning and transfer writing styles perfectly well. You're able to prompt it with the context of what its translating and it understands that it has to return short meaningful text. I know that it might make mistakes, thats obvious, but considering the post we're replying to, an open source developer with a 0 dollar budget wont be able to hire human translators.
translation is absolutely not “100% solved” because LLMs exist. such a take is underplaying so much cultural and emotional connection it’s laughable. contempible even. if you are an OSS dev and have no money, you can find another OSS dev to help you translate instead of using offensive AI slop.
How much "cultural and emotional connection" do you think it takes to translate a "back" button into all the languages that your users speak? Do you understand that translating software UI is not a creative endeavor, its a simple logistical problem that you solve by using the best tools at hand? Why would doing this be "contempible"? This isnt like translating Dostoevsky into english, and I certainly would prefer a human do that task instead of an LLM. Also, how would you be able to tell that the text fields youre reading in your software were written using "offensive AI slop"? Your ideological priors are obviously blinding you to see and understand the value of the tool at hand, and I recommend you try to engage with them in an open minded way. I have the same recommendation to you as the previous reply, if youre bilingual, look at how well LLMs write translations for any text you provide them. It speaks for itself.
# Translate this string table into <language>. I've included screenshots of the application in your context. When appropriate, search the web for translation guides / screenshots of similar applications and see what words they used for analogous UI elements.
Good idea, translating a whole table of related UI fields in one request would definitely give the LLM more context to work with and I assume would lead to better results.
> [gettext] also doesn’t solve the 2 hard problems. I would still have to write code to extract strings from source code and build a way to allow users to translate them easily.
The first problem is solved by the xgettext tool.
The second problem is solved by services like Transifex. Or by telling people to use Poedit and having them submit GitHub pull requests.
At the same time, gettext does solve the issue of plural forms. This solution does not.
Poedit is a cross platform offline GUI that makes it easy for translators to incrementally perform their work. This was a whole lot of unnecessary NIH. Even if LGPL isn't viable, you could use all the tooling and reimplement the hashing in your own code.
https://poedit.net
> The second problem is solved by services like Transifex.
Just wanted to mention another of those services, that I discovered by contributing translation work for the jellyfin project:
https://weblate.org/
The jellyfin status can be seen here:
https://translate.jellyfin.org/projects/jellyfin/
Using English strings both "as is" and as keys for translation to other languages is terrible idea. This is because the same English string may be translated differently depending on context - this includes different declensions/cases and synonyms. One should use abstract string keys instead and use translation for English like for any other language.
The solution to the "interface/tooling to translate" problem, at least for open source applications, is https://translatewiki.net/ , with the additional benefit that it comes with a team of experts that can help you understand how to deal with stuff you might be unfamiliar with, such as RTL languages and plural forms.
Internationalization and localization are extremely hard problems. I know because I worked as technical translator for sone years.
But, in C++ land, I had very good success with Qt and its translation system in one of my open source projects.
That was 2010’ish, there are probably better ways now bit I don’t know.
I'm also trying to avoid the gettext monstrosity to be introduced into our c++ codebase after making equality great experiences with a self-build solution in previous teams. It's ok so solve easy problems by yourself. Good for the author for thinking for himself.
It is okay to solve easy problems by yourself. It is not okay to treat hard problems as easy problems because you lack domain knowledge.
Compared to gettext, OP's solution doesn't have positional formatting or support for additional plural forms.
Yup, it's a matter of understanding your scope. Positional formatting is pretty easy to add. But plural forms is where complexity explodes
Have you noticed context issues in the translated values?
Just translating literally does not always work.
I am building an Android application and facing this issue. I've been thinking about going down the rabbit hole and linking each text to be translated to a screenshot or similar thing, so that the user sees their proposed translation in context
Since this is a Windows C++ app, why not use MUI? It solves this exact problem, and since it's a standard part of the platform, there's broad tooling support for it.
Looking at strings in the linked apptranslator.org website, a lot of them have ampersands (&) e.g. "&Back" or "&Book View"
What do those mean?
Translations seem to put them in the middle sometimes
---
on the other hand, there are languages even with "0 untranslated strings" that have untranslated "unused strings" - what are those about, and is it okay to not have those be tracked?
Windows convention for access keys (as opposed to keyboard shortcuts). E.g. you can choose &File > &Open by pressing Alt-F then O (as an alternative to Ctrl-O), with the requisite keys underlined when you press Alt; you can choose the dialog option labelled &Abort by pressing Alt-A, with the requisite key always underlined in the traditional Win32 toolkit. (By convention, OK and Cancel don’t get access keys, as you can always use Enter and Escape respectively instead.) Not every menu item gets a shortcut, but in a competently designed Windows UI every one gets an accelerator.
(This idea is nice but not without its problems. For those of us who regularly use more than one keyboard layout or even system locale, it doesn’t work all that well.)
Looks like the ampersands appear in menu items to indicate the keyboard shortcut key to navigate to that item.
https://willus.com/mingw/colinp/win32/resources/menu.html
It's been a while, but that might mean that (in the example above) B is the shortcut key for that menu item.
When you press ALT on windows you will see that some characters on menu items are underlined. When you press alt and these letters you can open that menu. To indicate which character to act as that you put & before that.
Not related to UI translation but I've been using Sumatra PDF on Linux through wine and it's worked really well. Using it to read and search through 50k page TRMs and it's very fast.
I had to implement multilingual UI for my app recently and the best method I found was using an LLM API inside a script which just runs on build and anytime a new text field gets added it gets the translations and stores it in a JSON where the right strings are picked based on system language or user settings. The annoying part was adjusting the entire UI to the new 'dynamic' string selection functionality but other than that, the tokens I had generated only cost maybe a dollar. I suggest the author looks into that, because waiting for user submissions could take some time
An LLM isn’t going to provide a high quality translation because it doesn’t understand the context of what the string means in the UI, cultural conventions/writing styles, etc. You might as well run every string through Google Translate. Localization isn’t just a 1-to-1 translation of the text - it requires tailoring for each language that a human translator can do best.
Did you just step out of a time machine from 10 years ago? Translation is considered a fully solved problem due to LLMs. If you're bilingual you can find out for yourself, LLMs are able to parse meaning and transfer writing styles perfectly well. You're able to prompt it with the context of what its translating and it understands that it has to return short meaningful text. I know that it might make mistakes, thats obvious, but considering the post we're replying to, an open source developer with a 0 dollar budget wont be able to hire human translators.
translation is absolutely not “100% solved” because LLMs exist. such a take is underplaying so much cultural and emotional connection it’s laughable. contempible even. if you are an OSS dev and have no money, you can find another OSS dev to help you translate instead of using offensive AI slop.
How much "cultural and emotional connection" do you think it takes to translate a "back" button into all the languages that your users speak? Do you understand that translating software UI is not a creative endeavor, its a simple logistical problem that you solve by using the best tools at hand? Why would doing this be "contempible"? This isnt like translating Dostoevsky into english, and I certainly would prefer a human do that task instead of an LLM. Also, how would you be able to tell that the text fields youre reading in your software were written using "offensive AI slop"? Your ideological priors are obviously blinding you to see and understand the value of the tool at hand, and I recommend you try to engage with them in an open minded way. I have the same recommendation to you as the previous reply, if youre bilingual, look at how well LLMs write translations for any text you provide them. It speaks for itself.
# Translate this string table into <language>. I've included screenshots of the application in your context. When appropriate, search the web for translation guides / screenshots of similar applications and see what words they used for analogous UI elements.
Good idea, translating a whole table of related UI fields in one request would definitely give the LLM more context to work with and I assume would lead to better results.