I'm trying to translate my bash scripts using the gettext tools but I have a problem where the encoding seems to be wrong.
Let's say I have the following file called fr.po:
# French translations for my-package package
# Traductions françaises du paquet my-package.
# Copyright (C) 2025 THE my-package'S COPYRIGHT HOLDER
# This file is distributed under the same license as the my-package package.
# Automatically generated, 2025.
#
msgid ""
msgstr ""
"Project-Id-Version: my-package v0.0.1\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2025-11-25 20:36-0500\n"
"PO-Revision-Date: 2025-11-25 17:58-0500\n"
"Last-Translator: Automatically generated\n"
"Language-Team: none\n"
"Language: fr\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=2; plural=(n > 1);\n"
msgid "test-message"
msgstr "a à e é è ê ë i î ï o ô ö u ù û ü c ç n ñ"
Then I execute the following:
file --mime ./fr.po # output: ./fr.po: text/x-po; charset=utf-8
msgfmt --output-file='/usr/share/locale/fr/LC_MESSAGES/my-test.mo' ./fr.po
export TEXTDOMAINDIR=/usr/share/locale
export TEXTDOMAIN=my-test
export LANG=fr_CA.UTF-8
export LC_ALL=fr_CA.UTF-8
# The following command works as intended and prints this:
# a à e é è ê ë i î ï o ô ö u ù û ü c ç n ñ
gettext test-message
# However if I use the same command within a string or in a pipeline I get this:
# a □ e □ □ □ □ i □ □ o □ □ u □ □ □ c □ n
printf "$(gettext test-message)"
echo "$(gettext test-message)"
gettext test-message | cat
cat <(gettext test-message)
####
gettext test-message > out.txt
cat out.txt # output: a □ e □ □ □ □ i □ □ o □ □ u □ □ □ c □ n
file --mime out.txt # output: out.txt: text/plain; charset=iso-8859-1
As you can see in the last 3 lines above, gettext seems to encode my message in ISO-8859-1 which is not what I want.
How can I force gettext to give me my message in UTF-8 ? Or how can I work around this issue ?
I tried changing the terminal encoding with chcp.com 65001 but it didn't change anything. I also tried to place the .mo file in /usr/share/locale/fr.utf-8/... but with no avail.
I saw this question too which seems awfully close to my problem but I couldn't find any equivalent to bind_textdomain_codeset that I could call from a bash script.
Here's the content of my-test.mo in UTF-8:
ޒ , < H I 3 V 7 test-message Project-Id-Version: my-package v0.0.1
Report-Msgid-Bugs-To:
PO-Revision-Date: 2025-11-25 17:58-0500
Last-Translator: Automatically generated
Language-Team: none
Language: fr
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Plural-Forms: nplurals=2; plural=(n > 1);
a à e é è ê ë i î ï o ô ö u ù û ü c ç n ñ
Update
I found a workaround using iconv. For example:
printf "$(gettext test-message | iconv -f iso-8859-1 -t utf-8)"
However I suspect this workaround must only be used if the script is executed with git-bash (windows) since this problem doesn't exist on linux. On a linux system (and probably WSL) the output of gettext is already in UTF-8 so converting it again would probably result in an error or a bad output.
So I'm still looking for a better alternative where I wouldn't have to wrap gettext in a function to check which OS the script is beeing executed on.
sudobeforemsgfmt). As far as I'm aware of, I also don't have set any other environment variables impacting my locale, and all entries printed bylocaleshow the valuefr_CA.UTF-8. Using Bash 5.3.3 and gettext-tools 0.26sudowhich doesn't work in git-bash or at least not out-of-the-box. Also I have git's latest version which comes with bash 5.2.37windowsandgit-bashtags, sorry.my-test.mo?msgfmt. I added it to the question.