I have the following Python 3.7 code in an imported package, not modifiable by myself, that reads and decodes the systems CCompiler name provided by disttools:
subproc.py
import os
import sys
import locale
import subprocess
from distutils.ccompiler import new_compiler
ccompiler = new_compiler()
ccompiler.initialize()
cc = subprocess.check_output(f"{ccompiler.cc}", stderr=subprocess.STDOUT, shell=True)
encoding = os.device_encoding(sys.stdout.fileno()) or locale.getpreferredencoding()
print("Encoding:", encoding)
compiler_name = cc.decode(encoding).partition("\n")[0].strip()
print("Compiler name:", compiler_name)
When calling it directly, i.e. running subprocess.py
directly, everything works fine. The compiler name is correctly identified as 'Microsoft (R) C/C++-Optimierungscompiler Version 19.42.34433 für x64'
(I guess the ü
is the origin of the issue)
However, when I call it with subprocess.Popen(), the os.device_encoding returns None
instead of cp850
, causing a the program to default to the windows encoding of cp1252
, which then causes cc.decode(encoding)
to raise "UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 62: character maps to <undefined>".
Here is how I start the subprocess:
call_subprocess.py
import subprocess
subprocess.Popen(
[
"python",
"C:/path/to/subproc.py",
],
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
)
My understanding is that os.device_encoding(sys.stdout.fileno())
cannot find an encoding, as the subprocess is running in the background, without a terminal. Furthermore, Windows will always provide cp1252
when queried with locale.getpreferredencoding()
.
Since I cannot edit the code within the external package, is there a way to call the subprocess to force either one of these commands to return cp850
?
Variants I have tried to solve the problem
- Explicitly set encoding in Popen:
subprocess.Popen( ... text=True, encoding="cp850", )
- Explicitly set
PYTHONIOENCODING
in subprocess environment:environ = os.environ.copy() environ['PYTHONIOENCODING'] = 'utf-8' ... subprocess.Popen( ... env=environ, encoding='utf-8', )
- Use
subprocess.run()
instead ofsubprocess.Popen()
- Various combinations of the solutions above.
Resources I have already looked at
- Subprocess uses wrong encoding on Windows
- Encoding error running in subprocess with captured output
- Changing the locale preferred encoding for the computer itself: Control panel > Clock and Region > Region > Administrative > Change system locale > Check Beta: Use Unicode UTF-8 > Reboot -> Works, but is undesired as the code must be executable on different machines without individual setup every time.
- Since in my case the subprocess was fairly isolated from other code and hat its own startup function, I used the following lines before the first locale import to override the return value of
locale.getdefaultlocale()
(Source):
# Override locale.getdefaultlocale() for the subprocess.
# This is necessary to avoid issues with the default locale on Windows.
# It might cause issues on computers not in western countries, that do not use cp850.
import _locale
_locale._getdefaultlocale = lambda *args: ["en_US", "cp850"]
- The other option I have found is not using
stdout=subprocess.PIPE,
at all, thanks to the comment of Barmar
sys.stdout
is the terminal window. When you call it withsubprocess.Popen()
,sys.stdout
is a pipe. Pipes don't have a predefined encoding.