os.device_encoding changes when called from a subprocess, causing decoding error on Windows. How to force encoding in subprocess?

Question

I have the following Python 3.7 code in an imported package, not modifiable by myself, that reads and decodes the systems CCompiler name provided by disttools:

subproc.py

import os
import sys
import locale
import subprocess
from distutils.ccompiler import new_compiler

ccompiler = new_compiler()
ccompiler.initialize()
cc = subprocess.check_output(f"{ccompiler.cc}", stderr=subprocess.STDOUT, shell=True)
encoding = os.device_encoding(sys.stdout.fileno()) or locale.getpreferredencoding()
print("Encoding:", encoding)
compiler_name = cc.decode(encoding).partition("\n")[0].strip()
print("Compiler name:", compiler_name)

When calling it directly, i.e. running subprocess.py directly, everything works fine. The compiler name is correctly identified as 'Microsoft (R) C/C++-Optimierungscompiler Version 19.42.34433 für x64' (I guess the ü is the origin of the issue)

However, when I call it with subprocess.Popen(), the os.device_encoding returns None instead of cp850, causing a the program to default to the windows encoding of cp1252, which then causes cc.decode(encoding) to raise "UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 62: character maps to <undefined>".

Here is how I start the subprocess:

call_subprocess.py

import subprocess

subprocess.Popen(
    [
        "python",
        "C:/path/to/subproc.py",
    ],
    stdout=subprocess.PIPE,
    stderr=subprocess.STDOUT,
)

My understanding is that os.device_encoding(sys.stdout.fileno()) cannot find an encoding, as the subprocess is running in the background, without a terminal. Furthermore, Windows will always provide cp1252 when queried with locale.getpreferredencoding().

Since I cannot edit the code within the external package, is there a way to call the subprocess to force either one of these commands to return cp850?

Variants I have tried to solve the problem

Explicitly set encoding in Popen:

subprocess.Popen(
  ...
  text=True,
  encoding="cp850",
)

Explicitly set PYTHONIOENCODING in subprocess environment:

environ = os.environ.copy()
environ['PYTHONIOENCODING'] = 'utf-8'
...
subprocess.Popen(
  ...
  env=environ,
  encoding='utf-8',
)

Use subprocess.run() instead of subprocess.Popen()
Various combinations of the solutions above.

Resources I have already looked at

Subprocess uses wrong encoding on Windows
Encoding error running in subprocess with captured output
Changing the locale preferred encoding for the computer itself: Control panel > Clock and Region > Region > Administrative > Change system locale > Check Beta: Use Unicode UTF-8 > Reboot -> Works, but is undesired as the code must be executable on different machines without individual setup every time.
Since in my case the subprocess was fairly isolated from other code and hat its own startup function, I used the following lines before the first locale import to override the return value of locale.getdefaultlocale() (Source):

 # Override locale.getdefaultlocale() for the subprocess.
 # This is necessary to avoid issues with the default locale on Windows.
 # It might cause issues on computers not in western countries, that do not use cp850.
 import _locale

 _locale._getdefaultlocale = lambda *args: ["en_US", "cp850"]

The other option I have found is not using stdout=subprocess.PIPE, at all, thanks to the comment of Barmar

When you call it directly, sys.stdout is the terminal window. When you call it with subprocess.Popen(), sys.stdout is a pipe. Pipes don't have a predefined encoding. — Barmar, Commented Jan 20 at 16:06

Xerus Lord · Accepted Answer · 2025-01-27 21:39:12Z

I am not sure if there is a solution to your problem.

Documentation says:

os.device_encoding(fd)

Return a string describing the encoding of the device associated with fd if it is connected to a terminal; else return None.

Given Barmar's comment about pipes not having an encoding, I don't think we can stop this from returning None. Not even UTF-8 mode will help you here as your compiler string can't be parsed by UTF-8.

Likewise,

locale.getpreferredencoding(do_setlocale=True)

Return the locale encoding used for text data, according to user preferences. User preferences are expressed differently on different systems, and might not be available programmatically on some systems, so this function only returns a guess.

seems to look only at User Preferences, but you specified you wanted to avoid modifying preferences.

In your provided unmodifiable code, you have

encoding = os.device_encoding(sys.stdout.fileno()) or locale.getpreferredencoding()

Logically, there does not seem to be a way to change the encoding in the code above. This is where I would reconsider re-implementing the function in the package. But then again, I don't know how big your package is, so do what you think is best. I wish you the best of luck in finding your solution.

With that said, while experimenting, with the information you provided, I created a "demo" that can reproduce the error you're seeing without your compiler package. (You were right about the ü causing problems.) I will post the demo here in case that helps anyone else answer the question.

fake_compiler.py

import sys
output = 'Microsoft (R) C/C++-Optimierungscompiler Version 19.42.34433 für x64'
sys.stdout.buffer.write(output.encode('cp850'))

encoding_test.py

import os, sys, locale, subprocess

ppython = [r'C:\Program Files\Python312\python.exe', r'fake_compiler.py']
cc = subprocess.check_output(ppython, stderr=subprocess.STDOUT, shell=True)
encoding = os.device_encoding(sys.stdout.fileno()) or locale.getpreferredencoding()
#encoding = 'cp1252' # neither cp1252 or utf-8 can parse the string
#encoding = 'utf-8'
print("Encoding:", encoding)
compiler_name = cc.decode(encoding).partition("\n")[0].strip()
print("Compiler name:", compiler_name)

encoding_test_parent.py

import subprocess, os
from time import sleep
import _locale

environ = os.environ.copy()
environ['PYTHONIOENCODING'] = 'cp850'

_locale._getdefaultlocale = lambda *args: ["cp850"]

p = subprocess.Popen(
    [
        r"C:\Program Files\Python312\python.exe",
        r".\encoding_test.py",
    ],
    stdout=subprocess.PIPE,
    stderr=subprocess.STDOUT,
    shell=True,
    env=environ,
    #encoding="cp850"
)

sleep(0.4) # to let the subprocess run
out, err = p.communicate()
print(out.decode('utf-8'))

Good luck!

Thank you for your detailed response. The solutions I have found so far seem to be: 1. Overwriting getdefaultlocale() and 2. working without setting stdout for the process at all, i.e. not using stdout=subprocess.PIPE — Energeneer, Commented Jan 28 at 10:49

Mark Tolonen · Accepted Answer · 2025-01-20 23:26:12Z

0

The Win32 function GetOEMCP() returns the OEM code page number. It should work for the countries/regions with cp### aliases defined in the standard encodings listed in the Python codecs module:

import ctypes as ct
import ctypes.wintypes as w

kernel32 = ct.WinDLL('kernel32')
GetOEMCP = kernel32.GetOEMCP
GetOEMCP.argtypes = ()
GetOEMCP.restype = w.UINT

print(f'cp{GetOEMCP()}')

Output for US-localized Windows:

cp437

edited Jan 20 at 23:26

answered Jan 20 at 23:21

Mark Tolonen

179k26 gold badges180 silver badges272 bronze badges

Thank you for your answer, however, I cannot edit the package where the issue occurs. Thus I can only change the call of the subprocess.
– Energeneer
Commented Jan 28 at 11:08

Add a comment |

Collectives™ on Stack Overflow

os.device_encoding changes when called from a subprocess, causing decoding error on Windows. How to force encoding in subprocess?

subproc.py

call_subprocess.py

Variants I have tried to solve the problem

Resources I have already looked at

2 Answers 2

Linked

Hot Network Questions

Collectives™ on Stack Overflow

subproc.py

call_subprocess.py

Variants I have tried to solve the problem

Resources I have already looked at

2 Answers 2

Linked

Related