2

I have the following Python 3.7 code in an imported package, not modifiable by myself, that reads and decodes the systems CCompiler name provided by disttools:

subproc.py

import os
import sys
import locale
import subprocess
from distutils.ccompiler import new_compiler

ccompiler = new_compiler()
ccompiler.initialize()
cc = subprocess.check_output(f"{ccompiler.cc}", stderr=subprocess.STDOUT, shell=True)
encoding = os.device_encoding(sys.stdout.fileno()) or locale.getpreferredencoding()
print("Encoding:", encoding)
compiler_name = cc.decode(encoding).partition("\n")[0].strip()
print("Compiler name:", compiler_name)

When calling it directly, i.e. running subprocess.py directly, everything works fine. The compiler name is correctly identified as 'Microsoft (R) C/C++-Optimierungscompiler Version 19.42.34433 für x64' (I guess the ü is the origin of the issue)

However, when I call it with subprocess.Popen(), the os.device_encoding returns None instead of cp850, causing a the program to default to the windows encoding of cp1252, which then causes cc.decode(encoding) to raise "UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 62: character maps to <undefined>".

Here is how I start the subprocess:

call_subprocess.py

import subprocess

subprocess.Popen(
    [
        "python",
        "C:/path/to/subproc.py",
    ],
    stdout=subprocess.PIPE,
    stderr=subprocess.STDOUT,
)

My understanding is that os.device_encoding(sys.stdout.fileno()) cannot find an encoding, as the subprocess is running in the background, without a terminal. Furthermore, Windows will always provide cp1252 when queried with locale.getpreferredencoding().

Since I cannot edit the code within the external package, is there a way to call the subprocess to force either one of these commands to return cp850?

Variants I have tried to solve the problem

  1. Explicitly set encoding in Popen:
    subprocess.Popen(
      ...
      text=True,
      encoding="cp850",
    )
    
  2. Explicitly set PYTHONIOENCODING in subprocess environment:
    environ = os.environ.copy()
    environ['PYTHONIOENCODING'] = 'utf-8'
    ...
    subprocess.Popen(
      ...
      env=environ,
      encoding='utf-8',
    )
    
  3. Use subprocess.run() instead of subprocess.Popen()
  4. Various combinations of the solutions above.

Resources I have already looked at

 # Override locale.getdefaultlocale() for the subprocess.
 # This is necessary to avoid issues with the default locale on Windows.
 # It might cause issues on computers not in western countries, that do not use cp850.
 import _locale

 _locale._getdefaultlocale = lambda *args: ["en_US", "cp850"]
  • The other option I have found is not using stdout=subprocess.PIPE, at all, thanks to the comment of Barmar
2
  • 2
    When you call it directly, sys.stdout is the terminal window. When you call it with subprocess.Popen(), sys.stdout is a pipe. Pipes don't have a predefined encoding.
    – Barmar
    Commented Jan 20 at 16:06
  • Thank you, this seems to be the base of the problem.
    – Energeneer
    Commented Jan 28 at 10:46

2 Answers 2

1

I am not sure if there is a solution to your problem.

Documentation says:

os.device_encoding(fd)

Return a string describing the encoding of the device associated with fd if it is connected to a terminal; else return None.

Given Barmar's comment about pipes not having an encoding, I don't think we can stop this from returning None. Not even UTF-8 mode will help you here as your compiler string can't be parsed by UTF-8.

Likewise,

locale.getpreferredencoding(do_setlocale=True)

Return the locale encoding used for text data, according to user preferences. User preferences are expressed differently on different systems, and might not be available programmatically on some systems, so this function only returns a guess.

seems to look only at User Preferences, but you specified you wanted to avoid modifying preferences.

In your provided unmodifiable code, you have

encoding = os.device_encoding(sys.stdout.fileno()) or locale.getpreferredencoding()

Logically, there does not seem to be a way to change the encoding in the code above. This is where I would reconsider re-implementing the function in the package. But then again, I don't know how big your package is, so do what you think is best. I wish you the best of luck in finding your solution.

With that said, while experimenting, with the information you provided, I created a "demo" that can reproduce the error you're seeing without your compiler package. (You were right about the ü causing problems.) I will post the demo here in case that helps anyone else answer the question.

fake_compiler.py

import sys
output = 'Microsoft (R) C/C++-Optimierungscompiler Version 19.42.34433 für x64'
sys.stdout.buffer.write(output.encode('cp850'))

encoding_test.py

import os, sys, locale, subprocess

ppython = [r'C:\Program Files\Python312\python.exe', r'fake_compiler.py']
cc = subprocess.check_output(ppython, stderr=subprocess.STDOUT, shell=True)
encoding = os.device_encoding(sys.stdout.fileno()) or locale.getpreferredencoding()
#encoding = 'cp1252' # neither cp1252 or utf-8 can parse the string
#encoding = 'utf-8'
print("Encoding:", encoding)
compiler_name = cc.decode(encoding).partition("\n")[0].strip()
print("Compiler name:", compiler_name)

encoding_test_parent.py

import subprocess, os
from time import sleep
import _locale

environ = os.environ.copy()
environ['PYTHONIOENCODING'] = 'cp850'

_locale._getdefaultlocale = lambda *args: ["cp850"]

p = subprocess.Popen(
    [
        r"C:\Program Files\Python312\python.exe",
        r".\encoding_test.py",
    ],
    stdout=subprocess.PIPE,
    stderr=subprocess.STDOUT,
    shell=True,
    env=environ,
    #encoding="cp850"
)

sleep(0.4) # to let the subprocess run
out, err = p.communicate()
print(out.decode('utf-8'))

Good luck!

1
  • Thank you for your detailed response. The solutions I have found so far seem to be: 1. Overwriting getdefaultlocale() and 2. working without setting stdout for the process at all, i.e. not using stdout=subprocess.PIPE
    – Energeneer
    Commented Jan 28 at 10:49
0

The Win32 function GetOEMCP() returns the OEM code page number. It should work for the countries/regions with cp### aliases defined in the standard encodings listed in the Python codecs module:

import ctypes as ct
import ctypes.wintypes as w

kernel32 = ct.WinDLL('kernel32')
GetOEMCP = kernel32.GetOEMCP
GetOEMCP.argtypes = ()
GetOEMCP.restype = w.UINT

print(f'cp{GetOEMCP()}')

Output for US-localized Windows:

cp437
1
  • Thank you for your answer, however, I cannot edit the package where the issue occurs. Thus I can only change the call of the subprocess.
    – Energeneer
    Commented Jan 28 at 11:08

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.