0

I am working on a test where I provide a CSV file with some rows, run the code using that CSV, generate another CSV, and then compare the results. In one of the input files, I was using the £ sign, while in the comparison file I was using . It was working fine before, but after adding more rows, it started failing, and now it’s reporting ´┐¢.

The issue may be related to the default encoding used by the code editor, but I’m not sure what the exact problem is.

write_to_s3 = (result_dataframe, s3):
   data_buffer = BytesIO()
   data_result.toPandas().to_csv(
      data_buffer,
      encoding='cp850',
   )
6
  • 1
    Don't share all the code, reduce to a minimal reproducible example. Commented Nov 26 at 18:27
  • It's an AWS Glue script where I am using CP850 to encode the output file Commented Nov 26 at 18:27
  • Click that edit link below your question and give us details to reproduce the problem. Commented Nov 26 at 18:29
  • FYI: From experimentation, ´┐¢ is UTF-8-encoded REPLACEMENT CHARACTER(�) incorrectly decoded as CP850 (The Western Europe encoding used by cmd.exe in Windows). If you are using CP850, why did you tag the question with cp866? Commented Nov 26 at 18:34
  • 1
    That's not a minimal reproducible example (read the link). Show a few lines of input and minimal, but complete code that reproduces the problem. We should be able to copy/paste/run the code with no changes to reproduce the problem. Commented Nov 26 at 20:56

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.