Mass convert a project to UTF-8 using Notepad++

Lately, I had to convert the encoding of a multimodule maven project from our default Cp-1252 encoding to UTF-8. Changing the project settings is rather easy and there are multiple guides availble on the internet, so I won’t re-invent the hot water.

The most dificult task however was converting all our source files from Cp-1252 to UTF-8 and preferably on Windows :) . I’ve been looking into applications that would auto-convert everything for me, but none of them actually converted to content, resulting in garbage files. I almost started converting all the files by hand using Notepad++ when I discovered this process could be automated !

First of all you’ll need to install the Python Script plugin using the Notepad++ Plugin Manager. Then, after installing and restarting Notepad++, you have to create a new script with the following code:

import os;
import sys;
filePathSrc="C:\\Temp\\UTF8"
for root, dirs, files in os.walk(filePathSrc):
	for fn in files:
	  if fn[-4:] != '.jar' and fn[-5:] != '.ear' and fn[-4:] != '.gif' and fn[-4:] != '.jpg' and fn[-5:] != '.jpeg' and fn[-4:] != '.xls' and fn[-4:] != '.GIF' and fn[-4:] != '.JPG' and fn[-5:] != '.JPEG' and fn[-4:] != '.XLS' and fn[-4:] != '.PNG' and fn[-4:] != '.png' and fn[-4:] != '.cab' and fn[-4:] != '.CAB' and fn[-4:] != '.ico':
		notepad.open(root + "\\" + fn)
		console.write(root + "\\" + fn + "\r\n")
		notepad.runMenuCommand("Encoding", "Convert to UTF-8 without BOM")
		notepad.save()
		notepad.close()

I think the code speaks for itself, just be 100% sure that you do the conversion to UTF-8 without the UTF-8 byte order mark (BOM) since javac does not support this special character.

If you have problems running the script, then first open the console (Plugins > Python Script > Show Console). Chances are that the indents got messed up (for those who don’t know Python, it doesn’t use curly brackets to identify a code block, it uses correct indentation instead).