4

I'm sure that I've misconfigured something here, but I can't see what it is.

In Django, I've got a model field that says this:

short_url_slug = AutoSlugField(slugify=short_url_slugify, populate_from=id, blank=False, unique=True)

South creates a migration (seemingly) correctly:

'short_url_slug': ('autoslug.fields.AutoSlugField', [], {'unique_with': '()', 'max_length': '50', 'populate_from': 'None', 'blank': 'True'}),

My Postgresql DB is UTF8:

\l

(MyDBName)                      | (username) | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 

And I have a real life unicode character:

u'\xa4'

But when I write this to the DB, and try to read it out, I get:

In [3]: this_instance.short_url_slug
Out[3]: u'o'

Thoughts? My suspicion is that Postgresql needs to have a different character encoding, but I'm not sure what it should be (if so) or how to do it.

Edit With Additional Info

SELECT version(), current_setting('standard_conforming_strings') AS scs;

PostgreSQL 9.2.4 on x86_64-apple-darwin11.4.2, compiled by i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.9.00), 64-bit | on

(END) 

Python Version:

Python 2.7.2 (default, Oct 11 2012, 20:14:37)

Django Version:

In [2]: django.VERSION
Out[2]: (1, 5, 1, 'final', 0)

psycopg2:

$ pip freeze | grep psycopg2
psycopg2==2.5

Raw log from postgresql:

LOG:  statement: UPDATE [...lots of stuff removed...] "short_url_slug" = 'o' [... rest of the stuff ...]

So, it looks like it's not even getting to Postgresql. But when I break at the line in the insertion, the variable definitely has the unicode value.

(Pdb) response.short_url_slug
u'\xd6'

(this is after assignment in Python, but before response.save())

More Output:

The way that I am detecting that the unicode is getting munged is that the database uniqueness constraint is getting violated. This can be tested on outputting this content into models (with the constraint off):

In [11]: all = Response.objects.all()

In [12]: all[0].short_url_slug
Out[12]: u'o'

In [13]: all[4].short_url_slug
Out[13]: u'o'

In [14]: all[4].short_url_slug == all[0].short_url_slug
Out[14]: True
4
  • Please show the following additional details: Output of the query SELECT version(), current_setting('standard_conforming_strings') AS scs;, and your Python, Django and psycopg2 (or whatever DB adapter you're using) versions. It would also be very helpful to turn log_statement = 'all' on in postgresql.conf, reload PostgreSQL, and examine the logs to identify the text of the suspect INSERT as PostgreSQL sees it. Commented Jul 22, 2013 at 8:18
  • U+00A4 (CURRENCY SIGN) is a completely different character from U+00D4 (LATIN CAPITAL LETTER O WITH CIRCUMFLEX), for which o is absolutely a valid slugification. Commented Jul 23, 2013 at 4:18
  • Do you have unidecode installed? What about pytils? Commented Jul 23, 2013 at 4:22
  • Interesting ... but I'm getting a unique constraint violation on the column when I try to add it. Commented Jul 23, 2013 at 5:29

1 Answer 1

2

Django slugify doesn't support unicode, you should use unicode-slugify

(As read in two scoops of Django http://django.2scoops.org/)

Sign up to request clarification or add additional context in comments.

4 Comments

Interesting - but I'm using AutoSlug and a custom slugifyer - shouldn't that go around Django's defaults?
What is your custom slugifyer? by default it's django's one it pyutils or unicode is not installed. It make's me think about the normalyze method pt the unicodedata module. But i wasn't abale to reproduce your results with the default 4 forms...
in the documentation s is said that if unidecode or pyutils are not installed it use the defaut django engine. could you try with the "é" u'\xe9' and see if it's translated to e??
I actually ended up using another solution, so I am unable to test this :(, but I DEFINITELY missed this as part of the documentation, and was almost certainly the cause!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.