A python utf gotcha

This one had me stumped for a while:

# -*- coding: utf-7 -*-
import datetime
from sqlalchemy import ForeignKey, Column
from sqlalchemy.types import Integer, Unicode, Boolean, DateTime

default_due_date = datetime.datetime.now() + datetime.timedelta(days=30)

Syntax error found on last line.

Hmm, bring up a python interpreter and type the last line in with the imports. Works fine.

It’s the first line that is the problem, I typoed it and made it utf-7 not utf-8. I suppose it means that it is case-insenstive. Still, it wasn’t too clear to me at least, what was going on.

Enhanced by Zemanta


11 responses to “A python utf gotcha”

  1. The problem isn’t that it is case-insenstive, but that u’+’ encoded in UTF-7 is b’+-‘.

    1. Ahh, is that it? I thought it was getting DateTime mixed with datetime but that was a furphy.
      So UTF-8 plus is UTF-7 plus,minus.

      That must do some odd things! Thanks for the note.

      1. @Craig : UTF-7 is simply another encoding for Unicode, which only uses 7 bits per byte. It is not very used but it would be an alternative to UTF-8 or UTF-16 + Base64. Now, indeed, the character plus (just character plus, or Unicode character plus if you like, but not UTF-8 plus, at it is an abstract character, independant from the encoding) is encoded in UTF-7 as two bytes that would be read +- when incorrectly decoding them as ASCII.

    2. himdel Avatar

      What you say is true but his file is probably not actually saved as utf7 so the conversion goes the other way. So the problem is that b’+ ‘, when interpreted as UTF-7 yields just u’ ‘: b’+’ denotes start of base64 block, the block ends on the first non-[A-Za-z+/] character which is the space immediately behind the +, so just the + gets consumed.

  2. $ echo ‘default_due_date = datetime.datetime.now() + datetime.timedelta(days=30)’| iconv -f utf7 -t utf8
    default_due_date = datetime.datetime.now() datetime.timedelta(days=30)

    The ‘ + ‘ sequence becomes just ‘ ‘.

Leave a Reply

Your email address will not be published. Required fields are marked *