A python utf gotcha

Written by

This one had me stumped for a while:

# -*- coding: utf-7 -*-
import datetime
from sqlalchemy import ForeignKey, Column
from sqlalchemy.types import Integer, Unicode, Boolean, DateTime

default_due_date = datetime.datetime.now() + datetime.timedelta(days=30)

Syntax error found on last line.

Hmm, bring up a python interpreter and type the last line in with the imports. Works fine.

It’s the first line that is the problem, I typoed it and made it utf-7 not utf-8. I suppose it means that it is case-insenstive. Still, it wasn’t too clear to me at least, what was going on.

Comments

11 responses to “A python utf gotcha”

June 7, 2014

liquidskydesign

Craig Small: A python utf gotcha http://t.co/IrIKiEfuii #debian #linux

Reply
June 7, 2014

liquidskydesign

Small Dropbear | A python utf gotcha http://t.co/IrIKiEfuii

Reply
June 7, 2014

SomeFeeds

Planet Debian: Craig Small: A python utf gotcha http://t.co/wYz0DiYgqH

Reply
June 7, 2014

sepp

The problem isn’t that it is case-insenstive, but that u’+’ encoded in UTF-7 is b’+-‘.

Reply
1. June 7, 2014
  
  Craig
  
  Ahh, is that it? I thought it was getting DateTime mixed with datetime but that was a furphy.
  So UTF-8 plus is UTF-7 plus,minus.
  
  That must do some odd things! Thanks for the note.
  
  Reply
  1. June 8, 2014
    
    Elessar
    
    @Craig : UTF-7 is simply another encoding for Unicode, which only uses 7 bits per byte. It is not very used but it would be an alternative to UTF-8 or UTF-16 + Base64. Now, indeed, the character plus (just character plus, or Unicode character plus if you like, but not UTF-8 plus, at it is an abstract character, independant from the encoding) is encoded in UTF-7 as two bytes that would be read +- when incorrectly decoding them as ASCII.
    
    Reply
2. June 7, 2014
  
  himdel
  
  What you say is true but his file is probably not actually saved as utf7 so the conversion goes the other way. So the problem is that b’+ ‘, when interpreted as UTF-7 yields just u’ ‘: b’+’ denotes start of base64 block, the block ends on the first non-[A-Za-z+/] character which is the space immediately behind the +, so just the + gets consumed.
  
  Reply
June 7, 2014

linuxsidareja

@planetdebian: Craig Small: A python utf gotcha http://t.co/iRX60BBj2R #arsipweb

Reply
June 7, 2014

planetdebian

Craig Small: A python utf gotcha: This one had me stumped for a while:

# -*- coding: utf-7 -*-
import dateti… http://t.co/RVVaBvZ5LR

Reply
June 7, 2014

PlanetFeeds

Craig Small: A python utf gotcha: This one had me stumped for a while:

# -*- coding: utf-7 -*-
import dateti… http://t.co/usOHiL9NpY

Reply
June 7, 2014

Gunnar Wolf

$ echo ‘default_due_date = datetime.datetime.now() + datetime.timedelta(days=30)’| iconv -f utf7 -t utf8
default_due_date = datetime.datetime.now() datetime.timedelta(days=30)

The ‘ + ‘ sequence becomes just ‘ ‘.

Reply

A python utf gotcha

Comments

11 responses to “A python utf gotcha”

Leave a Reply Cancel reply

More posts

epoll on pidfd

Debian WordPress 6.5

Debian WordPress 6.4.1

Devices with cgroup v2