Yesterday gave a talk at SLCPython Meetup Group about Python3. When I agreed to give this talk I had already read through many of the pages of changes (e.g. What's new in Python 3) and felt like I had a pretty good understanding of the changes. The thing I didn't realize was how much I would be learning about unicode.
Click here for Resources from my talk
Why Unicode is a Step Forward
Python 2 had several implicit uses of ASCII encoding/decoding. ASCII was the first, widely accepted system for encoding human language (i.e. English) to computer bytes. This process is necessary because humans don't express themselves well in bytes and computers don't do well with language.
Many human languages exist. Each has its own ways to express ideas. As computers have evolved and become used all around the globe the need for computers to encode/decode more than just English is paramount. The development of the Unicode system characters is a major step to solving this problem. The next step is for developers to implement Unicode as a foundation for computer systems. This step will make computers more accessible to people of all backgrounds and cultures.
Python Adoption of Unicode
Unicode handling is the primary reason for the break in backwards compatibility. In the words of core developer Nick Coghlan "Fixing [Unicode handling bugs] within the constraints of the Python 2 text model is considered too hard to be worth the effort."
The new text model in Python 3 treats all text as either byte arrays (machine language) or Unicode (human language(s)) with more explicit encoding/decoding. The diagram which helped me was this:
On the outside of the sandwich there's the computer and byte storage, on the inside is the Unicode which humans understand.
This new text model also allows people to write code using any Unicode character. For example, these scripts published on github use chinese characters (pretty fracking cool).
Fixing Up Loose Ends
Since the core developers broke backwards compatibility anyway, this has been a great opportunity to fix up many of the inconsistencies and gotcha's of Python 2. These are the changes most of us will notice. Things like maing print
a function or the deprecation of many functions and class methods (like raw_input
or dict.iterkeys
).
I found it useful to look online at several of the "what's new guides". I think Brian Curtin's Porting Guide is a nice and succinct place to start.
Now is the Time!
Python is a powerful language mostly because of it's libraries build by the community. The fear was that these packages would have trouble converting to Python3 (maybe never do) and that would be the end of Python. However, it seams to me that fear is no longer valid. By several measures, we're now at 75% support of Python3 by major libraries. One of these measures, is the Python 3 Readiness site which looks at the top 360 pip
packages and the adoption of Python3.
With almost all major libraries now supporting Python 3, it's up to the rest of us to adopt this new language. Good luck!
Other Online Materials
- What's New in Python 3 – for most of the Python 2.7 to Python 3 changes.
- What's New – for a VERY long list of everything new (though check out the dense but "short" summary)
- Pragmatic Unicode talk/essay – or "Why Python 3 Exists" - Coghlan
- Python 3 Porting Guide – nice quick reference for things which have changed from 2.x to 3x
- Porting to Python 3: An in-depth guide – Definitely in depth