The PEPs of Python 3.9 needs you!

Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing

By Jake Edge
May 20, 2020

With the release of Python 3.9.0b1, the first of four planned betas for the development cycle, Python 3.9 is now feature-complete. There is still plenty to do in terms of testing and stabilization before the October final release. The release announcement lists a half-dozen Python Enhancement Proposals (PEPs) that were accepted for 3.9. We have looked at some of those PEPs along the way; there are some updates on those. It seems like a good time to fill in some of the gaps on what will be coming in Python 3.9

String manipulations

Sometimes the simplest (seeming) things are the hardest—or at least provoke an outsized discussion. Much of that was bikeshedding over—what else?—naming, but the idea of adding functions to the standard string objects to remove prefixes and suffixes was fairly uncontroversial. Whether those affixes (a word for both prefixes and suffixes) could be specified as sequences, so more than one affix could be handled in a single call, was less clear cut; ultimately, it was removed from the proposal, awaiting someone else to push that change through the process.

Toward the end of March, Dennis Sweeney asked on the python-dev mailing list for a core developer to sponsor PEP 616 ("String methods to remove prefixes and suffixes"). He pointed to a python-ideas discussion from March 2019 about the idea. Eric V. Smith agreed to sponsor the PEP, which led Sweeney to post it and kick off the discussion. In the original version, he used cutprefix() and cutsuffix() as the names of the string object methods to be added. Four types of Python objects would get the new methods: str (Unicode strings), bytes (binary sequences), bytearray (mutable binary sequences), and collections.UserString (a wrapper around string objects). It would work as follows:

 'abcdef'.cutprefix('abc') # returns 'def' 'abcdef'.cutsuffix('ef') # returns 'abcd'

There were plenty of suggestions in the name department. Perhaps the most widespread agreement was that few liked "cut", so "strip", "trim", and "remove" were all suggested and garnered some support. stripprefix() (and stripsuffix(), of course) seemed to run into opposition due, at least in part, to one of the rationales specified in the PEP; the existing "strip" functions are confusing so reusing that name should be avoided. The str.lstrip() and str.rstrip() methods also remove leading and trailing characters, but they are a source of confusion to programmers actually looking for the cutprefix() functionality. The *strip() calls take a string argument, but treat it as a set of characters that should be eliminated from the front or end of the string:

 'abcdef'.lstrip('abc') # returns 'def' as "expected" 'abcbadefed'.lstrip('abc') # returns 'defed' not at all as expected

Eventually, removeprefix() and removesuffix() seemed to gain the upper hand, which is what Sweeney eventually switched to. It probably did not hurt that Guido van Rossum supported those names as well. Eric Fahlgren amusingly summed up the name fight this way:

I think name choice is easier if you write the documentation first:

cutprefix - Removes the specified prefix. trimprefix - Removes the specified prefix. stripprefix - Removes the specified prefix.

removeprefix - Removes the specified prefix. Duh. :)

Sweeney announced an update to the PEP that addressed a number of comments, but also added the requested ability to take a tuple of strings as an affix (that version can be seen in the PEP GitHub repository). But Steven D'Aprano was not so sure it made sense to do that. He pointed out that the only string operations that take a tuple are str.startswith() and str.endswith(), which do not return a string (just a boolean value). He is leery of adding a method that returns a (potentially changed) version of the string while taking a tuple because whatever rules are chosen on how to process the tuple will be the "wrong" choice for some. For example:

The difficulty here is that the notion of "cut one of these prefixes" is ambiguous if two or more of the prefixes match. It doesn't matter for startswith:
 "extraordinary".startswith(('ex', 'extra'))
since it is True whether you match left-to-right, shortest-to-largest, or even in random order. But for cutprefix, which prefix should be deleted?

As he said, the rule as proposed is that the first matching string processing the tuple left-to-right is used, but some might want the longest match or the last match; it all depends on the context of the use. He suggested that the feature get more "soak time" before committing to adding that behavior: "We ought to get some real-life exposure to the simple case first, before adding support for multiple prefixes/suffixes."

Ethan Furman agreed with D'Aprano. But Victor Stinner was strongly in favor of the tuple-argument idea. He wondered about the proposed behavior, however, when the empty string is passed as part of the tuple. As proposed, encountering the empty string (which effectively matches anything) when processing the tuple would simply return the original string, which leads to surprising results:

cutsuffix("Hello World", ("", " World")) # returns "Hello World"
cutsuffix("Hello World", (" World", "")) # returns "Hello"

The problem is not likely to manifest so obviously; affixes will not necessarily be hard coded so empty strings might slip into unexpected places. Stinner suggested raising ValueError if an empty string is encountered, similar to str.split(). But Sweeney decided to remove the tuple-argument feature entirely to "allow someone else with a stronger opinion about it to propose and defend a set of semantics in a different PEP" He posted the last version of the PEP on March 28.

On April 9, Sweeney opened a steering council issue requesting a review of the PEP. On April 20, Stinner accepted it on behalf of the council. It is a pretty minimal change but worth the time to try to ensure that it has the right interface (and semantics) for the long haul. We will see removeprefix() and removesuffix() in Python 3.9.

New parser

It should not really surprise anyone that the new parser for CPython, covered here in mid-April, has been accepted by the steering council. PEP 617 ("New PEG parser for CPython") was proposed by project founder and former benevolent dictator for life (BDFL) Guido van Rossum, along with Pablo Galindo Salgado and Lysandros Nikolaou; it is already working well and its performance is within 10% of the existing parser in terms of speed and memory use. It will also make the language specification simpler because the parser is based on a parsing expression grammar (PEG). The existing LL(1) parser for CPython suffers from a number of shortcomings and contains some hacks that the new parser will eliminate.

The change paves the way for Python to move beyond having an LL(1) grammar—though the existing language is not precisely LL(1)—down the road. That change will not come soon as the plans are to keep the existing parser available in Python 3.9 behind a command-line switch. But Python 3.10 will remove the existing parser, which could allow language changes. If those kinds of changes are made, however, alternative Python implementations (e.g. PyPy, MicroPython) may need to switch their parsers to something other than LL(1) in order to keep up with the language specification. That might give the core developers pause before making a change of that nature.

And more

We looked at PEP 615 ("Support for the IANA Time Zone Database in the Standard Library") back in early March. It would add a zoneinfo module to the standard library that would facilitate getting time-zone information from the IANA time zone database (also known as the "Olson database") to populate a time-zone object. It was looked on favorably at the time of the article and at the end of March Paul Ganssle asked for a decision on the PEP. He thought it might be amusing to have it accepted (assuming it was) during an interesting time window:

[...] I was hoping (for reasons of whimsy) to get this accepted on Sunday, April 5th either between 02:00-04:00 UTC or between 13:00 and 17:30 UTC, since those times represent ambiguous datetimes somewhere on earth (mostly in Australia). There is one other opportunity for this, which is that on Sunday April 19th, the hours between 01:00 and 03:00 UTC are ambiguous in Western Sahara.

He recognized that it might be difficult to pull off and it certainly was not a priority. The steering council did not miss the second window by much; Barry Warsaw announced the acceptance of the PEP on April 20. Python will now have a mechanism to access the system's time-zone database for creating and handling time zones. In addition, there is a tzdata module in the Python Package Index (PyPI) that contains the IANA data for systems that lack it; it will be maintained by the Python core developers as well.

PEP 593 ("Flexible function and variable annotations") adds a way to associate context-specific metadata with functions and variables. Effectively, the type hint annotations have squeezed out other use cases that were envisioned in PEP 3107 ("Function Annotations") that was implemented in Python 3.0 many years ago. PEP 593 creates a new mechanism for those use cases using the Annotated typehint. Another kind of clean up comes in PEP 585 ("Type Hinting Generics In Standard Collections"). It will allow the removal of a parallel set of type aliases maintained in the typing module in order to support generic types. For example, the typing.List type will no longer be needed to support annotations like "dict[str, list[int]]" (i.e.. a dictionary with string keys and values that are lists of integers).

The dictionary union operation for "addition" will also be part of Python 3.9. It was a bit contentious at times, but PEP 584 ("Add Union Operators To dict") was recommended for acceptance by Van Rossum in mid-February. The steering council promptly agreed and the feature was merged on February 24.

The last PEP on the list is PEP 602 ("Annual Release Cycle for Python"). As it says on the tin, it changes the release cadence from every 18 months to once per year. The development and release cycles overlap, though, so that a full 12 months is available for feature development. Python 3.10 feature development begins when the first Python 3.9 beta has been released—which is now. Stay tuned for the next round of PEPs in the coming year.

Index entries for this article
PythonPython Enhancement Proposals (PEP)

(Log in to post comments)