Reading through the `comp.lang.python`

mailing list, I saw an interesting question concerning the behavior of the default sorting algorithm in python. This led to this post.

Python uses timsort, a clever hybrid sorting algorithm with ideas borrowing from merge sort and (binary) insertion sort. A major idea in timsort is to use the structure of naturally occuring runs (consecutive elements in the list that are either monotone increasing or monotone decreasing) when sorting.

Let’s look at the following simple list.

`10, 20, 5`

A simple sorting algorithm is *insertion sort*, which just advances through the list and inserts each number into the correct spot. More explicitly, insertion sort would

- Start with the first element,
`10`

. As a list with one element, it is correctly sorted tautologically. - Now consider the second element,
`20`

. We insert this into the correct position in the already-sorted list of previous elements. Here, that just means that we verify that`20 > 10`

, and now we have the sorted sublist consisting of`10, 20`

. - Now consider the third element,
`5`

. We want to insert this into the correct position in the already-sorted list of previous elements. A naively easy way to do this is to either scan the list from the right or from the left, and insert into the correct place. For example, scanning from the right would mean that we compare`5`

to the last element of the sublist,`20`

. As`5 < 20`

, we shift left and compare`5`

to`10`

. As`5 < 10`

, we shift left again. As there is nothing left to compare against, we insert`5`

at the beginning, yielding the sorted list`5, 10, 20`

.

How many comparisons did this take? This took `20 > 10`

, `5 < 20`

, and `5 < 10`

. This is three comparisons in total.

We can see this programmatically as well. Here is one implementation of insertion_sort, as described above.

```
def insertion_sort(lst):
'''
Sorts `lst` in-place. Note that this changes `lst`.
'''
for index in range(1, len(lst)):
current_value = lst[index]
position = index
while position > 0 and lst[position - 1] > current_value:
lst[position] = lst[position - 1]
position = position - 1
lst[position] = current_value
```

Let’s also create a simple `Number`

class, which is just like a regular number, except that anytime a comparison is done it prints out the comparison. This will count the number of comparisons made for us.

```
class Number:
def __init__(self, value):
self.value = value
def __str__(self):
return str(self.value)
def __repr__(self):
return self.__str__()
def __lt__(self, other):
if self.value < other.value:
print("{} < {}".format(self, other))
return True
print("{} >= {}".format(self, other))
return False
def __eq__(self, other):
if self.value == other.value:
print("{} = {}".format(self, other))
return True
return False
def __gt__(self, other):
return not ((self == other) or (self < other))
def __le__(self, other):
return (self < other) or (self == other)
def __ge__(self, other):
return not (self < other)
def __nq__(self, other):
return not (self == other)
```

With this class and function, we can run

```
lst = [Number(10), Number(20), Number(5)]
insertion_sort(lst)
print(lst)
```

which will print

```
10 < 20
20 >= 5
10 >= 5
[5, 10, 20]
```

These are the three comparisons we were expecting to see.

Returning to python’s timsort — what happens if we call python’s default sorting method on this list? The code

```
lst = [Number(10), Number(20), Number(5)]
lst.sort()
print(lst)
```

prints

```
20 >= 10
5 < 20
5 < 20
5 < 10
[5, 10, 20]
```

There are *four* comparisons! And weirdly, the method checks that `5 < 20`

twice in a row. What’s going on there?^{1}

At its heart, this was at the core of the thread on comp.lang.python. Why are there extra comparisons in cases like this?

Poking around the implementation of timsort taught me a little bit more about timsort.^{2}

Timsort approaches this sorting task in the following way.

- First, timsort tries to identify how large the first run within the sequence is. So it keeps comparing terms until it finds one that is out of order. In this case, it compares
`20`

to`10`

(finding that`20 > 10`

, and thus the run is increasing), and then compares`5`

to`20`

(finding that`5 < 20`

, and thus that`5`

is not part of the same run as`10, 20`

). Now the run is identified, and there is one element left to incorporate. - Next timsort tries to insert
`5`

into the already-sorted run. It is more correct to say that timsort attempts to do a binary insertion, since one knows already that the run is sorted.^{3}In this binary insertion, timsort will compare`5`

with the middle of the already-sorted run`10, 20`

. But this is a list of length 2, so what is its middle element? It turns out that timsort takes the latter element,`20`

, in this case. As`5 < 20`

, timsort concludes that`5`

should be inserted somewhere in the first half of the run`10, 20`

, and not in the second half. - Of course, the first half consists entirely of
`10`

. Thus the remaining comparison is to check that`5 < 10`

, and now the list is sorted.

We count^{4} all four of the comparisons. The doubled comparison is due to the two tasks of checking whether `5`

is in the same run as `10, 20`

, and then of deciding through binary insertion where to place `5`

in the smaller sublist of `10, 20`

.

Now that we’ve identified a doubled comparison, we might ask *Why is it this way?* Is this something that should change?

The short answer is *it doesn’t really matter.* A longer answer is that to apply this in general would cause additional comparisons to be made, since this applies in the case when the last element of the run agrees in value with the central value of the run (which may occur for longer lists if there are repeated values). Checking that this happens would probably either involve comparing the last element of the run with the central value (one extra comparison, so nothing is really saved anyway), or perhaps adding another data structure like a skip list (which seems sufficiently more complicated to not be worth the effort). Or it would only apply when sorting really short lists, in which case there isn’t much to worry about.

Learning a bit more about timsort made me realize that I could probably learn a lot by really understanding an implementation of timsort, or even a slightly simplified implementation. It’s a nice reminder that one can choose to optimize for certain situations or behaviors, and this might not cover all cases perfectly — and that’s ok.

#### Footnotes

- Perhaps this is a case where there are multiple cores, and the second core is like the younger brother. It looks up at its older, wiser, first core brother and says things like
*are you*That’s definitely how multiple cores works, right?**sure**that 5 < 20, really? - Such as the fact that the implementation is far more nontrivial than I thought it would be. It’s a significant chunk of code that optimizes for many different setups. I understand now why the java and python implementations of timsort diverged.
- Although I should also admit that this isn’t a completely correct description either. Timsort does a sort of merge sort on consecutive runs as well, but I ignore this aspect by choosing a sufficiently small example.
- and account for