# Python Merge Sort vs Insertion Sort

Having a sorted list can make all sorts of operations easier. Creating or maintaining that structure can have a high time complexity. This post briefly explains and explores two approaches to sorting, with differing performance.

Code examples for both sorts and comparing them can be found here, here and here.

## The Data

In both of these algorithms, we’ll be dealing with a Python list of numbers. There are three types of data we’ll run through the algorithms, almost sorted (two items out of place), reverse sorted (descending, when we want ascending) and random. We’ll use an increasing size of list (**n**), to demonstrate the performance of both algorithms as the amount of data scales up.

## Insertion Sort

Insertion sort seems to “grow” the sorted data from the beginning of the array. This algorithm starts at the beginning of an array, sorting elements as it encounters them; to begin with, **key = 0**.

- Work with the element at position
**key**. The current position of the element at**key**will always be stored in a separate variable,**i.** - If there is an element at
**i - 1**and it is larger, then swap them. If the elements were swapped, decrement**i**to keep track of where the element now sits in the array. Repeat this step until the element at**i**is greater than or equal to that at**i - 1**, or**i = 0**. - Increment
**key**and return to step 1. This ensures that the next unsorted element is now worked on.

Insertion sort uses a lot of comparisons, it also only swaps elements when it’s required. In an already sorted list, the only work this algorithm does is comparisons.

## Merge Sort

Merge sorting involves recursively splitting a list of data into halves, until each element sits in its own one element list; this is the base case. Each level of calls then compares two lists, using a “merge” to combine the two into a single sorted list.

The merge takes two sorted lists of numbers, **A** and **B**, which are combined into one sorted list **C**. To begin with, the algorithm would be looking at two one-element **A** and **B** lists (from the base case). It keeps track of progress through **A** and **B**, using counters **i** and **j** respectively.

- Set
**i**and**j**to 0. - Compare
**A**[**i**] and**B**[**j**], append the smaller of the two onto **C **(pick either if they’re equal). - Increment
**i**if the element appended to**C**came from**A**, increment**j**if the element came from**B**. - Repeat steps 2 and 3 until all of either
**A**or**B**has been appended on to**C**, this list is now “exhausted”. - Append all the remaining elements from the list that was not exhausted, onto
**C**. **C**is now a sorted list of elements that are present in**A**and**B**. Return**C**to the calling function, which will also receive another sorted list and start from step 1.

Merge sort goes through this process regardless of how sorted the data already is; if you passed it a fully sorted list, it would still make all the recursive calls, do all the comparisons between **A**’s and **B**’s, and make **C**’s. How much work merge sort does, is therefore only linked to the size of list it operates on.

For those wanting more detail and a graphic depiction of merge sort, the merge sort code example was based on this explanation.

## Differing Performance

The different approaches to sorting data in these algorithms, produces some effects worth noting. Rough timings for different data are available in the comments here, if you’d like to see where these comparisons are coming from.

With limited information, it would be easy to come down on the side of either of these algorithms being “better” than the other. The information I’ve given above, about when each does work, might suggest that insertion sort is better. Conversely, those familiar with big O notation, may jump to the conclusion that merge sort is better (worst case merge sort is * n log n*, insertion is

*). However, as with so many things, it’s not always that clear cut and will depend on the specific application.*

**n**²For constant **n,** merge sort is more consistent in its performance than insertion sort. This is due to it taking the same route to sorting the data, regardless of how sorted the input is; for all three kinds of data tested, it sorted the largest **n** in around 10s.

Also for constant **n**, the performance of insertion sort varies wildly, because it *is* dependent on how sorted the input was. At a data length of 20,480 insertion sort finished in 0.007s (almost sorted), 27s (random) and 69s (reversed). If an application needs to regularly sort random or reversed data, then merge sort would be a better choice.

This is especially true if the application should also deal with a large variance in **n**. Insertion sort performance deteriorates rapidly in worst cases, which is where the big O complexity comes in. For reversed data, doubling **n** from 10,240 to 20,480 quadrupled processing time from 17s to 69s. Merge sort performance doubled for the same increase of **n**, but only to 0.1s. If the data for an application is variable in both size and arrangement, merge sort is going to provide consistent and pretty scalable performance.

In brute force comparisons on random data, merge sort will always outperform insertion sort as **n** grows. However, it’s not difficult to picture a situation where an application has to maintain a sorted list; rather than create one from random data each time. In cases where the values in a sorted list have altered very little, insertion sort can out perform merge sort. On the largest “almost sorted” data (two elements out of place), insertion sort finished in 0.4s compared to merge sort’s consistent ~10s. In this case, an application could initially use a merge sort, followed by insertion sort as the list values are incrementally updated.

## Finally

There’s an awful lot more that could be said about both of these algorithms, including lots of different structures of data and small variations in the code itself. From this brief explanation though, it should already be clear that knowing your application/data well enough to pick the *right* solution, is wiser than declaring one algorithm the *best* in all cases.