Advent of Code: Day 2

This is the second notebook in my posts on the Advent of Code challenges. This notebook in its original format can be found on my github.

Day 2: Corruption Checksum, part I

You are given a table of integers. Find the difference between the maximum and minimum of each row, and add these differences together.

There is not a lot to say about this challenge. The plan is to read the file linewise, compute the difference on each line, and sum them up.

In [1]:
with open("input.txt", "r") as f:
    lines = f.readlines()
lines[0]
Out[1]:
'5048\t177\t5280\t5058\t4504\t3805\t5735\t220\t4362\t1809\t1521\t230\t772\t1088\t178\t1794\n'
In [2]:
l = lines[0]
l = l.split()
l
Out[2]:
['5048',
 '177',
 '5280',
 '5058',
 '4504',
 '3805',
 '5735',
 '220',
 '4362',
 '1809',
 '1521',
 '230',
 '772',
 '1088',
 '178',
 '1794']
In [3]:
def max_minus_min(line):
    '''Compute the difference between the largest and smallest integer in a line'''
    line = list(map(int, line.split()))
    return max(line) - min(line)

def sum_differences(lines):
    '''Sum the value of `max_minus_min` for each line in `lines`'''
    return sum(max_minus_min(line) for line in lines)
In [4]:
testcase = ['5 1 9 5','7 5 3', '2 4 6 8']
assert sum_differences(testcase) == 18
In [5]:
sum_differences(lines)
Out[5]:
58975

Mathematical Interlude

In line with the first day’s challenge, I’m inclined to ask what we should “expect.” But what we should expect is not well-defined in this case. Let us rephrase the problem in a randomized sense.

Suppose we are given a table, $n$ lines long, where each line consists of $m$ elements, that are each uniformly randomly chosen integers from $1$ to $10$. We might ask what is the expected value of this operation, of summing the differences between the maxima and minima of each row, on this table. What should we expect?

As each line is independent of the others, we are really asking what is the expected value across a single row. So given $m$ integers uniformly randomly chosen from $1$ to $10$, what is the expected value of the maximum, and what is the expected value of the minimum?

 

Expected Minimum

Let’s begin with the minimum. The minimum is $1$ unless all the integers are greater than $2$. This has probability
$$ 1 – \left( \frac{9}{10} \right)^m = \frac{10^m – 9^m}{10^m}$$
of occurring. We rewrite it as the version on the right for reasons that will soon be clear.
The minimum is $2$ if all the integers are at least $2$ (which can occur in $9$ different ways for each integer), but not all the integers are at least $3$ (each integer has $8$ different ways of being at least $3$). Thus this has probability
$$ \frac{9^m – 8^m}{10^m}.$$
Continuing to do one more for posterity, the minimum is $3$ if all the integers are at least $3$ (each integer has $8$ different ways of being at least $3$), but not all integers are at least $4$ (each integer has $7$ different ways of being at least $4$). Thus this has probability

$$ \frac{8^m – 7^m}{10^m}.$$

And so on.

Recall that the expected value of a random variable is

$$ E[X] = \sum x_i P(X = x_i),$$

so the expected value of the minimum is

$$ \frac{1}{10^m} \big( 1(10^m – 9^m) + 2(9^m – 8^m) + 3(8^m – 7^m) + \cdots + 9(2^m – 1^m) + 10(1^m – 0^m)\big).$$

This simplifies nicely to

$$ \sum_ {k = 1}^{10} \frac{k^m}{10^m}. $$

Expected Maximum

The same style of thinking shows that the expected value of the maximum is

$$ \frac{1}{10^m} \big( 10(10^m – 9^m) + 9(9^m – 8^m) + 8(8^m – 7^m) + \cdots + 2(2^m – 1^m) + 1(1^m – 0^m)\big).$$

This simplifies to

$$ \frac{1}{10^m} \big( 10 \cdot 10^m – 9^m – 8^m – \cdots – 2^m – 1^m \big) = 10 – \sum_ {k = 1}^{9} \frac{k^m}{10^m}.$$

Expected Difference

Subtracting, we find that the expected difference is

$$ 9 – 2\sum_ {k=1}^{9} \frac{k^m}{10^m}. $$

From this we can compute this for each list-length $m$. It is good to note that as $m \to \infty$, the expected value is $9$. Does this make sense? Yes, as when there are lots of values we should expect one to be a $10$ and one to be a $1$. It’s also pretty straightforward to see how to extend this to values of integers from $1$ to $N$.

Looking at the data, it does not appear that the integers were randomly chosen. Instead, there are very many relatively small integers and some relatively large integers. So we shouldn’t expect this toy analysis to accurately model this problem — the distribution is definitely not uniform random.
But we can try it out anyway.

In [6]:
# We see the table is 16 lines long
len(lines)
Out[6]:
16
In [7]:
# And a generic line is 16 numbers long
len(lines[0].split())
Out[7]:
16

The largest integer in the table is a bit less than $7000$, so I’ll pretend that the integers are chosen uniformly randomly from $1$ to $7000$. There are $16$ integers per row, and there are $16$ rows. We now compute the expected value of each row.

In [8]:
total = 6999
for k in range(7000):
    total = total - 2* (k/7000)**16
In [9]:
total
Out[9]:
6176.470207282953

The expected value of the table is $16$ times this.

In [10]:
16 * total
Out[10]:
98823.52331652725

That’s significantly larger than the actual problem. I attribute much of this difference to the fact that the large integers are much sparser than they would be in the random scenario. For instance, inspection reveals that the fourth row of the given data are all below $1000$.

Part 2

In the table, each row has exactly one pair of integers that evenly divides each other. Find the sum of the quotients.

My plan is straightforward. For each line, go through each element and determine if it is the dividend or divisor in a perfection fraction. Once we’ve found a pair, we compute the quotient, and add these quotients together.

In [11]:
def find_quotient_in_line(line):
    '''
    Finds a pair of integers which divides each other in line.
    Returns the quotient.
    '''
    line = list(map(int, line.split()))
    for i, elem in enumerate(line):
        for num in line[i+1:]:
            if elem%num == 0:
                return elem/num
            if num%elem == 0:
                return num/elem
    raise KeyError('No divisor relationship found in line.')

def sum_quotients(lines):
    '''Sum the value of `find_quotient_in_line` for each line in `lines`'''
    return sum(find_quotient_in_line(line) for line in lines)
In [12]:
testcase = ['5 9 2 8', '9 4 7 3', '3 8 6 5']
assert find_quotient_in_line(testcase[0]) == 4
assert sum_quotients(testcase) == 9
In [13]:
sum_quotients(lines)
Out[13]:
308.0
This entry was posted in Expository, Programming, Python and tagged , , . Bookmark the permalink.

Leave a Reply