Advent of Code: Day 1

I thoroughly enjoyed reading through Peter Norvig’s extraordinarily clean and nice solutions to the Advent of Code challenge last year. Inspired by his clean, literate programming style and the convenience of jupyter notebook demonstrations, I will look at several of these challenges in my own jupyter notebooks.

My background and intentions aren’t the same as Peter Norvig’s: his expertise dwarfs mine. And timezones are not kind to those of us in the UK, and thus I won’t be competing for a position on the leaderboards. These are to be fun. And sometimes there are tidbits of math that want to come out of the challenges.

Enough of that. Let’s dive into the first day.

Day 1: Inverse Captcha, Part 1

Given a sequence of digits, find the sum of those digits which match the following digit. The sequence is presumed circular, so the first digit may match the last digit.

This would probably be done the fastest by looping through the sequence.

In [1]:
with open('input.txt', 'r') as f:
    seq = f.read()
seq = seq.strip()
seq[:10]
Out[1]:
'1118313623'
In [2]:
def sum_matched_digits(s):
    "Sum of digits which match following digit, and first digit if it matches last digit"
    total = 0
    for a,b in zip(s, s[1:]+s[0]):
        if a == b:
            total += int(a)
    return total

They provide a few test cases which we use to test our method against.

In [3]:
assert sum_matched_digits('1122') == 3
assert sum_matched_digits('1111') == 4
assert sum_matched_digits('1234') == 0
assert sum_matched_digits('91212129') == 9

For fun, this is a oneline version.

In [4]:
def sum_matched_digits_oneliner(s):
    return sum(int(a) if a == b else 0 for a,b in zip(s, s[1:]+s[0]))
In [5]:
assert sum_matched_digits_oneliner('1122') == 3
assert sum_matched_digits_oneliner('1111') == 4
assert sum_matched_digits_oneliner('1235') == 0
assert sum_matched_digits_oneliner('91212129') == 9

For more fun, this is a regex version.

In [6]:
import regex

def sum_matched_digits_regex(s):
    matches = map(int, regex.findall(r'(\d)\1', s, overlapped=True))
    total = sum(matches)
    if s[0] == s[-1]:
        total += int(s[0])
    return total
In [7]:
assert sum_matched_digits_regex('1122') == 3
assert sum_matched_digits_regex('1111') == 4
assert sum_matched_digits_regex('1235') == 0
assert sum_matched_digits_regex('91212129') == 9

Regardless of which one we use, we find the answer.

In [8]:
print(sum_matched_digits(seq))
print(sum_matched_digits_oneliner(seq))
print(sum_matched_digits_regex(seq))
1044
1044
1044

I wonder: is there any sort of time difference between these?

In [9]:
%timeit sum_matched_digits(seq)
147 µs ± 9.24 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [10]:
%timeit sum_matched_digits_oneliner(seq)
229 µs ± 31.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [11]:
%timeit sum_matched_digits_regex(seq)
230 µs ± 33 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

I don’t know why the oneliner appears to be must slower than the first version. Does this stay true for longer sequences?

In [12]:
import random
randseq = ''
for i in range(10**7):
    randseq += str(random.randint(0,9))
randseq[:10]
Out[12]:
'3443169366'
In [13]:
%timeit -n5 sum_matched_digits(randseq)
618 ms ± 8.37 ms per loop (mean ± std. dev. of 7 runs, 5 loops each)
In [14]:
%timeit -n5 sum_matched_digits_oneliner(randseq)
1.03 s ± 42.7 ms per loop (mean ± std. dev. of 7 runs, 5 loops each)
In [15]:
%timeit -n5 sum_matched_digits_regex(randseq)
974 ms ± 27.2 ms per loop (mean ± std. dev. of 7 runs, 5 loops each)

Yes, the difference in speed is real between the oneline version of the loop and the longform version of the loop. And the regex version and oneline versions appear to converge. I wonder why?

In [16]:
sum_matched_digits(randseq)
Out[16]:
4500592

Mathematical Interlude

We can compute what we expect the value to be for a random string of digits $d$. Assuming that each digit is randomly selected, we should expect that it has probability $1/10$ of matching the subsequent digit. Thus the expected contribution from each digit is (its value) $\times \frac{1}{10}$. The digit itself is $0$ with probability $0.1$, and $1$ with probability $0.1$, and so on. This becomes
$$ \sum_{d = 0}^{10 – 1} \frac{d}{10} \times \frac{1}{10} = \frac{10(10-1)}{2 \cdot 10^2} = \frac{9}{20} = 0.45.$$
If there are $n$ (random) digits, then we expect the sum of the digits which match the subsequent digit to be $0.45 n$.

In this case, there are $10^7$ digits, and we should expect the sum to be $0.45 \cdot 10^7 = 4.5 \cdot 10^6$. How close are we?

In [17]:
abs(sum_matched_digits(randseq) - 4.5 * 10**6)
Out[17]:
592.0

That’s really, really close. How does this apply to the Advent of Code Day 1 problem?

In [18]:
0.45 * len(seq)
Out[18]:
953.1

Today’s problem is about 10 percent more than what we might expect to occur by chance. That’s still pretty close, though.

Part 2

For the second part of the problem, we are tasked with finding the sum of those digits which match the digits half-way away from the string. This only makes sense on even length strings.

It’s easy enough to modify the loop to do this.

In [19]:
def sum_matched_digits_with_sep(s, sep):
    "Sum of digits which match the digit sep digits later"
    total = 0
    for a,b in zip(s, s[sep:]+s[:sep]):
        if a == b:
            total += int(a)
    return total
In [20]:
assert sum_matched_digits_with_sep('1212', 2) == 6
assert sum_matched_digits_with_sep('1221', 2) == 0
assert sum_matched_digits_with_sep('123425', 3) == 4
assert sum_matched_digits_with_sep('123123', 3) == 12
assert sum_matched_digits_with_sep('12131415', 4) == 4
In [21]:
sum_matched_digits_with_sep(seq, len(seq)//2)
Out[21]:
1054

The one-liner can be similarly written. What about the regex?

We want to identify a digit, skip sep - 1 digits, and then check to see if the subsequent digit matches.
In principle, we need to worry about wrapping around the string. But we notice that not wrapping around misses exactly half of the matches, so we just double the non-wrapped answer. This leads to the following.

In [22]:
import regex

def sum_matched_digits_with_sep_regex(s, sep):
    matches = map(int, regex.findall(r'(\d)\d{}\1'.format("{"+str(sep-1)+"}"), s, overlapped=True))
    total = 2*sum(matches)
    return total

I don’t think I’ve ever defined a regex “function” in this way. I don’t particularly like how it interacts with python’s string formatting.

In [23]:
assert sum_matched_digits_with_sep_regex('1212', 2) == 6
assert sum_matched_digits_with_sep_regex('1221', 2) == 0
assert sum_matched_digits_with_sep_regex('123425', 3) == 4
assert sum_matched_digits_with_sep_regex('123123', 3) == 12
assert sum_matched_digits_with_sep_regex('12131415', 4) == 4
In [24]:
sum_matched_digits_with_sep_regex(seq, len(seq)//2)
Out[24]:
1054

Mathematical Interlude, Part 2

It is interesting to note that the expected value is the same as in the consecutive digit case. This is because the probability that two randomly chosen digits agree has nothing to do with the location of the digits. One random digit is as good as another.

I will instead note that a similar calculation as above shows that the expected value depends also on the base involved. We arrived at the value $n \times 9/20 = n \times (10 – 1)/2 \cdot 10$ for an $n$ digit number written in base $10$.
For an $n$ digit number written in base $b$, the expected value is
$$ n \cdot \frac{b-1}{2b}.$$
This increases as the base increases, and tends towards $n/2$.

This entry was posted in Expository, Programming, Python and tagged , , . Bookmark the permalink.

One Response to Advent of Code: Day 1

  1. Tinm says:

    This is a good introduction to coding in a computer language. It’s a little complex, so practice and patience are in order.

Leave a Reply