I thoroughly enjoyed reading through Peter Norvig’s extraordinarily clean and nice solutions to the Advent of Code challenge last year. Inspired by his clean, literate programming style and the convenience of jupyter notebook demonstrations, I will look at several of these challenges in my own jupyter notebooks.
My background and intentions aren’t the same as Peter Norvig’s: his expertise dwarfs mine. And timezones are not kind to those of us in the UK, and thus I won’t be competing for a position on the leaderboards. These are to be fun. And sometimes there are tidbits of math that want to come out of the challenges.
Enough of that. Let’s dive into the first day.
with open('input.txt', 'r') as f:
seq = f.read()
seq = seq.strip()
seq[:10]
def sum_matched_digits(s):
"Sum of digits which match following digit, and first digit if it matches last digit"
total = 0
for a,b in zip(s, s[1:]+s[0]):
if a == b:
total += int(a)
return total
They provide a few test cases which we use to test our method against.
assert sum_matched_digits('1122') == 3
assert sum_matched_digits('1111') == 4
assert sum_matched_digits('1234') == 0
assert sum_matched_digits('91212129') == 9
For fun, this is a oneline version.
def sum_matched_digits_oneliner(s):
return sum(int(a) if a == b else 0 for a,b in zip(s, s[1:]+s[0]))
assert sum_matched_digits_oneliner('1122') == 3
assert sum_matched_digits_oneliner('1111') == 4
assert sum_matched_digits_oneliner('1235') == 0
assert sum_matched_digits_oneliner('91212129') == 9
For more fun, this is a regex version.
import regex
def sum_matched_digits_regex(s):
matches = map(int, regex.findall(r'(\d)\1', s, overlapped=True))
total = sum(matches)
if s[0] == s[-1]:
total += int(s[0])
return total
assert sum_matched_digits_regex('1122') == 3
assert sum_matched_digits_regex('1111') == 4
assert sum_matched_digits_regex('1235') == 0
assert sum_matched_digits_regex('91212129') == 9
Regardless of which one we use, we find the answer.
print(sum_matched_digits(seq))
print(sum_matched_digits_oneliner(seq))
print(sum_matched_digits_regex(seq))
I wonder: is there any sort of time difference between these?
%timeit sum_matched_digits(seq)
%timeit sum_matched_digits_oneliner(seq)
%timeit sum_matched_digits_regex(seq)
I don’t know why the oneliner appears to be must slower than the first version. Does this stay true for longer sequences?
import random
randseq = ''
for i in range(10**7):
randseq += str(random.randint(0,9))
randseq[:10]
%timeit -n5 sum_matched_digits(randseq)
%timeit -n5 sum_matched_digits_oneliner(randseq)
%timeit -n5 sum_matched_digits_regex(randseq)
Yes, the difference in speed is real between the oneline version of the loop and the longform version of the loop. And the regex version and oneline versions appear to converge. I wonder why?
sum_matched_digits(randseq)
Mathematical Interlude¶
We can compute what we expect the value to be for a random string of digits $d$. Assuming that each digit is randomly selected, we should expect that it has probability $1/10$ of matching the subsequent digit. Thus the expected contribution from each digit is (its value) $\times \frac{1}{10}$. The digit itself is $0$ with probability $0.1$, and $1$ with probability $0.1$, and so on. This becomes
$$ \sum_{d = 0}^{10 – 1} \frac{d}{10} \times \frac{1}{10} = \frac{10(10-1)}{2 \cdot 10^2} = \frac{9}{20} = 0.45.$$
If there are $n$ (random) digits, then we expect the sum of the digits which match the subsequent digit to be $0.45 n$.
In this case, there are $10^7$ digits, and we should expect the sum to be $0.45 \cdot 10^7 = 4.5 \cdot 10^6$. How close are we?
abs(sum_matched_digits(randseq) - 4.5 * 10**6)
That’s really, really close. How does this apply to the Advent of Code Day 1 problem?
0.45 * len(seq)
Today’s problem is about 10 percent more than what we might expect to occur by chance. That’s still pretty close, though.
Part 2¶
For the second part of the problem, we are tasked with finding the sum of those digits which match the digits half-way away from the string. This only makes sense on even length strings.
It’s easy enough to modify the loop to do this.
def sum_matched_digits_with_sep(s, sep):
"Sum of digits which match the digit sep digits later"
total = 0
for a,b in zip(s, s[sep:]+s[:sep]):
if a == b:
total += int(a)
return total
assert sum_matched_digits_with_sep('1212', 2) == 6
assert sum_matched_digits_with_sep('1221', 2) == 0
assert sum_matched_digits_with_sep('123425', 3) == 4
assert sum_matched_digits_with_sep('123123', 3) == 12
assert sum_matched_digits_with_sep('12131415', 4) == 4
sum_matched_digits_with_sep(seq, len(seq)//2)
The one-liner can be similarly written. What about the regex?
We want to identify a digit, skip sep - 1
digits, and then check to see if the subsequent digit matches.
In principle, we need to worry about wrapping around the string. But we notice that not wrapping around misses exactly half of the matches, so we just double the non-wrapped answer. This leads to the following.
import regex
def sum_matched_digits_with_sep_regex(s, sep):
matches = map(int, regex.findall(r'(\d)\d{}\1'.format("{"+str(sep-1)+"}"), s, overlapped=True))
total = 2*sum(matches)
return total
I don’t think I’ve ever defined a regex “function” in this way. I don’t particularly like how it interacts with python’s string formatting.
assert sum_matched_digits_with_sep_regex('1212', 2) == 6
assert sum_matched_digits_with_sep_regex('1221', 2) == 0
assert sum_matched_digits_with_sep_regex('123425', 3) == 4
assert sum_matched_digits_with_sep_regex('123123', 3) == 12
assert sum_matched_digits_with_sep_regex('12131415', 4) == 4
sum_matched_digits_with_sep_regex(seq, len(seq)//2)
Mathematical Interlude, Part 2¶
It is interesting to note that the expected value is the same as in the consecutive digit case. This is because the probability that two randomly chosen digits agree has nothing to do with the location of the digits. One random digit is as good as another.
I will instead note that a similar calculation as above shows that the expected value depends also on the base involved. We arrived at the value $n \times 9/20 = n \times (10 – 1)/2 \cdot 10$ for an $n$ digit number written in base $10$.
For an $n$ digit number written in base $b$, the expected value is
$$ n \cdot \frac{b-1}{2b}.$$
This increases as the base increases, and tends towards $n/2$.
This is a good introduction to coding in a computer language. It’s a little complex, so practice and patience are in order.