Thursday, 12 June 2008

Google Treasure Hunt Question 4

Question 4 has been out for a while now and it's taken me a while to blog the solution for a couple of reasons, I've not had much time to work out the solution recently, and this question seems a lot more difficult than the other three (for me at least).

Here's my question this time around:
Find the smallest number that can be expressed as
the sum of 25 consecutive prime numbers,
the sum of 99 consecutive prime numbers,
the sum of 189 consecutive prime numbers,
the sum of 467 consecutive prime numbers,
the sum of 535 consecutive prime numbers,
and is itself a prime number.

For example, 41 is the smallest prime number that can be expressed as
the sum of 3 consecutive primes (11 + 13 + 17 = 41) and
the sum of 6 consecutive primes (2 + 3 + 5 + 7 + 11 + 13 = 41).


First of all I thought I'd write a routine (once again in Perl) to generate prime numbers. I know I'm not entering a competition to find the worlds largest primes so chose to write an optimised solution rather than a super efficient one. The difference here is computational complexity v coding complexity. I chose the simpler code but less efficient solution rather than the more complicated code but efficient solutions offered by algorithms such as the Sieve of Eratosthenes.

sub primes {
my $max = shift || 10;
my @primes = ( 2, 3, 5, 7 );
return @primes if ($max <= 9);
my $loop = 9;
while (scalar(@primes) < $max) {
my $is_prime = 1;
for (my $div = 3; $div < ($loop-1)/2; $div++) {
$is_prime = 0 if ($loop % $div == 0);
}
push (@primes,$loop) if ($is_prime);
$loop += 2;
}
return @primes;
}


Now I had a way of populating an array with prime numbers I thought about the solution a bit more carefully and decided it wasn't likely to be simple to calculate, however long the code I managed to write was. So, I decided to search around for lists of prime numbers and decided to use a list of the first million primes. I could, on reflection, just used my routine to generate 1 million primes and written them to a file instead of generating each time.

The solution I came up with is shown below. It starts by populating an array with the first million primes from the downloaded file. Then the sums of the required continuous number of primes are generated and stored in another array. At this early stage, the solution is now contained in this array (with the assumption the solution exists within the first one million primes of course) so it's just a case of searching the array to find it. In order to find the number, I numerically sort the list. Now it's a simple case of finding the first (and therefore lowest) prime in the new list that is repeated 5 times.

use strict;
use FileHandle;

sub sum_primes {
my $amount = shift;
my $start = 0;
my @sums;

while ($amount < scalar(@_)) {
my $sum = 0;
for (my $i = $start; $i < $amount; $i++) {
$sum += $_[$i];
}
push(@sums,$sum);
$start++;
$amount++;
}
return @sums;
}

sub read_primes {
my $filename = shift;
my $fh = new FileHandle;
$fh->open($filename) || die "$filename: $!\n";
my @primes;
push(@primes,split) while (<$fh>);
$fh->close();
return @primes;
}

sub is_prime {
my $num = shift;
foreach my $prime (@_) {
return 1 if ($num == $prime);
}
return 0;
}

my @primes = read_primes("1000000.txt");
my @sum_list;
push(@sum_list, sum_primes(25,@primes));
push(@sum_list, sum_primes(99,@primes));
push(@sum_list, sum_primes(189,@primes));
push(@sum_list, sum_primes(467,@primes));
push(@sum_list, sum_primes(535,@primes));
@sum_list = sort(@sum_list);
my $prev = 0;
my $same = 0;
foreach my $num (@sum_list) {
if ($num == $prev) {
$same++;
if ($same == 4) {
print "Found $num, checking... ";
if (is_prime($num,@primes)) {
print "PRIME! :-)\n";;
last;
} else {
print "not prime :-(\n";
$same = 0;
}
}
} else {
$same = 0;
}
$prev = $num;
}


This code takes a few minutes to run. I'm sure it's not the smartest solution to the problem, there must be some maths I can use to calculate a solution. Instead, this approach turns the problem into a search solution but it works pretty well and identified the correct answer of 6990493 for my question.

No comments: