Mandy's Tech Blog

Since the code snippets in my previous post consisted mostly of vague hand-waving, I thought it would be good to spend some time today showing how Iterate() can be useful in common algorithms. Consider a fairly standard prime number generator:

public static class Primes 
{
	public static IEnumerable<ulong> GetPrimes(ulong max)
	{
		var list = new List<ulong> { 2 };
		for (ulong candidate = 3; candidate < max; candidate += 2)
		{
			var sqrt = (ulong)Math.Sqrt(candidate);
			bool isPrime = true;
			foreach (var prime in list)
			{
				if (prime > sqrt)
					break;

				if (candidate % prime == 0)
				{
					isPrime = false;
					break;
				}
			}

			if (isPrime)
				list.Add(candidate);
		}

		return list;
	}
}

Surely, one could provide additional optimizations to this routine, but it will serve our purposes as it is. So where do we start? The easiest simplification is to get rid of the inner loop because it is already characterized as an operation on a collection. Let's begin by moving the first breaking condition out of the loop:

var smallPrimes = list.TakeWhile(n => n <= sqrt);
foreach (var prime in smallPrimes)
{
	if (candidate % prime == 0)
	{
		isPrime = false;
		break;
	}
}

Having done this, it now becomes clear that we are verifying a condition against all elements of a collection. We can rewrite this as:

bool isPrime = list.TakeWhile(n => n <= sqrt).All(n => candidate % n != 0);

Of course, now that we're only using isPrime in one place after its definition, we may as well not have it as a temporary variable:

for (ulong candidate = 3; candidate < max; candidate += 2)
{
	var sqrt = (ulong)Math.Sqrt(candidate);

	if (list.TakeWhile(n => n <= sqrt).All(n => candidate % n != 0))
		list.Add(candidate);
}

These are fine modifications, but they may have left you wondering when Iterate() becomes useful. To demonstrate this, let's begin by applying it in the most obvious place:

public static IEnumerable<ulong> GetPrimes(ulong max)
{
	var list = new List<ulong> { 2 };

	var candidates = EnumerableUtility.Iterate<ulong>(3, n => n <= max, n => n + 2);
	foreach (var candidate in candidates)
	{
		var sqrt = (ulong)Math.Sqrt(candidate);

		if (list.TakeWhile(n => n <= sqrt).All(n => candidate % n != 0))
			list.Add(candidate);
	}

	return list;
}

Well, that doesn't make our code clearer, shorter, or more efficient! However, having a foreach loop does make it clearer that we are performing a filter operation on the collection before actually doing anything with the values:

public static IEnumerable<ulong> GetPrimes(ulong max)
{
	var list = new List<ulong> { 2 };

	var candidates = EnumerableUtility.Iterate<ulong>(3, n => n <= max, n => n + 2);

	var primes = candidates.Where(
		candidate =>
		{
			var sqrt = (ulong)Math.Sqrt(candidate);
			return list.TakeWhile(n => n <= sqrt).All(n => candidate % n != 0);
		});

	foreach (var prime in primes)
		list.Add(prime);

	return list;
}

As an aside, if you're not yet comfortable with closures, you may be a bit unsettled by the ordering of statements in this snippet: it appears as if we are checking the list of primes before we have actually populated it! Rest assured that this is not the case: the Where() function, as it is implemented in System.Core.Enumerable, does not calculate the next value in its result until that value is actually requested. Because we never request a new value until we've added all previous values to the list, we have nothing to fear (note, on the other hand, that other implementations of Where(), such as the one included the upcoming ParallelFX library, may not share this implementation detail, so be sure you're using the right one!).

It would seem that we are near the end of possible refactorings for this routine. However, to believe this is to forget that we are not required to calculate all the values in the list ourselves! Getting rid of list, we finally arrive at this:

public static IEnumerable<ulong> GetPrimes(ulong max)
{
  var knownPrimes = new ulong[] { 2, 3 };

  var candidates = EnumerableUtility.Iterate<ulong>(5, n => n <= max, n => n + 2);

  IEnumerable<ulong> primes = null;
  var computedPrimes = candidates.Where(
    candidate =>
    {
      var sqrt = (ulong)Math.Sqrt(candidate);
      return primes.TakeWhile(n => n <= sqrt).All(n => candidate % n != 0);
    });

  primes = knownPrimes.Concat(computedPrimes);

  return primes;
}

Once again, we have a rather clever use of closures, but I hope this one isn't quite as jarring as the last one might have been. While it may at first appear that we are using primes before it has been assigned a real value, recall that the evaluation of Where() does not occur until values are requested from it, which no longer even happens in this function.

It is also perhaps worth noting that our list of known primes has expanded to include 3. The reason for this is that failure to do so would cause infinite recursion when attempting to access the first value in computedPrimes. If it doesn't seem obvious to you, try it and see for yourself :).

One significant benefit to this latest refactoring is that we now only calculate each prime upon demand. Because of this, we may as well compute all primes from 1 to ulong.MaxValue, and forget about the max parameter:

public static IEnumerable<ulong> GetPrimes()
{
	var knownPrimes = new ulong[] { 2, 3 };

	var candidates = EnumerableUtility.Iterate<ulong>(3, n => n != ulong.MaxValue, n => n + 2).Select(n => n + 2);

	IEnumerable<ulong> primes = null;
	var computedPrimes = candidates.Where(
		candidate =>
		{
			var sqrt = (ulong)Math.Sqrt(candidate);
			return primes.TakeWhile(n => n <= sqrt).All(n => candidate % n != 0);
		});

	primes = knownPrimes.Concat(computedPrimes);

	return primes;
}

Aside from removing max, our only significant change has been to candidates, which is now defined as all odd numbers from 5 to ulong.MaxValue (the trickiness with the call to Select() is to keep ourselves from trying to go past MaxValue).

While we now have some beautiful code, you may have noticed that the last two snippets run significantly more slowly than their fully imperative counterparts. The reason for this is that we are no longer caching our prime numbers, so whenever we have to iterate through the list, all primes must be recalculated. Does this mean that we are doomed? Certainly not! In fact, a very minor change to this code is all that is necessary, but the details will have to wait until the next post.

Technorati Tags: functional programming,C# 3.0,.net

Beyond Loops

Lately, I've been reading quite a bit about the functional programming concepts that are finally coming to C#. The benefits of this new hybrid (imperative/functional) style are numerous, from greater readability (for those who grasp the new idioms, anyway) to more convenient parallelization. However, I've seen very little about the advantage that excites me most: eliminating loops.

What?!

Okay, I'm not really advocating that we actually stop writing code that contains loops—after all, the most reasonable alternative to loops is recursion, and recursion hurts my brain. I am instead suggesting that the time has come for most of our loops to be abstracted away, and here is my first attempt at hiding them:

public static class EnumerableUtility
{
	public static IEnumerable<T> Iterate<T>(T initial, Func<T, bool> fnContinue, Func<T, T> fnNext)
	{
		for (T current = initial; fnContinue(current); current = fnNext(current))
			yield return current;
	}

	public static void ApplyToAll<T>(this IEnumerable<T> list, Action<T> fnAction)
	{
		foreach (T item in list)
			fnAction(item);
	}
}

(Note: Did It With .NET defines a similar function, Sequence(), that operates similarly but specifies the arguments in a different order. I prefer my function, but mine could be replaced by his by calling Sequence(initial, fnContinue, fnNext). Either way, the effect is the same.)

Why is this so great? All I really did was put a couple loops into a function, right? Well…sort of, but if you'll stick with me, I think that you'll see that these functions have some significant advantages.

The Problem

In order to compute very much that is of interest to anyone, a language must support sequence, choice, and repetition. When taken in isolation, each of these concepts is easy to grasp, but when combined, they can very quickly exceed the limits of human comprehension. Personally, I find that a few big loops will tax my abilities faster than either of the other two structures, but I also find that repetition tends to be the least frequently abstracted of the three basic constructs.

New Paradigm

Let's look at some code:

var somethingOrOther = initialValue;
while (SomeCondition(somethingOrOther))
{
	if (SomeOtherCondition(somethingOrOther))
		DoStuff(somethingOrOther);
	somethingOrOther = GetNext(somethingOrOther);
}

Or its common counterpart:

for (var somethingOrOther = initialValue; SomeCondition(somethingOrOther); somethingOrOther = GetNext(somethingOrOther)))
{
  if (SomeOtherCondition(somethingOrOther))
    DoStuff(somethingOrOther);
}

The details and complexity vary, but I've seen code like this many times in my (admittedly short) career. We've been trained to accept such things as normal, but is this really as good as it gets? Here are some of the issues I see:

If the code is sufficiently complex, it becomes difficult to determine with certainty where somethingOrOther is assigned—especially if the original developer was undisciplined.
In the case of the first example, we have an iterator left outside the scope of the loop, which is just asking for trouble.
We're mixing repetition and choice, which will lead to confusion as the complexity of the loop grows.

However, the operations defined in EnumerableUtility allow us to think of this as a series of operations on a list, rather than as a loop:

var items = EnumerableUtility.Iterate(initialValue, SomeCondition, GetNext);
items = items.Where(SomeOtherCondition);
items.ApplyToAll(DoStuff);

How is this better? First of all, the second and third operations now appear as a single operation or a list, rather than as a series of operations or individual items—and I find this to be easier on my mental model. Second, we no longer have to concern ourselves about strange assignments to our iterator because the iterator itself has been abstracted away! Finally, having all of the primary operations (enumerating the values, filtering them, and performing an operation on the remaining values) separated from each other allows us to consider each in isolation, which is significantly easier on the brain.

Next time, I intend to introduce some common algorithms and show how using lists instead of loops can simplify their implementation.

And now for something completely different!

I just had to point out this bit of feedback for Visual Studio 2008 Beta 2. Did you notice the issue when you clicked through the license?

Subversion on the Macintosh

Prerequisites

Move Apache 1.3 to Apache 2. This is actually much less painful that it sounds, but it's quite necessary because Subversion doesn't support anything less than Apache 2. You can try running multiple versions of Apache from one box if you like, but in my case, I found that it was easier to just make the switch. If we're lucky, Apple will put Apache 2 on Leopard, and we won't ever have to worry about this part again. Here's how you get Apache 2 on your machine:
1. Run FinkCommander (you got that back when you were reading my first article on .NET development, right?)
2. Install the apache2 package. That wasn't so hard, was it?
If you're also serving up ASP.NET on this machine, rebuild mod_mono (discussed in part 2 of my .NET development series). When you run the configuration script, add --with-apxs=/sw/bin/apxs2 to the command line arguments.
1. If you've installed a CruiseControl.NET dashboard (as discussed in part 3 of the .NET development series), then at the end of /sw/etc/apache2/httpd.conf, add:

Alias /ccnet "/web/ccnet"
AddMonoApplications default "/ccnet:/web/ccnet"
<Location /ccnet>
	SetHandler mono
</Location>

Installing Subversion

Run FinkCommander and install the libapache2-mod-svn package.
Run sudo mkdir /svn, where /svn is the path where you want to store your repository.
Run sudo svnadmin create /svn.
Run sudo chown -R www /svn, where www is the name of the user the Apache uses.
Edit /sw/etc/apache2/mods-enabled/dav_svn.conf; the comments should explain what you need to do here.
Restart Apache (/sw/etc/sbin/apache2ctl -k restart).

At this point, you should have a pretty basic Subversion server set up at http://your-machine/svn. Of course, there are all sorts of other things that you can do to your repository, like set it up for secure access, put it on a different port, and what-have-you, but I haven't had need for any of these features (I'm running on a small, personal network). However, I would suspect that setting up these features would not vary as much from platform to platform as the initial setup procedure does, so try Google if you need that stuff.

Caveats

Sadly, I lost my notes on how to get Apache 2 to start up at boot. However, I did find this forum posting that suggests that it can be accomplished by replacing the Apache 1.3 startup scripts with links to the Apache 2 scripts.

I've also found that sometimes, even though I have Apache 2 set up to start up at boot, it fails to do so, or CruiseControl.NET fails to start (or eventually locks up). However, this happens infrequently enough that I haven't bothered to track down the root of the problem, and a reboot (or two) generally fixes it. If anyone reading this happens to stumble upon a solution to my problem, I'd love to hear about it.

Developing for .NET on the Mac, Part 3: Continuous Integration

If you're not already into automated testing and continuous integration, I highly recommend taking a serious look at it, although making a comprehensive argument for the practice is rather beyond the scope of this article. In any case, if you are into continuous integration, you'll definitely want to set up an instance of CruiseControl.NET (or whatever continuous integration server you use) on a Macintosh--the behavior of Mono varies slightly from platform to platform, so you can't really be sure that your code works properly on the Mac unless you test on a Mac.

Let's get down to business, then, shall we?

Prerequisite:

I expect that you've already followed through the two opening posts in this series. At a minimum, you need to have an Apache server with mod_mono installed on it.

Grab CruiseControl.NET from their SourceForge page. Get the file named CruiseControl.NET-[Version].zip where [Version] is the latest version number.
Unzip the file and copy the contents of webdashboard into a directory that Apache can access. For the purpose of these instructions, I'll be calling it /web/ccnet.
Open /etc/http/httpd.conf in your favorite editor, and add this to the end:

Alias /ccnet "/web/ccnet"
AddMonoApplications default "/ccnet:/web/ccnet"
<Location /ccnet>
	SetHandler mono
</Location>

If you have your permissions set up correctly, you should now be able to see your CruiseControl.NET dashboard by going to http://your-server/ccnet/. But wait: we don't have the actual CruiseControl.NET service running! To get this set up, we need a few more steps:

Copy the contents of CruiseControl.NET's server directory into some convenient location, such as /usr/local/ccnet.
Edit ccservice.exe.config to your heart's content; make sure that you also create a valid ccnet.config. Neither of these tasks varies from the standard CruiseControl.NET installation procedures, so refer to the CruiseControl.NET site for details on this part.
To get the service running when you first boot your Mac, create a new directory in /Library/StartupItems called ccnet.
In this directory, create two files. One should be called StartupParameters.plist, and the other should be called ccnet. Here's what needs to be in StartupParameters.plist:

{
	Description = "CruiseControl.NET Server";
	Provides = ("ccnet")
	OrderPreference = "Late";
	Messages =
	{
		start = "Starting CruiseControl.NET";
		top = "Stopping CruiseControl.NET";
		restart = "Restarting CruiseControl.NET"
	};
}

And here's what you put in the file called ccnet:

#!/bin/sh
# startup script for service CruiseControl.NET

. /etc/rc.common

case "$1" in
	start)
		ConsoleMessage "Starting CruiseControl.NET"d
		if [ -x /usr/local/ccnet/ccservice.exe ]; then
			/Library/Frameworks/Mono.framework/Versions/Current/bin/mono-service2 -d:/src/ccnet /usr/local/ccnet/ccservice.exe
		fi
		;;
esac
exit 0

Now, if you restart your Mac and navigate to your CruiseControl.NET page, you should have a running instance of ccnet running whatever tests you have set it up to perform.

A couple of notes about the StartupItem I've provided:

StartupItems have been replaced with some fancy new technology called launchd in OS X 10.4. However, I could never get that thing to work, and StartupItems work fine, so this is what I've got.
Obviously, I provide no warranty for this. YMMV, but this is what worked for me.
You'll note that StartupItems are also supposed to have the ability to start and restart themselves. I'm lazy—if I need to restart the service, I reboot.
On that note, I've noticed that occasionally, when rebooting, ccservice.exe doesn't initialize as it should. I've never figured out why this happens, but usually another reboot or two fixes it. It doesn't bother me because I rarely reboot my continuous integration server, but if anyone learns a workaround, let me know.

Well, that's it for this post. Next time, I'll probably talk about installing Subversion on a Mac. It's not strictly a .NET topic, but it is a quite handy thing to have around, and there are a couple of gotchas to installing it in OS X.

Applications of Iterate()