Build. Optimize. Make Awesome.

Software development by David Boike.

Since Generics were first introduced with .NET 2.0 I have had a utility method I used to batch a list into smaller lists of a fixed size, usually to batch up a bunch of database inserts into manageable groups of 10 or 25 or so.

Today I needed to do the same thing except I will be dealing with a potentially VERY large data set with some fairly complex computations built in. Forcing it all into a list and batching the list means I will have to hold all of that garbage in memory.

So I started looking for a LINQ implementation that would deal with only one batch at a time and keep the memory footprint low. I found very little – everything under a Google search for “LINQ batch” seemed to be about operations with LINQ-to-SQL. Don’t care.

I found Split a collection into n parts with LINQ? on Stack Overflow but was horrified by some of the algorithms there. They all seemed to commit at least one unforgivable sin:

  • Extensive use of division or modulus, although I can let this slide because the question was how to divide an unknown sized collection into X equal parts, as opposed to my desire to return an unknown number of identically-sized parts.
  • Accessed the Count() of the collection
  • SUPER log and/or complex
  • Created tons of intermediate objects.
  • Rendered the collection to a list or something that would cause massive computation

What is needed is simply take a few, and keep track of your place. So here’s my solution:

public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> collection, int batchSize)
{
	IEnumerable<T> remaining = collection;
	while(remaining.Any())
	{
		yield return remaining.Take(batchSize);
		remaining = remaining.Skip(batchSize);
	}
}

Sometimes the simplest, shortest, most to-the-point solutions are the best ones.

Find out how to preprocess a WordPress export file before import to avoid multi-megapixel images in the post body slowing down page loads.

My wife is an incredibly talented woman. While she’s not working her day job at a magazine publisher, she makes and sells fondant-covered cakes, cupcakes, cookies, and other goodies. Let me tell you how difficult it is to try losing weight when there are constantly cake scraps lying around!

If you live in the Twin Cities area or are just plain curious, check out her website, Sweets by Natalie Kay. Some of my favorites: a Chocolate Cherry Chip Transformer Cake, and this Mario-Kart inspired birthday cake.

Transformers Cake Mario Kart Cake

Her site is a WordPress blog that she started out its life hosted on wordpress.com, which is nice but doesn’t give a lot of flexibility over themes and layout. When she wanted more flexibility, the task of converting the content to a different hosting provider fell to the family IT director.

WordPress contains export and import functionality, but a problem quickly emerged. WordPress.com adds width and height parameters to the querystring of images that are embedded within post text, which are intercepted by a handler that resizes the image to those dimensions before serving it to the client. However, the export file contains the URLs of the full size image.

My wife captured these images with her 10-megapixel D-SLR camera. These are not small files. The images (2-4 MB each) would load at a crawl, slowing down the entire page.

Programmer husband to the rescue! It’s nice to be needed.

The first hurdle was getting the XML export file to load at all, as WordPress exports invalid XML, a fact that nearly made me gag!

XmlException was unhandled
‘atom’ is an undeclared namespace. Line 149, position 3.

Seriously. Apparently WordPress exports by outputting text and not with any sort of complaint XML library, or blindly outputs some content elements without worrying about what XML namespaces that content might be using. Since I didn’t intend to do this dozens of times, I decided this would be pretty easy to fix manually by adding the atom declaration to the rss element:

<rss version="2.0"
	xmlns:excerpt="http://wordpress.org/export/1.0/excerpt/"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:wp="http://wordpress.org/export/1.0/"
	xmlns:atom="http://whocares.com/it-seriously-doesnt-matter"
>

The .NET XmlDocument will not care what the URL is or if it’s “correct”, it only cares that the atom namespace is declared.

After that, my conversion app does the following:

  1. Load the XML Document.
  2. Select each blog entry with an XPath expression.
  3. Use very simple regular expressions to identify the start of each image tag, and its corresponding closing bracket, outputting everything outside the image tag(s) as-is.
  4. Within each image tag, identify each HTML attribute, again by regular expression. If the width/height attributes are specified, save the values. If the src attribute contains a URL that includes w=? or h=? in the querystring, save those values
  5. With desired width and height values in hand, use the same attribute-finding regular expression to locate the src attribute and output a new URL that contains the width and height attributes that will tap into WordPress.com’s image resizing feature.

Using this modified export file, the WordPress import process downloads the downsized images from WordPress.com for the version embedded in the post text, but you can still click through to the full version of the image in all its megapixel glory.

So, here is the source:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Text.RegularExpressions;

namespace WordPressConverter
{
	class Program
	{
		static void Main(string[] args)
		{
			string inpath = @"C:\Users\Dave\Desktop\wordpress.input.xml";
			string outpath = @"C:\Users\Dave\Desktop\wordpress.output.xml";
			ConvertWordpressExport(inpath, outpath);
		}

		private static void ConvertWordpressExport(string inpath, string outpath)
		{
			XmlDocument doc = new XmlDocument();
			doc.Load(inpath);

			XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable);
			nsmgr.AddNamespace("content", "http://purl.org/rss/1.0/modules/content/");

			XmlNodeList nodes = doc.SelectNodes("/rss/channel/item/content:encoded", nsmgr);

			foreach (XmlNode n in nodes)
			{
				string newText = ProcessBlogPost(n.InnerText);
				n.InnerText = null;
				n.AppendChild(doc.CreateCDataSection(newText));
			}

			doc.Save(outpath);

			Console.WriteLine("Done");
			Console.ReadLine();
		}

		private static Regex findImgTag = new Regex("<img", RegexOptions.Compiled | RegexOptions.IgnoreCase);
		private static Regex findEndImg = new Regex("/>", RegexOptions.Compiled | RegexOptions.IgnoreCase);

		private static string ProcessBlogPost(string blogPost)
		{
			StringBuilder output = new StringBuilder();
			int pos = 0;
			while (true)
			{
				Match startImg = findImgTag.Match(blogPost, pos);
				if (!startImg.Success)
				{
					output.Append(blogPost.Substring(pos));
					break;
				}
				else
				{
					output.Append(blogPost.Substring(pos, startImg.Index - pos));
					Match endImg = findEndImg.Match(blogPost, startImg.Index);
					pos = endImg.Index + endImg.Length;
					string imgTag = blogPost.Substring(startImg.Index, pos - startImg.Index);

					ImgTagProcessor p = new ImgTagProcessor(imgTag);
					output.Append(p.Process());
				}
			}
			return output.ToString();
		}

		class ImgTagProcessor
		{
			static Regex findAtts = new Regex(@"(?<Att>\w+)=""(?<Value>[^""]*)""", RegexOptions.Compiled | RegexOptions.IgnoreCase);
			static Regex queryW = new Regex(@"w=(\d+)", RegexOptions.Compiled | RegexOptions.IgnoreCase);
			static Regex queryH = new Regex(@"h=(\d+)", RegexOptions.Compiled | RegexOptions.IgnoreCase);

			string imgTag;
			string width;
			string height;

			internal ImgTagProcessor(string imgTag)
			{
				this.imgTag = imgTag;
			}

			internal string Process()
			{
				// Extract width and height info
				foreach (Match m in findAtts.Matches(imgTag))
				{
					switch (m.Groups["Att"].Value)
					{
						case "width":
							this.width = m.Groups["Value"].Value;
							break;
						case "height":
							this.height = m.Groups["Value"].Value;
							break;
						case "src":
							Uri uri = new Uri(m.Groups["Value"].Value);
							string query = uri.Query;
							if (!String.IsNullOrEmpty(query))
							{
								Match matchW = queryW.Match(query);
								Match matchH = queryH.Match(query);
								if (matchW.Success)
									width = matchW.Groups[1].Value;
								if (matchH.Success)
									height = matchH.Groups[1].Value;
							}
							break;
					}
				}
				return findAtts.Replace(imgTag, new MatchEvaluator(EvaluateAttributeMatch));
			}

			string EvaluateAttributeMatch(Match m)
			{
				switch (m.Groups["Att"].Value)
				{
					case "src":
						UriBuilder uri = new UriBuilder(m.Groups["Value"].Value);
						List<string> queryItems = new List<string>();
						if (width != null)
							queryItems.Add("w=" + width);
						if (height != null)
							queryItems.Add("h=" + height);
						uri.Query = String.Join("&", queryItems.ToArray());
						return "src=\"" + uri.ToString() + "\"";
					default:
						return m.Value;
				}
			}
		}
	}
}

I hope someone else can find it useful!

Yesterday Scott Guthrie wrote about the new “code-first” data access paradigm that Microsoft has released as an update to the Entity Framework, in his blog post with the same name as this one. (So I’m lazy!) I read it and was blown away. The speed, power, and elegance that this solution provides now (and will provide in the future after it matures out of CTP) looks like a big win for developers all over, but of course I had to download the bits and put it through its paces.

Walk before you run

I decided to start off with a very basic test, a boring book and author example model:

namespace EFTest.Model
{
	public class Book
	{
		public int BookID { get; set; }
		public string Title { get; set; }

		public Author Author { get; set; }
	}

	public class Author
	{
		public int AuthorID { get; set; }
		public string FirstName { get; set; }
		public string LastName { get; set; }

		public ICollection<Book> Books { get; set; }
	}

	public class BookDB : DbContext
	{
		public DbSet<Book> Books { get; set; }
		public DbSet<Author> Authors { get; set; }
	}
}

Scott Guthrie had used an MVC web application with SQL CE 4 as the database back end. I wanted to try different things (and more importantly, didn’t feel like installing SQL CE) so I created an ASP.NET Web Application Project and added a simple DataGrid to display the data with AutoGenerateColumns set to true.

Here are some initial observations:

  • You may want to define your database connection in Web.config before you get started. I started adding a book on every request, thinking that with no database backend, all the data would only be stored in memory. Wrong. My data was persisting even between recompiles, so obviously it was being stored somewhere! But where? Turns out my laptop has versions of Visual Studio 2005, 2008, and 2010 installed, and SQL Server 2005 and 2008. I’m not sure how, but the Entity Framework decided to find a SQL Server Express 2005 instance and created a database named “EFTest.Model.BookDB” (the namespace and class name of my DbContext class) on that instance even though I had not provided any connection string, although the default Web Application Project came with a connection string named ApplicationServices which did point to that instance. I’m not sure if that is how Entity Framework selected that database or not.
  • It’s a little confusing and disconcerting to have these things happen to a database that’s not a file-based database included in your Visual Studio project. I think it would be much more straightforward to be destroying and recreating included-in-project SQL Express or SQL CE databases, both of which can be easily upscaled to real SQL Server databases for QA and Production. (Later I’ll show that not using a file-based database probably won’t work in practice anyway)
  • The Entity Framework translated an undecorated string property into a nullable nvarchar(4000) in the database. Obviously you’re going to want to decorate these with StringLength and Required attributes to fit your business requirements. These attributes are from the System.ComponentModel.DataAnnotations namespace, in the System.ComponentModel.DataAnnotations assembly.

Time to kick the tires

So now that I’ve seen the basics in play, it’s time to kick the tires and see what I can get it to do.

So first I added this line to the Application_Start() method of Global.asax so that the database would be recreated whenever I change my model:

void Application_Start(object sender, EventArgs e)
{
	Database.SetInitializer(new RecreateDatabaseIfModelChanges<BookDB>());
}

I added a bunch of types to see how they would translate to database types, recompiled, and then ran, only to get the following exception: Cannot drop database “BookLibrary” because it is currently in use.

OK. I guess this reinforces that it would be best to use a file-based SQL Express or SQL CE database contained in the solution. I tried changing the connection string to use a SQL Express Books.mdf database in my App_Data folder. This worked great the first time, but then when I changed my model and tried to let it regenerate, I go the following exception: Cannot open database “BookDB” requested by the login. The login failed. Login failed for user ‘(my login)’.

I’m not sure if it’s something I’m doing wrong, but at this point I’m a little frustrated, so I decide to download SQL CE and use Scott’s NerdDinner example as a starting point and use that from here on out.

I also needed to download and install the first preview beta of WebMatrix, because Microsoft has not yet shipped the update for Visual Studio that will allow us to manage SQL CE 4 .sdf databases in the Server Explorer tab. WebMatrix is installed through the Web Platform Installer, and was a 20 MB download.

I have to say, WebMatrix may be great for beginners, but for an experienced developer used to Visual Studio, it’s just weird. I’ll be very glad when Visual Studio integrates the SQL CE support.

Fun with Types

Now that I’m using Scott’s NerdDinners as a base, it’s important to point out that for the SetInitializer call in Global.asax, Scott is defining a custom type NerdDinnersInitializer that inherits from the RecreateDatabaseIfModelChanges that I was using. This enables him to override the Seed() method to create default data when the database is recreated following a model change. You may want to refer back to his article.

Now my goal is to create a new model class and throw a bunch of different types in it to see how they get mapped to SQL types. Let’s knock out most of the intrinsic value types and see what happens!

public class TypeTest
{
	public bool TestBool { get; set; }
	public byte TestByte { get; set; }
	public short TestInt16 { get; set; }
	public int TestInt32 { get; set; }
	public long TestInt64 { get; set; }
	public Single TestSingle { get; set; }
	public double TestDouble { get; set; }
	public float TestFloat { get; set; }
	public decimal TestDecimal { get; set; }
	public DateTime TestDateTime { get; set; }
	public Guid TestGuid { get; set; }
}

Oops! Unable to infer a key for entity type ‘NerdDinnerReloaded.Models.TypeTest’.

The Entity Framework is able to infer a primary key for Dinner and RSVP because (following convention over configuration) the classes have DinnerID and RsvpID properties. Fixing this is as easy as adding a TypeTestID property.

The mappings to database types is as you’d probably expect:

.NET Type SQL Type
bool bit
byte tinyint
short smallint
int int
long bigint
Single real
double float
float real
decimal numeric
DateTime datetime
Guid uniqueidentifier

I know, pretty boring. You could pretty much look that up on MSDN. All these value types emerged on the SQL end as their not-nullable counterparts.

I attempted to change every one of the primitive datatypes to their nullable counterparts by adding a ? to each type in the model. This worked as expected, switching each column to be nullable, but with one caveat: when I switched TestTypeID to int?, Entity Framework was again unable to infer a primary key. Lesson: Entity Framework does not appreciate nullable primary keys.

Next I tried replacing int, long, and byte with uint, ulong, and sbyte. The results were odd. For uint and ulong, no exception was thrown, but the properties were essentially dropped – they did not get translated into the database table. For sbyte, I received an exception about not being able to map the type. I tested all these in their non-null configurations. I didn’t bother with uint? or ulong? or sbyte? because I really don’t have many uses for these types in the first place. My development life is constrained by what you can put in a database, and you really can’t put these types in a SQL Server database, so they have no usefulness to me.

Now for some more interesting types.

  • DateTimeOffset – throws exception!
  • DayOfWeek (simple enumeration) – Success! Maps to int
  • DayOfWeek? (nullable enum) – Success! Maps to nullable int
  • Enum based on byte – Success! Maps to tinyint. At this point, I’m going to assume that any enumeration that maps to a supported type will also be supported. Not so fast, see update below.
  • XmlDocument – ignored, no exception thrown. I was so hoping this would map to an xml column.
  • SqlXml – also ignored.
  • XDocument – also ignored. Not feeling good about any XML support at this point.
  • XElement – also ignored. OK I give up on XML.
  • byte[] – Maps to image type. This is weird to me because Transact-SQL reference says that image will be removed in a future version of SQL Server and that we should be using varbinary(MAX) instead. I wonder why the Entity Framework team chose to map to image?
  • char – ignored. I don’t know why I didn’t test this with primitives, so when I did I was shocked it didn’t map to nchar(1). But really, who uses char columns anyway?
  • SqlGeography – ignored
  • SqlGeometry – ignored
  • SqlHierarchyId – ignored

UPDATE: A commenter alerted me that although enums appear to map correctly to the correct column type, if you attempt to execute any code with them, you will get a nasty exception that “The entity type TheEnumType is not part of the model for the current context.” Hopefully this is a CTP-only issue and Microsoft plans to implement enums correctly in the near future.

That’s all the types I think of to test. I’m impressed that enumerations are taken care of so well. Although ideally I would like all of these types to map correctly out of the box, I’m most upset about any sort of support for xml column types.

Many to Many Relationships

I had no idea if the Entity Framework could easily support Many to Many relationships but decided to throw out a simple idea and see what happened:

	public class Left
	{
		public int LeftID { get; set; }
		public string Name { get; set; }

		public virtual ICollection<Right> Rights { get; set; }
	}

	public class Right
	{
		public int RightID { get; set; }
		public string Name { get; set; }

		public virtual ICollection<Left> Lefts { get; set; }
	}

	public class NerdDinners : DbContext
	{
		public DbSet<Left> Lefts { get; set; }
		public DbSet<Right> Rights { get; set; }

		// other items
    }

Lo and behold, it worked! Here’s the database structure that was created:

Table Left
* LeftID
* RightID

Table Right
* Name
* RightID

Table Lefts_Rights
* Lefts_LeftID
* Rights_RightID

Very cool! I’m sure there’s probably a way to customize the cross-reference table, but if you’re in a hurry and don’t really care too much, this is a really quick and painless way to get a Many to Many relationship.

Creating a Hierarchy

It’s also pretty simple to create a hierarchical object that has parent-child relationships.

	public class TreeNode
	{
		[Key]
		public int NodeID { get; set; }

		public TreeNode ParentNode { get; set; }

		public virtual ICollection<TreeNode> ChildNodes { get; set; }

		public string NodeName { get; set; }
	}

Notice the [Key] attribute that declares NodeID to be the primary key, since it doesn’t follow the conventions that would normally expect the name to be TreeNodeID.

What I can’t figure out is how to define the column that stores the ParentNodeID. By default, the column generated is named ParentNode_NodeID, which is pretty ugly.

The post Data Annotations in the Entity Framework and Code First mentions a RelatedToAttribute that should address this problem, but the version in this post (dated March 30, 2010, so clearly preceding this newer release of Entity Framework) has different properties than the bits I downloaded, and I don’t know how to bridge that gap.

Going Forward

This is a pretty long post already, but there are still some things I’d like to explore at a later date.

  • I didn’t really get the chance to actually use the code much, as I was primarily concerned with building the database schema from the model.
  • Scott says this version integrates better with stored procedures, although it is not immediately obvious to me how this would be done.
  • It would be interesting to test how well the Entity Framework cooperates with WCF RIA Services for Silverlight applications.

Conclusion

For a Community Technical Preview, these Entity Framework bits are really impressive and I’m excited to get to try them out.

Here’s what it needs before the RTM:

  • Provide mappings to and from SQL xml column types for the XmlDocument, XElement, XDocument, and SqlXml types.
  • The promised Visual Studio update to allow easier management of SQL CE databases within Visual Studio, and/or some documentation about how to get around the gotchas involved with using other SQL options.
  • A cookbook of how to achieve various design patterns by the application of attributes or fluent configuration.
  • XML documentation for IntelliSense for all the Entity Framework and Data Annotations attributes.
  • Update: fully support mapping enumeration values. Right now the correct schema is generated, but the model does not support actually committing values.

A big thanks to Scott Guthrie and his entire team. I’m looking forward to the next release!

Even though I’m a Windows-based developer, I grew up the son of a teacher and an Apple fanboy. While I like to think I’ve become more balanced since my youth, I still remain a fan of their products. My wife has a MacBook Pro which makes my Dell laptop jealous, and I love my iPhone to death.

When Apple announced the iPad, like many my reaction was along the lines of “OK that’s cool, but what do you use it for?”

I still haven’t bought one, but little things I hear keep inching me closer to the inevitable point where I’m sure I will break down and buy one.

My wife and I have friends who own one, and say it makes a great living room computer. The iPhone, while handy and always nearby, still isn’t big enough for some things, and the iPad gives you that real estate for casual web surfing and blog reading on the couch.

And now Scott Adams, creator of Dilbert, has bought an iPad, and I think he sums it up perfectly:

A regular laptop is like your boss: always making you wait before giving you busy-work assignments. The iPad is more like a punctual lover. It’s always ready for fun.

Check out Scott’s full post: The Amazingness of Instant.

I recently saw links to these two videos (one pro-HTC EVO, the other pro-iPhone) from several members of my Facebook network, and had a curious sense of déjà vu.

Warning: the videos contain language you may not necessarily want your children to hear.

While these vids are kind of funny to watch, I was instantly teleported back across time and space to about a decade ago to the height (at least, in my experience) of the Mac/PC flame war. Mac users were stupid because you couldn’t do anything for business or run Word. Windows users were stupid because their computers couldn’t do graphics and were soulless idiot boxes that couldn’t handle file names longer than 8.3.

This was of course before Apple had life infused back into it by Steve Jobs’ return, along with the successes of the iMac and iPod, and before Windows had some of its shine taken off by the embarrassment that was Vista. However, by now, most of those differences have largely evaporated. You can use Word on both. You can do graphics on both and you can do business applications on both, and Get a Mac advertisements (which have now been canceled) and their Windows counterparts aside, it really doesn’t matter – people use what they like and it’s OK.

So now the big flame war has been miniaturized to fit in our pocket in the form of smartphones, and it’s equally as stupid.

Here’s a crazy idea:

Buy whatever phone will make you happy. It really doesn’t matter.

What a crazy notion. It really doesn’t. In the end, I can buy whatever phone I want (full disclosure, I own an iPhone 3G and an upgrade to an iPhone 4 is most likely in the not-too-distant future) but the important part is just because I prefer the iPhone doesn’t mean you have to as well, and that choice does not make you a moron.

Some people want their megapixels, want to customize the crap out of the interface, and don’t want to be on AT&T. Fine. Go do that. Some people think the iPhone platform provides more polish and ease of use. Fine, that’s awesome too!

As a software developer, all this fragmentation in browsers and mobile devices does make my job harder, but at the end of the day I’m glad there is competition so that no platform can rest on its laurels and fail to bring me cool new features to geek out over.

So flame on if you must, but ask yourself: Does it really matter?

I know a lot of us are using the LINQ OrderBy() method to get our data shuffled in the right order, but on occasion I still do like implementing IComparable, especially when defining the default, intrinsic sort scheme for a particular class.

What I don’t like is implementing IComparable when I want to compare on more than one thing.

I don’t like this kind of code because the engineer in me balks at essentially doing each comparison more than once, first for equality, and then for direction:

public int CompareTo(SomeObject other)
{
	if (this.Prop1 != other.Prop1)
		return this.Prop1.CompareTo(other.Prop1);
	else if (this.Prop2 != other.Prop2)
		return this.Prop2.CompareTo(other.Prop2);
	// .... etc.
	return 0;
}

However, expanding it to get rid of this double evaluation yields this nastiness:

public int CompareTo(SomeObject other)
{
	int cmp = this.Prop1.CompareTo(other.Prop1);
	if (cmp != 0)
	{
		cmp = this.Prop2.CompareTo(other.Prop2);
		if (cmp != 0)
			return cmp;
	}
	return 0;
}

Yuck. I decided I needed a way to have some of the elegance of LINQ in an IComparable shell, and here it is:

public class ComplexCompare
{
	private int value;

	private ComplexCompare()
	{
	}

	public static ComplexCompare By<T>(T a, T b)
	{
		return By(a, b, true);
	}

	public static ComplexCompare By<T>(T a, T b, bool ascending)
	{
		ComplexCompare cc = new ComplexCompare();
		if (ascending)
			cc.value = Comparer<T>.Default.Compare(a, b);
		else
			cc.value = Comparer<T>.Default.Compare(b, a);
		return cc;
	}

	public static ComplexCompare By<T>(T a, T b, IComparer<T> comparer)
	{
		ComplexCompare cc = new ComplexCompare();
		cc.value = comparer.Compare(a, b);
		return cc;
	}

	public ComplexCompare ThenBy<T>(T a, T b)
	{
		return ThenBy(a, b, true);
	}

	public ComplexCompare ThenBy<T>(T a, T b, bool ascending)
	{
		// Only compare more specific items if the preceding items have been equal
		if (value == 0)
		{
			if (ascending)
				this.value = Comparer<T>.Default.Compare(a, b);
			else
				this.value = Comparer<T>.Default.Compare(b, a);
		}
		return this;
	}

	public int End()
	{
		return value;
	}
}

ComplexCompare can be used like this:

public int CompareTo(SomeObject other)
{
	return ComplexCompare.By(this.Prop1, other.Prop1) // ascending by default
		.ThenBy(this.Prop2, other.Prop2, false) // but easy to change to descending
		.ThenBy(this.Prop3, other.Prop3) // each call can compare a completely different type
		.End(); // stops the fun and returns the int value
}

Now a hardcore computer science person would tell me that all this extra abstraction adds overhead, and in a really tight loop with millions of rows, using this scheme to compare items would surely be catastrophic! However, I’m not usually one to give in to Micro-Optimization Theater; I’m usually comparing 30-40 items, not millions. My primary concern is that code I write can be instantly understood when viewed by one of my peer developers, and I think ComplexCompare (although I’m not in love with the name) will help me to do that.

On my team, we use FogBugz for case/defect tracking and project management, and we absolutely love it.

So when the folks over at Fog Creek created Kiln for version control with integrated code review, we had to try it out, and once I tried it, I knew we had to have it.

Of course, Kiln is based on Mercurial, and in order to convert, we have a fairly large Subversion repository that we need to convert. At least, at over 4000 revisions and still growing, it feels pretty big to us.

(If you have no idea what Kiln or Mercurial is all about, I encourage you to check out Hg Init, a tutorial on Mercurial written by Joel Spolsky, which is extremely informative and well written.)

Fog Creek provides an import tool that is pretty impressive, since it is able to do import from half a dozen different existing source control systems (including just a pile of files on disk) and does it pretty well. However, it didn’t meet our needs in a few areas:

  • It doesn’t take advantage of author mapping offered by the hg convert utility. Our current Subversion server (VisualSVN Server) uses our Windows credentials, which are based on an archaic corporate naming scheme, so we really would like to map these to our real names.
  • The conversion is imperfect, resulting in some directories and files that aren’t in the head of the Subversion trunk turning up in the converted Mercurial repository. These are directories that existed long ago, before a massive repository reorganization moved them elsewhere. In any case, we don’t want them, and we don’t necessarily want to hunt them down manually.
  • The conversion doesn’t convert Subversion’s svn:ignore properties into an .hgignore file. When you’re developing with Visual Studio 2008 and a large solution, that results in a bunch of untracked bin and obj directories.
  • Although this turned out not to be important for us, the importer does not allow carving a large Subversion repository into a bunch of smaller subrepositories.

I developed a Windows Forms application in C# to handle our conversion, and learned a lot about the command-line svn and hg commands along the way. I offer it to you in the hopes that you might find it useful, because there is no greater tragedy than code that is written and only executed once.

I warn you, the application is NOT what I would call polished source code or UI. It’s REALLY rough around the edges. It was designed experimentally to get a job done that only had to be done once. Niceties were not observed. Sorry! And of course I make no warranties that it will work for you, AT ALL!

That said, here’s the link:

  • HgImport, Version 1.0 - A Visual Studio 2008 project for an SVN to Mercurial import application.
  • Requires the pre-.Net-4.0 Task Parallel Library extensions

Here’s how to use it:

  • You must enable the hg convert extension:
    • Edit your .hgrc file in your user folder … which if you’re using Kiln on Windows, may be called Mercurial.ini instead.
    • Find the [extensions] section of the file, and add the line “hgext.convert=” (without the quotes) underneath.
    • All this does is enables the convert command on the command line. See the Mercurial ConvertExtension documentation for further details.
  • Build and run the application.
  • Switch to the Author Map tab and enter your username mappings. Each line should take this format:
    • oldusername = Firstname Lastname
  • I don’t save that text anywhere. Copy and paste it into a text file for safekeeping. Now switch tabs back.
  • Enter your Subversion repository trunk url in the text box. (Sorry, we are just converting our trunk. All our branches are dead and we want to be rid of them. You’re welcome to change the code if you like!)
  • Click the Load SVN button. Sorry if your repository requires login. My app doesn’t support it. I warned you it was rough around the edges! You can either modify the source or do what I did: use the svnsync command to sync your central SVN repository server onto a mirror repository on your local computer. This has the added benefit of speeding the conversion process by taking the network out of the equation.
  • Check any directories that you want to convert into subrepositories. (Since I abandoned this feature, this is NOT well tested! Especially the .hgignore doesn’t take subrepositories into account.)
  • Click Convert and then go get a snack. I estimate approximately one snack per thousand SVN revisions, or one full meal per 5000 revisions. Just ballpark figures.

Here’s what the conversion process does:

  • When you load the SVN repository, it loads the SVN head file structure into an XML document in memory.
  • When you begin conversion, first it clears out the working directory. (“working” under the execution directory)
  • Next, the app performs a full convert of the SVN repository using “hg convert”, similar to what the Kiln import tool does, although the authors are mapped at this point.
  • Next, the app updates the Mercurial repository with “hg up” to essentially establish the working copy.
  • Next, the Mercurial working directory is compared recursively with the SVN-exported XML file structure. This creates a filtering list that eliminates the phantom directories and files that no longer exist in the SVN working copy.
  • Next, the intermediate Mercurial repository is converted to the final Mercurial repository using “hg convert”, utilizing the previously created filter, and then the final repository is updated with “hg up”.
  • If you selected subrepositories, the filtering file will also exclude those directories, and then the app will attempt to repeat the process of converting each subrepository out of the transitional repository. Like I said, I gave up on this feature and this will definitely not work with .hgignore, if it even works at all!
  • Lastly, the app will query the SVN repository for svn:ignore properties, and translate these into an .hgignore file in the root of your new Mercurial repository. Then this file will be added with “hg add” and committed with “hg commit”.

At this point, {AppWorkingDirectory}\working\FilteredRepo will be a fully functioning Mercurial repository! If you want to stop there, great, or it’s ready to push up to Kiln.

That’s it. It’s not pretty, but it works for us and hopefully it will work for you, or at least serve as a starting point. Enjoy!

My parents always told me to say what you mean and mean what you say.  It’s good advice for life in general but even more so in software engineering.

I’ve been updating our online store back end jobs, which were still using legacy code, to use updated code to prepare for a database move that would have been impossible before.  At the same time, I’m trying to make the code more efficient and maintainable.

The new code threw an exception today from the e-commerce provider.  We were attempting to capture an amount that was more than what was authorized on the card.

In reality, after tax, the order total was $21.625 after tax, which is stupid on its face to have fractional pennies, but that code is even nastier and I have very limited control over it.  When the store ran the credit authorization, it rounded down to $21.62.  When my new code ran, it was rounded up (as most 5th graders would expect, you round 0.5 up) to $21.63, hence the failure over a penny.

I dug up old code and found that the store authorization was using:


decimal.Round(amount, 2)

Well, what does that mean exactly? I powered up Reflector and took a look at the guts of the method.  This was beyond interesting:


// Original code
decimal rounded1 = decimal.Round(amount, 2);
// is equivalent to:
decimal rounded2 = decimal.Round(amount, 2, MidpointRounding.ToEven);
// and is completely different from
decimal rounded3 = decimal.Round(amount, 2, MidpointRounding.AwayFromZero);

MidpointRounding.AwayFromZero is exactly what I would expect from grade school. Using positive numbers, once you get halfway, you round up. With negative numbers, you would round down (to the larger negative).

MidpointRounding.ToEven is beyond weird to me. It rounds to the nearest even number.

This means that, with MidpointRounding.ToEven and rounding to 2 decimal places, 0.675 rounds to 0.68, as I would expect, but 0.685 ALSO rounds to 0.68!

I don’t know who came up with this or why, but it looks like internally, .NET uses a native extern method to accomplish even rounding, and a fairly complex but managed-code algorithm to accomplish away from zero rounding. Was the method that makes no sense to me or most 5th graders selected as the default because it is more efficient? I’m not sure I’ll ever know.

The point is that as software developers we need to be cognizant of the framework code we utilize and what it’s doing. We need to resist the urge to be lazy and use the more complex method overloads, and also throw in a comment to explain why.

Of course there are numerous types and methods in the .NET Framework that have similar problems. String.Compare() methods without the benefit of a string comparison type come to mind.

In any case, say (or rather, code) what you mean and mean what you code, and then you can avoid little gotchas like this.

Last week in our development team meeting we had a discussion about QR Codes and whether or not there was a good reason to include QR reader functionality in future products.

For those unfamiliar, the short version (stolen from the Wikipedia linked above):

A QR Code is a matrix code (or two-dimensional bar code) created by Japanese corporation Denso-Wave in 1994. The “QR” is derived from “Quick Response”, as the creator intended the code to allow its contents to be decoded at high speed.

My personal opinion, which I espoused during that meeting, was that QR codes are, for lack of a better word, stupid, that they would never appear in anything mainstream, had no potential return on investment, and that generally they were a complete and utter waste of time and we should go out of our way to avoid allocating any development time toward it.

And so what do I see in the Sunday paper just a few days later?  Best Buy included a QR code on the front page of their Sunday ad.  Thanks Best Buy.  Big help.

My typical admiration for Best Buy and all the things they have to sell me aside, I’m sticking with my opinion.

Here’s an image of Best Buy’s Sunday ad from May 23, 2010:

Best Buy 5/23/2010 Ad with QR Code

Best Buy 5/23/2010 Ad with QR Code

In order to use this thing, here’s what you have to do:

  1. Text BBYAPP to an SMS shortcode.
  2. Receive a text message in return.
  3. Download the app teased in the message.
  4. Run app.
  5. Take a picture of the QR Code.
  6. Wait for it to process.
  7. Be sent to a website to watch a video trailer for Super Mario Galaxy 2.

By the way, the website forced you to click a link to say whether you had an iPhone or an Android device.  No browser detection.  How very low-tech.

If that sounds hopelessly complicated, that’s because it is.  This could have been accomplished by asking the user to visit bestbuy.com/mario (not a real link).

This fails a pretty simple metric I have for evaluating new software and technology.  If it feels cumbersome and overcomplicated to me, a software developer and self-proclaimed uber-geek, then it can never find general acceptance among the masses?

Of course, this isn’t a perfect use of QR Code technology.  QR codes should follow the Three Rules of QR Codes.  Briefly, they must 1) have a good mobile-device-appropriate landing page, 2) have a tiny url, and 3) lead to something valuable.  This Sunday ad doesn’t outright fail, but doesn’t exactly achieve stellar marks on #1 or #3.  The landing page is somewhat mobile ready, but should be able to tell iPhone and Android apart, and have some option for other devices.  More at issue is Rule 3 – a video is not all that valuable.  Offer me 10% off Super Mario Galaxy 2 and then maybe we have something to discuss.

To be fair, Best Buy is improving; previously they hung a QR code in a storefront window in New York City and this was an even bigger affront to Rule 3: it was only a link to the Best Buy mobile website.

So I maintain that QR codes will not catch on, because they can never reach the mainstream.  They are a digital chicken and the egg paradox.

  1. In order for end users to accept QR codes, they must be commonplace.  They must permeate our entire existence.  The term “QR” must be as well understood as “URL” is today.  My grandma (who uses the Internet and is a pretty hip lady, in my opinion) must know and understand what a QR code is and what it does.  “QR” must be a verb and be added to the Oxford dictionary.
  2. In order for publishers to make widespread use of QR codes, end users must have gained the acceptance of them.
  3. See #1.

So it won’t happen.  Maybe UPS or Fedex will use them in their package tracking (actually I think one or both might already) but I won’t know what it means and I won’t care to as long as my latest order from Amazon makes it to my door.  But it will never reach critical mass.  And knowing this, I’d sure like to avoid wasting my time writing software to support it.

In my previous post I attempted to build an NServiceBus Timeout Manager that used timers and events to send back timeout messages when they came due instead of looping through the message queue with thread sleeps until each message was ready to send back that occurs in the timeout manager included with the NServiceBus 2.0 RTM.

That implementation stored timeouts that were set to expire “soon” in memory, and stored everything in a secondary MSMQ queue so that if the application failed, it could recover when it started back up.

I realized that a better implementation would be to separate out the storage using a provider implementation.  This way, the entire implementation could be switched out completely by providing a second class that implements ITimeoutStorageProvider.

Also, I refactored the MsmqTimeoutStorage class into a base class that’s concerned with the in-memory timeout handling, and the superclass that adds in the actual storage implementation with MSMQ.

This way, switching to database storage would be as easy as creating a class that inherits TimeoutStorageProvider<T> (more on the generic parameter in a bit) and implements the following methods:


public abstract void Init();
protected abstract T StoreTimeout(IMessageContext context, TimeoutMessage msg);
protected abstract void RemoveBySagaId(Guid sagaId);
protected abstract void RemoveTimeout(T id);
protected abstract List&lt;TimeoutEntry&lt;T&gt;&gt; GetTimeouts(DateTime cutoff);

The generic parameter T is the ID unique to the storage provider, used later to remove it from storage after the timeout is complete. For the MSMQ implementation, this is a string because a MSMQ message’s Id parameter is a string. For a database implementation with Microsoft SQL it would probably be advisable to use a Guid.

Here is the source:

To build, simply unzip the project, include it in a solution, fix the references to the NServiceBus components and Log4Net, build, and go.

Feel free to take the code and use it however you like under the terms of the Apache License, Version 2.0.  I offer it without warranty, express or implied, and as always your mileage may vary.  I only ask that you share any improvements you make to it so that everyone can benefit.

Powered by WordPress Web Design by SRS Solutions © 2010 Build. Optimize. Make Awesome. Design by SRS Solutions