A Short, Friendly GUID/UUID in .NET

I recently committed to returning a unique ID that was 18 characters or less as part of the response from a new version of an existing web method I support. This value would be stored and used for subsequent calls to other web methods and would potentially be manually entered into a website query page.

Easy, you say. Just use an IDENTITY (or SEQUENCE in Oracle) column (which is already in place as a surrogate key, BTW). The problem with IDENTITY columns is that you have to wait for the call to the database to complete before you know what the value will be. In this scenario, we are using MSMQ to enhance the performance and reliability of the operation — this allows us to return from the web method call without waiting on the DB (which *usually* is quite fast but slows down for short periods of time infrequently). This web method gets called very frequently and the calls can be bursty, so the queuing maintains a very quick response time under all circumstances.

I had planned all along to use a GUID/UNIQUEIDENTIFIER column in the underlying table with the GUID generated during the web method call and written to the write queue with the rest of the data. Unfortunately, I confused two numbers in my head when I made this commitment. GUIDs are 16 *bytes* long but their default display format is actually 32 hex digits with 4 dashes thrown in for good measure.

I didn’t want to go back to the client and ask for them to increase the size of this field. Especially since this was going to be a human queriable field on a web site. I considered base 36 encoding the GUID, but that would still take 25 digits. Next I considered Base64 Encoding a GUID which is actually quite clever but it still came up “short” (or should I say long) by 4 digits compressing the GUID down to 22 digits. But still a nice improvement over 36.

Rick Strahl (a very prolific, well-written .net blogger) contemplated using GetHashCode().ToString(“x”) on a variety of targets, but I couldn’t get comfortable with statistical validity of this approach.

I considered using a combination of a time-based value (such as DateTime.Ticks) but I could not get past the bursty usage problem with those. I also felt that I must be re-inventing the wheel. The GUID was the “best practice” and I couldn’t help but feel I should utilize that approach or at least a similar one.

The “G” in GUID stands for global. I really didn’t need “global” uniqueness. In fact, I just needed to guarantee statistical uniqueness for all these specific types of calls made by a particular client. I could use the “client id” as part of the lookup to further reduce the “uniqueness” required. I then did some research on UUIDs (GUID is just MS’s brand name for UUID) and saw that later versions of the algorithm use random number generators. I decided that if 128 bits gets me “global” uniqueness, I could get by with a few less for my scenario. So now the question was how to encode those bits

I actually used base 36 over a decade ago in an Xbase application to avoid having to increase the width of a field but there are a couple of problems with it. First, because 36 is not a power of 2, you cannot easily (as in bit shifting) convert bits to digits. Second, 0/O (zero, and oh) and 1/I (one and eye) can easily be confused, particularly in some fonts.

Removing those four digits leads to my choice, base 32 with 2-9 and the capital letters minus O and I as the digits. You can easily convert to base 32 because it’s a power of 2 and you get a very friendly set of digits with good bit density.

Now the final piece of the puzzle was how to generate “good” random numbers. Good for me meant that there was a very, very low probability of collisions, even if I was generating an ID on the same machine every 10 milliseconds or so. As I mentioned above, usage of the service can be bursty. Some clients are psuedo-batching their work and hit the service hard for a relatively short period of time.

While doing some research on .net random number generation, I learned that it is much better to instantiate the generator once than it is to repeatedly instantiate it for each number. This sounds counter-intuitive at first, but makes sense when you think about the time-base default seeding these generators use. So a singleton was in order.

Finally, I had to choose between venerable RAND and RNGCryptoServiceProvider. Although I think using the singleton approach with a single instance of RAND would have worked, I felt more comfortable with the more robust seeding used by RNGCryptoServiceProvider. Particularly since we are running on clusters and our web service calls are sessionless (no affinity).

There’s one final wrinkle. In my scenario, I have a unique (most of the time) 32 bit integer value available. I wanted to incorporate that into my scheme to further reduce the odds of a collision. I do this by taking a byte array as an input that is “mixed” with the random bytes to produce the ID.

Without further adieu, here’s the code for the class itself and some unit tests:

using System;
using System.Security.Cryptography;

public class UniqueIdGenerator
{
	private static readonly UniqueIdGenerator _instance = new UniqueIdGenerator();
	private static char[] _charMap = { // 0, 1, O, and I omitted intentionally giving 32 (2^5) symbols
		'2', '3', '4', '5', '6', '7', '8', '9', 
		'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'
	};

	public static UniqueIdGenerator GetInstance()
	{
		return _instance;
	}

	private RNGCryptoServiceProvider _provider = new RNGCryptoServiceProvider();

	private UniqueIdGenerator()
	{
	}

	public void GetNext(byte[] bytes)
	{
		_provider.GetBytes(bytes);
	}

	public string GetBase32UniqueId(int numDigits)
	{
		return GetBase32UniqueId(new byte[0], numDigits);
	}
    
	public string GetBase32UniqueId(byte[] basis, int numDigits)
	{
		int byteCount = 16;
		var randBytes = new byte[byteCount - basis.Length];
		GetNext(randBytes);
		var bytes = new byte[byteCount];
		Array.Copy(basis, 0, bytes, byteCount - basis.Length, basis.Length);
		Array.Copy(randBytes, 0, bytes, 0, randBytes.Length);

		ulong lo = (((ulong)BitConverter.ToUInt32(bytes, 8)) << 32) | BitConverter.ToUInt32(bytes, 12); // BitConverter.ToUInt64(bytes, 8);
		ulong hi = (((ulong)BitConverter.ToUInt32(bytes, 0)) << 32) | BitConverter.ToUInt32(bytes, 4);  // BitConverter.ToUInt64(bytes, 0);
		ulong mask = 0x1F;

		var chars = new char[26];
		int charIdx = 25;

		ulong work = lo;
		for (int i = 0; i < 26; i++)
		{
			if (i == 12)
			{
				work = ((hi & 0x01) << 4) & lo;
			}
			else if (i == 13)
			{
				work = hi >> 1;
			}
			byte digit = (byte)(work & mask);
			chars[charIdx] = _charMap[digit];
			charIdx--;
			work = work >> 5;
		}

		var ret = new string(chars, 26 - numDigits, numDigits);
		return ret;
	}
}
using System;
using System.Collections.Generic;
using NUnit.Framework;

[TestFixture]
public class UniqueIdGeneratorTest
{
	[Test]
	public void GetInstanceTest()
	{
		var instance = UniqueIdGenerator.GetInstance();
		Assert.AreSame(instance, UniqueIdGenerator.GetInstance());
	}

	[Test]
	public void GetNextTest()
	{
		var b1 = new byte[16];
		for (int i = 0; i < 16; i++ ) b1[i] = 0;
		UniqueIdGenerator.GetInstance().GetNext(b1);
		Assert.That(Array.Exists(b1, b => b != 0)); // This could be false every billion years or so
	}

	[Test]
	public void GetBase32UniqueIdTest()
	{
		var b1 = new byte[16];	
		for (int i = 0; i < 16; i++) b1[i] = 0;
		string id = UniqueIdGenerator.GetInstance().GetBase32UniqueId(b1, 26);
		Assert.AreEqual(26, id.Length);
		Assert.AreEqual(new string('2', 26), id);

		b1 = new byte[] { 0xFF, 0xFF, 0xFF, 0xFF };
		id = UniqueIdGenerator.GetInstance().GetBase32UniqueId(b1, 26);
		System.Diagnostics.Trace.WriteLine(id);
		Assert.AreEqual(26, id.Length);
		Assert.AreEqual("ZZZZZZ", id.Substring(20, 6));

		id = UniqueIdGenerator.GetInstance().GetBase32UniqueId(b1, 6);
		System.Diagnostics.Trace.WriteLine(id);
		Assert.AreEqual(6, id.Length);
		Assert.AreEqual("ZZZZZZ", id);

		id = UniqueIdGenerator.GetInstance().GetBase32UniqueId(18);
		System.Diagnostics.Trace.WriteLine(id);
		Assert.AreEqual(18, id.Length);

		var id2 = UniqueIdGenerator.GetInstance().GetBase32UniqueId(18);
		System.Diagnostics.Trace.WriteLine(id2);
		Assert.AreEqual(18, id2.Length);

		Assert.AreNotEqual(id, id2);
	}

	[Test, Ignore]
	public void GetBase32UniqueIdDupeTest()
	{
		var alreadySeen = new Dictionary<string, string>(1000000);
		System.Diagnostics.Trace.WriteLine("Allocated");
		for (int i = 0; i < 100000000; i++)
		{
			var id = UniqueIdGenerator.GetInstance().GetBase32UniqueId(12);
			Assert.That(!alreadySeen.ContainsKey(id));
			alreadySeen.Add(id, id);
		}
	}
}
About these ads

9 Responses to “A Short, Friendly GUID/UUID in .NET”

  1. Dr_Barnowl Says:

    “GUID is just MS’s brand name for UUID”

    Alas, this isn’t entirely true. The MS implementation also differs in an important aspect ; several of the fields are treated with the endianness of the native platform (little endian). This means it differs from the RFC 4122 standard which specifies big-endian.

    MS claims that their implementation “is a Microsoft implementation of the UUID standard.”, which is a shame because it isn’t. The same stream of bytes will be represented as a different string, and the same string will result in different bytes.

  2. dementic Says:

    i got to this well written article while trying to figure out almost the same problem.
    but while reading your article, especially the ‘DateTime.Ticks’ part.

    i wonder, why didnt you follow that road ?

    well, not the actual ticks that is, but, using a datetime ?

    i.e
    2011/12/07/01/57/25/10/60
    Year/Month/Day/Hour/Minutes/Seconds/ms/ns
    this gives you a globally unique identifier, which will never repeat itself and has 18 digits.

    or, am i missing something here?

    • jopincar Says:

      You are missing something. The actual resolution of ticks is not good enough to prevent duplicates in a multi-threaded or multi-machine situation. You will get duplicates.

      • dementic Says:

        well, as i said, the Ticks only lead me to the idea building the Guid from the datetime value as i showed.

        but, something interests me….
        “A single tick represents one hundred nanoseconds or one ten-millionth of a second. There are 10,000 ticks in a millisecond.”

        you want to tell me you expect more then that? are you powering the internet? :P

      • jopincar Says:

        Instead of talking from theory, why don’t you see if you really get that many ticks. A mildly multithreaded program will get duplicates very quickly using the approach you seem fond of.

  3. Billy Stack Says:

    Great post!

    Is this completely your own algorithm or are you basing this on some mathematical algorithm described elsewhere?

    Is there more of an explanation of this algorithm used to generate the identifier above documented somewhere. We have a requirement to generate a unique identifier 15 in length and we feel your code above would be a perfect fit once modified slightly.

    • jopincar Says:

      Not sure what you are asking. It’s completely my own algorithm in the sense that the code was written by me completely from scratch. All the heavy lifting is being done by the RngCryptoServiceProvider. So the mathematical foundation behind this approach is based on the RngCryptoServiceProvider’s implementation of GetBytes (which is supposedly quite good). My innovation is encoding those bytes in a human-friendly way. I’m using a simple map from 4 bits to a set of 32 characters, so I there’s nothing mathematically challenging there I would hope.

  4. .Net Short Unique Identifier | PHP Developer Resource Says:

    […] .Net (cannot use GUID as it is to long for this case). Do people think that the algorithm used here http://jopinblog.wordpress.com/2009/02/04/a-shorter-friendlier-guiduuid-in-net/ is a good candidate or have you any other […]

  5. Alex Leblois Says:

    This is awesome!! really helpful for me. Thanks for sharing with us. Following links also helped me to complete my task.

    http://msdn.microsoft.com/en-IN/library/system.guid(v=vs.71).aspx

    http://www.mindstick.com/Articles/93446478-8ec4-4f1d-b87f-8248e0f7d6ad/?GUID%20in%20NET


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: