Easily convert file sizes to human-readable form

If you write an application that has anything to do with file management, you will probably need to display the size of the files. But if a file has a size of 123456789 bytes, it doesn’t mean that you should just display this value to the user, because it’s hard to read, and the user usually doesn’t need 1-byte precision. Instead, you will write something like 118 MB.

This should be a no-brainer, but there are actually a number of different ways to display byte sizes… For instance, there are several co-existing conventions for units and prefixes:

  • The SI (International System of Units) convention uses decimal multiples, based on powers of 10: 1 kilobyte is 1000 bytes, 1 megabyte is 1000 kilobytes, etc. The prefixes are the one from the metric system (k, M, G, etc.).
  • The IEC convention uses binary multiples, based on powers of 2: 1 kibibyte is 1024 bytes, 1 mebibyte is 1024 kibibytes, etc. The prefixes are Ki, Mi, Gi etc., to avoid confusion with the metric system.
  • But neither of these conventions is commonly used: the customary convention is to use binary mutiples (1024), but decimal prefixes (K, M, G, etc.).

Depending on the context, you might want to use either of these conventions. I’ve never seen the SI convention used anywhere; some apps (I’ve seen it in VirtualBox for instance) use the IEC convention; most apps and operating systems use the customary convention. You can read this Wikipedia article if you want more details: Binary prefix.

OK, so let’s chose the customary convention for now. Now you have to decide which scale to use: do you want to write 0.11 GB, 118 MB, 120564 KB, or 123456789 B? Typically, the scale is chosen so that the displayed value is between 1 and 1024.

A few more things you might have to consider:

  • Do you want to display integer values, or include a few decimal places?
  • Is there a minimum unit to use (for instance, Windows never uses bytes: a 1 byte file is displayed as 1 KB)?
  • How should the value be rounded?
  • How do you want to format the value?
  • for values less than 1KB, do you want to use the word “bytes”, or just the symbol “B”?

OK, enough of this! What’s your point?

So as you can see, displaying a byte size in human-readable form isn’t as straightforward as you might have expected… I’ve had to write code to do it in a number of apps, and I eventually got tired of doing it again over and over, so I wrote a library that attempts to cover all use cases. I called it HumanBytes, for reasons that should be obvious… It is also available as a NuGet package.

Its usage is quite simple. It’s based on a class named ByteSizeFormatter, which has a few properties to control how the value is rendered:

var formatter = new ByteSizeFormatter
{
    Convention = ByteSizeConvention.Binary,
    DecimalPlaces = 1,
    NumberFormat = "#,##0.###",
    MinUnit = ByteSizeUnit.Kilobyte,
    MaxUnit = ByteSizeUnit.Gigabyte,
    RoundingRule = ByteSizeRounding.Closest,
    UseFullWordForBytes = true,
};

var f = new FileInfo("TheFile.jpg");
Console.WriteLine("The size of '{0}' is {1}", f, formatter.Format(f.Length));

In most cases, though, you will just want to use the default settings. You can do that easily with the Bytes extension method:

var f = new FileInfo("TheFile.jpg");
Console.WriteLine("The size of '{0}' is {1}", f, f.Length.Bytes());

This method returns an instance of the ByteSize structure, whose ToString method formats the value using the default formatter. You can change the default formatter settings globally through the ByteSizeFormatter.Default static property.

A note on localization

Not all languages use the same symbol for “byte”, and obviously the word “byte” itself is different across languages. Currently the library only supports English and French; if you want your language to be supported as well, please fork, add your translation, and make a pull request. There are only 3 terms to translate, so it shouldn’t take long Winking smile.

StringTemplate: another approach to string interpolation

With the upcoming version 6 of C#, there’s a lot of talk on CodePlex and elsewhere about string interpolation. Not very surprising, since it’s one of the major features of that release… In case you were living under a rock during the last few months and you haven’t heard about it, string interpolation is a way to insert C# expressions inside a string, so that they’re evaluated at runtime and replaced with their values. Basically, you write something like this:

string text = $"{p.Name} was born on {p.DateOfBirth:D}";

And the compiler transforms it to this:

string text = String.Format("{0} was born on {1:D}", p.Name, p.DateOfBirth);

Note: the syntax shown above is the one from the latest design notes about this feature. It might still change before the final release, and the current preview build of VS2015 uses a different syntax: “\{p.Name} was born on \{p.DateOfBirth:D}”.

I really love this feature. It’s going to be extremely convenient for things like logging, generating URLs or queries, etc. I will probably use it a lot, especially since Microsoft has listened to community feedback and included a way to customize how the embedded expressions are evaluated (see the part about IFormattable in the design notes).

But there’s one thing that bothers me, though: since interpolated strings are interpreted by the compiler, they have to be hard-coded ; you can’t extract them to resources for localization. This means that this feature cannot be used for localization, and we’re stuck with old-fashioned numeric placeholders in localized strings.

Or are we really?

For a few years now, I’ve been using a custom string interpolation engine that can be used like String.Format, but with named placeholders instead of numeric ones. It takes a format string, and an object with properties that match the placeholder names:

string text = StringTemplate.Format("{Name} was born on {DateOfBirth:D}", new { p.Name, p.DateOfBirth });

Obviously, if you already have an object with the properties you want to include in the string, you can just pass that object directly:

string text = StringTemplate.Format("{Name} was born on {DateOfBirth:D}", p);

The result is exactly what you would expect: the placeholders are replaced with the values of the corresponding properties.

In which ways is it better than String.Format?

  • It’s much more readable: a named placeholder tells you immediately which value will go there
  • It’s less error-prone: you don’t need to pay attention to the order of the values to be formatted
  • When you extract the format strings to resources for localization, the translator sees a name in the placeholder, not a number. This gives more context to the string, and makes it easier to understand what the final string will look like.

Note that you can use the same format specifiers as in String.Format. The StringTemplate class parses your format string into one compatible with String.Format, extracts the property values into an array, and calls String.Format.

Of course, parsing the string and extracting the property values with reflection every time would be very inefficient, so there are a some optimizations:

  • each distinct format string is only parsed once, and the result of the parsing is added to a cache, to be reused every time.
  • for each property used in a format string, a getter delegate is generated and cached, to avoid using reflection every time.

This means that the first time you use a given format string, there will be the overhead of parsing and generating the delegates, but subsequent usages of the same format string will be much faster.

The StringTemplate class is part of a library called NString, which also contains a few extension methods to make string manipulations easier. The library is a PCL that can be used with all .NET flavors except Silverlight 5. A NuGet package is available here.

Passing parameters by reference to an asynchronous method

Asynchrony in C# 5 is awesome, and I’ve been using it a lot since it was introduced. But there are few annoying limitations; for instance, you cannot pass parameters by reference (ref or out) to an asynchronous method. There are good reasons for that; the most obvious is that if you pass a local variable by reference, it is stored on the stack, but the current stack won’t remain available during the whole execution of the async method (only until the first await), so the location of the variable won’t exist anymore.

However, it’s pretty easy to work around that limitation : you only need to create a Ref<T> class to hold the value, and pass an instance of this class by value to the async method:

async void btnFilesStats_Click(object sender, EventArgs e)
{
    var count = new Ref<int>();
    var size = new Ref<ulong>();
    await GetFileStats(tbPath.Text, count, size);
    txtFileStats.Text = string.Format("{0} files ({1} bytes)", count, size);
}

async Task GetFileStats(string path, Ref<int> totalCount, Ref<ulong> totalSize)
{
    var folder = await StorageFolder.GetFolderFromPathAsync(path);
    foreach (var f in await folder.GetFilesAsync())
    {
        totalCount.Value += 1;
        var props = await f.GetBasicPropertiesAsync();
        totalSize.Value += props.Size;
    }
    foreach (var f in await folder.GetFoldersAsync())
    {
        await GetFilesCountAndSize(f, totalCount, totalSize);
    }
}

The Ref<T> class looks like this:

public class Ref<T>
{
    public Ref() { }
    public Ref(T value) { Value = value; }
    public T Value { get; set; }
    public override string ToString()
    {
        T value = Value;
        return value == null ? "" : value.ToString();
    }
    public static implicit operator T(Ref<T> r) { return r.Value; }
    public static implicit operator Ref<T>(T value) { return new Ref<T>(value); }
}

As you can see, it’s pretty straightforward. This approach can also be used in iterator blocks (i.e. yield return), that also don’t allow ref and out parameters. It also has an advantage over standard ref and out parameters: you can make the parameter optional, if for instance you’re not interested in the result (obviously, the callee must handle that case appropriately).

Easy unit testing of null argument validation

When unit testing a method, one of the things to test is argument validation : for instance, ensure that the method throws a ArgumentNullException when a null argument is passed for a parameter that isn’t allowed to be null. Writing this kind of test is very easy, but it’s also a tedious and repetitive task, especially if the method has many parameters… So I wrote a method that automates part of this task: it tries to pass null for each of the specified arguments, and asserts that the method throws an ArgumentNullException. Here’s an example that tests a FullOuterJoin extension method:

[Test]
public void FullOuterJoin_Throws_If_Argument_Null()
{
    var left = Enumerable.Empty<int>();
    var right = Enumerable.Empty<int>();
    TestHelper.AssertThrowsWhenArgumentNull(
        () => left.FullOuterJoin(right, x => x, y => y, (k, x, y) => 0, 0, 0, null),
        "left", "right", "leftKeySelector", "rightKeySelector", "resultSelector");
}

The first parameter is a lambda expression that represents how to call the method. In this lambda, you should only pass valid arguments. The following parameters are the names of the parameters that are not allowed to be null. For each of the specified names, AssertThrowsWhenArgumentNull will replace the corresponding argument with null in the provided lambda, compile and invoke the lambda, and assert that the method throws a ArgumentNullException.

Using this method, instead of writing a test for each of the arguments that are not allowed to be null, you only need one test.

Here’s the code for the TestHelper.AssertThrowsWhenArgumentNull method (you can also find it on Gist):

using System;
using System.Linq;
using System.Linq.Expressions;
using NUnit.Framework;

namespace MyLibrary.Tests
{
    static class TestHelper
    {
        public static void AssertThrowsWhenArgumentNull(Expression<TestDelegate> expr, params string[] paramNames)
        {
            var realCall = expr.Body as MethodCallExpression;
            if (realCall == null)
                throw new ArgumentException("Expression body is not a method call", "expr");

            var realArgs = realCall.Arguments;
            var paramIndexes = realCall.Method.GetParameters()
                .Select((p, i) => new { p, i })
                .ToDictionary(x => x.p.Name, x => x.i);
            var paramTypes = realCall.Method.GetParameters()
                .ToDictionary(p => p.Name, p => p.ParameterType);
            
            

            foreach (var paramName in paramNames)
            {
                var args = realArgs.ToArray();
                args[paramIndexes[paramName]] = Expression.Constant(null, paramTypes[paramName]);
                var call = Expression.Call(realCall.Method, args);
                var lambda = Expression.Lambda<TestDelegate>(call);
                var action = lambda.Compile();
                var ex = Assert.Throws<ArgumentNullException>(action, "Expected ArgumentNullException for parameter '{0}', but none was thrown.", paramName);
                Assert.AreEqual(paramName, ex.ParamName);
            }
        }

    }
}

Note that it is written for NUnit, but can easily be adapted to other unit test frameworks.

I used this method in my Linq.Extras library, which provides many additional extension methods for working with sequences and collections (including the FullOuterJoin method mentioned above).