Optimize ToArray and ToList by providing the number of elements

Very poorPoorAverageGoodExcellent (No Ratings Yet) 
Loading...Loading...

The ToArray and ToList extension methods are convenient ways to eagerly materialize an enumerable sequence (e.g. a Linq query) into an array or a list. However, there’s something that bothers me: both of these methods are very inefficient if they don’t know the number of elements in the sequence (which is almost always the case when you use them on a Linq query). Let’s focus on ToArray for now (ToList has a few differences, but the principle is mostly the same).

Basically, ToArray takes a sequence, and returns an array that contains all the elements from the sequence. If the sequence implements ICollection<T>, it uses the Count property to allocate an array of the right size, and copy the elements into it; here’s an example:

List<User> users = GetUsers();
User[] array = users.ToArray();

In this scenario, ToArray is fairly efficient. Now, let’s change that code to extract just the names from the users:

List<User> users = GetUsers();
string[] array = users.Select(u => u.Name).ToArray();

Now, the argument of ToArray is an IEnumerable<User> returned by Select. It doesn’t implement ICollection<User>, so ToArray doesn’t know the number of elements, so it cannot allocate an array of the appropriate size. So here’s what it does:

  1. start by allocating a small array (4 elements in the current implementation)
  2. copy elements from the source into the array until the array is full
  3. if there are no more elements in the source, go to 7
  4. otherwise, allocate a new array, twice as large as the previous one
  5. copy the items from the old array to the new array
  6. repeat from step 2
  7. if the array is longer than the number of elements, trim it: allocate a new array with exactly the right size, and copy the elements from the previous array
  8. return the array

If there are few elements, this is quite painless; but for a very long sequence, it’s very inefficient, because of the many allocations and copies.

What is annoying is that, in many cases, we know the number of elements in the source! In the example above, we only use Select, which doesn’t change the number of elements, so we know that it’s the same as in the original list; but ToArray doesn’t know, because the information was lost along the way. If only we had a way to help it by providing this information ourselves….

Well, it’s actually very easy to do: all we have to do is create a new extension method that accepts the count as a parameter. Here’s what it might look like:

public static TSource[] ToArray<TSource>(this IEnumerable<TSource> source, int count)
{
    if (source == null) throw new ArgumentNullException("source");
    if (count < 0) throw new ArgumentOutOfRangeException("count");
    var array = new TSource[count];
    int i = 0;
    foreach (var item in source)
    {
        array[i++] = item;
    }
    return array;
}

Now we can optimize our previous example like this:

List<User> users = GetUsers();
string[] array = users.Select(u => u.Name).ToArray(users.Count);

Note that if you specify a count that is less than the actual number of elements in the sequence, you will get an IndexOutOfRangeException; it’s your responsibility to provide the correct count to the method.

So, what do we actually gain by doing that? From my benchmarks, this improved ToArray is about twice as fast as the built-in one, for a long sequence (tested with 1,000,000 elements). This is pretty good!

Note that we can improve ToList in the same way, by using the List<T> constructor that lets us specify the initial capacity:

public static List<TSource> ToList<TSource>(this IEnumerable<TSource> source, int count)
{
    if (source == null) throw new ArgumentNullException("source");
    if (count < 0) throw new ArgumentOutOfRangeException("count");
    var list = new List<TSource>(count);
    foreach (var item in source)
    {
        list.Add(item);
    }
    return list;
}

In this case, the performance gain is not as as big as for ToArray (about 25% instead of 50%), probably because the list doesn’t need to be trimmed, but it’s not negligible.

Obviously, a similar optimization could be made to ToDictionary as well, since the Dictionary<TKey, TValue> class also has a constructor that lets us specify the initial capacity.

The improved ToArray and ToList methods are available in my Linq.Extras library, which also provides many useful extension methods for working on sequences and collections.

Easy unit testing of null argument validation

Very poorPoorAverageGoodExcellent (No Ratings Yet) 
Loading...Loading...

When unit testing a method, one of the things to test is argument validation : for instance, ensure that the method throws a ArgumentNullException when a null argument is passed for a parameter that isn’t allowed to be null. Writing this kind of test is very easy, but it’s also a tedious and repetitive task, especially if the method has many parameters… So I wrote a method that automates part of this task: it tries to pass null for each of the specified arguments, and asserts that the method throws an ArgumentNullException. Here’s an example that tests a FullOuterJoin extension method:

[Test]
public void FullOuterJoin_Throws_If_Argument_Null()
{
    var left = Enumerable.Empty<int>();
    var right = Enumerable.Empty<int>();
    TestHelper.AssertThrowsWhenArgumentNull(
        () => left.FullOuterJoin(right, x => x, y => y, (k, x, y) => 0, 0, 0, null),
        "left", "right", "leftKeySelector", "rightKeySelector", "resultSelector");
}

The first parameter is a lambda expression that represents how to call the method. In this lambda, you should only pass valid arguments. The following parameters are the names of the parameters that are not allowed to be null. For each of the specified names, AssertThrowsWhenArgumentNull will replace the corresponding argument with null in the provided lambda, compile and invoke the lambda, and assert that the method throws a ArgumentNullException.

Using this method, instead of writing a test for each of the arguments that are not allowed to be null, you only need one test.

Here’s the code for the TestHelper.AssertThrowsWhenArgumentNull method (you can also find it on Gist):

using System;
using System.Linq;
using System.Linq.Expressions;
using NUnit.Framework;

namespace MyLibrary.Tests
{
    static class TestHelper
    {
        public static void AssertThrowsWhenArgumentNull(Expression<TestDelegate> expr, params string[] paramNames)
        {
            var realCall = expr.Body as MethodCallExpression;
            if (realCall == null)
                throw new ArgumentException("Expression body is not a method call", "expr");

            var realArgs = realCall.Arguments;
            var paramIndexes = realCall.Method.GetParameters()
                .Select((p, i) => new { p, i })
                .ToDictionary(x => x.p.Name, x => x.i);
            var paramTypes = realCall.Method.GetParameters()
                .ToDictionary(p => p.Name, p => p.ParameterType);
            
            

            foreach (var paramName in paramNames)
            {
                var args = realArgs.ToArray();
                args[paramIndexes[paramName]] = Expression.Constant(null, paramTypes[paramName]);
                var call = Expression.Call(realCall.Method, args);
                var lambda = Expression.Lambda<TestDelegate>(call);
                var action = lambda.Compile();
                var ex = Assert.Throws<ArgumentNullException>(action, "Expected ArgumentNullException for parameter '{0}', but none was thrown.", paramName);
                Assert.AreEqual(paramName, ex.ParamName);
            }
        }

    }
}

Note that it is written for NUnit, but can easily be adapted to other unit test frameworks.

I used this method in my Linq.Extras library, which provides many additional extension methods for working with sequences and collections (including the FullOuterJoin method mentioned above).

A review of NDepend

Very poorPoorAverageGoodExcellent (No Ratings Yet) 
Loading...Loading...

I’ve been hearing quite a lot about NDepend over the last few years, but I had never tried it until recently, when its creator Patrick Smacchia was kind enough to offer me a license.

NDepend is a static analysis tool for .NET that checks your code base against a large set of rules that fall in various categories, such as code quality, object-oriented design, architecture, naming conventions, etc. All of these rules are completely customizable. It can be used as a standalone tool, or as a Visual Studio extension; there is also a command-line tool to integrate in the build process.

I should note that it’s the first time I write a software review, so this exercise is completely new to me. Although I was offered a free license, I’m not affiliated with NDepend in any way, and I’ll do my best to be as fair and unbiased as possible.

Setup experience

NDepend doesn’t have an installer: it’s just a zip file that you extract into a folder. From there you can run the standalone tool (VisualNDepend.exe), and install the VS plugin (NDepend.Install.VisualStudioAddin.exe).

There is no UI to enter the license key either; you just drop the NDependProLicense.xml file into the NDepend folder.

Admittedly, this tool is intended for professional developers who shouldn’t have any problem with those steps, so it’s not that big a deal, but a more streamlined setup experience would have been nicer.

UI

Perhaps it’s just me, but I found the UI a little confusing; there are just too many windows and tooltips that pop open all the time (I used the tool mostly as a VS extension). NDepend needs a lot of screen space to work comfortably, and at home I only have one screen with a lower-than-average resolution, which made it a bit awkward to use for me.

To be fair, the Dashboard gives a pretty good overview of the project. In the VS extension, there is also an icon in the status bar that lets you see at a glance the code queries and rule violations (click the images to enlarge).

imageimage

You can also view a full report that is rendered as webpage and contains a lot of relevant information about your project.

image

This report can be customized to your specific needs in the NDepend project properties.

Code queries and rules

This is, in my opinion, the best thing about NDepend : the code inspection engine is extremely powerful and customizable. NDepend comes with a lot of default rules :

image

(in this screenshot I have already fixed all warnings, so all rules show a count of 0)

These rules are defined using a domain specific language called CQLinq, which allows you to write complex queries about your code using the familiar Linq syntax. For instance, here’s a simple one that checks for namespaces with few types:

image

The default rules often come with comments that give more information about the rule and explain why it’s relevant. As you can see, the code is mostly standard Linq, and the editor has syntax highlighting and Intellisense. NDepend’s code model includes about everything you could expect (classes, methods, etc), but also a lot of extra information like cyclomatic complexity, number of IL instructions, dependencies between classes or namespaces, etc. The result presentation is quite smart; depending on the output of the query, it shows namespaces, types or members organized by assembly. Result columns that contain lists can be clicked to view the elements of the list, and a click on a code item jumps to the location in code.

Each rule can be enabled or disabled, or set as critical or not. You can modify the default rules, or create your own. Note that rules don’t have to be warnings: you can create a code query that just reports information about your code:

image

So as you can see, CQLinq is a powerful way to check just about any design rule you care to enforce about your code.

Of course, the feature is not perfect… Here are a few downsides:

  • There are a lot of default rules. Arguably, that could also be counted as a quality, but the first time you run NDepend on your project, the sheer number of reported rule violations is quite overwhelming, and usually you don’t really care about most of them. So you have to spend quite a long time reviewing the results to decide which rules you really care about, which ones need to be adjusted to your need, etc. When I did it on a rather small project, it took me 2 hours to fix all warnings; not because I had a lot of things to fix in my code, but because I had a lot of things to fix in the rules! I’m not saying my code was perfect, obviously, and NDepend did help me find and fix a few issues, but many of the rules weren’t really relevant in my specific project. So, if you use NDepend, expect to spend a lot of time adjusting the rules to your needs; once you do that, the tool will really shine, and the analysis results will be a lot more useful to you.
  • There is no easy way to “suppress” a specific occurrence of a rule violation. For instance, in ReSharper you can suppress a warning with a special comment (and the quick fix menu lets you add that comment automatically); in FxCop, you can apply the [SuppressMessage] attribute to a type or member. There is nothing like that in NDepend; if you want to exclude a code item from a rule, you have to modify the code of the CQLinq query itself. Given the flexibility of the query language, it’s understandable that there is no generic way to suppress warnings, but still, it’s annoying; it also means that you can’t just reuse the exact same queries in other projects. There is however a nice feature that partly counterbalances the lack of a generic suppression mechanism: the JustMyCode context. It defines a “view” of the code that only includes your own code, not the code generated by designers or by the compiler. So you can query against the JustMyCode context to ignore rule violations in code that you didn’t write, and you can customize what is considered “not your code” using the same CQLinq syntax.
  • Queries that take IL statistics (number of IL instructions, IL cyclomatic complexity, etc) into account are often biased by complex code constructs such as iterator blocks, anonymous methods or async methods, which results in false positives. Some methods are complex at the IL level, and reported as such, even though the original C# code is rather straightforward.

Dependency management

I guess that’s the feature that gave the tool its name, even though now it does much more than that… NDepend can give you very detailed information about dependencies between assemblies and namespaces (your own, as well as framework or third party assemblies). The dependencies can be viewed as a directed graph:

image

Or as a matrix:

image

Both views are interactive; the matrix view can even be “drilled down” to view dependencies at a lower level.

I didn’t really take advantage of the dependency-related features, because I only tested NDepend on simple projects, but they can certainly be very useful in large solutions to eliminate unwanted coupling between different parts of the code.

Code evolution analysis

NDepend also lets you to compare analysis results between builds. Basically, you set a baseline for the comparison, and it gives you trends to measure the progress of various code metrics over time. I didn’t use this feature myself so I can’t really talk in detail about it, but its usefulness is quite obvious for large projects as it lets you see which aspects are improving or worsening, allowing you to refocus the team’s efforts as necessary.

Conclusion

I have to say that I’m very impressed by NDepend’s analysis engine; it’s incredibly powerful, and the fact that the rules are completely customizable opens a world of possibilities. I love the fact that I can just write a simple Linq query to find all classes or methods that match certain criteria. Regarding the other features, like dependency management, I’m sure they can be very useful, but most of the projects I work on are rather small, so dependencies are usually not a major issue for me.

The way I see it, NDepend is a great tool to keep close tabs on the architecture of large projects, but is probably overkill for small projects. It’s also very useful if you need to enforce strict design guidelines across a large code base; obviously, it won’t completely replace code review, but it can certainly be a big help in the review process.

In any case, NDepend has a lot of obvious qualities, but it’s probably not the right tool for everyone. The only way to decide if you need it or not is to try it for yourself, and see how it works out for you!

Uploading data with HttpClient using a "push" model

Very poorPoorAverageGoodExcellent (2 votes) 
Loading...Loading...

If you have used the HttpWebRequest class to upload data, you know that it uses a “push” model. What I mean is that you call the GetRequestStream method, which opens the connection if necessary, sends the headers, and returns a stream on which you can write directly.

.NET 4.5 introduced the HttpClient class as a new way to communicate over HTTP. It actually relies on HttpWebRequest under the hood, but offers a more convenient and fully asynchronous API. HttpClient uses a different approach when it comes to uploading data: instead of writing manually to the request stream, you set the Content property of the HttpRequestMessage to an instance of a class derived from HttpContent. You can also pass the content directly to the PostAsync or PutAsync methods.

The .NET Framework provides a few built-in implementations of HttpContent, here are some of the most commonly used:

  • ByteArrayContent: represents in-memory raw binary content
  • StringContent: represents text in a specific encoding (this is a specialization of ByteArrayContent)
  • StreamContent: represents raw binary content in the form of a Stream

For instance, here’s how you would upload the content of a file:

async Task UploadFileAsync(Uri uri, string filename)
{
    using (var stream = File.OpenRead(filename))
    {
        var client = new HttpClient();
        var response = await client.PostAsync(uri, new StreamContent(stream));
        response.EnsureSuccessStatusCode();
    }
}

As you may have noticed, nowhere in this code do we write to the request stream explicitly: the content is pulled from the source stream.

This “pull” model is fine most of the time, but it has a drawback: it requires that the data to upload already exists in a form that can be sent directly to the server. This is not always practical, because sometimes you want to generate the request content “on the fly”. For instance, if you want to send an object serialized as JSON, with the “pull” approach you first need to serialize it in memory as a string or MemoryStream, then assign that to the request’s content:

async Task UploadJsonObjectAsync<T>(Uri uri, T data)
{
    var client = new HttpClient();
    string json = JsonConvert.SerializeObject(data);
    var response = await client.PostAsync(uri, new StringContent(json));
    response.EnsureSuccessStatusCode();
}

This is fine for small objects, but obviously not optimal for large object graphs…

So, how could we reverse this pull model to a push model? Well, it’s actually pretty simple: all you have to do is to create a class that inherits HttpContent, and override the SerializeToStreamAsync method to write to the request stream directly. Actually, I intended to blog about my own implementation, but then I did some research, and it turns out that Microsoft has already done the work: the Web API 2 Client library provides a PushStreamContent class that does exactly that. Basically, you just pass a delegate that defines what to do with the request stream. Here’s how it works:

async Task UploadJsonObjectAsync<T>(Uri uri, T data)
{
    var client = new HttpClient();
    var content = new PushStreamContent((stream, httpContent, transportContext) =>
    {
        var serializer = new JsonSerializer();
        using (var writer = new StreamWriter(stream))
        {
            serializer.Serialize(writer, data);
        }
    });
    var response = await client.PostAsync(uri, content);
    response.EnsureSuccessStatusCode();
}

Note that the PushStreamContent class also provides a constructor overload that accepts an asynchronous delegate, if you want to write to the stream asynchronously.

Actually, for this specific use case, the Web API 2 Client library provides a less convoluted approach: the ObjectContent class. You just pass it the object to send and a MediaTypeFormatter, and it takes care of serializing the object to the request stream:

async Task UploadJsonObjectAsync<T>(Uri uri, T data)
{
    var client = new HttpClient();
    var content = new ObjectContent<T>(data, new JsonMediaTypeFormatter());
    var response = await client.PostAsync(uri, content);
    response.EnsureSuccessStatusCode();
}

By default, the JsonMediaTypeFormatter class uses Json.NET as its JSON serializer, but there is an option to use DataContractJsonSerializer instead.

Note that if you need to read an object from the response content, this is even easier: just use the ReadAsAsync<T> extension method (also in the Web API 2 Client library). So as you can see, HttpClient makes it very easy to consume REST APIs.

Little known new features in Visual Studio 2012

Very poorPoorAverageGoodExcellent (9 votes) 
Loading...Loading...

Visual Studio 2012 RC is out since last week, and even though I didn’t have much time to play with it yet, I think I like it so far. Lots of things have already been said about the design, and about the most important new features, but there are also many smaller and less remarkable improvements that make life easier for us. Since I have seen little or nothing written about those, I thought I would make a list of what I noticed so far.

Better Edit and Continue: improved debug experience with anonymous methods

Edit and continue is a very useful feature that has been present in Visual Studio for a long time. It’s great when you need to fix some code in the middle of a step-by-step debugging session without restarting the application completely. This feature has always had a limitation: you couldn’t use it in a method that contained an anonymous method:

Modifying a ’method’ which contains a lambda expression will prevent the debug session from continuing while Edit and Continue is enabled.

Before .NET 3.5, it wasn’t really a problem since anonymous methods were not very common, but since Linq was introduced and lambda expressions became more widely used, this limitation began to grow more and more annoying.

Well, good news: Visual Studio 2012 fixed that! You can now use Edit and Continue in a method that contains lambda expressions or Linq queries. Note that the limitation hasn’t completely disappeared: you still can’t modify a statement that contains an anonymous method. Note the slightly different error message:

Modifying a statement which contains a lambda expression will prevent the debug session from continuing while Edit and Continue is enabled.

But you can modify everything else in the method body, which is a great improvement. Note that editing a method that contains anonymous types is also supported, as long as you don’t modify the anonymous type itself.

Optimized event handler autocompletion

Visual Studio has a nice autocompletion feature that allow you to generate an event handler automatically when you type += and Tab after an event name :

image

But the new EventHandler(…) part is redundant, because since C# 2, method groups are implicitly convertible to compatible delegate types. Visual Studio 2012 fixes this and generates the shorter form:

image

OK, this is a very small change, but the previous behavior was really annoying me… I had actually suggested this improvement on Connect, so I’m glad to see that it was implemented.

Improved Find/Replace

The Find dialog in Visual Studio had not seen any improvement for as long as I can remember (probably since VS2003 or VS2005), so it was time for a change… In VS2012, it has been replaced with the Quick Find feature from the Productivity Power Tools extension. It appears as a small panel in the top right corner of the text editor, much less obtrusive than the old dialog:

image

It provides the following features:

  • incremental search (matches are highlighted as you type in the search box)
  • quick access to options such as match case, match whole word, or regex through the search field dropdown
  • support for .NET regular expressions (the old Find dialog was using a different kind of regex, not fully compatible with the .NET regex syntax)

If for some reason you need the full Find dialog, it’s still there: there’s a menu item to access it in the search field dropdown.

Quick launch

Where’s that command again? In the Debug menu or the Project menu? I can’t remember, and I don’t know the keyboard shortcut…

Sounds familiar? For me it does… Visual Studio has so many features that sometimes it can be hard to find what you’re looking for. That’s why the new Quick Access feature is a very welcome addition; it appears as a search box in the top right corner of the IDE, and allows you to type what you want to do:

image

Again, this feature comes from the Productivity Power Tools extension, and has been included in Visual Studio itself. It has also been given a more prominent place in the IDE (in VS2010 it was only accessible through the Ctrl+Q shortcut).

Smarter Solution Explorer

The Solution Explorer used to show only the files in your project, but nothing about what was in the files; If you want to see the classes and members, you had to switch to the Class View. Visual Studio 2012 changes that: now you can see what is declared in a file, just by expanding the node in the Solution Explorer:

image

This feature had been in Eclipse for a long time, so it was time for Visual Studio to catch up Clignement d'œil.

A few other interesting features of the new Solution Explorer:

  • Scope to This: allows to re-root the tree to a specific node (e.g. if you want to see only the content of a folder). The re-rooted tree can be viewed in a separate view.
  • Pending Changes Filter: show only pending changes (if source control is active for the current project)
  • Open Files Filter: show only open files
  • Sync With Active Document: quickly locate the current document in the Solution Explorer (very useful for large solutions)

And yes, that’s yet another feature that comes from Productivity Power Tools (it was known as the Solution Navigator), with a few improvements.

ALL CAPS MENUS

Menus in VS2012 are in ALL CAPS, and it’s not exactly the most popular change… there has been tons of negative feedback about this! Frankly, I’m a bit surprised that people pay so much importance to such an insignificant detail, when there are so many more important things to discuss. Personally, I found it a bit unsettling when I saw the first screenshots, but when I got my hands on the RC I found that it didn’t bother me at all. I guess most people will just get used to it in the end…

Anyway, the reason I mention this is to say that if you really don’t like the ALL CAPS menus, you can get the normal casing back with a simple registry tweak. Just open regedit, go to the HKEY_CURRENT_USER\Software\Microsoft\VisualStudio\11.0\General key, create a DWORD value named SuppressUppercaseConversion and set it to 1.

kick it on DotNetKicks.com

css.php