Sorting DataTable on multiple columns with Linq

  • Thread starter Thread starter Gabriel.B
  • Start date Start date
G

Gabriel.B

Guest
I know there are a lot of threads related to sorting a DataTable, but I was not able to find anything that gives a clear solution to my problem. I currently have something that works, but I am curious if there is a better way to do the sorting.

The problem seems simple enough: I have a DataTable that I want to sort on multiple columns that can change (the user wants to sort on selected columns) and the sorting needs to be custom because of certain business rules. So the DataView Sort cannot be used (unless I would create some types that have a strange implementations of ToString). The next best thing is Linq with OrderBy and ThenBy. In the end, the solution I currently have is based on this, although the code is somewhat fragile.

I will simplify the problem and eliminate the custom sorting in order to illustrate the issue with the Linq code. A table with people that have first, last and middle names is sorted by First, Last, then Middle.

static void TestSortTable()
{
DataTable people = new DataTable();
people.Columns.Add("First", typeof(string));
people.Columns.Add("Last", typeof(string));
people.Columns.Add("Middle", typeof(string));

people.Rows.Add("A", "M", "D");
people.Rows.Add("A", "B", "E");
people.Rows.Add("B", "B", "E");
people.Rows.Add("B", "B", "A");
people.Rows.Add("B", "B", "D");
people.Rows.Add("C", "M", "E");
people.Rows.Add("C", "D", "A");
PrintPeople(people, "All people:");

string[] sortedColumns = { "First", "Last", "Middle" };
DataTable sortedPeople;

sortedPeople = SortWithLinqGood(people, sortedColumns);
PrintPeople(sortedPeople, "LINQ good:");

sortedPeople = SortWithLinqBad(people, sortedColumns);
PrintPeople(sortedPeople, "LINQ bad:");

Console.ReadLine();
}

static DataTable SortWithLinqGood(DataTable data, string[] sortedColumns)
{
string sortedColumn = sortedColumns[0];
OrderedEnumerableRowCollection<DataRow> sorted = data.AsEnumerable().OrderBy(x => x.Field<string>(sortedColumn));
for(int i = 1; i < sortedColumns.Count(); i++)
{
string additionalColumn = sortedColumns;
sorted = sorted.ThenBy(x => x.Field<string>(additionalColumn));
}

return sorted.CopyToDataTable();
}

static DataTable SortWithLinqBad(DataTable data, string[] sortedColumns)
{
string sortedColumn = sortedColumns[0];
OrderedEnumerableRowCollection<DataRow> sorted = data.AsEnumerable().OrderBy(x => x.Field<string>(sortedColumn));
for(int i = 1; i < sortedColumns.Count(); i++)
{
sortedColumn = sortedColumns;
sorted = sorted.ThenBy(x => x.Field<string>(sortedColumn));
}

return sorted.CopyToDataTable();
}



The output from executing TestSortTable() is:

All people:
First, Last, Middle
A M D
A B E
B B E
B B A
B B D
C M E
C D A

LINQ good:
First, Last, Middle
A B E
A M D
B B A
B B D
B B E
C D A
C M E

LINQ bad:
First, Last, Middle
B B A
C D A
A M D
B B D
A B E
B B E
C M E

The first Linq function where a new variable additionalColumn is declared in the loop is good. Trying to reuse the sortedColumn variable results in a bad sort.

Trying to change the loop and not use any variable, but sortedColumns will result in an exception during the execution with index out of range. This is actually the clue to what happens: the expression is evaluated only when the result is used, so at that point i is out of range.

The documentation for OrderBy hints at that:

This method is implemented by using deferred execution. The immediate return value is an object that stores all the information that is required to perform the action. The query represented by this method is not executed until the object is enumerated either by calling its GetEnumerator method directly or by using foreach in Visual C# or For Each in Visual Basic.

This does not make it clear that there is an issue with the code.

The method that creates a new variable inside the loop works, but somebody looking at the code could think that it can be "optimized" and make a change that might seem OK although the result will be affected.

Does anyone have a better idea for sorting a DataTable on multiple columns using a method that also allows for custom sorting?

Thanks!

Continue reading...
 
Back
Top