Process Csv Data with LINQ(C#)
1. Introduction
In C# there is a technology called LINQ(Language-Integrated Query). It is a set of language level keywords and functions that create query for data manipulation. It is designed highly like SQL syntax but still there are difference.
With the help of these functionalities, we can use LINQ as a uniform platform to query different datasets like SQL database, XML document, or almost any kind of in memory data collections.
2. Install
Install C# from Microsoft offical page. LINQ is already shipped with C# installation so we don't need to do anything special.
What we need to do is to write few lines at the beginning of C# code file (*.cs) to include the functionalities of LINQ.
Firstly, we create a console app with "dotnet new console".
$ mkdir processCsvWithLinq $ cd processCsvWithLinq $ dotnet new console
Secondly, we can use our editor to open Program.cs file and add few lines at the beginning. After that we are ready to use LINQ.
(By the way, many articles on Microsoft official documents introducing LINQ have not mention you should add these following few lines so you may meet some trobule at that place.)
using System; using System.Collections.Generic; using System.Linq; using System.IO;
3. Read in data
In this article we will use classic iris.csv dataset as example. We assume that it is in the same folder as this project.
(If you want to read in a Csv file with Chinese Character like encoding in GBK, read the last section.)
C# as an high level object oriented language, we can easily rely on it's built-in read function.
using System; using System.Collections.Generic; using System.Linq; using System.IO; namespace processCsvWithLinq { class Program { static void Main(string[] args) { string[] lines = File.ReadAllLines("iris.csv"); foreach(string line in lines) { Console.WriteLine(line); } } } }
Save your code and use "dotnet run" at console to see the program running well and print out data as we expect.

Now we have csv dataset in computer memory. We are ready to use LINQ to write some query.
4. Do some data transforming
Now that we already have variable "lines" and it is an collection of strings. Like SQL but not all like SQL, we can write LINQ query as below.
Firstly we use "from" to select data source. LINQ always starts with a from sentence.
Secondly, we use a few lines of "let" sentences to create intermediate variables. It is not always necessary, but I use them here and it will make our process very clear.
Finally, I finish my query with a "select" sentence. It can be any variables inside the query or collection of them.
The query will not be run right away(lazy evaluate), but until it meets a function. Normally, a "foreach" loop.
using System; using System.Collections.Generic; using System.Linq; using System.IO; namespace processCsvWithLinq { class Program { static void Main(string[] args) { string[] lines = File.ReadAllLines("iris.csv"); var query = from line in lines.Skip(1) // escape first row of column names let elements = line.Split(',') let species = elements[4] let symbol = species == "virginica"? "Target": "Untarget" select line + "," + symbol; foreach(var result in query) { Console.WriteLine(result); } } } }
Use "dotnet run" at console to see our program's result.

We can also save the adjusted data into disk from our query if we want.
using (StreamWriter outputFile = new StreamWriter("iris_adjusted.csv")) { foreach (string result in query) outputFile.WriteLine(result); } Console.WriteLine("Data Saved.");
5. Data summarise
We can calculate each column summarise value after we convert it into numeric data.
For example,
using System; using System.Collections.Generic; using System.Linq; using System.IO; namespace processCsvWithLinq { class Program { static void Main(string[] args) { string[] lines = File.ReadAllLines("iris.csv"); var query = from line in lines.Skip(1) // escape first row of column names let elements = line.Split(',') let petalLength = elements[2] select Convert.ToDouble(petalLength); var results = query.ToList(); double petalLengthAvg = results.Average(); Console.WriteLine(petalLengthAvg); } } } // output // 3.758000...
If we want to do some group summarise, it will be little more complicated than above.
We will use group syntax like below:
using System; using System.Collections.Generic; using System.Linq; using System.IO; namespace processCsvWithLinq { class Program { static void Main(string[] args) { string[] lines = File.ReadAllLines("iris.csv"); var query = from line in lines.Skip(1) // escape first row of column names let elements = line.Split(',') let species = elements[4] let petalLength = Convert.ToDouble(elements[2]) group petalLength by species; foreach(var speciesGroup in query) { Console.WriteLine(speciesGroup.Key); var result = speciesGroup.ToList(); Console.WriteLine(result.Average()); } Console.WriteLine("End."); } } } // output: // setosa // 1.462000... // versicolor // 4.26 // virginica // 5.552
6. About Csv with Chinese Character
As we read in data above, we still can use File.ReadAllLine() function.
This function has a second argument called encoding.
It is a special class of variable defined in C# System.Text. We can do things like below:
using System.Text Encoding.RegisterProvider(CodePagesEncodingProvider.Instance); Encoding myGbk = Encoding.GetRncoding("GB2312"); string[] lines = File.ReadAllLines(path, myGbk);
浙公网安备 33010602011771号