Parse CVS Data with LINQ

Tuesday, July 10, 2018

By: Chris Dunn

In the world of BIG DATA we're dealing with more than just JSON, XML and SQL.  A lot of data comes in raw Comma Separated Values (CSV) format.  A number of times in the past I've imported CSV data into SQL server and queried the data that way.  With LINQ, we now possess that same concise query power (somewhat) in our code. It gives us some more options especially if we don't want the weight of a database (or server) for a client application.

Also, if you're starting to get involved in machine learning with ML.NET or similar framework, reading csv files will become the norm.

So in that light I thought I would toss out some code that's simple, small and imports CSV file data into a POCO with only a few lines.  You can query away or do as you like.

The first method is to read and convert all lines from a text file into a dictionary (key-value-pair) object.  So no need for pre-defined classes.  I am skipping the first line of the returned text which contains the column names. 

static Dictionary<string,string> CSVToDictionary(string path)
        {
            var data = System.IO.File.ReadAllLines(path);
            return data.Skip(1).Select(m => m.Split(",")).ToDictionary(m=>m[1],m=>m[0]);
        }

The second option is to load the data into a list of a given type.  In my example I'm doing so with a state text file containing state abbreviation and state name.  Here I am doing the same as the first method, but instead of a dictionary I am returning a List<State> type.

        static IList CSVToList(string path)
        {
            var data = System.IO.File.ReadAllLines(path);
            return data.Skip(1).Select(m => m.Split(",")).Select(m => new State() { Abbrev = m[0], Name = m[1] }).ToList();
        }
    class State
    {
        public string Abbrev { get; set; }
        public string Name { get; set; }
        
    }

Once that's setup, running and reusing the functionality is a breeze. The two examples below show the Dictionary method and type method calls. Another helpful method to add to your data toolbox.

        static void Main(string[] args)
        {

            var statesDictionary = CSVToDictionary(Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "states.csv"));

            foreach (var state in statesDictionary)
                Console.WriteLine("Abbrev={0} Name={1}", state.Key, state.Value);


            var statesList = CSVToList(Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "states.csv"));

            foreach (var state in statesList)
                Console.WriteLine("Abbrev={0} Name={1}", state.Abbrev, state.Name);

            Console.ReadKey();
        }
Tags: c# linq csv data

Copyright 2019 Cidean, LLC. All rights reserved.

Proudly running Umbraco 7. This site is responsive with the help of Foundation 5.