Reading CSV Files

For some reason the Python standard library csv module is written in C, which means that it isn't available to IronPython.

Reading CSV (Comma Separated Values) files is one of those tasks that always sounds easy, but as soon as you have to handle different delimiters and all the quoting rules it becomes a headache to roll your own reader.

This entry briefly shows how to use CsvReader by Sebastien Lorion, which is available under a generous MIT Open Source License.

First you need to download the project, which comes with a prebuilt dll (no need to build) LumenWorks.Framework.IO.dll.

To use it with IronPython you need to first add a reference to the dll (of course): import clr from System.Reflection import Assembly assemblyPath = "PATH_TO_DLL\\LumenWorks.Framework.IO.dll" assembly = Assembly.LoadFile(assemblyPath) clr.AddReference(assembly)

Having added a reference to the dll, you are then able to import CsvReader from the LumenWorks.Framework.IO.Csv namespace it contains.

Assuming you already have the CSV as a string, the basic usage is as follows: from System.IO import StringReader from LumenWorks.Framework.IO.Csv import CsvReader

text = '1, 2, 3, 4\n5, 6, 7,8\n'

reader = CsvReader(StringReader(text), False) reader.SkipEmptyLines = False

rows = [] while reader.ReadNextRecord: row = [] colCount = reader.FieldCount for i in range(colCount): row.append(reader[i]) rows.append(row)

print rows

Which prints:

'1', '2', '3', '4'], ['5', '6', '7', '8'

This example uses a StringReader for cases where you already have the CSV as text. If this isn't the case, you can use any of the myriad ways of creating a TextReader to read from a file.

The full signature for creating a CsvReader is:

CsvReader(TextReader reader, bool hasHeaders, char delimiter, char quote, char escape, char comment, bool trimSpaces)

Most of these arguments are optional (uhm... not the first one though). See the 'Constructors' in the source.


 * reader: A TextReader pointing to the CSV file.
 * hasHeaders: if field names are located on the first non commented line
 * delimiter: The delimiter character separating each field (default is ',')
 * quote: The quotation character wrapping every field (default is ''')
 * escape: The escape character letting insert quotation characters inside a quoted field (default is '\'). If no escape character, set to '\0' to gain some performance
 * comment: The comment character indicating that a line is commented out (default is '#')
 * trimSpaces: if spaces at the start and end of a field are trimmed. Default is True

The reader implements indexers and IEnumerable, so it can both be indexed and iterated over.

Back to Contents.