Interacting with Excel

Excel (which exposes a COM automation interface) is accessible to .NET languages like IronPython through the .NET COM interop facilities.

Here is a simple example that should get you off the ground: import clr clr.AddReferenceByName('Microsoft.Office.Interop.Excel, Version=11.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c') from Microsoft.Office.Interop import Excel

ex = Excel.ApplicationClass ex.Visible = True ex.DisplayAlerts = False

workbook = ex.Workbooks.Open('foo.xls') ws = workbook.Worksheets[1]

print ws.Rows[1].Value2[0,0] From here you can explore the various worksheet objects that you get back.

Python itself has excellent libraries available for dealing with Excel files:


 * xlrd
 * xlwt
 * xlutils

These can be faster than COM, but have limitations like not being (currently) able to access the formulae in spreadsheet. If you have problems or are interested in these libraries, you can discuss them on the following Google group: http://groups.google.com/group/python-excel

Installing the Interop Assemblies
To use COM components from .NET, you need .NET class libraries that wrap the COM interfaces - for most COM libraries these can be automatically generated by the framework, but for Office the automatic generation doesn't work correctly, so Microsoft have provided pre-built ones called the Office Primary Interop Assemblies (or PIAs). For Office 2003 they can be found here: Office 2003 PIAs (I'm not sure, but I think Office 2007 apps will install the PIAs automatically.)

Getting Started
If the PIAs are installed, you should be able to add a reference to the Excel assembly:

import clr clr.AddReference("Microsoft.Office.Interop.Excel")

Getting Excel Started
import Microsoft.Office.Interop.Excel as Excel excel = Excel.ApplicationClass excel.Visible = True # makes the Excel application visible to the user

Getting hold of an already-running instance of Excel: from System.Runtime.InteropServices import Marshal excel = Marshal.GetActiveObject("Excel.Application")

Workbooks
workbook = excel.Workbooks.Add
 * 1) creating a new one

workbook = excel.Workbooks.Open(r"C:\My Excel Files\Book1.xls")
 * 1) opening a workbook

filename = r"C:\My Excel Files\Book1.xls" workbooks = [wb for wb in excel.Workbooks if wb.FullName == filename] if workbooks: workbook = workbooks[0]
 * 1) finding a workbook that's already open

excel.ActiveWorkbook
 * 1) just grabbing the current workbook

EnvironmentError: Old format or invalid type library Traceback (most recent call last): File C:\myscript.py, line 13, in Initialize File Microsoft.Office.Interop.Excel, line unknown, in Open EnvironmentError: Old format or invalid type library. (Exception from HRESULT: 0x80028018 (TYPE_E_INVDATAREAD))

This is known bug (http://support.microsoft.com/kb/320369). You probably run an English version of Excel. However, the regional settings for the computer are configured for a non-English language. To fix it, set the english locale to the Excel thread.

System.Threading.Thread.CurrentThread.CurrentCulture = System.Globalization.CultureInfo("en-US") workbook = excel.Workbooks.Open(r"C:\My Excel Files\Book1.xls")
 * 1) opening a workbook

Worksheets
worksheet = workbook.Worksheets.Add
 * 1) adding a worksheet

workbook.ActiveSheet
 * 1) getting the active one

Cell ranges
cell = worksheet.Range["A3"] print cell.Value2 # prints None cell.Value2 = 42 # if the sheet is visible, you'll see this appear.
 * 1) a range can be one cell

from System.Reflection import Missing print cell.Address[Missing.Value] # prints $A$3 print cell.Address[False, False] # prints A3 (the first two parameters are whether row and column should be absolute)
 * 1) for optional parameters, like the arguments to the Address indexer, you can use Missing.Value to use the default.

xlrange = worksheet.Range["A3", "D4"] print xlRange.Address[False, False] # prints A3:D4 print xlrange.Value2
 * 1) or a number of cells (in which case .Value2 can be got/set as a 2D array).
 * 1) prints:
 * 2) System.Object[,](
 * 3) None, None
 * 4) None, None)

from System import Array a = Array.CreateInstance(object, 2, 2) a[0, 1] = 3 a[1, 1] = "hi there!"

xlrange.Value2 = a print xlrange.Value2
 * 1) prints:
 * 2) System.Object[,](
 * 3) None, 3
 * 4) None, "hi there!")

Note the use of the Value2 property of Ranges - this is a convenience property that ignores some data types. To use the Value property (which handles dates and currencies correctly), we pass an enum indicating that we want the default type (there are also XML serialisation versions selected by different values of the enum). from System import DateTime default = Excel.XlRangeValueDataType.xlRangeValueDefault cell.Value[default] = DateTime.Now print cell.Value[default] # prints 01/06/2007 15:38:10 (for me)

Events
Excel exposes a number of events that we might be interested in - for example, we can ask to be notified when a workbook is changed. To do this we need to add an event handler to the event we're interested in.

This is a little more fiddly in the case of COM objects, because they expose events on a separate interface to their main one (this is referred to as the 'source' interface - COM Interop Exposed Part 2 has more information on this). If, for example, you have a workbook for which you'd like to receive the SheetChanged event, you need to cast it to the corresponding Excel.WorkbookEvents_Event interface first. In C# this is handled for you, but in IronPython we need to use the Python idiom for calling superclass methods.

Say we have a simple handler for the SheetChange event: def onSheetChange(sheet, changedRange): print "%s!%s changed" % (sheet.Name, changedRange.Address[False, False])
 * 1) the SheetChange event handler should accept the sheet and the range that changed

What we want to express is this: workbook.SheetChange += onSheetChange # doesn't work

But the Worksheet interface doesn't have the SheetChange event. Instead it's on the WorkbookEvents_Event interface. So we need to add the handler 'through' the interface, but with the workbook instance we have.

The trick is to realise that the line: obj.Event += handler ... is actually implemented behind the scenes in .NET as: obj.add_Event(handler) ... and in Python that method call is really: .add_Event(obj, handler)

So we can add the handler like this: Excel.WorkbookEvents_Event.add_SheetChange(workbook, onSheetChange)

Now if you change the value of a cell in the workbook, you should see a line which indicates which cell was changed.

We remove the handler in the same way: Excel.WorkbookEvents_Event.remove_SheetChange(workbook, onSheetChange)

As another event example, you might want to prevent the user from closing a workbook. Excel.Application provides a WorkbookBeforeClose event that can be used to prevent the workbook from being closed. The reference says that the event handler will be called with the workbook being closed, and a boolean passed by reference that can be set to True to cancel closing the workbook.

def onWorkbookBeforeClose(workbook, cancel): print "you can't close this workbook!" # IronPython ensures that arguments that are declared as passed by reference # arrive as Reference objects. These are updated by setting their .Value property. cancel.Value = True

Excel.AppEvents_Event.add_WorkbookBeforeClose(excel, onWorkbookBeforeClose)
 * 1) Application events are exposed by the AppEvents_Event interface


 * 1) Any attempts to close a workbook in the Excel instance will now be thwarted...

Excel.AppEvents_Event.remove_WorkbookBeforeClose(excel, onWorkbookBeforeClose)
 * 1) ...which could definitely be annoying.

Incidentally, if you need to pass ByRef arguments into a method, you can construct one using the clr.Reference generic type: ref = clr.Reference[int](42)

Finding Stuff
The VBA Language reference provides all the information you need to drive Excel, but it's in VBA, unfortunately. Once you're familiar with the differences between how things are done in VBA and how they work in IronPython, you should be able to interpret the documentation.

An important extra source of information is the interactive interpreter - if in doubt, import the Excel namespace at the command line, and use dir on things to find out what methods and properties are available. This was especially useful for me when trying to understand how events worked - finding the *Events_Event interfaces with all of the events on them was my "Aha!" moment. (Well, that and some extremely useful pointers from Dino Viehland.)

Excel 2003 VBA Language Reference:
 * Top-level
 * Application
 * Workbook
 * Worksheet
 * Range - a collection of one or more cells. This is probably the most useful object.

Useful articles about .NET COM Interop:
 * COM Interop Exposed
 * COM Interop Exposed Part 2

Back to Contents.