Interacting with Excel

From IronPython Cookbook

Excel (which exposes a COM automation interface) is accessible to .NET languages like IronPython through the .NET COM interop facilities.

Here is a simple example that should get you off the ground:

import clr
clr.AddReferenceByName('Microsoft.Office.Interop.Excel, Version=11.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c')
from Microsoft.Office.Interop import Excel

ex = Excel.ApplicationClass()   
ex.Visible = True
ex.DisplayAlerts = False   

workbook = ex.Workbooks.Open('foo.xls')
ws = workbook.Worksheets[1]

print ws.Rows[1].Value2[0,0]

From here you can explore the various worksheet objects that you get back.

Python itself has excellent libraries available for dealing with Excel files:

These can be faster than COM, but have limitations like not being (currently) able to access the formulae in spreadsheet. If you have problems or are interested in these libraries, you can discuss them on the following Google group: http://groups.google.com/group/python-excel

Contents

Installing the Interop Assemblies

To use COM components from .NET, you need .NET class libraries that wrap the COM interfaces - for most COM libraries these can be automatically generated by the framework, but for Office the automatic generation doesn't work correctly, so Microsoft have provided pre-built ones called the Office Primary Interop Assemblies (or PIAs). For Office 2003 they can be found here: Office 2003 PIAs (I'm not sure, but I think Office 2007 apps will install the PIAs automatically.)

Getting Started

If the PIAs are installed, you should be able to add a reference to the Excel assembly:

import clr
clr.AddReference("Microsoft.Office.Interop.Excel")

Getting Excel Started

import Microsoft.Office.Interop.Excel as Excel
excel = Excel.ApplicationClass()
excel.Visible = True # makes the Excel application visible to the user

Getting hold of an already-running instance of Excel:

from System.Runtime.InteropServices import Marshal
excel = Marshal.GetActiveObject("Excel.Application")

Workbooks

# creating a new one
workbook = excel.Workbooks.Add()

# opening a workbook
workbook = excel.Workbooks.Open(r"C:\My Excel Files\Book1.xls")

# finding a workbook that's already open
filename = r"C:\My Excel Files\Book1.xls"
workbooks = [wb for wb in excel.Workbooks if wb.FullName == filename]
if workbooks:
    workbook = workbooks[0]

# just grabbing the current workbook
excel.ActiveWorkbook

EnvironmentError: Old format or invalid type library

Traceback (most recent call last):
  File C:\myscript.py, line 13, in Initialize
  File Microsoft.Office.Interop.Excel, line unknown, in Open
EnvironmentError: Old format or invalid type library. (Exception from HRESULT: 0x80028018 (TYPE_E_INVDATAREAD))

This is known bug (http://support.microsoft.com/kb/320369). You probably run an English version of Excel. However, the regional settings for the computer are configured for a non-English language. To fix it, set the english locale to the Excel thread.

# opening a workbook
System.Threading.Thread.CurrentThread.CurrentCulture = System.Globalization.CultureInfo("en-US")
workbook = excel.Workbooks.Open(r"C:\My Excel Files\Book1.xls")

Worksheets

# adding a worksheet
worksheet = workbook.Worksheets.Add()

# getting the active one
workbook.ActiveSheet

Cell ranges

# a range can be one cell
cell = worksheet.Range["A3"]
print cell.Value2 # prints None
cell.Value2 = 42 # if the sheet is visible, you'll see this appear.

from System.Reflection import Missing
# for optional parameters, like the arguments to the Address indexer, you can use Missing.Value to use the default.
print cell.Address[Missing.Value] # prints $A$3
print cell.Address[False, False] # prints A3 (the first two parameters are whether row and column should be absolute)

# or a number of cells (in which case .Value2 can be got/set as a 2D array).
xlrange = worksheet.Range["A3", "D4"]
print xlRange.Address[False, False] # prints A3:D4
print xlrange.Value2
# prints:
# System.Object[,](
# None, None
# None, None)

from System import Array
a = Array.CreateInstance(object, 2, 2)
a[0, 1] = 3
a[1, 1] = "hi there!"

xlrange.Value2 = a
print xlrange.Value2
# prints:
# System.Object[,](
# None, 3
# None, "hi there!")

Note the use of the Value2 property of Ranges - this is a convenience property that ignores some data types. To use the Value property (which handles dates and currencies correctly), we pass an enum indicating that we want the default type (there are also XML serialisation versions selected by different values of the enum).

from System import DateTime
default = Excel.XlRangeValueDataType.xlRangeValueDefault
cell.Value[default] = DateTime.Now
print cell.Value[default] # prints 01/06/2007 15:38:10 (for me)

Events

Excel exposes a number of events that we might be interested in - for example, we can ask to be notified when a workbook is changed. To do this we need to add an event handler to the event we're interested in.

This is a little more fiddly in the case of COM objects, because they expose events on a separate interface to their main one (this is referred to as the 'source' interface - COM Interop Exposed Part 2 has more information on this). If, for example, you have a workbook for which you'd like to receive the SheetChanged event, you need to cast it to the corresponding Excel.WorkbookEvents_Event interface first. In C# this is handled for you, but in IronPython we need to use the Python idiom for calling superclass methods.

Say we have a simple handler for the SheetChange event:

# the SheetChange event handler should accept the sheet and the range that changed
def onSheetChange(sheet, changedRange):
   print "%s!%s changed" % (sheet.Name, changedRange.Address[False, False])

What we want to express is this:

workbook.SheetChange += onSheetChange # doesn't work

But the Worksheet interface doesn't have the SheetChange event. Instead it's on the WorkbookEvents_Event interface. So we need to add the handler 'through' the interface, but with the workbook instance we have.

The trick is to realise that the line:

obj.Event += handler

... is actually implemented behind the scenes in .NET as:

obj.add_Event(handler)

... and in Python that method call is really:

<obj's class>.add_Event(obj, handler)

So we can add the handler like this:

Excel.WorkbookEvents_Event.add_SheetChange(workbook, onSheetChange)

Now if you change the value of a cell in the workbook, you should see a line which indicates which cell was changed.

We remove the handler in the same way:

Excel.WorkbookEvents_Event.remove_SheetChange(workbook, onSheetChange)

As another event example, you might want to prevent the user from closing a workbook. Excel.Application provides a WorkbookBeforeClose event that can be used to prevent the workbook from being closed. The reference says that the event handler will be called with the workbook being closed, and a boolean passed by reference that can be set to True to cancel closing the workbook.

def onWorkbookBeforeClose(workbook, cancel):
    print "you can't close this workbook!"
    # IronPython ensures that arguments that are declared as passed by reference
    # arrive as Reference objects. These are updated by setting their .Value property.
    cancel.Value = True

# Application events are exposed by the AppEvents_Event interface
Excel.AppEvents_Event.add_WorkbookBeforeClose(excel, onWorkbookBeforeClose)

# Any attempts to close a workbook in the Excel instance will now be thwarted...

# ...which could definitely be annoying.
Excel.AppEvents_Event.remove_WorkbookBeforeClose(excel, onWorkbookBeforeClose)

Incidentally, if you need to pass ByRef arguments into a method, you can construct one using the clr.Reference generic type:

ref = clr.Reference[int](42)


Finding Stuff

The VBA Language reference provides all the information you need to drive Excel, but it's in VBA, unfortunately. Once you're familiar with the differences between how things are done in VBA and how they work in IronPython, you should be able to interpret the documentation.

An important extra source of information is the interactive interpreter - if in doubt, import the Excel namespace at the command line, and use dir() on things to find out what methods and properties are available. This was especially useful for me when trying to understand how events worked - finding the *Events_Event interfaces with all of the events on them was my "Aha!" moment. (Well, that and some extremely useful pointers from Dino Viehland.)

Excel 2003 VBA Language Reference:

Useful articles about .NET COM Interop:

Back to Contents.

TOOLBOX
LANGUAGES