Read and Edit OpenXml Docs (Word, Excel, etc.) in Windows 10 with c#

It is often useful to be able examine and edit Microsoft Office Documents in your own applications. Office Documents have being using the OpenXml format for some time now, in this post I’ll demonstrate how you can read and manipulate the data.

NB For more information, and to download some great tools including the Open , visit http://openxmldeveloper.org/

 

Reading data from Excel

NB For some reason you CANNOT use any other code to read a document if the document is already open in an Office Product, in this case Excel. If you get AccessDenied then check the document isn’t already open.

Consider the following spread-sheet;

image

I’ve saved this document as Book1.xlsx. If you haven’t downloaded the Open XML SDK 2.5 for Microsoft Office tools then rename the document to Book1.zip and open the Zip. I am not going to go into great depth about how the document packaging works, see the open xml developer site for more information. But basically each document is a Zip file containing a content catalogue and folders with the data represented in the document. So to examine the data in the first sheet you will find that located at; xl/worksheets/sheet1.xml

The first step is to open the document;

            FileOpenPicker opener = new FileOpenPicker();
            opener.ViewMode = PickerViewMode.Thumbnail;            
            opener.FileTypeFilter.Add(".xlsx");

            
            StorageFile file = await opener.PickSingleFileAsync();
            if (file != null)
            {
                XmlDocument worksheet = null;
                using (var fileStream = await file.OpenReadAsync())
                {
                    using (ZipArchive archive = new ZipArchive(fileStream.AsStream(), ZipArchiveMode.Read))
                    {
                        worksheet = this.GetSheet(archive, "sheet1");
                    }
                }
            }

You can see that we’ve opened the document as a Zip Archive, now we can read the specific sheet file held within, in this case I’m storing the result in an XmlDocument, this is just my choice of Xml parser;

        private XmlDocument GetSheet(ZipArchive archive, string sheetName)
        {
            XmlDocument sheet = new XmlDocument();
            ZipArchiveEntry archiveEntry = archive.GetEntry("xl/worksheets/" + sheetName + ".xml");

            using (var archiveEntryStream = archiveEntry.Open())
            {
                using (StreamReader reader = new StreamReader(archiveEntryStream))
                {
                    string xml = reader.ReadToEnd();
                    sheet.LoadXml(xml);
                }
            }

            return sheet;
        }

Now we have the document, we can grab a value, here via XPath

       private string ReadCell(XmlDocument worksheet, string cellAddress)
        {
            string value = string.Empty;           
            XmlElement row = worksheet.SelectSingleNodeNS("//x:c[@r='" + cellAddress + "']", "xmlns:x=\"http://schemas.openxmlformats.org/spreadsheetml/2006/main\"") as XmlElement;
            if (row != null)
            {
                value = row.InnerText;
            }
            
            return value;
        }

There you have it, you’ve read the data in cell B2.

Writing data in Excel

Let’s now change that value from 2 to 99.

using (var fileStream = await file.OpenStreamForWriteAsync())
                {
                    using (ZipArchive archive = new ZipArchive(fileStream, ZipArchiveMode.Update))
                    {
                        await WriteCell(archive, worksheet, "sheet1", "B2", "99");
                    }
                }

In this example I’ve chosen to simply remove the sheet from the archive and put a new version back.

        private async Task WriteCell(ZipArchive archive, XmlDocument worksheet, string sheetName, string cellAddress, string value)
        {            
            XmlElement row = worksheet.SelectSingleNodeNS("//x:c[@r='" + cellAddress + "']/x:v", "xmlns:x=\"http://schemas.openxmlformats.org/spreadsheetml/2006/main\"") as XmlElement;
            if (row != null)
            {
                row.InnerText = value;
            }

            string sheetFilename = "xl/worksheets/" + sheetName + ".xml";
            ZipArchiveEntry archiveEntry = archive.GetEntry(sheetFilename);
            archiveEntry.Delete();

            archiveEntry = archive.CreateEntry(sheetFilename);
            using (var archiveEntryStream = archiveEntry.Open())
            {
                using (DataWriter writer = new DataWriter(archiveEntryStream.AsOutputStream()))
                {
                    writer.WriteString(worksheet.GetXml());

                    await writer.StoreAsync();
                    await writer.FlushAsync();
                    writer.DetachStream();
                }               
            }
        }

Let’s open the file in Excel and check our result;

 

image

Looks good.

 

What else can you do?

We’ve seen a simple example of reading and writing to Excel. But these basic techniques allow you to do pretty much anything with any Office document. The underlying schema was developed to aid the use of tools, so anything is possible. Again I would point to Open XML SDK 2.5 for Microsoft Office and http://openxmldeveloper.org/ for information about how to interpret the various document types. My recommendation is to write an example document and use the SDK tools to learn how to reproduce it.

Advertisements

My .net performance counters were missing on Windows 7

I had a strange problem today. I wanted to take a peek at the CLR memory performance counters but I couldn’t find them. After some searching it turns out I needed to set the following registery key to 0;

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\.NETFramework\Performance\Disable Performance Counters