Read and Edit OpenXml Docs (Word, Excel, etc.) in Windows 10 with c#

It is often useful to be able examine and edit Microsoft Office Documents in your own applications. Office Documents have being using the OpenXml format for some time now, in this post I’ll demonstrate how you can read and manipulate the data.

NB For more information, and to download some great tools including the Open , visit http://openxmldeveloper.org/

 

Reading data from Excel

NB For some reason you CANNOT use any other code to read a document if the document is already open in an Office Product, in this case Excel. If you get AccessDenied then check the document isn’t already open.

Consider the following spread-sheet;

image

I’ve saved this document as Book1.xlsx. If you haven’t downloaded the Open XML SDK 2.5 for Microsoft Office tools then rename the document to Book1.zip and open the Zip. I am not going to go into great depth about how the document packaging works, see the open xml developer site for more information. But basically each document is a Zip file containing a content catalogue and folders with the data represented in the document. So to examine the data in the first sheet you will find that located at; xl/worksheets/sheet1.xml

The first step is to open the document;

            FileOpenPicker opener = new FileOpenPicker();
            opener.ViewMode = PickerViewMode.Thumbnail;            
            opener.FileTypeFilter.Add(".xlsx");

            
            StorageFile file = await opener.PickSingleFileAsync();
            if (file != null)
            {
                XmlDocument worksheet = null;
                using (var fileStream = await file.OpenReadAsync())
                {
                    using (ZipArchive archive = new ZipArchive(fileStream.AsStream(), ZipArchiveMode.Read))
                    {
                        worksheet = this.GetSheet(archive, "sheet1");
                    }
                }
            }

You can see that we’ve opened the document as a Zip Archive, now we can read the specific sheet file held within, in this case I’m storing the result in an XmlDocument, this is just my choice of Xml parser;

        private XmlDocument GetSheet(ZipArchive archive, string sheetName)
        {
            XmlDocument sheet = new XmlDocument();
            ZipArchiveEntry archiveEntry = archive.GetEntry("xl/worksheets/" + sheetName + ".xml");

            using (var archiveEntryStream = archiveEntry.Open())
            {
                using (StreamReader reader = new StreamReader(archiveEntryStream))
                {
                    string xml = reader.ReadToEnd();
                    sheet.LoadXml(xml);
                }
            }

            return sheet;
        }

Now we have the document, we can grab a value, here via XPath

       private string ReadCell(XmlDocument worksheet, string cellAddress)
        {
            string value = string.Empty;           
            XmlElement row = worksheet.SelectSingleNodeNS("//x:c[@r='" + cellAddress + "']", "xmlns:x=\"http://schemas.openxmlformats.org/spreadsheetml/2006/main\"") as XmlElement;
            if (row != null)
            {
                value = row.InnerText;
            }
            
            return value;
        }

There you have it, you’ve read the data in cell B2.

Writing data in Excel

Let’s now change that value from 2 to 99.

using (var fileStream = await file.OpenStreamForWriteAsync())
                {
                    using (ZipArchive archive = new ZipArchive(fileStream, ZipArchiveMode.Update))
                    {
                        await WriteCell(archive, worksheet, "sheet1", "B2", "99");
                    }
                }

In this example I’ve chosen to simply remove the sheet from the archive and put a new version back.

        private async Task WriteCell(ZipArchive archive, XmlDocument worksheet, string sheetName, string cellAddress, string value)
        {            
            XmlElement row = worksheet.SelectSingleNodeNS("//x:c[@r='" + cellAddress + "']/x:v", "xmlns:x=\"http://schemas.openxmlformats.org/spreadsheetml/2006/main\"") as XmlElement;
            if (row != null)
            {
                row.InnerText = value;
            }

            string sheetFilename = "xl/worksheets/" + sheetName + ".xml";
            ZipArchiveEntry archiveEntry = archive.GetEntry(sheetFilename);
            archiveEntry.Delete();

            archiveEntry = archive.CreateEntry(sheetFilename);
            using (var archiveEntryStream = archiveEntry.Open())
            {
                using (DataWriter writer = new DataWriter(archiveEntryStream.AsOutputStream()))
                {
                    writer.WriteString(worksheet.GetXml());

                    await writer.StoreAsync();
                    await writer.FlushAsync();
                    writer.DetachStream();
                }               
            }
        }

Let’s open the file in Excel and check our result;

 

image

Looks good.

 

What else can you do?

We’ve seen a simple example of reading and writing to Excel. But these basic techniques allow you to do pretty much anything with any Office document. The underlying schema was developed to aid the use of tools, so anything is possible. Again I would point to Open XML SDK 2.5 for Microsoft Office and http://openxmldeveloper.org/ for information about how to interpret the various document types. My recommendation is to write an example document and use the SDK tools to learn how to reproduce it.

15 thoughts on “Read and Edit OpenXml Docs (Word, Excel, etc.) in Windows 10 with c#

  1. reddvid March 22, 2016 / 8:00 pm

    This is unclear to me. How can I put the value of my cell address E9 to a textblock?

    • pauliom March 23, 2016 / 9:44 pm

      ReadCell returns a string, so providing you have selected the correct worksheet and cell address then you can just set the text of the textblock.

  2. Arjan April 3, 2016 / 8:04 pm

    Writing numbers with WriteCell works OK, but writing “Hello” into a cell will cause an error message when I open the file with Excel.

    • pauliom April 4, 2016 / 5:41 am

      As I mention in the post, first use Excel to create the document you want and open it with the explorer (or just open the zip) and see what you need to write. Excel uses a string library style mechanism.

      • Arjan April 4, 2016 / 4:40 pm

        I think I don’t understand you fully but this is what I did: I created an excel file with ‘Hi’ in cell B2 with the excel application. With my UWP app I stored “Hello” in cell B2. No error messages. When I open the excel file again I got an error on open (dutch message something like: problems in content of excel file, do you want to restore content) If I choose yes then the file is restored (with Hello in B2) correctly but I don’t want that error message.

  3. pauliom April 4, 2016 / 5:59 pm

    Excel doesn’t store strings directly in the cell. That’s why I suggest you create the Excel document with some strings in it and take a look at how it stores them. It stores the strings in a separate sharedstrings.xml file and records the address of the string in the cell via a string type attribute and the position in the sharedstrings file.

  4. Arjan April 4, 2016 / 7:35 pm

    Ok, thanks for clarification!. I also want to store timestamps in cells’s. Maybe they are stored in again another way.. I am serious thinking to create comma separated files: very simple and excel version independant. What’s your advise?

  5. pauliom April 4, 2016 / 8:28 pm

    It really depends on what you are trying to do. Once you have the .net to write to the correct file xlsx isn’t too bad, but if you don’t need the complexity don’t do it 🙂

  6. Muhammad Ibrahim Khan March 19, 2017 / 10:33 am

    I tried the exace same code, but instead of returning cell value, it returns 1 for cell A1, although in cell A1, i have written string “Hello”. So I was expecting “Hello”, but ReadCell() function return me 1. Same is the case for other cells, I receive 2, 3 or 0 or any other integer. I would be thankful for your help. Thanks

    • pauliom March 21, 2017 / 9:35 pm

      Depends how you write the string. Take a look in the separate strings file in the zip file. What you’re prob’ seeing is a reference into the string files. I.e. you have to do a little extra work to extract the correct string. You can examine the cell type to know what mechanism you should use.

  7. Agha A Khan August 10, 2017 / 4:21 am

    Unable to compile. The SelectSingleNodeNS is not defined under XmlDocument

  8. Shivshanker Cheral July 5, 2018 / 8:47 am

    When we create new empty Excel file. This code does not write any value in the provided cell address! And also there is no error message!.

    • pauliom August 4, 2018 / 10:06 am

      The code is replacing a current value. I assume that if you follow the code you’ll see that it skips the setting of the value because there isn’t one there already? In that case you’ll have to create the cell before you can update it.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s