Read and Edit OpenXml Docs (Word, Excel, etc.) in Windows 10 with c#

It is often useful to be able examine and edit Microsoft Office Documents in your own applications. Office Documents have being using the OpenXml format for some time now, in this post I’ll demonstrate how you can read and manipulate the data.

NB For more information, and to download some great tools including the Open , visit http://openxmldeveloper.org/

 

Reading data from Excel

NB For some reason you CANNOT use any other code to read a document if the document is already open in an Office Product, in this case Excel. If you get AccessDenied then check the document isn’t already open.

Consider the following spread-sheet;

image

I’ve saved this document as Book1.xlsx. If you haven’t downloaded the Open XML SDK 2.5 for Microsoft Office tools then rename the document to Book1.zip and open the Zip. I am not going to go into great depth about how the document packaging works, see the open xml developer site for more information. But basically each document is a Zip file containing a content catalogue and folders with the data represented in the document. So to examine the data in the first sheet you will find that located at; xl/worksheets/sheet1.xml

The first step is to open the document;

            FileOpenPicker opener = new FileOpenPicker();
            opener.ViewMode = PickerViewMode.Thumbnail;            
            opener.FileTypeFilter.Add(".xlsx");

            
            StorageFile file = await opener.PickSingleFileAsync();
            if (file != null)
            {
                XmlDocument worksheet = null;
                using (var fileStream = await file.OpenReadAsync())
                {
                    using (ZipArchive archive = new ZipArchive(fileStream.AsStream(), ZipArchiveMode.Read))
                    {
                        worksheet = this.GetSheet(archive, "sheet1");
                    }
                }
            }

You can see that we’ve opened the document as a Zip Archive, now we can read the specific sheet file held within, in this case I’m storing the result in an XmlDocument, this is just my choice of Xml parser;

        private XmlDocument GetSheet(ZipArchive archive, string sheetName)
        {
            XmlDocument sheet = new XmlDocument();
            ZipArchiveEntry archiveEntry = archive.GetEntry("xl/worksheets/" + sheetName + ".xml");

            using (var archiveEntryStream = archiveEntry.Open())
            {
                using (StreamReader reader = new StreamReader(archiveEntryStream))
                {
                    string xml = reader.ReadToEnd();
                    sheet.LoadXml(xml);
                }
            }

            return sheet;
        }

Now we have the document, we can grab a value, here via XPath

       private string ReadCell(XmlDocument worksheet, string cellAddress)
        {
            string value = string.Empty;           
            XmlElement row = worksheet.SelectSingleNodeNS("//x:c[@r='" + cellAddress + "']", "xmlns:x=\"http://schemas.openxmlformats.org/spreadsheetml/2006/main\"") as XmlElement;
            if (row != null)
            {
                value = row.InnerText;
            }
            
            return value;
        }

There you have it, you’ve read the data in cell B2.

Writing data in Excel

Let’s now change that value from 2 to 99.

using (var fileStream = await file.OpenStreamForWriteAsync())
                {
                    using (ZipArchive archive = new ZipArchive(fileStream, ZipArchiveMode.Update))
                    {
                        await WriteCell(archive, worksheet, "sheet1", "B2", "99");
                    }
                }

In this example I’ve chosen to simply remove the sheet from the archive and put a new version back.

        private async Task WriteCell(ZipArchive archive, XmlDocument worksheet, string sheetName, string cellAddress, string value)
        {            
            XmlElement row = worksheet.SelectSingleNodeNS("//x:c[@r='" + cellAddress + "']/x:v", "xmlns:x=\"http://schemas.openxmlformats.org/spreadsheetml/2006/main\"") as XmlElement;
            if (row != null)
            {
                row.InnerText = value;
            }

            string sheetFilename = "xl/worksheets/" + sheetName + ".xml";
            ZipArchiveEntry archiveEntry = archive.GetEntry(sheetFilename);
            archiveEntry.Delete();

            archiveEntry = archive.CreateEntry(sheetFilename);
            using (var archiveEntryStream = archiveEntry.Open())
            {
                using (DataWriter writer = new DataWriter(archiveEntryStream.AsOutputStream()))
                {
                    writer.WriteString(worksheet.GetXml());

                    await writer.StoreAsync();
                    await writer.FlushAsync();
                    writer.DetachStream();
                }               
            }
        }

Let’s open the file in Excel and check our result;

 

image

Looks good.

 

What else can you do?

We’ve seen a simple example of reading and writing to Excel. But these basic techniques allow you to do pretty much anything with any Office document. The underlying schema was developed to aid the use of tools, so anything is possible. Again I would point to Open XML SDK 2.5 for Microsoft Office and http://openxmldeveloper.org/ for information about how to interpret the various document types. My recommendation is to write an example document and use the SDK tools to learn how to reproduce it.

Advertisements
This entry was posted in Universal Apps, Windows 10 and tagged , . Bookmark the permalink.

11 Responses to Read and Edit OpenXml Docs (Word, Excel, etc.) in Windows 10 with c#

  1. reddvid says:

    This is unclear to me. How can I put the value of my cell address E9 to a textblock?

    • pauliom says:

      ReadCell returns a string, so providing you have selected the correct worksheet and cell address then you can just set the text of the textblock.

  2. Arjan says:

    Writing numbers with WriteCell works OK, but writing “Hello” into a cell will cause an error message when I open the file with Excel.

    • pauliom says:

      As I mention in the post, first use Excel to create the document you want and open it with the explorer (or just open the zip) and see what you need to write. Excel uses a string library style mechanism.

      • Arjan says:

        I think I don’t understand you fully but this is what I did: I created an excel file with ‘Hi’ in cell B2 with the excel application. With my UWP app I stored “Hello” in cell B2. No error messages. When I open the excel file again I got an error on open (dutch message something like: problems in content of excel file, do you want to restore content) If I choose yes then the file is restored (with Hello in B2) correctly but I don’t want that error message.

  3. pauliom says:

    Excel doesn’t store strings directly in the cell. That’s why I suggest you create the Excel document with some strings in it and take a look at how it stores them. It stores the strings in a separate sharedstrings.xml file and records the address of the string in the cell via a string type attribute and the position in the sharedstrings file.

  4. Arjan says:

    Ok, thanks for clarification!. I also want to store timestamps in cells’s. Maybe they are stored in again another way.. I am serious thinking to create comma separated files: very simple and excel version independant. What’s your advise?

  5. pauliom says:

    It really depends on what you are trying to do. Once you have the .net to write to the correct file xlsx isn’t too bad, but if you don’t need the complexity don’t do it 🙂

  6. I tried the exace same code, but instead of returning cell value, it returns 1 for cell A1, although in cell A1, i have written string “Hello”. So I was expecting “Hello”, but ReadCell() function return me 1. Same is the case for other cells, I receive 2, 3 or 0 or any other integer. I would be thankful for your help. Thanks

    • pauliom says:

      Depends how you write the string. Take a look in the separate strings file in the zip file. What you’re prob’ seeing is a reference into the string files. I.e. you have to do a little extra work to extract the correct string. You can examine the cell type to know what mechanism you should use.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s