Adding an interactive living pivot table to Excel wtih Python

 

1. Why do I care?

Building a summary or pivot table table is very common in daily data analysis. We can use pandas.pivot_table or pandas.dataframe.groupby to get the result. After that we can save it into a sheet in excel.

However, this result is a static table in excel sheet, which we cannot interactive with. So how can I create a real excel's pivot table?

 

2. How pywin32 can help us?

pywin32 is a python library which can help contorl applications in windows. Excel, of course, can be controlled under the library. Surprisingly, WPS if you are using, can also be controlled by this way.

After we build a connection to excel or wps, we will use some code highly similar with VBA. But don't worry if you never knew VBA, we are not really using VBA and we will explain step by step.

2.1 Normal workflow in pandas

First, let's have a look at the classic data "tips.csv" as an example. It can be found (here).

import pandas as pd

tips = pd.read_csv("tips.csv", engine="python")
tips.head()

As a common work flow, we will like to do some change to the data.

def judge(x, y):
    """x:total_bill, y:tip
    """
    if x * 0.1 < y:
        return "under paid"
    else:
        return "well paid"

tips["est"] = tips[["total_bill", "tip"]].apply(lambda x: judge(*x), axis="columns")

tips.head()

  

Now we can save it into an excel. You may notice we haven't do pivot table yet. This is what we are going to do inside excel with pywin32.

Again, we can use pandas.pivot_table to do summary, but it's an static table. In some situations we may need a living pivot table as below.

import os
file_path = os.getcwd() + "\\result.xlsx"

tips.to_excel(file_path, index=False, sheet_name="source")

Normally, "to_excel()" function in pandas doesn't need os library to bulid a full path.

But because pywin32 with excel application requires the full path, so we build it here in advance.

Also, we specified a sheet name, this can guarantee the name of sheet, which we will cite it later.

2.2 pywin32 comes to help

Now, we are opening excel with the help of pywin32. And it can also open wps application, with a little different at application name.

import win32com.client as win32
win32c = win32.constants

excel = win32.gencache.EnsureDispatch("Excel.Application")

# -----------------------------------------------
# write below if you are using WPS
# excel = win32.gencache.EnsureDispatch("ket.Application")

Every code later will be all the same, no matter excel or wps we are using.

We will open excel file with excel.WorkBook.Open function.

Also, the above function can open csv file as well. Which is very helpful if we have any pandas part, we can jump to csv file directly.

excel.Visible = False # use True will pump up an excel windows, if that is what you need
excel.DisplayAlerts = False

wb = excel.Workbooks.Open(file_path)

We build a new sheet for the living pivot table, this is not always needed, but pivot in a new sheet is more clear to readers and ourself.

# building a new sheet
exist_sheet = wb.Worksheets("source")
new_sheet = wb.Sheets.Add(Before=exist_sheet)
new_sheet.Name = 'my_pivot'

For building a pivot table in excel, we first need to build a pivot cache. Using this pivot cache object, we can build a real pivot table.

There are just two lines actually:

pivot_cache = wb.PivotCaches().Create(SourceType=win32c.xlDatabase, SourceData="source!A:H", Version=win32c.xlPivotTableVersion12)

pivot_table = pivot_cache.CreatePivotTable(TableDestination='my_pivot!R4C1',TableName='my_pivot_name')

(Update: Originally, I tried to use PivotCaches().Add() function to build pivot table, but that function has no Version argument. So after a little research, I found what I show above. Using argument "Version" in PivotCaches(), can help us avoid some strange bug from old excel pivot table.)

Finally, we can add data into pivot table. And don't forget to save & quite our excel application.

pivot_table.PivotFields("sex").Orientation = win32c.xlRowField

pivot_table.AddDataField(pivot_table.PivotFields("total_bill"), "求和项:total_bill", win32c.xlSum)
pivot_table.AddDataField(pivot_table.PivotFields("tip"), "求和项:tip", win32c.xlSum)

wb.Save()
excel.Application.Quit()

This is basically what we get at the end.

  

3. Summary

The difficult part of using pywin32 is it doesn't list all function to us, and there are few article online we can refer to. I manually try funcitions one by one from VBA to get above process working.

Use our imagination and combination of pywin32, we can build a really powerful excel output, including pivot table like above, image, and even runing xlam Macro in python.

Talking about Macro, here also comes situations that we already have old VBA function(disigned by other people), but we want to use it in a new workflow with Python.

What can we do? It is too hard to read & rebuild in Python again. We can direcely use it from pywin32, by simlply wb.Run(xlam_file_path) + "!functionname".

Finally, here is microsoft reference for excel VBA, check it out if you want to explore more.

 

  

 

posted @ 2020-05-02 16:20  DrVonGoosewing  阅读(646)  评论(0)    收藏  举报