Thursday, March 12, 2020

multi-index (column) report using pandas

multi-index-09-03-2020 <!-- Custom stylesheet, it must be in the same directory as the html file
In [1]:
import pandas as pd
import numpy as np


#1 input file
input_file = r"D:\Python Scripts\multi_index_3.xlsx"

df1=pd.read_excel(input_file)
df1['Focus_area']=df1['Focus_area'].astype('category')
table0=df1
table0
#table0 is shown below.
Out[1]:
Focus_area City As_on Spend
0 Education Mumbai 2019-01-01 1000000
1 Healthcare Pune 2019-03-31 20000
2 Environment Chennai 2018-03-31 100000
3 Disaster Relief Bengaluru 2017-12-01 220000
4 Vocational Skills Varanasi 2018-09-30 540000
5 Education Pune 2019-06-30 3000000
6 Environment Pune 2019-03-31 666000
In [67]:
df1['As_on']=pd.to_datetime(df1.As_on)
In [68]:
df1.dtypes
Out[68]:
Focus_area          category
City                  object
As_on         datetime64[ns]
Spend                  int64
dtype: object
In [69]:
df1 = df1.set_index(['Focus_area','City'])
In [70]:
df1.dtypes
Out[70]:
As_on    datetime64[ns]
Spend             int64
dtype: object
In [4]:
df1.groupby(df1['As_on'])['Spend'].sum()
Out[4]:
As_on
2017-12-01     220000
2018-03-31     100000
2018-09-30     540000
2019-01-01    1000000
2019-03-31     686000
2019-06-30    3000000
Name: Spend, dtype: int64
In [5]:
df1['Quarter'] = pd.PeriodIndex(df1['As_on'], freq='Q-MAR').strftime('Q%q')#Adding quarter to the dataframe.

Q-DEC specifies quarterly periods whose last quarter ends on the last day in December.

Q-MAR specifies quarterly periods whose last quarter ends on the last day in March.

In [ ]:
df1
In [7]:
df1['Year']=df1.As_on.dt.year#Adding Year to the dataframe.
df1
Out[7]:
Focus_area City As_on Spend Quarter Year
0 Education Mumbai 2019-01-01 1000000 Q4 2019
1 Healthcare Pune 2019-03-31 20000 Q4 2019
2 Environment Chennai 2018-03-31 100000 Q4 2018
3 Disaster Relief Bengaluru 2017-12-01 220000 Q3 2017
4 Vocational Skills Varanasi 2018-09-30 540000 Q2 2018
5 Education Pune 2019-06-30 3000000 Q1 2019
6 Environment Pune 2019-03-31 666000 Q4 2019
In [12]:
df2 = pd.pivot_table(df1,index=['City','Focus_area'],columns=['Year','Quarter'],values=['Spend'],fill_value=0)#works
#df2.query('City==["Pune","Mumbai"]')#works
In [13]:
df2
Out[13]:
Spend
Year 2017 2018 2019
Quarter Q3 Q2 Q4 Q1 Q4
City Focus_area
Bengaluru Disaster Relief 220000 0 0 0 0
Chennai Environment 0 0 100000 0 0
Mumbai Education 0 0 0 0 1000000
Pune Education 0 0 0 3000000 0
Environment 0 0 0 0 666000
Healthcare 0 0 0 0 20000
Varanasi Vocational Skills 0 540000 0 0 0
In [14]:
df2 = pd.pivot_table(df1,index=['Year','Quarter'],columns=['City','Focus_area'],values=['Spend'],fill_value=0)#works
df2
Out[14]:
Spend
City Bengaluru Chennai Mumbai Pune Varanasi
Focus_area Disaster Relief Environment Education Education Environment Healthcare Vocational Skills
Year Quarter
2017 Q3 220000 0 0 0 0 0 0
2018 Q2 0 0 0 0 0 0 540000
Q4 0 100000 0 0 0 0 0
2019 Q1 0 0 0 3000000 0 0 0
Q4 0 0 1000000 0 666000 20000 0
In [ ]:
df2.stack(level=0)#works
#level=0 is the outermost level i.e. spend
#level=1 is the next level i.e. city
#level=2 is the innermost level i.e. Focus_area
In [22]:
df2.unstack(level=1)
#level=0 is the outermost level i.e. Year
#level=1 is the next level i.e. quarter
Out[22]:
Spend
City Bengaluru Chennai Mumbai ... Pune Varanasi
Focus_area Disaster Relief Environment Education ... Environment Healthcare Vocational Skills
Quarter Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 ... Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
Year
2017 NaN NaN 220000.0 NaN NaN NaN 0.0 NaN NaN NaN ... 0.0 NaN NaN NaN 0.0 NaN NaN NaN 0.0 NaN
2018 NaN 0.0 NaN 0.0 NaN 0.0 NaN 100000.0 NaN 0.0 ... NaN 0.0 NaN 0.0 NaN 0.0 NaN 540000.0 NaN 0.0
2019 0.0 NaN NaN 0.0 0.0 NaN NaN 0.0 0.0 NaN ... NaN 666000.0 0.0 NaN NaN 20000.0 0.0 NaN NaN 0.0
3 rows × 28 columns
In [ ]:
 
-->

Tuesday, March 10, 2020

CSR Multi-index report in python using pandas

I wanted to make a report with multi-index columns and rows using python and pandas. 

The starting point was data in Excel file. The data was brought in pandas data frame.

I wanted an output report that looked like the report shown below.

But there was one condition. The condition was the quarters of the year has to be as per Indian financial year that starts on 1st April and end on 31st March the following year. 

What helped me was the below syntax.

Q-DEC specifies quarterly periods whose last quarter ends on the last day in December. 

Q-MAR specifies quarterly periods whose last quarter ends on the last day in March.

So, I chose Q-MAR. 

The data shown is about spend on Social and Environmental efforts of a company in India. A similar report has to be published by the company, as applicable, as per Indian laws. 

The report format can also be extended to report on financial numbers. The data itself is imaginary. 

Monday, March 2, 2020

CSR Reporting using Power BI - multi-index or nested columns and rows report

I started learning Power BI a few days ago. I downloaded Power BI desktop.

What I realized is I can bring together data from different Excel files or Tables using Data Model, analyze it and report it. The totals are done by Power BI.

The time I had spent on Data Models in MS Access came in handy.   

I used Power Query to bring the files together from folder.

Below is my first multi-index or nested columns and rows report. Power BI makes it easy to report in this way. In Power BI this is a matrix visual and is interactive.

The data is not real. Defining Calendar Table helped. The Calendar represents Indian financial year and quarters.

The next step for me is to expand it to add multiple years, other locations and other focus areas.

Once I have these building blocks in place, then the next challenge is to develop a dashboard.