Thursday, March 12, 2020

multi-index (column) report using pandas

multi-index-09-03-2020 <!-- Custom stylesheet, it must be in the same directory as the html file
In [1]:
import pandas as pd
import numpy as np


#1 input file
input_file = r"D:\Python Scripts\multi_index_3.xlsx"

df1=pd.read_excel(input_file)
df1['Focus_area']=df1['Focus_area'].astype('category')
table0=df1
table0
#table0 is shown below.
Out[1]:
Focus_area City As_on Spend
0 Education Mumbai 2019-01-01 1000000
1 Healthcare Pune 2019-03-31 20000
2 Environment Chennai 2018-03-31 100000
3 Disaster Relief Bengaluru 2017-12-01 220000
4 Vocational Skills Varanasi 2018-09-30 540000
5 Education Pune 2019-06-30 3000000
6 Environment Pune 2019-03-31 666000
In [67]:
df1['As_on']=pd.to_datetime(df1.As_on)
In [68]:
df1.dtypes
Out[68]:
Focus_area          category
City                  object
As_on         datetime64[ns]
Spend                  int64
dtype: object
In [69]:
df1 = df1.set_index(['Focus_area','City'])
In [70]:
df1.dtypes
Out[70]:
As_on    datetime64[ns]
Spend             int64
dtype: object
In [4]:
df1.groupby(df1['As_on'])['Spend'].sum()
Out[4]:
As_on
2017-12-01     220000
2018-03-31     100000
2018-09-30     540000
2019-01-01    1000000
2019-03-31     686000
2019-06-30    3000000
Name: Spend, dtype: int64
In [5]:
df1['Quarter'] = pd.PeriodIndex(df1['As_on'], freq='Q-MAR').strftime('Q%q')#Adding quarter to the dataframe.

Q-DEC specifies quarterly periods whose last quarter ends on the last day in December.

Q-MAR specifies quarterly periods whose last quarter ends on the last day in March.

In [ ]:
df1
In [7]:
df1['Year']=df1.As_on.dt.year#Adding Year to the dataframe.
df1
Out[7]:
Focus_area City As_on Spend Quarter Year
0 Education Mumbai 2019-01-01 1000000 Q4 2019
1 Healthcare Pune 2019-03-31 20000 Q4 2019
2 Environment Chennai 2018-03-31 100000 Q4 2018
3 Disaster Relief Bengaluru 2017-12-01 220000 Q3 2017
4 Vocational Skills Varanasi 2018-09-30 540000 Q2 2018
5 Education Pune 2019-06-30 3000000 Q1 2019
6 Environment Pune 2019-03-31 666000 Q4 2019
In [12]:
df2 = pd.pivot_table(df1,index=['City','Focus_area'],columns=['Year','Quarter'],values=['Spend'],fill_value=0)#works
#df2.query('City==["Pune","Mumbai"]')#works
In [13]:
df2
Out[13]:
Spend
Year 2017 2018 2019
Quarter Q3 Q2 Q4 Q1 Q4
City Focus_area
Bengaluru Disaster Relief 220000 0 0 0 0
Chennai Environment 0 0 100000 0 0
Mumbai Education 0 0 0 0 1000000
Pune Education 0 0 0 3000000 0
Environment 0 0 0 0 666000
Healthcare 0 0 0 0 20000
Varanasi Vocational Skills 0 540000 0 0 0
In [14]:
df2 = pd.pivot_table(df1,index=['Year','Quarter'],columns=['City','Focus_area'],values=['Spend'],fill_value=0)#works
df2
Out[14]:
Spend
City Bengaluru Chennai Mumbai Pune Varanasi
Focus_area Disaster Relief Environment Education Education Environment Healthcare Vocational Skills
Year Quarter
2017 Q3 220000 0 0 0 0 0 0
2018 Q2 0 0 0 0 0 0 540000
Q4 0 100000 0 0 0 0 0
2019 Q1 0 0 0 3000000 0 0 0
Q4 0 0 1000000 0 666000 20000 0
In [ ]:
df2.stack(level=0)#works
#level=0 is the outermost level i.e. spend
#level=1 is the next level i.e. city
#level=2 is the innermost level i.e. Focus_area
In [22]:
df2.unstack(level=1)
#level=0 is the outermost level i.e. Year
#level=1 is the next level i.e. quarter
Out[22]:
Spend
City Bengaluru Chennai Mumbai ... Pune Varanasi
Focus_area Disaster Relief Environment Education ... Environment Healthcare Vocational Skills
Quarter Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 ... Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
Year
2017 NaN NaN 220000.0 NaN NaN NaN 0.0 NaN NaN NaN ... 0.0 NaN NaN NaN 0.0 NaN NaN NaN 0.0 NaN
2018 NaN 0.0 NaN 0.0 NaN 0.0 NaN 100000.0 NaN 0.0 ... NaN 0.0 NaN 0.0 NaN 0.0 NaN 540000.0 NaN 0.0
2019 0.0 NaN NaN 0.0 0.0 NaN NaN 0.0 0.0 NaN ... NaN 666000.0 0.0 NaN NaN 20000.0 0.0 NaN NaN 0.0
3 rows × 28 columns
In [ ]:
 
-->

No comments: