Wednesday, January 29, 2020

Excel Power Query - One to many and other way round


I had a list of unicorns downloaded from internet as shown below. The Cat1 is categories column. 

Figure 1

So, for every unicorn (Company), you have one or many select investors. 

Step 1:

Using power query in Excel 2016, I filtered the list to show Indian unicorns (16 of them). This was a fairly simple operation. 

Step 2:

Then again using power query I transformed the column containing select investors (many in one cell) in to multiple rows. I got a list of every Indian unicorn against the every select investor. The total number of rows grew to become 46.

Step 3:

Further I grouped all unicorn companies on select investors to know all the Indian unicorns they have invested. But for that I had to write a small custom function to combine all the unicorns as one list against every investor from select investor column. Then the list was put in column named Custom. Now we know each investor and Indian unicorns it is investing in as shown below. 

Figure 2

So, to conclude from unicorn company with its multiple investing firms; we now have investing firm and multiple unicorns companies it is investing in.  

This kind of requirement comes often with different contexts. I am new to power query and finding its usefulness as I could do the above in minutes. Next I want to do similar output using python. 


Monday, January 20, 2020

ET top 25 - further analysis

Further to below blog, I have added a few more plots. 

A) I have added a column to dataframe to indicate if the company belongs to public (pub) sector or private (pvt) sector. The below plot (similar to the earlier blogpost) shows the plot by types of companies. 

B) I also wanted to study the distribution, so I have added a histogram. From it, we can see that most of companies are below revenue level of Rs 2,00,000 Cr (USD 28.5billion).
C) I added one more calculated column in the dataframe to show profit percentage compared to revenue in the below graph. High profits percentage companies are in private sector and are IT companies namely TCS (only company with 20% plus profit ratio) and Infosys. The other two companies are banking firms namely HDFC Bank and HDFC. The top five companies by revenue are not that profitable by the ratio of profits to revenue and most of them are public sector companies. 
The data was analyzed using python, pandas and plotted using seaborn library. 

Sunday, January 19, 2020

ET500-Raw data to pandas dataframe to charts

ET500-Raw data to pandas dataframe to charts

Recently Economic Times published ET500 – list of top 500 companies in India.

I copied the data from website for top 25 companies. It was continuous and looked like this.

There were no delimiters. Using python, I imported the file, and using re module; cleaned and separated elements in each line. Then imported these in pandas data frame. It looked as this.


The next step was to plot it. The below plot is using seaborn library.


Within these spaces the topprofit-making (Rs 30,000 Cr and above) companies are Reliance, ONGC, TCS. The next bracket of Rs 20,000 Cr and above but below Ra 30,000 Cr has HDFC Bank. Rs 15,000 Cr and above, but below the above levels have Indian Oil, HDFC and Infosys. 

Till Rs 10,000 Cr of PAT; revenue and PAT appear to go together. After that PAT level there is a lot of deviation in revenue levels. 

In my next blog I will do more analysis and visualization.