Sunday, December 26, 2021

How chart can be made interactive using Altair Vega-Lite Python and Pandas.

Interactive Altair Vega-Lite Plot
using Python and Pandas

Milk Production over years

Click a country name, to see its chart

MMT is million metric tonnes

India is #1, followed by #USA


The interactive Chart


The code is available on my github account.

Wednesday, December 15, 2021

Indian_Unicorns_Plots

Indian Unicorns


The data was downloaded from web, cleaned using Excel Power Query.

Total number of Unicorns is 51.


Total Valuation of the unicorns 148.45 USD Billion


The total valuation of these unicorns is more than market capitalisation of Infosys and Wipro added together.


1) Valuation plotted against cities and in turn Industries


BengaluruGurugramMumbaiNew DelhiPuneNoidaFaridabadThaneChennaiJaipurEdtechFintechInternetAutoSupply chainHealthMobile_telecommE/comm DTCOtherTravelE/comm DTCSupply chainFintechHealthInternetFintechEdtechE/comm DTCFintechE/comm DTCE/comm DTCFintechE/comm DTCE/comm DTCOtherE/comm DTC0.025.4Valuation ($B)BengaluruChennaiFaridabadGurugramJaipurMumbaiNew DelhiNoidaPuneThaneCity

2) Count of Unicorns per industry per city


The size of the circle is proportional to valuation.

BengaluruGurugramMumbaiNew DelhiPuneNoidaFaridabadThaneChennaiJaipurEdtech3Fintech7Internet7Auto2Supply chain3Health1Mobile_telecomm1E-comm DTC1Other1Travel1E-comm DTC4Supply chain3Fintech1Health1Internet1Fintech2Edtech2E-comm DTC1Fintech1E-comm DTC1E-comm DTC2Fintech1E-comm DTC1E-comm DTC1Other1E-comm DTC13.011.9Valuation ($B) [sum]BengaluruChennaiFaridabadGurugramJaipurMumbaiNew DelhiNoidaPuneThaneCity [csvDistinct]

Credit:app.rawgraphs.io for graphics and CB Insights for data

Saturday, December 11, 2021

KeralaDay3

Kerala Day 3

Periyar Tiger Reserve

We started from Munnar. We came down a moutain range and climed up another. I did not note down the geo locations while coming down and hence those are missing. While we did not see tigers and elephants we did see the following. Great cormorant, Woolly-necked stork, Sambar deer, Bengal monitor, White-throated kingfisher, Brahminy Kite and Gaur.
Please see the photos my facebook.

The path taken



The above interactive visualization was made using Python and Folium. I wanted to make a visualization with Folium for last 2 years!
The folium library in turn uses the leaflet library to render the web-page visualisation. Using Folium I do not have to worry about the HTML, CSS, Javascript code! Plus Python allows me to hold my data in pandas and manupulation of the data becomes convenient. Customizing markers and icons becomes easy using both the pandas and folium libraries.

Folium was not part of the standard Anaconda distribution. It had to be downloaded by using the following command.
$ conda install folium -c conda-forge
at the Anaconda promo. But before that I created a separate environment for folium (folium_project) by closing the base environment. The command was
$ conda create --name folium_project --clone base

Folium allows one to leverage the power of leaflet by just writing a few lines of code. The web-page visualization including its code is on my github account.

Friday, December 3, 2021

Indian Unicorns 2021

The Indian unicorns is a success story.
I have captured its essence in the dashboard below.

Bangalore City

and

Fintech Industry

leads the show.

The Unicorn Dashboard made using Tableau

Sunday, November 28, 2021

Milk_Production

World wide Milk Production 1970 - 2018


The bar race chart shows milk production of top 15 countries over a period of 1970 to 2018. Please press the play button at the bottom right to play the animation.

World Milk Day

National Milk Day - India

Background

Milk Day is celebrated in India on 26th November 2021 in memory of the father of the white revolution, Dr. Verghese Kurien. In the year 2014, the Indian Dairy Association (IDA) took the initiative to celebrate National Milk Day in India. In the year 1965, Prime Minister Lal Bahadur Shastri assigned Dr. Kurien to create the National Dairy Development Board (NDDB).

Today

In India we have come a long way in becoming the world's largest milk producer. In the 70s the then chief minister of Maharashtra State had predicted that milk would be made available through taps, like tap water. India has achieved what he meant. Making milk available all when needed. I have plotted the 10 Largest Milk Producing Countries 2020. India was at the top with 196.18 million tonnes of milk production in 2019.

Sunday, November 14, 2021

Kerala_Tour_Day2

Kerala Tour

Munnar

Munnar rises as three mountain streams merge - Mudrapuzha, Nallathanni and Kundala. 1,600 m above sea level, this hill station was once the summer resort of the erstwhile British Government in South India. Sprawling tea plantations, picturesque towns, winding lanes and holiday facilities make this a popular resort town. Among the exotic flora found in the forests and grasslands here is the Neelakurinji. This flower which bathes the hills in blue once in every twelve years, will bloom next in 2030. Munnar also has the highest peak in South India, Anamudi, which towers over 2,695 m. The stay in Munnar was blissful.

Mattupetty

Another prime draw for visitors, located about 13 km from Munnar Town, is Mattupetty. Situated at a height of 1700 m above sea level, Mattupetty is known for its storage masonry dam and the beautiful lake, which offers pleasurable boat rides, enabling one to enjoy the surrounding hills and landscape. Mattupetty's fame is also attributed to the dairy farm run by the Indo-Swiss Livestock Project, where one can encounter different high yielding breeds of cows. We did a speed boat ride, that was exhilarating.

Echo Point

About 15 km from Munnar lies the famous Echo Point. Popular for its natural echo phenomenon, one can always see the area full of eager visitors throughout the year. At an altitude of about 600 ft, people love taking a walk in the lush greenery surrounding the place. It is an ideal picnic spot. People usually come across Echo Point while making the trip to Top Station, the highest point (1700m) in Munnar, on the Munnar-Kodaikanal road. Here the clouds seem like they are just an arm’slength away and one gets a brilliant view of the valley below. This is also a prime viewing spot for the Neelakurinji (Strobilantheskunthianus), flowers which bloom once every twelve years. We bought hand made soaps and chocolates.

Eravikulam National Park

One of the main attractions near Munnar is the Eravikulam National Park. This park is famous for its endangered inhabitant - the Nilgiri Tahr. Spread over an area of 97 sq. km., this park is also home to several species of rare butterflies, animals and birds. A great place for trekking, the park offers a magnificent view of the tea plantations caressed by blankets of mists. The park becomes a hot destination when the hill slopes here get covered in a carpet of blue, resulting from the flowering of the Neelakurinji. It is a plant endemic to this part of the Western Ghats which blooms once in twelve years. We walked to the highest point possible. We did see the Nilgiri Tahr. It did offer magnificent view of the tea plantations caressed by blankets of mists. The mighty waterfall was breathtaking.
Please visit the link for more information

The Interactive Map

Saturday, November 13, 2021

Kerala_Tour_Day1

Our Kerala Tour Day1

During Diwali holidays we traveled to Kerala from Mumbai. We visited the following places on Day1. All these places were worth visiting.

Marine Drive

Marine Drive is a picturesque promenade in Kochi, India. It is built facing the backwaters, and is a popular hangout for the local populace. Despite its name, no vehicles are allowed on the walkway. From Marine Drive we took a Harbour Cruise Ride to visit the island on the other side.

Chinese fishing nets

Chinese fishing nets (Cheena vala) are a type of stationary lift net in India. They are fishing nets that are fixed land installations for fishing.

Dutch Palace

The Mattancherry Palace is a Portuguese palace popularly known as the Dutch Palace, in Mattancherry, Kochi, in the Indian state of Kerala which features Kerala murals depicting portraits and exhibits of the Rajas of Kochi. The palace was included in the "tentative list" of UNESCO World Heritage Site. More information on it is available at the Dutch Palace

Jewish Synagogue

The Paradesi Synagogue, in a corner of Jew Town, is more than a hundred years old and houses many rare antiques. The synagogue, that woos many visitors, adds to the quaint charm of Mattancherry. The Jewish synagogue was built in 1568, almost 1500 years after the beginning of the Jewish connection with Kerala. It was built on the land, adjacent to the Mattancherry Palace, given by the erstwhile king of Cochin.

St Francis Church

The St. Francis Church, well-known for its beautiful architecture and ambience, is believed to be one of the oldest churches built by the Europeans in India. The church’s history dates back to 1503. More information on it is available at St Francis Church

Below is the interactive map of the above locations.

The interactive map was made using Leaflet which is the leading open-source JavaScript library for mobile-friendly interactive maps.

The code is available on my GitHub account. The map is available on my GitHub Page.

Monday, November 1, 2021

Data Organisation in spreadsheets : Part-III

Data Organisation in spreadsheets part III

Further to my earlier post, here is the last post on the topic.

-Make it a Rectangle

The best layout for your data within a spreadsheet is as a single big rectangle with rows corresponding to subjects and columns corresponding to variables.


The first row should contain variable names, and please do not use more than one rows for the variable names.


-Create a Data Dictionary 

It is helpful to have a separate file that explains what all of the variables are. It is helpful if this is laid out in rectangular form, so that the data analyst can make use of it in analyses.


Such a “data dictionary” might contain:

  • The exact variable name as in the data file
  • A version of the variable name that might be used in data visualizations
  • A longer explanation of what the variable means
  • The measurement units
  • Expected minimum and maximum values

-No Calculations in the Raw Data Files

Your primary data file should contain just the data and nothing else: no calculations, no graphs.


-Do Not Use Font Color or Highlighting as Data

As the logic will not be clear to the person analyzing the data. Instead add a column to comment on the value.


-Make Back ups

Make regular back ups in different locations.


-Use Data Validation to Avoid Errors

It might seem cumbersome but it will help you avoid data entry mistakes.

It would be worth it.


-Save the data in Plain Text Files

Keep the copy of your data files in a plain text format, with comma or tab delimiters.


- courtesy


Data Organization in Spreadsheets


Karl W. Broman & Kara H. Woo


Saturday, October 30, 2021

Organising data in Spreadsheets : Part-II

Data Organization in Spreadsheets (Part-II)

Continuing with my earlier blog with the below link; I am adding a few more points in this new blog.

- Choose Good Names for Things.


It is important to pick good names for things. This can be hard, and so it is worth putting some time and thought into it. 

As a general rule, do not use spaces, either in variable names or file names. They make programming harder: the analyst will need to surround everything in double quotes, like ”Health Care”, rather than just writing Health_Care. Where you might use spaces, use underscores or perhaps hyphens - pick one and be consistent. 

Avoid special characters, except for underscores and hyphens. Other symbols ($, @, %, #, &, *, (, ), !, /, etc.) often have special meaning in programming languages, and so they can be harder to handle. They are also a bit harder to type. 

- Choose Good Names for Things.


The main principle in choosing names, whether for variables or for file names, is short, but meaningful. So not too short. Finally, never include “final” in a file name. You will invariably end up with “final_ver2.” 

- Write Dates as YYYY-MM-DD.


When entering dates, please consider using the global “ISO 8601” standard, YYYY-MM-DD, such as 2013-02-27. Or be consistent with the date format for your region. 

- No Empty Cells Fill in all cells. 


Use some common code to fill missing data. 

- Put Just One Thing in a Cell.


The cells in your spreadsheet should each contain one piece of data. Do not put more than one thing in a cell. For example do not write employee name and ID in one cell. 

- Put Just One Thing in a Cell.

Finally, do not merge cells

It might look pretty, but you end up breaking the rule of no empty cells. Also it is not clear how to divide the number in the merged cell in to its constituent cells.

Further reading: For the third part and final part refer to the link given below.


Tuesday, October 26, 2021

Organizing data in Spreadsheets : Part-I

 Data Organisation in Spreadsheets (Part-I)

1. About Spreadsheets in general 

  • Congratulations on joining this journey of making better spreadsheets.
  • Spreadsheet is easy to use and that creates its own challenges.
  • Is the Spreadsheet formatted for Human Eyes?
  • Is the Spreadsheet formatted for a Computer? 
  • Is the Spreadsheet formatted for Both?

2. The answer is - Be Consistent…

-The first rule of data organization is be consistent. 

  • Whatever you do, do it consistently. Entering and organizing your data in a consistent way from the start will prevent you and your collaborators from having to spend time harmonizing the data later.
  • Use consistent codes for categorical variables. For a categorical variable like the mode of transport in a study of daily office commuters, use a single common value for private transport (e.g., “private”), and a single common value for public transport (e.g., “public”).

-Use a consistent fixed code for any missing values. 

  • Please have every cell filled in, so that one can distinguish between truly missing values and unintentionally missing values.
  • You could also use a hyphen.

-Use consistent variable names. 

  • If in one file (e.g., the first batch of subjects), you have a variable called “Public_Travel,” then call it exactly that in other files.

-Use a consistent data layout in multiple files. 

  • If your data are in multiple files and you use different layouts in different files, it will be extra work for the analyst to combine the files into one dataset for analysis.
  • With a consistent structure, it will be easy to automate this process.

-Use consistent file names. 

  • Have some system for naming files. If one file is called “Travel_batch1_2021-01-31.csv,” then do not call the file for the next batch “Travel2.csv” but rather use “Travel_batch2_2021-02-28.csv.
  • Keeping a consistent file naming scheme will help ensure that your files remain well organized, and it will make it easier to batch process the files if you need to.

-Use a consistent format for all dates, 

  • A format could be YYYY-MM-DD, for example, 2015-08-01. Or follow your local way of formatting dates.
  • If sometimes you write 8/1/2015 and sometimes 8-1-15, it will be more difficult to use the dates in analyses or data visualizations.

-Use consistent phrases in your notes. 

  • If you have a separate column of notes (e.g., “Personal Car” or “Bus”), be consistent in what you write. Do not sometimes write “Personal Car” and sometimes “Personal car,” or sometimes “Local Train” and sometimes “Local” or “Train”.

-Be careful about extra spaces within cells. 

  • A blank cell is different than a cell that contains a single space. And “Train” is different from “ Train ” (i.e., with spaces at the beginning and end). Similarly " Train" or "Train ". It has a space at the beginning and end respectively.

3. Further reading:

The links for the other two parts of this post.


Part-II


Part II post.


Part-III


Part III post.

Saturday, October 23, 2021

Indian Unicorns

Indian Unicorns grouped with select top industries and total valuations

EduTech can become a leader in the world.
Data downloaded from Web, cleaned and analysed using Google Sheet.