Showing posts with label pandas. Show all posts
Showing posts with label pandas. Show all posts

Monday, August 15, 2022

Ramsar Sites in India

India, on its 75th year of Independence (15th August), achieved a milestone of 75 Ramsar sites. 


It recently added 11 more wetlands to the list of Ramsar sites to make total 75 Ramsar sites covering an area of 13 lakh 26 thousand 677 Hectare in the country. 


I have created an interactive map. Kindly visit the map at https://andybandra.github.io/ramsar_india/ You can see and click the map shown below.


On the map you may click an icon to get the name and state of a site, and its area in hectares.  


India is one of the Contracting Parties to Ramsar Convention, signed in Ramsar, Iran in 1971. India signed it on 1st February 1982. These are wetlands deemed to be of "international importance" under the Ramsar Convention.


What is RAMSAR Wetland Sites

The Ramsar Convention is an international treaty for the conservation and sustainable utilization of wetlands, recognizing the fundamental ecological functions of wetlands and their economic, cultural, scientific, and recreational value.


Tamil Nadu has the highest number of Ramsar Sites in India with 14 Ramsar Sites. The largest site is marked in blue icon. It is the Sundarbans National Park a national parktiger reserve and biosphere reserve in West Bengal,


The list is available at https://en.wikipedia.org/wiki/List_of_Ramsar_sites_in_India.



75 Ramsar India Sites

Saturday, December 11, 2021

KeralaDay3

Kerala Day 3

Periyar Tiger Reserve

We started from Munnar. We came down a moutain range and climed up another. I did not note down the geo locations while coming down and hence those are missing. While we did not see tigers and elephants we did see the following. Great cormorant, Woolly-necked stork, Sambar deer, Bengal monitor, White-throated kingfisher, Brahminy Kite and Gaur.
Please see the photos my facebook.

The path taken



The above interactive visualization was made using Python and Folium. I wanted to make a visualization with Folium for last 2 years!
The folium library in turn uses the leaflet library to render the web-page visualisation. Using Folium I do not have to worry about the HTML, CSS, Javascript code! Plus Python allows me to hold my data in pandas and manupulation of the data becomes convenient. Customizing markers and icons becomes easy using both the pandas and folium libraries.

Folium was not part of the standard Anaconda distribution. It had to be downloaded by using the following command.
$ conda install folium -c conda-forge
at the Anaconda promo. But before that I created a separate environment for folium (folium_project) by closing the base environment. The command was
$ conda create --name folium_project --clone base

Folium allows one to leverage the power of leaflet by just writing a few lines of code. The web-page visualization including its code is on my github account.

Monday, October 26, 2020

Economic Times Top 10 companies

ET_Top_10_WS1

To scrape the web,

to get the table,

of Economic Times

2019 Top 10 Companies

  • Let us download,
  • the necessary libraries.
In [43]:
from bs4 import BeautifulSoup
from urllib.request import urlopen, Request
import lxml
import pandas as pd
from tabulate import tabulate
  • The link address is:link

Using URL open and
reading the URL using BeautifulSoup failed.
The error was
HTTP Error: Forbidden

  • So I tried the following.
In [ ]:
site= "https://economictimes.indiatimes.com/et500"
hdr = {'User-Agent': 'Mozilla/5.0'}
req = Request(site,headers=hdr)
page = urlopen(req)
soup = BeautifulSoup(page)
  • Let us test

soup.title

The output was:

<title>ET 500 Company List - List of Top Companies in India 2019 | Economic Times ET500</title>

  • It worked.

*Find table rows

In [ ]:
rows = soup.find_all('tr')
print(rows[:3]) 

[<tr class="table_head"><th colspan="2">Rank</th><th rowspan="2">Company Name</th><th rowspan="2">Market Cap <br/> (Rs. Cr)</th><th rowspan="2">Revenue <br/> Change%</th><th rowspan="2">PAT <br/> Change%</th></tr>, <tr class="table_head"><th>2019</th><th>2018</th></tr>, <tr class="data light"><td class="Rnk1 textC">1</td><td class="Rnk2 textC">2</td><td class="comp"><a class="etcmpnylist" href="/stocks/companyid-13215.cms" target="_blank" title="Reliance Industries Ltd.">Reliance Industries Ltd.</a></td><td class="textR">949,280.41</td><td class="textR">43.36</td><td class="textR">9.74</td></tr>]

  • From Table rows let us look at table data items
In [ ]:
list_rows = []
for row in rows:
    row_td = row.find_all('td')
    str_cells = str(row_td)
    cleantext = BeautifulSoup(str_cells,'lxml').getText()
    list_rows.append(cleantext) # WORKS

list_rows

['[]',
 '[]',
 '[1, 2, Reliance Industries Ltd., 949,280.41, 43.36, 9.74]',
 '[2, 1, Indian Oil Corporation Ltd., 125,768.47, 24.99, -21.69]',
 '[3, 3, Oil And Natural Gas Corporation Ltd., 172,135.97, 29.97, 37.95]',
 '[4, 4, State Bank of India, 290,140.98, 7.88, LP]',
 '[5, 5, Tata Motors Ltd., 48,588.33, 2.06, PL]',
 '[6, 6, Bharat Petroleum Corporation Ltd., 111,550.97, 26.10, -13.39]',
 '[7, 7, Hindustan Petroleum Corporation Ltd., 45,237.26, 24.94, -7.31]',
 '[8, 8, Rajesh Exports Ltd., 20,200.00, -6.35, 2.08]',
 '[9, 9, Tata Steel Ltd., 45,758.52, 15.42, -23.94]',
 '[10, 10, Coal India Ltd., 126,223.50, 14.04, 148.09]']
  • Now let us prpare the dataframe
In [ ]:
df = pd.DataFrame(list_rows)
In [ ]:
df
In [ ]:
df1 = df.loc[2:]
In [ ]:
df2 = df1[0].str.split(',',expand=True)
In [ ]:
df2[0] = df2[0].str.strip('[')
df2[6] = df2[6].str.strip(']')
print(df2.head(3))
In [ ]:
df2[7]=df2[3]+df2[4] # Join 
df3 = df2.drop([3,4],axis=1)
df3 = df3[[0,1,2,7,5,6]] 

The column containing market capitalisation had a ',' in its numbers.

While splitting the data frame df1, the column with market capitalization also split inadvertently.

The number was split in to two columns 3 & 4.

So the two columns were joined in col no 7.

Later dropped columns 3 and 4. And rearranged columns sequence in df3.

In [ ]:
df3.head(3)
In [ ]:
df3_md = df3.to_markdown()
In [42]:
print(df3_md)
|    |   0 |   1 | 2                                    |        7 |     5 | 6      |
|---:|----:|----:|:-------------------------------------|---------:|------:|:-------|
|  2 |   1 |   2 | Reliance Industries Ltd.             | 949280   | 43.36 | 9.74   |
|  3 |   2 |   1 | Indian Oil Corporation Ltd.          | 125768   | 24.99 | -21.69 |
|  4 |   3 |   3 | Oil And Natural Gas Corporation Ltd. | 172136   | 29.97 | 37.95  |
|  5 |   4 |   4 | State Bank of India                  | 290141   |  7.88 | LP     |
|  6 |   5 |   5 | Tata Motors Ltd.                     |  48588.3 |  2.06 | PL     |
|  7 |   6 |   6 | Bharat Petroleum Corporation Ltd.    | 111551   | 26.1  | -13.39 |
|  8 |   7 |   7 | Hindustan Petroleum Corporation Ltd. |  45237.3 | 24.94 | -7.31  |
|  9 |   8 |   8 | Rajesh Exports Ltd.                  |  20200   | -6.35 | 2.08   |
| 10 |   9 |   9 | Tata Steel Ltd.                      |  45758.5 | 15.42 | -23.94 |
| 11 |  10 |  10 | Coal India Ltd.                      | 126224   | 14.04 | 148.09 |

Columns to be Renamed

In [ ]:
df3.rename(columns = {0:'Rank2019',
           1:'Rank2018',
           2:' Company Name',
           7:' Market Cap  (Rs. Cr)',
           5:' Revenue  Change%',
           6:' PAT  Change%'},
           inplace = True)
In [40]:
df3.head(3)
Out[40]:
Rank2019 Rank2018 Company Name Market Cap (Rs. Cr) Revenue Change% PAT Change%
2 1 2 Reliance Industries Ltd. 949280.41 43.36 9.74
3 2 1 Indian Oil Corporation Ltd. 125768.47 24.99 -21.69
4 3 3 Oil And Natural Gas Corporation Ltd. 172135.97 29.97 37.95

Now let us get the table header.

In [ ]:
col_labels = soup.find_all('th')
all_header = []
col_str = str(col_labels)
cleantext2 = BeautifulSoup(col_str,'lxml').get_text()
all_header.append(cleantext2)
In [ ]:
dfh1 = pd.DataFrame(all_header)    
dfh2 = dfh1[0].str.split(',',expand=True)
In [ ]:
dfh2[0] = dfh2[0].str.strip('[')
dfh2[6] = dfh2[6].str.strip(']')
In [ ]:
dfh_a = dfh2.drop([5,6],axis=1) 
dfh_a.columns = dfh_a.iloc[0] # Add column names. 
dfh_a.insert(1,'Rank2018','Rank2018',True) # Add a column. 
dfh_a.rename(columns = {'Rank' : 'Rank2019'}, inplace = True)
In [ ]:
frames = [dfh_a,df3]
df4 = pd.concat(frames)
In [39]:
df5 = df4.drop(0)
df5
Out[39]:
Rank2019 Rank2018 Company Name Market Cap (Rs. Cr) Revenue Change% PAT Change%
2 1 2 Reliance Industries Ltd. 949280.41 43.36 9.74
3 2 1 Indian Oil Corporation Ltd. 125768.47 24.99 -21.69
4 3 3 Oil And Natural Gas Corporation Ltd. 172135.97 29.97 37.95
5 4 4 State Bank of India 290140.98 7.88 LP
6 5 5 Tata Motors Ltd. 48588.33 2.06 PL
7 6 6 Bharat Petroleum Corporation Ltd. 111550.97 26.10 -13.39
8 7 7 Hindustan Petroleum Corporation Ltd. 45237.26 24.94 -7.31
9 8 8 Rajesh Exports Ltd. 20200.00 -6.35 2.08
10 9 9 Tata Steel Ltd. 45758.52 15.42 -23.94
11 10 10 Coal India Ltd. 126223.50 14.04 148.09
In [ ]:
df5_md = df5.to_markdown()
In [41]:
print(df5_md)
|    |   Rank2019 |   Rank2018 |  Company Name                        |    Market Cap  (Rs. Cr) |    Revenue  Change% |  PAT  Change%   |
|---:|-----------:|-----------:|:-------------------------------------|------------------------:|--------------------:|:----------------|
|  2 |          1 |          2 | Reliance Industries Ltd.             |                949280   |               43.36 | 9.74            |
|  3 |          2 |          1 | Indian Oil Corporation Ltd.          |                125768   |               24.99 | -21.69          |
|  4 |          3 |          3 | Oil And Natural Gas Corporation Ltd. |                172136   |               29.97 | 37.95           |
|  5 |          4 |          4 | State Bank of India                  |                290141   |                7.88 | LP              |
|  6 |          5 |          5 | Tata Motors Ltd.                     |                 48588.3 |                2.06 | PL              |
|  7 |          6 |          6 | Bharat Petroleum Corporation Ltd.    |                111551   |               26.1  | -13.39          |
|  8 |          7 |          7 | Hindustan Petroleum Corporation Ltd. |                 45237.3 |               24.94 | -7.31           |
|  9 |          8 |          8 | Rajesh Exports Ltd.                  |                 20200   |               -6.35 | 2.08            |
| 10 |          9 |          9 | Tata Steel Ltd.                      |                 45758.5 |               15.42 | -23.94          |
| 11 |         10 |         10 | Coal India Ltd.                      |                126224   |               14.04 | 148.09          |

So we got the table we wanted

It may be done in more efficient ways, but this is my first such attempt on my own.