Saturday, October 30, 2021

Organising data in Spreadsheets : Part-II

Data Organization in Spreadsheets (Part-II)

Continuing with my earlier blog with the below link; I am adding a few more points in this new blog.

- Choose Good Names for Things.


It is important to pick good names for things. This can be hard, and so it is worth putting some time and thought into it. 

As a general rule, do not use spaces, either in variable names or file names. They make programming harder: the analyst will need to surround everything in double quotes, like ”Health Care”, rather than just writing Health_Care. Where you might use spaces, use underscores or perhaps hyphens - pick one and be consistent. 

Avoid special characters, except for underscores and hyphens. Other symbols ($, @, %, #, &, *, (, ), !, /, etc.) often have special meaning in programming languages, and so they can be harder to handle. They are also a bit harder to type. 

- Choose Good Names for Things.


The main principle in choosing names, whether for variables or for file names, is short, but meaningful. So not too short. Finally, never include “final” in a file name. You will invariably end up with “final_ver2.” 

- Write Dates as YYYY-MM-DD.


When entering dates, please consider using the global “ISO 8601” standard, YYYY-MM-DD, such as 2013-02-27. Or be consistent with the date format for your region. 

- No Empty Cells Fill in all cells. 


Use some common code to fill missing data. 

- Put Just One Thing in a Cell.


The cells in your spreadsheet should each contain one piece of data. Do not put more than one thing in a cell. For example do not write employee name and ID in one cell. 

- Put Just One Thing in a Cell.

Finally, do not merge cells

It might look pretty, but you end up breaking the rule of no empty cells. Also it is not clear how to divide the number in the merged cell in to its constituent cells.

Further reading: For the third part and final part refer to the link given below.


Tuesday, October 26, 2021

Organizing data in Spreadsheets : Part-I

 Data Organisation in Spreadsheets (Part-I)

1. About Spreadsheets in general 

  • Congratulations on joining this journey of making better spreadsheets.
  • Spreadsheet is easy to use and that creates its own challenges.
  • Is the Spreadsheet formatted for Human Eyes?
  • Is the Spreadsheet formatted for a Computer? 
  • Is the Spreadsheet formatted for Both?

2. The answer is - Be Consistent…

-The first rule of data organization is be consistent. 

  • Whatever you do, do it consistently. Entering and organizing your data in a consistent way from the start will prevent you and your collaborators from having to spend time harmonizing the data later.
  • Use consistent codes for categorical variables. For a categorical variable like the mode of transport in a study of daily office commuters, use a single common value for private transport (e.g., “private”), and a single common value for public transport (e.g., “public”).

-Use a consistent fixed code for any missing values. 

  • Please have every cell filled in, so that one can distinguish between truly missing values and unintentionally missing values.
  • You could also use a hyphen.

-Use consistent variable names. 

  • If in one file (e.g., the first batch of subjects), you have a variable called “Public_Travel,” then call it exactly that in other files.

-Use a consistent data layout in multiple files. 

  • If your data are in multiple files and you use different layouts in different files, it will be extra work for the analyst to combine the files into one dataset for analysis.
  • With a consistent structure, it will be easy to automate this process.

-Use consistent file names. 

  • Have some system for naming files. If one file is called “Travel_batch1_2021-01-31.csv,” then do not call the file for the next batch “Travel2.csv” but rather use “Travel_batch2_2021-02-28.csv.
  • Keeping a consistent file naming scheme will help ensure that your files remain well organized, and it will make it easier to batch process the files if you need to.

-Use a consistent format for all dates, 

  • A format could be YYYY-MM-DD, for example, 2015-08-01. Or follow your local way of formatting dates.
  • If sometimes you write 8/1/2015 and sometimes 8-1-15, it will be more difficult to use the dates in analyses or data visualizations.

-Use consistent phrases in your notes. 

  • If you have a separate column of notes (e.g., “Personal Car” or “Bus”), be consistent in what you write. Do not sometimes write “Personal Car” and sometimes “Personal car,” or sometimes “Local Train” and sometimes “Local” or “Train”.

-Be careful about extra spaces within cells. 

  • A blank cell is different than a cell that contains a single space. And “Train” is different from “ Train ” (i.e., with spaces at the beginning and end). Similarly " Train" or "Train ". It has a space at the beginning and end respectively.

3. Further reading:

The links for the other two parts of this post.


Part-II


Part II post.


Part-III


Part III post.

Saturday, October 23, 2021

Indian Unicorns

Indian Unicorns grouped with select top industries and total valuations

EduTech can become a leader in the world.
Data downloaded from Web, cleaned and analysed using Google Sheet.