Monday, November 1, 2021

Data Organisation in spreadsheets : Part-III

Data Organisation in spreadsheets part III

Further to my earlier post, here is the last post on the topic.

-Make it a Rectangle

The best layout for your data within a spreadsheet is as a single big rectangle with rows corresponding to subjects and columns corresponding to variables.


The first row should contain variable names, and please do not use more than one rows for the variable names.


-Create a Data Dictionary 

It is helpful to have a separate file that explains what all of the variables are. It is helpful if this is laid out in rectangular form, so that the data analyst can make use of it in analyses.


Such a “data dictionary” might contain:

  • The exact variable name as in the data file
  • A version of the variable name that might be used in data visualizations
  • A longer explanation of what the variable means
  • The measurement units
  • Expected minimum and maximum values

-No Calculations in the Raw Data Files

Your primary data file should contain just the data and nothing else: no calculations, no graphs.


-Do Not Use Font Color or Highlighting as Data

As the logic will not be clear to the person analyzing the data. Instead add a column to comment on the value.


-Make Back ups

Make regular back ups in different locations.


-Use Data Validation to Avoid Errors

It might seem cumbersome but it will help you avoid data entry mistakes.

It would be worth it.


-Save the data in Plain Text Files

Keep the copy of your data files in a plain text format, with comma or tab delimiters.


- courtesy


Data Organization in Spreadsheets


Karl W. Broman & Kara H. Woo


No comments: