Data Organisation in spreadsheets part III
Further to my earlier post, here is the last post on the topic.
-Make it a Rectangle
The best layout for your data within a spreadsheet is as a single big rectangle with rows corresponding to subjects and columns corresponding to variables.
The first row should contain variable names, and please do not use more than one rows for the variable names.
-Create a Data Dictionary
It is helpful to have a separate file that explains what all of the variables are. It is helpful if this is laid out in rectangular form, so that the data analyst can make use of it in analyses.
Such a “data dictionary” might contain:
- The exact variable name as in the data file
- A version of the variable name that might be used in data visualizations
- A longer explanation of what the variable means
- The measurement units
- Expected minimum and maximum values
-No Calculations in the Raw Data Files
Your primary data file should contain just the data and nothing else: no calculations, no graphs.
-Do Not Use Font Color or Highlighting as Data
As the logic will not be clear to the person analyzing the data. Instead add a column to comment on the value.
-Make Back ups
Make regular back ups in different locations.
-Use Data Validation to Avoid Errors
It might seem cumbersome but it will help you avoid data entry mistakes.
It would be worth it.
-Save the data in Plain Text Files
Keep the copy of your data files in a plain text format, with comma or tab delimiters.
- courtesy
Data Organization in Spreadsheets
No comments:
Post a Comment