User Tools

Site Tools


help:data_structure

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
help:data_structure [2015/03/29 00:06]
admin
help:data_structure [2015/03/29 01:54] (current)
admin [Getting your Data Structure Right for SOFA]
Line 1: Line 1:
 [[http://​www.sofastatistics.com/​userguide.php | Contents]] [[http://​www.sofastatistics.com/​userguide.php | Contents]]
  
-====== Getting your Data Structure Right for SOFA ======+====== Getting your data structure right for SOFA ======
  
 ===== Data Format ===== ===== Data Format =====
  
-Most often, you enter some data into a statistical program ​and everything Just Works. You don't have to think about your data structure particularly. But sometimes you want to analyse one variable by another e.g. height by gender and SOFA doesn'​t seem to allow you to. Or you want to see if there is a difference between, for example, different years, and there is no way of doing it. Or perhaps you want to do a paired t-test and you can't get the correct results. +Most often, you enter some data into SOFA and everything Just Works. You don't have to think about your data structure particularly. But sometimes you want to analyse one variable by another e.g. height by gender and SOFA doesn'​t seem to allow you to. Or you want to see if there is a difference between, for example, different years, and there is no way of doing it. Or perhaps you want to do a paired t-test and you can't get the correct results.
- +
- +
- +
- +
  
 If you have trouble analysing your variables in SOFA Statistics, check that: If you have trouble analysing your variables in SOFA Statistics, check that:
  
-  - Your data is structured the right way for the analysis you want. For example, if SOFA needs a column for gender ​and a column for height, there will be a problem if your data has a column for male height ​and a column for female height.+  - Your data is structured the right way for the analysis you want. For example, if SOFA needs a column for year and a column for score, there will be a problem if your data has a column for 2015 score and a column for 2016 score.
   - Any variables you need to analyse as numbers e.g. for correlation analyses or histograms, have actually been entered/​imported as numeric data not as text.   - Any variables you need to analyse as numbers e.g. for correlation analyses or histograms, have actually been entered/​imported as numeric data not as text.
  
Line 23: Line 18:
 ==== Types of SOFA Statistics analysis ==== ==== Types of SOFA Statistics analysis ====
  
-=== Differences between groups ​===+=== Analysing One Variable "​By"​ Another ​===
  
-Instead of one column per condition or group there needs to be a group column and a measures column.+{{:​help:​group_diffs_data_example.jpg|}}
  
-Example of bad format (for SOFA):+The By variable must be single variable with different values in it (long format), not one column per option ​(wide format). See [[http://​www.theanalysisfactor.com/​wide-and-long-data/​|]]. ​
  
-  Male Female +E.g
-  186  167 +
-  179  170 +
-  ...+
  
-Example of a good format (for SOFA):+== By Gender ==
  
-  Gender ​ Height +The long format is good and the wide format is bad for this purpose.
-  Male    186 +
-  Female ​ 167 +
-  Male    179 +
-  Female ​ 170 +
-  ...+
  
-In this case, the ranked or averaged variable would be Height, the Group By variable would be Gender, and groups a and b would be Male and Female respectively.+{{:​help:​gender_long_format.png?nolink |}}
  
-Or if we were looking at the fictitious weight data in the demonstration data and we wanted to know if it differed between two countries:+{{:help:​gender_wide_format.png?​nolink |}}
  
-{{:help:group_diffs_data_example.jpg|}}+== By Year == 
 + 
 +Once again, the long format is good and the wide format is bad. 
 + 
 +{{:​help:​years_long_form.png?​nolink |}} 
 + 
 +{{:help:years_wide_form.png?​nolink ​|}}
  
 === Relationships between two different variables === === Relationships between two different variables ===
Line 88: Line 81:
 ==== Restructuring your data ==== ==== Restructuring your data ====
  
-The most common problem is when your data has the data for different groups in different variables. +The most common problem is when your data has the data for different groups in different variables. The easiest way to handle this might be to change the data in a spreadsheet and import it in the restructured form.
- +
-E.g. height data for two genders: +
- +
-  Male Female +
-  186  167 +
-  179  170 +
-  ... +
- +
-The easiest way to handle this might be to change the data in a spreadsheet and import it in the restructured form+
- +
-  - Insert group by column\\ {{:​help:​insert_group_by_column.jpg|}} +
-  - Transfer first variable (Male) by renaming it to the measure (Height) and populating the group by column (Gender) for that variable\\ {{:​help:​first_var_into_group_by_col.jpg|}} +
-  - Transfer second variable by pasting height values below and completing the Gender column with the variable (Female)\\ {{:​help:​second_var_into_group_by_col.jpg|}} +
-  - Delete the variable not needed (Female in this case) +
- +
-NB You could have used 1 for Male and 2 for Female if you preferred and added value labels to Gender once the data was imported into SOFA Statistics. ​ See [[help:​variable_details | Setting variable details e.g. labels]] +
- +
-The same process can be used if there are multiple groups e.g. countries instead of genders.+
  
 ===== Numbers stored in a text variable ===== ===== Numbers stored in a text variable =====
help/data_structure.1427601970.txt.gz · Last modified: 2015/03/29 00:06 by admin