User Tools

Site Tools


help:data_structure

Contents

Getting your data structure right for SOFA

Data Format

Most often, you enter some data into SOFA and everything Just Works. You don't have to think about your data structure particularly. But sometimes you want to analyse one variable by another e.g. height by gender and SOFA doesn't seem to allow you to. Or you want to see if there is a difference between, for example, different years, and there is no way of doing it. Or perhaps you want to do a paired t-test and you can't get the correct results.

If you have trouble analysing your variables in SOFA Statistics, check that:

  1. Your data is structured the right way for the analysis you want. For example, if SOFA needs a column for year and a column for score, there will be a problem if your data has a column for 2015 score and a column for 2016 score.
  2. Any variables you need to analyse as numbers e.g. for correlation analyses or histograms, have actually been entered/imported as numeric data not as text.

Structuring data for analysis

The first step is to think about what you want to find out about the data. Here are some examples.

Types of SOFA Statistics analysis

Analysing One Variable "By" Another

The By variable must be a single variable with different values in it (long format), not one column per option (wide format). See http://www.theanalysisfactor.com/wide-and-long-data/.

E.g.

By Gender

The long format is good and the wide format is bad for this purpose.

By Year

Once again, the long format is good and the wide format is bad.

Relationships between two different variables

E.g. looking at linear correlation:

Age  Weight
56   86
22   55
...

In the appropriate SOFA dialog you would select one variable as A and the other as B.

Difference between two "paired" variables

E.g. looking to see if there is a difference between fuel consumption before a fuel gadget was added and afterwards:

NB each row would be the data for one vehicle (or one type of vehicle etc depending on what was being studied).

Consumption (before)    Consumption (after)
12.5                    11.7
16.1                    16.0
...

Or a difference in weight before and after a diet:

NB each row would be the data for one person.

Weight  Post-diet Weight
87      90
59      59
...

In the appropriate SOFA dialog you would select one variable as A and the other as B.

Restructuring your data

The most common problem is when your data has the data for different groups in different variables. The easiest way to handle this might be to change the data in a spreadsheet and import it in the restructured form.

Numbers stored in a text variable

If you imported your data into SOFA from a spreadsheet, the solution is probably to change the appropriate column data types to numeric and reimport the data. SOFA tries to warn you if it doesn't detect enough numeric variables for the analysis you are conducting e.g. you need at least two numeric variables to conduct a Pearson's R linear correlation analysis.

Contents

Wiki

help/data_structure.txt · Last modified: 2015/03/29 01:54 by admin