In this post
So far we have looked at collecting data using surveys and asking people ourselves. This is known as primary data since nobody has ever collected this information before and we have found the answers ourselves. Another form of data is secondary data. Secondary data makes use of other people who have gone out and done research and we then try to analyse the findings. Usually, secondary data is collected from tables and figures that have been published by other people and not ourselves.
The internet is obviously a very good resource for finding secondary data but, since we have not collected the information ourselves, we should be a little more sceptical about what we find. Just because one piece of data says one thing does not make it true as the person carrying out the experiment may have been careless by not paying attention to certain factors. This would make the data unreliable.
Grouped data
When looking at data we may want to split the findings into groups to make the data easier to manage. When doing this we need to make sure that the groups are not ambiguous, as seen earlier (no values could go into two groups). As well as this the group size should be collected with great care. If group sizes are too small then the data would become spread out and we would eventually get one observation in each group: this would make grouping the data pointless!
Another mistake would be for the intervals to be too large. This could make every value take the same group which would again make the grouping of data pointless. If a doctor measured people’s body temperatures and decided to group the results in classes of 10°C then every person’s temperature would fall into the same category and the other intervals would have no data in them. Clearly, we wish to avoid this as we would simply be wasting time and not learn anything new about the data that we have.
Discrete and continuous data
There are two types of commutative data: discrete and continuous. Discrete data means that an answer must be a certain value. There could be a large number of possible values that the data might be but there must be a finite number. An example of this would be if you were to ask people their age in years. Clearly, people must be a certain age which would be from 0 to 120 (unless someone is very old and even then their age must be a specific number).
Continuous data can be any number and is not limited by having to fall into a certain category. This basically means that the data collected could have an infinite amount of numbers after the decimal point. So, if we were to measure the temperature outside we could get the value 15.9°C, but this value has been rounded. By measuring the value more accurately we could get the temperature to any number of decimals such as 15.9354127283… which could go on forever. Continuous data can still be limited above and below. Just because we said that continuous data is limited does not mean that it must have a minimum and maximum, it simply means that if we were to measure to a higher accuracy the number would keep getting closer to the true value by gaining numbers after the decimal point. Since continuous data is not destined to fall into a certain category the data must be grouped.
Things to be wary of
One of the main things that you should be wary of when looking at data is to make sure that the information is not biased. This is when the information has been collected under false pretences and will therefore not show a true picture of reality.
Obviously, the questions that are used need to be fair and no assumptions should be made of the participant. This is much easier to control when conducting your own surveys but, when using secondary data, a source with a good reputation should be used to make sure that the information has been collected correctly.