The distribution of a particular data set under analysis has a shape; that is, when a histogram displaying univariate numerical data is analysed, you can see it has a shape. Shapes include:
(a) Approximately symmetric
(b) Symmetric ~ note; only if it is bell-shaped can you say a distribution is symmetric.
(b) Positively skewed
(c) Negatively skewed.
You may also like to comment on the presence of any possible outliers. Outliers are of course data values that lie a significant distance away from the MAIN body of data.
So, when you come to actually state what shape the distribution has, you would do it like so:
The distribution of head circumferences of new born boys is negatively skewed with no apparent outliers.
The centre of a distribution is basically the median. The central point of tendency and the mid-point of your data. So, 50% of your data lies below this *centre*, while 50% of the data set lies above this *centre*.
Therefore, you need to find this central point, using the median.
(n+1/2)th position ~ you will calculate the number of data values you have simply by looking at the frequency of each bar. E.g. you have a bar that goes from 10-15; by looking at the frequency on the y-axis, you can see 3; therefore, 3 data values out of your total data set are found here.
You then find this median position, count through your data until you reach a point where your median position lies somewhere in a particular interval (i.e. in continuous numerical data) or even a particular bar (i.e. in discrete data). You then set it out like so.
The centre of the distribution lies somewhere in the interval 15-16cm.
Spread is a really good indication of the variability of the data set under analysis. So how can you calculate the spread. Actually, when you look at Univariate data, you'll see that spread can be calculated in three very appropriate ways:
(a) Range (Max - Min)
(b) IQR [Interquartile Range] (Q3 - Q1)
(c) Standard Deviation
However, when you're describing a histogram, the only appropriate statistical figure (summary statistic) to use would be the range. Therefore you would calculate the range by finding the difference between the maximum data value in the data set and the minimum data value in the data set. You would state this in the report.
The spread of the distribution, as measured by the range, is 10cm (17cm-7cm).
So there you have it; just remember, the four main things you have to talk about in the analysis of a histogram:
(b) Outliers (if any).