Statistics: How to describe our Data?

histogram.jpg At least there are 2 ways to present our data:

1. We can describe the characteristics of our data exactly using (mostly) Quantitave Data (i.e. numbers) –> “DESCRIPTIVE STATISTICS

e.g. : I walk approximately 2 km each day
2. further, we also can measure from a Sample to infer something about the probable characteristics of the larger group (Population) –> “INFERENTIAL STATISTICS

e.g.: We don’t expect much rain at this time of the year

Let’s talk about DESCRIPTIVE STATISTICS first:

In order to describe the characteristics of our data in Descriptive Statistics we use “VARIABLE”

there are 2 kinds of Variable:

1. Scale Variables : variables that can be scaled (mostly’numerical’)

e.g. : height (123 cm, 133.5cm , 170 cm …); number of houses (2,4,6, …);Speed (34 km/h, 50 km/h, 70.5km/h…)

if we look at the samples..we found further we can categorize the Scale Variables to be

a. “Discrete” = counted, e.g.:number of houses (2,4,6, …)

b. “Continuous” = measured, we can measure between the two data more accurate

e.g. : Speed (34 km/h, 50 km/h, 70.5km/h, 90.56754 km/h…)

2. Categorical Variables : can not be scaled, used to be ‘Boxes to tick’

This data variables are categorized into

a. “Ordinal” = can be ordered,

e.g.: level of education (Ph.D, Master, Bachelor, High School,..)

b. “Nominal” = can not be ordered

e.g. : gender (male, female); marital status (married, single, divorce)

What are points of interest when we present “Descriptive Statistics data”?

it’s used to be we present (talk about)

– the CENTRAL TENDENCY (typical of the group or value that can represent a whole data)

– its VARIABILITY (Range? the average distance from the mean?)

1. What is the CENTRAL TENDENCY?

CENTRAL TENDENCY is needed because often we want one value that best represents an entire group of scores or a number that is “typical” of the group as a whole. 3 options : pick the MEAN, MEDIAN, or MODE. All are called ‘Average’.

Tips!!!

Use MEAN : when the numerical data WITHOUT EXTREMES.

………in Excel formula = AVERAGE (range)

Use MEDIAN : when the numerical data WITH EXTREME scores to avoid a distorted average

…….in Excel formula = MEDIAN (range)

Use MODE (choosing the number or value that occurs MOST FREQUENTLY) : for categorical data

…….in Excel formula = MODE (range)

for examples:

we have data

Student ID ……….Mark

——————————————————————–

1………………………… 10

2………………………… 1 2
3……………………….. .. 6

4…………………………. 14

5…………………………… 5
—————————————–

TOTAL ………………….45

Mean ……………………….9

Median ………………….10

Because there is no Extreme Data..so to represent a whole data or Students’ Marks we can use Mean = “The average of Students Mark is 9″

But if there is extreme data, e.g.

we have data

Student ID ………. Mark

——————————————————————–

1 ……………………….10

2 ……………………….1 2
3………………………. 44

4……………………….. 14

5………………………… 5
—————————————–

TOTAL ……………….85

Mean …………………..25

Median ………………..12

So..we can not use the Mean to represent a whole data, we should use the Median = 12……….”the average of Students’ Marks is 12”

In case of Categorical Data, e.g.:

Party ……………….. Vote Number

———————————-

PKS ……………………….30

PDIP………………………10

Golkar ……………………20

or ‘a Group of Children’ with ages: 11, 12, 10, 14, 13, 14, 12

we can describe by choosing the number or value that occurs MOST FREQUENTLY

e.g. in case of the group of children above …so mode is 12

———————————————

2. What is the VARIABiLiTY?

Besides the Central Tendency of Data, often it is important to present ‘how values vary from one another?’

In this case we can measure:

– its RANGE : difference between the largest and smallest score

ex.: 90, 80, 106, 24, 33 —> Range = 106-24 = 82

– its STANDARD DEVIATION : the average distance from the mean

ex.: 90, 80, 106, 24, 35

mean= 67

value……..mean………. ‘distance’

———————————

90 ………. 67…………………..23

80 ………. 67…………………..13

106 ………. 67………………….39

24 ………. 67………………….. -43

36 ………. 67………………….. -31

—————————–

Counting its average distance can not just be added divided by number of data

because if just added the total amount of the ‘distance’ must be 0

….. so in Standard Deviation’s formula we square each ‘distance’, sum them then divide by number of data, then again square root it

Standar Deviation for Population = Square Root { [sum (square of ‘distance’)] / number of data }Standar Deviation for Sample = Square Root { [sum (square of ‘distance’)] / (number of data – 1) }

the SD for population is used when we work with all population, e.g. we count the SD for all 15,000 students’ marks in Flinders Uni

where as SD for sample (it’s more common) when we count the SD of a data sample e.g. SD of 500 students’ marks represents a whole population

in EXCEL formula:

= STDEVP (range) ………for SD Population

= STDEV (range) ………….for SD sample

One Response

  1. good stuff,helpful

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: