At least there are 2 ways to present our data:

1. We can describe the characteristics of our data exactly using (mostly) Quantitave Data (i.e. numbers) –> “**DESCRIPTIVE STATISTICS**”

e.g. : I walk approximately 2 km each day

2. further, we also can measure from a Sample to infer something about the probable characteristics of the larger group (Population) –> “**INFERENTIAL STATISTICS**”

e.g.: We don’t expect much rain at this time of the year

Let’s talk about DESCRIPTIVE STATISTICS first:

In order to describe the characteristics of our data in Descriptive Statistics we use “VARIABLE”

there are 2 kinds of Variable:

1. **Scale Variables **: variables that can be scaled (mostly’numerical’)

e.g. : height (123 cm, 133.5cm , 170 cm …); number of houses (2,4,6, …);Speed (34 km/h, 50 km/h, 70.5km/h…)

if we look at the samples..we found further we can categorize the Scale Variables to be

a. “**Discrete**” = counted, e.g.:number of houses (2,4,6, …)

b. “**Continuous**” = measured, we can measure between the two data more accurate

e.g. : Speed (34 km/h, 50 km/h, 70.5km/h, 90.56754 km/h…)

2. **Categorical Variables **: can not be scaled, used to be ‘Boxes to tick’

This data variables are categorized into

a. “**Ordinal**” = can be ordered,

e.g.: level of education (Ph.D, Master, Bachelor, High School,..)

b. “**Nominal**” = can not be ordered

e.g. : gender (male, female); marital status (married, single, divorce)

**What are points of interest when we present “Descriptive Statistics data”?**

it’s used to be we present (talk about)

– the **CENTRAL TENDENCY **(typical of the group or value that can represent a whole data)

– its **VARIABILITY **(Range? the average distance from the mean?)

**1. What is the CENTRAL TENDENCY?**

CENTRAL TENDENCY is needed because often we want one value that best represents an entire group of scores or a number that is “typical” of the group as a whole. 3 options : pick the MEAN, MEDIAN, or MODE. All are called ‘Average’.

**Tips!!!**

Use **MEAN **: when the numerical data *WITHOUT EXTREMES*.

………in Excel formula *= AVERAGE (range)*

Use **MEDIAN **: when the numerical data *WITH EXTREME *scores to avoid a distorted average

…….in Excel formula *= MEDIAN (range)*

Use **MODE **(choosing the number or value that occurs MOST FREQUENTLY) : for *categorical data*

…….in Excel formula *= MODE (range)*

for examples:

we have data

Student ID ……….Mark

——————————————————————–

1………………………… 10

2………………………… 1 2

3……………………….. .. 6

4…………………………. 14

5…………………………… 5

—————————————–

TOTAL ………………….45

Mean ……………………….9

Median ………………….10

Because there is no Extreme Data..so to represent a whole data or Students’ Marks we can use Mean = “The average of Students Mark is 9″

But if there is extreme data, e.g.

we have data

Student ID ………. Mark

——————————————————————–

1 ……………………….10

2 ……………………….1 2

3………………………. 44

4……………………….. 14

5………………………… 5

—————————————–

TOTAL ……………….85

Mean …………………..25

Median ………………..12

So..we can not use the Mean to represent a whole data, we should use the Median = 12……….”the average of Students’ Marks is 12”

In case of Categorical Data, e.g.:

Party ……………….. Vote Number

———————————-

PKS ……………………….30

PDIP………………………10

Golkar ……………………20

or ‘a Group of Children’ with ages: 11, 12, 10, 14, 13, 14, 12

we can describe by choosing the number or value that occurs MOST FREQUENTLY

e.g. in case of the group of children above …so mode is 12

———————————————

**2. What is the VARIABiLiTY?**

Besides the Central Tendency of Data, often it is important to present ‘how values vary from one another?’

In this case we can measure:

– its **RANGE **: difference between the largest and smallest score

ex.: 90, 80, 106, 24, 33 —> Range = 106-24 = 82

– its **STANDARD DEVIATION** : the average distance from the mean

ex.: 90, 80, 106, 24, 35

mean= 67

value……..mean………. ‘distance’

———————————

90 ………. 67…………………..23

80 ………. 67…………………..13

106 ………. 67………………….39

24 ………. 67………………….. -43

36 ………. 67………………….. -31

—————————–

Counting its average distance can not just be added divided by number of data

because if just added the total amount of the ‘distance’ must be 0

….. so in Standard Deviation’s formula we *square * each ‘distance’, *sum *them then *divide* by number of data, then again *square root* it

**Standar Deviation **for **Population = Square Root { [sum (square of ‘distance’)] / number of data }Standar Deviation for Sample = Square Root { [sum (square of ‘distance’)] / (number of data – 1) }**

the SD for population is used when we work with all population, e.g. we count the SD for all 15,000 students’ marks in Flinders Uni

where as SD for sample (it’s more common) when we count the SD of a data sample e.g. SD of 500 students’ marks represents a whole population

in EXCEL formula:

= STDEV**P** (range) ………for SD Population

= STDEV (range) ………….for SD sample

Filed under: AUSTRALIA Jan08-Feb12, SKiLLS, STATISTICS |

gabaza ishmael, on April 24, 2015 at 9:32 am said:good stuff,helpful