Salary Distributions_Part 3

The client is interested in the salary distributions of jobs in the State of Minnesota. The salaries range from $40,000 to $120,000 per year. For this purpose, the data from the Bureau of Labor Statistics will be analysed. The data contains the following information.

•           The listing of the jobs by title

•           The salary amount (in dollars) for each job

The variables in the data set are:

1.  Salary– This isQuantitative and Continuous variable. These numbers cover a range from $40,000 to $120,000.

2.  Title of the job – This is Qualitative (non-numeric) and Categorical variable. It lists the name of the jobs.

Choosing the “Salary” quantitative variable from the data set to construct a frequency distribution.

Summarizing the data set as below:

Smallest number = 40,170

Largest number = 119,850

ð  Range = Largest Number – Smallest Number = 119,850 – 40,170 = 79,680

On considering 6 classes, the class width will be 79,680/6= 13,280

But, we need to round up, so will use 13,500 as class width.Our classes will be:

40,001– 53,500

53,501 –67,000

67,001– 80,500

80,501 – 94,000

94,001 – 107,500

107,501– 121,000

Then, counting the number of jobs in each class, we get the frequencies as below:

Class

Frequency

40,001 – 53,500

154

53,501 – 67,000

93

67,001 – 80,500

48

80,501 – 94,000

39

94,001 – 107,500

18

107,501 – 121,000

12

The midpoint of each class can be calculated as:

Midpoint = (Lower class limit + Upper class limit)/2

The relative frequency of each class can be calculated for a data set of size n by:

Relative frequency = Class frequency / Sample size = f / n

The cumulative frequency of each class can be calculated as:

Relative frequency =Sum of the frequencies of that class and all previous classes

Class

Frequency

Midpoint

Relative frequency

Cumulative frequency

40,000 – 53,499

154

46750.5

0.42

154

53,500 – 66,999

93

60250.5

0.26

247

67,000 – 80,499

48

73750.5

0.13

295

80,500 – 93,999

39

87250.5

0.11

334

94,000 – 107,499

18

100750.5

0.05

352

107,500 – 120,999

12

114250.5

0.03

364

From the frequency distribution table, we can understand that uppermost class has the highest frequency of 154 jobs. And, we go the down the classes, the associated frequency number is reducing. This indicates that the data is majorly gathered in the salary range of $40,000 – $53,499.

Histogram: A frequency histogram is a graphical way to summarize a frequency distribution. It is a bar graph with the following:

1.  Horizontal axis measuring the data values – Salary

2.  Vertical axis measuring the frequencies – Class frequencies

The height of the bars indicate the frequency (number of jobs) in each class.The distribution is peaked towards the left side, skewed to the left, also called as negatively skewed. This distribution has a large number of occurrences in the upper classes (right side of graph) and few in the lower cells (left side of the graph).

Conclusion:The frequency distribution and histogram indicates that around 42% of the total jobs (relative frequency of the first class) in the State of Minnesota have a salary in the range of $40,000 – $53,499. It can be also inferred that high salaried jobs are less and vice versa.

Part 2

Measures of Center: The measures of center is a value that allows to summarize a set of data by identifying the central position within that set of data.

The measures of center are:

(i)  Mean – the arithmetic average value

https://statistics.laerd.com/statistical-guides/img/measures-of-central-tendency-1.png

(ii)  Median – the point which divides the data set into equal halves (middle value of the data set)

(iii)  Mode – The value that occurs most often in a data set (most repeated value)

Measures of Variation: Tomeasure how much the data varies from the center. It is helps to measure the extent to which data are dispersed (or) spread out.

The measures of variation are:

(i)  Range –Measures thedifference betweenthe largest and smallestvalues of the data. It tells us the width of our dataset.

o  R = max – min

(ii)  Mid Range–Measure the average of the highest and lowest number of the data.

(iii)  Variance –Measures the amount of variability (deviation) of every observation from the mean of the data set.

(iv)  Standard Deviation – The square root of a variance

Calculations: Measures of center and Measures of variation

a.         Mean = Sum of the Data / Total Number of Values = (22,679,430) / 364 = 62,306

b.         Median = Middle Value of the Given Data Set = (56,440 + 56,600)/2 = 56,520

Arranging the salary amountfrom lowest to highest value and picking the one in the middle. Since, we have an even number of data values the median is the mean of the two values in the middle.

c.         Mode = Most Repeated Value in the Given Data = 40,170

d.         Midrange = Average of the highest and lowest number

ð  (40,170 + 119,850)/2 = 80,010

e.         Range = Maximum Value – Minimum Value = 119,850 – 40,170 = 79,680

f.          Variance = 365,684,995

g.         Standard Deviation = √Variance = √365,684,995 = 19,122.89 ~ 199,123      

5 – Number Summary:

§  Maximum = 119,850

§  Median = Middle Value of the Given Data Set = (56,440 + 56,600)/2 = 56,520

§  Lower Quartile (Q1) = Q1 is the median in the lower half of the data  = 46,757.5

§  Upper Quartile (Q3) = Q3 is the median in the upper half of the data  = 73,357.5

§  Minimum = 40,170

Conclusion:There are 364 unique jobs and salaries listed in the given data set.On analysing the data, we understand that the most repeated salary value is $40,170 and median salary in the State of Minnesota is $56,520.