The client is interested in the salary distributions of jobs in the State of Minnesota. The salaries range from $40,000 to $120,000 per year. For this purpose, the data from the Bureau of Labor Statistics will be analysed. The data contains the following information.
• The listing of the jobs by title
• The salary amount (in dollars) for each job
The variables in the data set are:
1. Salary– This isQuantitative and Continuous variable. These numbers cover a range from $40,000 to $120,000.
2. Title of the job – This is Qualitative (non-numeric) and Categorical variable. It lists the name of the jobs.
Choosing the “Salary” quantitative variable from the data set to construct a frequency distribution.
Summarizing the data set as below:
Smallest number = 40,170
Largest number = 119,850
ð Range = Largest Number – Smallest Number = 119,850 – 40,170 = 79,680
On considering 6 classes, the class width will be 79,680/6= 13,280
But, we need to round up, so will use 13,500 as class width.Our classes will be:
40,001– 53,500
53,501 –67,000
67,001– 80,500
80,501 – 94,000
94,001 – 107,500
107,501– 121,000
Then, counting the number of jobs in each class, we get the frequencies as below:
Class
Frequency
40,001 – 53,500
154
53,501 – 67,000
93
67,001 – 80,500
48
80,501 – 94,000
39
94,001 – 107,500
18
107,501 – 121,000
12
The midpoint of each class can be calculated as:
Midpoint = (Lower class limit + Upper class limit)/2
The relative frequency of each class can be calculated for a data set of size n by:
Relative frequency = Class frequency / Sample size = f / n
The cumulative frequency of each class can be calculated as:
Relative frequency =Sum of the frequencies of that class and all previous classes
Class
Frequency
Midpoint
Relative frequency
Cumulative frequency
40,000 – 53,499
154
46750.5
0.42
154
53,500 – 66,999
93
60250.5
0.26
247
67,000 – 80,499
48
73750.5
0.13
295
80,500 – 93,999
39
87250.5
0.11
334
94,000 – 107,499
18
100750.5
0.05
352
107,500 – 120,999
12
114250.5
0.03
364
From the frequency distribution table, we can understand that uppermost class has the highest frequency of 154 jobs. And, we go the down the classes, the associated frequency number is reducing. This indicates that the data is majorly gathered in the salary range of $40,000 – $53,499.
Histogram: A frequency histogram is a graphical way to summarize a frequency distribution. It is a bar graph with the following:
1. Horizontal axis measuring the data values – Salary
2. Vertical axis measuring the frequencies – Class frequencies
The height of the bars indicate the frequency (number of jobs) in each class.The distribution is peaked towards the left side, skewed to the left, also called as negatively skewed. This distribution has a large number of occurrences in the upper classes (right side of graph) and few in the lower cells (left side of the graph).
Conclusion:The frequency distribution and histogram indicates that around 42% of the total jobs (relative frequency of the first class) in the State of Minnesota have a salary in the range of $40,000 – $53,499. It can be also inferred that high salaried jobs are less and vice versa.
Part 2
Measures of Center: The measures of center is a value that allows to summarize a set of data by identifying the central position within that set of data.
The measures of center are:
(i) Mean – the arithmetic average value
(ii) Median – the point which divides the data set into equal halves (middle value of the data set)
(iii) Mode – The value that occurs most often in a data set (most repeated value)
Measures of Variation: Tomeasure how much the data varies from the center. It is helps to measure the extent to which data are dispersed (or) spread out.
The measures of variation are:
(i) Range –Measures thedifference betweenthe largest and smallestvalues of the data. It tells us the width of our dataset.
o R = max – min
(ii) Mid Range–Measure the average of the highest and lowest number of the data.
(iii) Variance –Measures the amount of variability (deviation) of every observation from the mean of the data set.
(iv) Standard Deviation – The square root of a variance
Calculations: Measures of center and Measures of variation
a. Mean = Sum of the Data / Total Number of Values = (22,679,430) / 364 = 62,306
b. Median = Middle Value of the Given Data Set = (56,440 + 56,600)/2 = 56,520
Arranging the salary amountfrom lowest to highest value and picking the one in the middle. Since, we have an even number of data values the median is the mean of the two values in the middle.
c. Mode = Most Repeated Value in the Given Data = 40,170
d. Midrange = Average of the highest and lowest number
ð (40,170 + 119,850)/2 = 80,010
e. Range = Maximum Value – Minimum Value = 119,850 – 40,170 = 79,680
f. Variance = 365,684,995
g. Standard Deviation = √Variance = √365,684,995 = 19,122.89 ~ 199,123
5 – Number Summary:
§ Maximum = 119,850
§ Median = Middle Value of the Given Data Set = (56,440 + 56,600)/2 = 56,520
§ Lower Quartile (Q1) = Q1 is the median in the lower half of the data = 46,757.5
§ Upper Quartile (Q3) = Q3 is the median in the upper half of the data = 73,357.5
§ Minimum = 40,170
Conclusion:There are 364 unique jobs and salaries listed in the given data set.On analysing the data, we understand that the most repeated salary value is $40,170 and median salary in the State of Minnesota is $56,520.