1 Migami

## Frequency Histograms Algebra 1 Homework 4

Generally, statisticians (and any sane person) will use some kind of statistical program like R or minitab to make their statistical graphs. However, it is still surprisingly common to see textbooks do everything by hand and in the end, learning how to make a histogram by hand is a great way to get better at reading them and figuring out what the problem is when a computer or calculator gives you something you don’t expect. In this lesson, we will look at the step-by-step process of making a frequency distribution and a histogram.

To show you how to do this, we will be using the data set below. I went ahead and put the numbers in order which will make everything much easier.

 12 14 14 14 16 18 20 20 21 23 27 27 27 29 31 31 32 32 34 36 40 40 40 40 40 42 51 56 60 65

To make a histogram by hand, we must first find the frequency distribution. The idea behind a frequency distribution is to break the data into groups (called classes or bins) so that we can better see patterns. It is sort of like the difference between asking you your age and asking you if you are between 20 and 25. In the second question, I am grouping up the ages. This way if I have a HUGE data set (like many are) I can see the patterns (like are most people older or younger) much easier than if I just tried to decipher a large list of numbers.

## Steps to Making Your Frequency Distribution

### Step 1: Calculate the range of the data set

The range is the difference between the largest value and the smallest value. We need this to figure out how much “space” we need to divide into groups. In this example:

$$\text{Range}=65-12=53$$

### Step 2: Divide the range by the number of groups you want and then round up

Doing this allows us to figure out how large each group is. It’s as if we are going to cut a board into equal pieces. In step 1, we measured how long the board is and now we are deciding how big each piece will be.

Hmmm… but how many groups to have? Too many, and our graphs and tables won’t be much better than a list of numbers. Too few, and the pattern will be hidden with too little detail. Often, a good number of groups is 5 or 6 although there are some rules that people use to decide this. MORE OFTEN, people will let the computer decide and then adjust if they want to while textbooks will tell you how many groups to use. But if you are working with the dataset yourself, you will have to see what the graph looks like before you can be sure you chose a good number.

Let’s say that we choose to have 6 groups. If we do this then:

$$\dfrac{53}{6}=8.8$$

The number we just found is commonly called the class width. We will round this up to 9 just because it is easier to work with that way. A computer would probably keep the 8.8 so be aware that sometimes you will see this number as a decimal. NOTE: In general, people who are doing this by hand always round up even if it was 8.1!

### Step 3: Use the class width to create your groups

I’m going to start at the smallest number we have, which is 12, and count by 9 until I have my 6 groups. For example, my first group will be 12 to 21 since 12+9=21. My next group will be 21-30 since 21+9=30… and so on. I’ll put these in a table and label them “classes”. I will also add “frequency” to the table.:

ClassesFrequency
12 – 21
21 – 30
30 – 39
39 – 48
48 – 57
57 – 66

### Step 4: Find the frequency for each group

This part is probably the most tedious and the main reason why it is unrealistic to make a frequency distribution or histogram by hand for a very large data set. We are going to count how many points are in each group. Let’s start with our first group: 12 – 21. We want to count how many points are between 12 and 21 NOT INCLUDING 21. You see the overlap between the groups right? That’s to account for decimals and we keep it even when we don’t have any. The right hand endpoint of any group isn’t included in that group. It goes in the next group. That means 21 would be in the second group and any 30 we have would be counted in the third group.

Back to the first group: 12-21. I have circled the points which would be included in this group:

Alright – now I update the table with this information!

ClassesFrequency
12 – 218
21 – 30
30 – 39
39 – 48
48 – 57
57 – 66

Continuing with this pattern (each group is a different color!):

ClassesFrequency
12 – 218
21 – 306
30 – 396
39 – 486
48 – 572
57 – 662

That last table is our frequency distribution! To make a histogram from this, we will use the groups on the horizontal axis and the frequency on the vertical axis. Finally, we will use bars to represent the the frequency of each individual group. With this data, the finished histogram will look like the one below.

You can see another example of how this is done in the video below.

## Video example

In this example, we will go through the same process with a different data set.

## What to study next

Once you know how to sketch a histogram, you should study how to read them and how to interpret the common shapes common shapes and patterns. Finally, you can also see how to create histograms on the TI-83 calculator.

### Related

Bar graphs and histograms are used to compare the sizes of different group/categories.

 Bar Graph (Chart)General Characteristics:• Column label is categorical variable (colors). • Column height is size of the group. • Columns separated by space. • Since this data is categorical, the only possible calculation is finding the mode.
 Frequency Histogram General Characteristics:• Column label is quantitative variable (ages). • Column label is a range of values (or single value). • Column height is size of the group. • Columns NOT separated by space. • Calculate mean, median, quartiles, standard deviation, and so on.

Note: The overall shape of a histogram will vary when the width of the intervals (also referred to as "bins") is changed. It is important when working with your graphing calculator to be aware of the width size of your intervals, and how to control them. See the link at the bottom of this page for help with graphing histograms on your graphing calculator.

 Histograms versus Bar Graphs:
 Generally speaking ...1. Histograms are used to show distributions of variables, while bar graphs are used to compare variables. 2. Histograms plot quantitative (numerical) data with ranges of the data grouped into intervals, while bar graphs plot categorical data. 3. The bars in a bar graph can be rearranged, but it does not make sense to rearrange the bars in a histogram.4. It is possible to speak of the skewness of a histogram, but not of a bar graph.5. Bar graphs have space between the columns, while histograms do not.

 Frequency Histograms:

Histograms are generally constructed from frequency tables, thus the name "frequency histogram." The intervals from the table generally appear on the x-axis and the frequency values appear on the y-axis. The frequencies are represented by the height of each column located directly over the corresponding interval. There is no space between the columns. If an interval's count is zero, however, a space (or gap) will appear since the column has a height of zero.

Data set: {9, 25, 30, 31, 34, 36, 37, 42, 45, 47, 49, 43, 55, 58, 61, 63, 67}

 Frequency Table Interval Count (frequency) 0-10 1 11-20 0 21-30 1 31-40 5 41-50 5 51-60 2 61-70 3

The intervals appear on the horizontal axis. The count (frequency) appears on the vertical axis.

 Cumulative Frequency Histograms:

The term "cumulative frequency" refers to the running total of the frequencies.  Each interval now contains a frequency number which represents that specific interval's count added to the sum of all the previous intervals' counts.  A cumulative frequency histogram will always contain column bars that get increasingly taller (or stay the same height) as you move to the right.

Data set: {9, 25, 30, 31, 34, 36, 37, 42, 45, 47, 49, 43, 55, 58, 61, 63, 67}

 Cumulative Frequency Table Interval Count (frequency) Cumulative frequency 0-10 1 1 11-20 0 1 + 0 = 1 21-30 1 1 + 1 = 2 31-40 5 2 + 5 = 7 41-50 5 7 + 5 = 12 51-60 2 12 + 2 = 14 61-70 3 14 + 3 = 17

Notice that the columns increase in height, or stay the same, as you move to the right.

 Relative Frequency Histograms:

While the term "frequency" refers to the number of observations (or counts) of a given piece of data, the term "relative frequency" refers to the number of observations (or counts) expressed as a "part of the whole" or percentage. Each of the number of observartions is divided by the total number of observations from the entire data set. In the data set we have been examining, there are 2 pieces of data in the interval "51-60", but there are 17 total pieces of data. The relative frequency of the 2 pieces of data in the interval "51-60" is 2/17 or 0.12 (to the nearest hundredth) or 12%. Notice that when you add all of the relative frequencies in the chart below, you get a value of one (or 100%). Relative frequency may be expressed as a fraction (ratio), a decimal, or a percentage.

Data set: {9, 25, 30, 31, 34, 36, 37, 42, 45, 47, 49, 43, 55, 58, 61, 63, 67}

 Relative Frequency Table Interval Count (frequency) Relative frequency 0-10 1 1/17 = 0.06 11-20 0 0/17 = 0 21-30 1 1/17 = 0.06 31-40 5 5/17 = 0.29 41-50 5 5/17 = 0.29 51-60 2 2/17 = 0.12 61-70 3 3/17 = 0.18 Total = 17 Total = 1

Relative Frequency Histogram

The intervals appear on the horizontal axis. The relative frequency (as a percentage in decimal form) appears on the vertical axis.
The graph is the same shape as the "frequency" histogram, but with a different y-scale.

 Scaling a Histogram: When graphing histograms, if the first x-scale interval does not start at zero, a "break" symbol may be drawn to show a gap in the scale. Or a histogram may display a separated x-scale to avoid confusion with the vertical axis.

 Histograms: Pros and Cons While a box plot emphasizes center and spread of a distribution, a histogram emphasizes the distribution of values. Histograms clearly show the shape of the data. In additon, values occurring with high frequency are easier to identify in a histogram than in a box plot. With histograms, however, the center and spread are harder to identify, exact values cannot be read due to the grouping of the data, and it is difficult to compare two data set.

Working with Histograms
on Calculator:

Working with Frequency
Tables on Calculator:

Working with Cumulative Frequency Histograms: