SciTech-Mathematics-Probability+Statistics- Pandas DataFrame Histogram/BarChart/Boxplot/Scatterplot + Relative Frequency Histogram: Definition + Example()

Links:

Histogram from Pandas DataFrame

BY ZACH BOBBITTPOSTED ON AUGUST 5, 2021
You can use the following basic syntax to create a histogram from a pandas DataFrame:

df.hist(column='col_name')

The following examples show how to use this syntax in practice.

import pandas as pd

#create DataFrame
df1 = pd.DataFrame({
       'points': [25, 12, 15, 14, 19, 23, 25, 29, 29, 31, 31, 33],
      'assists': [5, 7, 7, 9, 12, 9, 9, 4, 7, 7, 8, 9],
 'rebounds': [11, 8, 10, 6, 6, 5, 9, 12, 10, 7, 7, 9]})

#view first five rows of DataFrame
df1.head()
	points	assists	rebounds
0	25	5	11
1	12	7	8
2	15	7	10
3	14	9	6
4	19	12	6

#create histogram for 'points' column
df1.hist(column='points')

#customize the histogram with specific colors, styles, labels, and number of bins:
df1.hist(column='points', bins=5, grid=False, rwidth=.9, color='purple')

#create DataFrame
df2 = pd.DataFrame({
  'team':['A', 'A', 'A', 'A', 'A', 'A',
               'B', 'B', 'B', 'B', 'B', 'B'],
 'points': [25, 12, 15, 14, 19, 23, 25, 29, 29, 31, 31, 33]})
#create histogram for each team
df2.hist(column='points', by='team', bins=3, grid=False, rwidth=.9,
        color='purple', sharex=True)

df1.hist(column='points') df1.hist(column='points', bins=5,...) df2.hist(column='points', by='team', bins=3,...)

The x-axis displays the points scored per player and the y-axis shows the frequency for the number of players who scored that many points.

Note that the sharex argument specifies that the two histograms should share the same x-axis.
This makes it easier to compare the distribution of values between the two histograms.

Additional Resources
The following tutorials explain how to create other common plots in Python:

How to Create Boxplot from Pandas DataFrame
How to Plot Multiple Pandas Columns on Bar Chart

Boxplot from Pandas DataFrame

BY ZACH BOBBITTPOSTED ON JULY 20, 2021
You can use the following syntax to create boxplots from a pandas DataFrame:

#create boxplot of one column
df.boxplot(column=['col1'])

#create boxplot of multiple columns
df.boxplot(column=['col1', 'col2'])

#create boxplot grouped by one column
df.boxplot(column=['col1'], by='col2') 

The following examples show how to use this syntax in practice with the following DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'conference': ['A', 'A', 'A', 'B', 'B', 'B'],
                   'points': [5, 7, 7, 9, 12, 9],
                   'assists': [11, 8, 10, 6, 6, 5],
                   'rebounds': [4, 2, 5, 8, 6, 11],})

#view DataFrame
df
# Example 1: Boxplot of One Column
df.boxplot(column=['points'], grid=False, color='black')

#Example 2: Boxplot of Multiple Columns
df.boxplot(column=['points', 'assists'], grid=False, color='black')

#Example 3: Boxplot Grouped by One Column
df.boxplot(column=['points'], by='conference', grid=False, color='black')
Example 1: df.boxplot(column=['points'], Example 2: df.boxplot(column=['points', 'assists'], Example 3: df.boxplot(column=['points'], by='conference',

BarChart from Pandas DataFrame

Multiple Columns
Pandas: How to Plot Multiple Columns on Bar Chart
BY ZACH BOBBITTPOSTED ON APRIL 8, 2021
You can use the following syntax to plot multiple columns of a pandas DataFrame on a single bar chart:

df[['x', 'var1', 'var2', 'var3']].plot(x='x', kind='bar')

The x column will be used as the x-axis variable and var1, var2, and var3 will be used as the y-axis variables.

The following examples show how to use this function in practice.

Plot Columns on a Bar Chart

The following code shows how to plot three columns on a bar chart, specifying that the column named period should be used as the x-axis variable:

import pandas as pd
import matplotlib.pyplot as plt

#create fake data
df = pd.DataFrame({'period': [1, 2, 3, 4, 5, 6, 7, 8],
                   'A': [9, 12, 15, 14, 19, 23, 25, 29],
                   'B': [5, 7, 7, 9, 12, 9, 9, 14],
                   'C': [5, 4, 7, 13, 15, 15, 18, 31]})

#plot columns on bar chart
df[['period', 'A', 'B', 'C']].plot(x='period', kind='bar')

# We could also choose to plot only certain columns, such as A and B:
df[['period', 'A', 'B']].plot(x='period', kind='bar')

#create fake data
df = pd.DataFrame({'period': [1, 2, 3, 4, 5, 6, 7, 8],
                   'A': [9, 12, 15, 14, 19, 23, 25, 29],
                   'B': [5, 7, 7, 9, 12, 9, 9, 14],
                   'C': [5, 4, 7, 13, 15, 15, 18, 31]})

#create stacked bar chart
df[['period', 'A', 'B', 'C']].plot(x='period', kind='bar', stacked=True)

# To change the colors of the bars, simply use the color argument as follows:
df[['period', 'A', 'B', 'C']].plot(x='period', kind='bar', stacked=True,
                                   color=['red', 'pink', 'gold'])
Example 1: df[['period', 'A', 'B', 'C']].plot(x='period', kind='bar') Example 2: df[['period', 'A', 'B', 'C']].plot(x='period', kind='bar', stacked=True)

Relative Frequency Histogram: Definition + Example

BY ZACH BOBBITTPOSTED ON FEBRUARY 19, 2020

Often in statistics you will encounter tables that display information about frequencies.
Frequencies simply tell us how many times a certain event has occurred.

For example, the following table shows how many items a particular shop sold in a week based on the price of the item:

Item Price(Unit: Dollar) Frequency
1 – 10 20
11 – 20 21
21 – 30 13
31 – 40 8
41 – 50 4

This type of table is known as a frequency table.
In one column we have the "class" and in the other column we have the frequency of the class.

Often we use frequency histograms to visualize the values in a frequency table,
since it's typically easier to gain an understanding of data when we can visualize the numbers.

A histogram lists the classes along the x-axis of a graph,
and uses bars to represent the frequency of each class along the y-axis.
The following frequency histogram provides a visual representation of the frequency table above:

Frequency histogram example

A close cousin of a frequency table is a relative frequency table, which simply lists the frequencies of each class as a percentage of the whole.

The following table shows the relative frequencies of the same dataset we saw earlier:

Item Price(Unit: Dollar) Frequency Relative Frequency
1 – 10 20 0.303
11 – 20 21 0.318
21 – 30 13 0.197
31 – 40 8 0.121
41 – 50 4 0.061

In total, there were 66 items sold. Thus, we found the relative frequency of each class by taking the frequency of each class and dividing by the total items sold.

For example, there were 20 items sold in the price range of $1 – $10. Thus, the relative frequency of the class $1 – $10 is 20 / 66 = 0.303.

Next, there were 21 items sold in the price range of $11 – $20. Thus, the relative frequency of the class $11 – $20 is 21 / 66 = 0.318.

We perform the same calculation for each class to get the relative frequencies.

Once we have the relative frequency of each class, we can then create a relative frequency histogram to visualize these relative frequencies.

Similar to a frequency histogram, this type of histogram displays the classes along the x-axis of the graph and uses bars to represent the relative frequencies of each class along the y-axis.

The only difference is the labels used on the y-axis. Instead of displaying raw frequencies, a relative frequency histogram displays percentages.

Example of a relative frequency histogram

When to Use a Relative Frequency Histogram

A frequency histogram can be useful when you're interested in raw data values.
For example, a shop might have a goal to sell at least 10 items each week in the $41 – $50 range.

By creating a frequency histogram of their data, they can easily see that they're not meeting their goal of selling 10 items per week in this price range:

Frequency histogram example

Conversely, a relative frequency histogram is useful when you're interested in percentage values.
For example, a shop might have a goal of selling 5% of their total items in the $41 – $50 price range.

By creating a relative frequency histogram of their data, they can see that they are meeting this goal:

Example of a relative frequency histogram

Note that a frequency histogram and a relative frequency histogram will both look the exact same. The only difference is the values displayed on the y-axis.

Additional Resources
The following tutorials explain how to create relative frequency histograms in different statistical software:

How to Create a Relative Frequency Histogram in Python

posted @ 2024-08-12 13:30  abaelhe  阅读(35)  评论(0)    收藏  举报