Written by
Reviewed by

If you've been curious about the world of data analysis, today is your lucky day! We're granting you exclusive access to a portion of our best-selling Data Analysis Level 1 course.

Normally reserved for registered students, we're leaking this module to give you a taste of the powerful skills you can gain. Get ready to have a first-hand look at the core principles of descriptive statistics and the magic of pivot tables.

First, some theory

⌨️ The goal of this module is to learn about descriptive statistics and using pivot tables to answer the questions we set out when framing the problem.

Exploratory data analysis

Before you begin conducting in-depth analysis, it's essential to first become familiar with your dataset—to get a "feel" for it, if you will. This is where exploratory data analysis comes in.

You can think of it as looking at a large puzzle and trying to figure out what it represents.

The goal is to get a better understanding of the data and what it tells you, so you can make informed decisions and ask more focused questions.

A key part of the exploratory analysis is using Descriptive Statistics.

Descriptive statistics

Descriptive statistics are a set of techniques used to summarize and describe the main features of a dataset.

Imagine you have a dataset with hundreds or even thousands of values. It would be impossible to make sense of it just by looking at the raw data.

For example, if you had customer feedback scores from 500 customers, you wouldn't be able to determine the overall rating just by looking at a spreadsheet with all the raw data. However, you could calculate the average (or mean) score to get an idea of your company’s performance. This is an example of descriptive statistics.

It provides a quick and simple way to get a general idea of what the data looks like and also helps with spotting errors.

Let’s look at another example, when calculating the minimum and maximum values for a certain variable, you might notice one of the values falls outside a reasonable range. For instance, if you've collected product prices for a restaurant menu and calculated a minimum value of -$20, this is not realistic. Investigating further can help you identify and fix the dataset.

The three main types of descriptive statistics are:

  • Frequency Distribution: A table or graph that illustrates the number of occurrences of each value or group of values in a dataset.
  • Measures of Central Tendency: Estimates of the middle or average values in the dataset. These include the mean, median, and mode.
  • Measures of Variability: Describes how spread out the data is. Range, standard deviation, and variance are all measures of variability.

Pivot Tables

Pivot tables are a data tool to summarize and organize large amounts of data in a meaningful way. It allows you to quickly look up the information you need.

They are especially useful when you have a lot of data, because they allow you to quickly see the patterns and relationships in the data that might not be immediately obvious.

💡 Pivot Tables Example

Let’s say you have a worksheet containing monthly sales data for three products. To find out which has the highest revenue. You can manually add the corresponding sales figure to a running total every time the product appears. However, if the worksheet has thousands of rows, it could literally take a lifetime for three products. Pivot tables can quickly aggregate sales figures and calculate their sums in less than a minute.
Image source

This was a brief explanation of statistics and pivot tables. Now let’s actually practice with our data set in the next mission and understand it better.

Calculating Descriptive Statistics

💼 Now that you're familiar with the theory behind descriptive statistics and pivot tables, let's explore how they can be used to answer some of the key research questions we posed at the start of the course.

  1. What are the total numbers of subscribers in each subject?
  2. How does the average content duration/price/number of students vary across different subjects?
  3. How many courses are free and paid for each subject?
  4. What is the average price of web development courses at different levels?
  5. What are the 20 most popular courses? Also, include the following information:
  • Their level
  • Whether they are free or paid
  • Whether any are free beginner courses
  • Duration of the course

Our analysis will consist of three parts. In this mission we will:

  • Calculate some of the descriptive statistics such as the mean, median, minimum, and maximum values for content duration/price of the course/and number of subscribers.

Let’s get started!

Q: Calculate descriptive statistics for “Content Duration”, “Price”, and “Subscribers”

For these variables, we’ll calculate the mean and median values (measures of central tendency), as well as the minimum and maximum values.

We’ll be using the following formulas:

=AVERAGE

=MEDIAN

=MIN

=MAX

📹 Video: Descriptive Statistics

Reviewing our descriptive statistics

As we mentioned earlier, descriptive statistics can help you to spot if something’s “off” with your dataset, or with a particular value.

In the video, we noticed that the minimum value was zero for the content duration and subscribers and we deleted those entries as they were outliers and can give us misleading results. Generally, an analyst would decide what values to keep and remove. For example, it could be that they will perform analysis on all courses with more than 10 subscribers. In that case, we would need to delete the lower value.

Now, looking at our calculations, try answering the following questions:

  1. What are the total numbers of subscribers in each subject?
  2. How does the average content duration/price/number of students vary across different subjects?
  3. How many courses are free and paid for each subject?
  4. What is the average cost of web development courses at different levels?

We can’t, right? Our insights have given us a good starting point, but they are not enough to answer the questions we asked at the beginning of the course.

We have established the basics; now let's use another powerful tool to answer those key questions: the pivot table.

Conclusion

This module you've explored is just the tip of the iceberg. There's a whole world of data insights waiting for you in our full course, with in-depth tutorials, practical exercises, and real-world applications.

If you're ready to unlock the full potential of your data analysis capabilities, the next step is clear.

Enroll now and start a transformational journey to become a data analysis expert: EntryLevel: Data Analysis using Excel and Tableau

Date originally published:
Date last updated:
Enjoyed this article? Share with a friend!

Course Preview: Data Analysis using Excel and Tableau

Related articles
What is EntryLevel?
EntryLevel helps you learn and get experience so you can get hired in tech. Beginner-friendly 6 week programs guide you to create a portfolio you can show off to employers.
Browse tech programs