Mini-guide on Visualizing Data

Why did the penguin cross the road twice?

To prove he wasn’t chicken.


Today I am going to show you some cool methods visualize quantative and qualitative data in unique ways. Without further ado, lets get into it!


For our example today, we are going to be using the Palmer Penguins dataset created by Dr. Kristen Gorman. So first, we must import the modules we are going to use and the dataset itself.

import pandas as pd
from plotly import express as px
from import write_html

url = ""
penguins = pd.read_csv(url)


Now that we have the modules and data properly imported, lets take a look at our data! For clarity purposes, we will be using the plotly module for all visualizations.

fig = px.histogram(data_frame = penguins, # data that needs to be plotted
                 x = "Sex", # must include col name from data_frame
                 color = "Species", #seperates diff individuals by their species grouping using color
                 width = 800,
                 height = 200)

# change margins


As seen above, we can see that the penguins data is not clean, and has invalid values inside the set. To properly visualize the valid data, lets clean up the dataframe.

penguins = penguins.dropna(subset = ["Body Mass (g)", "Sex"]) #gets rid of NA vals
penguins["Species"] = penguins["Species"].str.split().str.get(0) #shortens species names
penguins = penguins[penguins["Sex"] != "."] #takes out the pesky "."

cols = ["Species", "Island", "Sex", "Culmen Length (mm)", "Culmen Depth (mm)", "Flipper Length (mm)", "Body Mass (g)"] 
penguins = penguins[cols] #setting dataframe cols we want to look at

Time to Graph!


Lets first look at some qualitative data about our penguins using an Alluvial diagram.

colors = {"Adelie"    : "#EDE65E ",
         "Chinstrap" : "#42FF8A",
          "Gentoo"    : "#E8A1F8"}
#These are custom colors, you can customize the hex codes as desired in your code

color_hex = penguins["Species"].map(colors)
#color_hex .map() creates a Series that mirrors the input, so color_hex reflects 
#penguins["Species"] but replaces the species name with a color code

fig = px.parallel_categories(penguins,
                             dimensions = ["Species", "Island", "Sex"], # order of columns on graph
                             color = color_hex, #usage of color_hex
                             height = 300)


From the diagram, we can get some pretty important insights on the relationships of the descriptive statistics of the penguins sample. Looking at the center bars of the diagram, which display the islands in the dataset, the diagram gives us a clear picture of where certain species of penguins prefer to live on.


If we want to compare the quantative data in our dataset, we have a variety of different diagrams and graphs we can use in the plotly module. Lets compare the body mass, culmen length, and culmen depth of our penguins using a 3d scatterplot.

colorz = ["#EDE65E","#42FF8A","#E8A1F8"]

fig = px.scatter_3d(penguins,
                    x = "Body Mass (g)",
                    y = "Culmen Length (mm)",
                    z = "Culmen Depth (mm)",
                    color = "Species", #need to specify for seperation in the graph
                    color_discrete_sequence = colorz, #required to get custom colors
                    opacity = 0.5 #changes transparency of datapoints



Finally, lets take a look at a bar graph comparing the weights of male vs. female penguins by species.

fig = px.histogram(penguins, 
             x = "Sex", 
             y = "Body Mass (g)",
             color = "Species", 
             color_discrete_sequence = colorz,
             barmode = 'group',
             opacity = 0.75,

Hopefully these examples give insight on how you can create informative visualizations from data!

Written on April 7, 2022