There are a number of visualization packages in Python, the most well-known of which are Matplotlib (and seaborn), Plotly, and Hvplot. Each of these three packages has its advantages, but learning how to use them comes with a cost, sometimes quite expensive.
The idea for this article came to me when I came across a mind map of Pandas Methods from the Daily Dose of Data Science newsletter (which I highly recommend). At the same time, I discovered the Hvplot visualization package. I thought it was great that you could easily switch from one visualization backend to another, like Hvplot (here’s an example of switching from Hvplot to Plotly to Hvplot). When I saw that it could be done with Pandas, I couldn’t help but share the idea because it was so interesting.
Pandas is the heart of data science in Python, and we all know how to use it. However, Matplotlib, which is integrated into Pandas, is outdated and is being overshadowed by other packages in terms of ease of use and expressiveness. By leveraging the power of the Pandas visualization backend, you can take advantage of modern visualization packages for data exploration and results rendering without investing time in learning these packages. These packages are very powerful nonetheless!
Pandas is built on two packages, Numpy and Matplotlib. This explains why it uses Matplotlib scripts to generate graphs, and therefore the generated graphs are matplotlib graphs.
Pandas has been evolving steadily since its release, offering users the possibility to modify the visualization backend used by Pandas.
Here are the 6 available backends I found during my research:
- Plotnine (ggplot2)
- Plotly
- Altair
- Holoview
- Harplot
- panda_bokeh
- Matplotlib (default backend)
There are several methods you can use to change the backend.
pd.set_option("plotting.backend", '<name of backend>')
# OR
pd.options.plotting.backend = '<name of backend>'
df.plot(backend='<name of backend>', x='...')
Note: Changing backends requires Pandas >= 0.25, and sometimes certain dependencies are important, such as the Hvplot below.
Here are two examples:
import pandas as pd # Basic packagespd.options.plotting.backend = "plotly"
df = pd.DataFrame(dict(a=[1,3,2], b=[3,2,1]))
fig = df.plot()
fig.show()
import numpy as np
import pandas as pd # Basic packagesimport hvplot
import hvplot.pandas # ! Specific dependency to install
pd.options.plotting.backend = 'hvplot' # Backend modification
data = np.random.normal(size=[50, 2])
df = pd.DataFrame(data, columns=['x', 'y'])
df.plot(kind='scatter', x='x', y='y') # Plotting
2.1. Mattplotlip
Matplotlib is the default visualization backend for Pandas, meaning that if you don’t specify a backend, Matplotlib is used. It’s an efficient package for quickly visualizing data to explore or extract results, but it’s outdated and lags behind other packages in terms of ease of use and rendering performance.
The advantage of Matplotlib is that Pandas was built on top of Matplotlib since its inception, so Matplotlib is seamlessly integrated into Pandas, and all matplotlib functions are available in Pandas.
As a reminder, here are 11 Matplotlib display methods integrated into Pandas:
- For area plots, “Area”
- “Bar” for vertical bar charts
- “barh” for horizontal bar charts
- “Box” for box art
- “Hexbin” for hexbin plot
- “hist” for histogram
- “kde” for kernel density estimation charts
- “density” is an alias for “kde”.
- “line” for line graphs
- “Pie” for pie charts
- “scatter” for scatter plots.
2.2. Plotly
Plotly is a visualization package developed by a company called Plotly. The company developed the Plotly.js framework, which allows for interactive visualization of data within Python. Plotly also provides a Python dashboard package called Dash.
To use Plotly in Pandas, simply import it. Plotly Express Change the backend:
import pandas as pd
import plotly.express as px # Import packagesdf = pd.read_csv("iris.csv")
# Modifying locally Pandas backend
df.plot.scatter(backend = "plotly", x = "sepal.length", y = "sepal.width")
Pandas returns the same type of object as Plotly.
df.plot.scatter(backend = "plotly", x = "sepal.length", y = "sepal.width")
# → <class 'plotly.graph_objs._figure.Figure'>px.scatter(x=df["sepal.length"], y = df["sepal.width"])
# → <class 'plotly.graph_objs._figure.Figure'>
The advantage is that you can integrate graphics created in Pandas right into the Plotly world, especially Dash!
One limitation is that the integration between Plotly and Pandas is not yet complete, as detailed on the Plotly website (see the Plotly website for more details).
2.3. Hvplot
Hvplot is an interactive visualization package based on bokeh.
This is an interesting package that I discovered a while ago, and it still appeals to me today because Hvplot integrates backend concepts with Pandas and is used to create dynamic client-side websites using the Holoviz suite and related packages like Panel.
Without even the concept of a Pandas backend, Hvplot doesn’t require excessive learning to start using, it can simply be replaced. .composition() With panda .hvplot():
import pandas as pd
import hvplotdf = pd.read_csv("iris.csv")
# Plot with Pandas
df.plot.scatter(backend = "hvplot", x = "sepal.length", y = "sepal.width")
# Same plot with hvplot
df.hvplot.scatter(backend = "hvplot", x = "sepal.length", y = "sepal.width")
The way to use the Hvplot backend is the same as the Plotly backend. Just import the dependency of the Hvplot package.
import numpy as np
import pandas as pd # Basic packagesimport hvplot
import hvplot.pandas # Specific dependency to install
pd.options.plotting.backend = 'hvplot' # Backend modification
data = np.random.normal(size=[50, 2])
df = pd.DataFrame(data, columns=['x', 'y'])
df.plot(kind='scatter', x='x', y='y') # Plotting
Like Plotly, charts generated in Pandas using the hvplot backend are of type Hvplot.
df.plot.scatter(backend = "hvplot", x = "sepal.length", y = "sepal.width")
# → <class 'holoviews.element.chart.Curve'>df.hvplot.scatter(backend = "hvplot", x = "sepal.length", y = "sepal.width")
# → <class 'holoviews.element.chart.Curve'>
Hvplot is part of the very powerful Holoviz suite, along with many other related tools that can push your data analysis very far. Namely, Panels, geoviews, datashaders, and other tools. This type of matching allows you to create graphs in pandas and still benefit from the Holoviz suite.
The Pandas backend is a very efficient solution to discover and leverage the latest Python visualization packages without investing time. You can locally convert standard matplotlib graphs into interactive Plotly graphs in 18 characters including spaces, and thus enjoy all the benefits of this type of visualization.
However, this solution has some limitations. It is not suitable for highly advanced visualization goals that require a lot of customization, such as advanced visualizations in data journalism, because the package integration in Pandas is not yet perfect. Also, this solution only covers visualization packages built on top of Pandas, and excludes other visualization solutions such as D3.js.
Hvplot is currently my favorite visualization package. It is very easy to get started with, compatible with all the major data manipulation packages (Polars, Dask, Xray, etc.), and has a continuum of applications that can be used for everything from graphs to full dynamic client-side websites.
While I was researching, I didn’t find as much documentation as I had hoped. I expected a lot of articles because I thought the concept was great. So please let me know in the comments if you think this solution is really useful or if it’s just a really cool thing that doesn’t really work.
Thanks for reading!