Data visualization is an essential aspect of AI. Data visualizations let you derive insights from data and enable you to communicate about the data with others. There are many software packages that we can use to visualize data and build meaningful dashboards. While making visualizations in Python, we have different libraries like Matplotlib, Seaborn, Altair, etc., and ggplot2, lattice when using R.
Interactivity in data visualizations takes them to the next level. When a user can interact with the charts with different actions like zoom, hover, and filter plot with a chosen variable, it adds great flexibility to the chart. Such interactivity allows the users to dig deeper into the visualizations to find additional information in the dataset. Providing interactive visualizations in dashboards using different tools is a common practice today.
Both Python and R have packages to build interactive plots.One of the popular python packages for interactive visualization is Bokeh. The Bokeh library allows us to create different plots with very few lines of code with an added flexibility for advanced customization. Bokeh now has an R interface along with the existing interfaces for Python, Scala, and Julia.
What is rbokeh?
rbokeh is an open-source R package that makes use of the Bokeh visualization tool. It provides a declarative interface that is flexible for dynamic web-based visualizations. Ryan Hafen created and is currently maintaining the rbokeh package. You can find more details about the rbokeh package here. Before installing rbokeh, you must have R and R Studio installed. You can also build rbokeh visualizations with Kaggle or Google Colab.
To begin, we’ll use the R function ‘install. packages()’ to get the rbokeh package from CRAN.
We will use the following command to import the rbokeh package:
With the rbokeh package, we will also import other R libraries
Before moving on to graph visualizations, it’s important to note that rbokeh plots are generated by calling the
This function is equivalent to a blank canvas which can be set up, and then layers can be added later with a pipe operator. Here x, y, and ly_geom() are the data inputs that specify the type of geom used, such as ly_points, ly_lines, ly_hist, ly_boxplot, etc.
For this tutorial, we will load the built-in cars93 dataset from the MASS package of R.
Cars93 is a dataset of 93 rows and 27 columns. Additional details about the package and documentation can be found here. The cars in the dataset were chosen at random from a list of 1993 passenger car models published in both Consumer Reports and the PACE Buying Guide.
Using the following command, we can print the first few rows of the ‘Cars93’ dataset:
We will first build a simple scatter plot. Here we will see the relation between horsepower and price for different cars. A scatter plot helps to visualize the relationship between two variables and hence, is a good option to visualize all the data points. First, we’ll create a ‘figure()’ function. Then we’ll make a layer called ‘ly_points’ and pass the following arguments: — horsepower on the x-axis — price on the y-axis — and the dataset, i.e., Cars93, to the data argument. Note that we can assign the resulting figure to ‘scatter_plot’ and then display it by writing ‘scatter_plot’. We need to use the
hover command to see the added tooltips. Panning and zooming are also accessible as interactive elements.
#Simple Scatter Plotscatter_plot <- figure(title ="Scatter Plot") %>%
ly_points(x = Horsepower, y = Price, data = Cars93, hover = c(Horsepower, Price))
To build a scatter plot that will show a point for each origin in addition to the relation with horsepower and price, we will group it by color, as shown below.
#Scatter Plot with groupingscatter_plot1 <- figure(title ="Scatter Plot") %>%
ly_points(x = Horsepower, y = Price, color = Origin, data = Cars93, hover = c(Origin, Horsepower, Price))
Next, we will plot a basic line chart using this Cars93 dataset. Suppose we want to see the variation in the prices of small and compact-sized cars, we will first filter the data using the type variable.
#Filteringdf <- Cars93 %>%
filter(Type %in% c(“Small”, “Compact”))
Then for the line plot, we will use the price variable and specify it with the data argument in the ‘ly_lines’ layer, as shown below:
#Line Plotline_plot <- figure(title =”Line Plot”) %>%
ly_lines(Price,color=Type,data = df)
We can change the thickness as well as the color of the lines. Color is chosen for all of these plots based on the default theme. We can call the
set_palette() function with either a color vector or a pre-defined palette to get the color we want. Here we will use the discrete color property because we are working with categorical values, and to get thicker lines, we are changing the setting of the width parameter, as shown in the code below. We can also use the hex code of the colors or any of the listed CSS colors of our choice here.
#Changing the color and width of linesline_plot2 <- figure(title =”Line Plot”) %>%
ly_lines(Price, color = Type, width = 2, data = df)%>%
set_palette(discrete_color = pal_color(c(“#6200b3”, “#ff0831”,”#226f54")))
Now we’ll explore how to visualize a multi-layer plot by combining point and line layers in one figure. For the multi-layer plot, we will plot the lines using
ly_lines() and add markers using
ly_points(). Note that, in both the layers, the ‘Type’ is mapped to color. It is possible to use different glyphs for the graphs. We can use the following command to explore the available possible values for the glyphs.
When a numbered glyph is specified in the code, the fill and line properties are managed effectively to get the desired result.
#Multi-layer plot with glyphmulti_layer <- figure(df, legend_location =”top_right”, title =”Multi-layer Plot”) %>%ly_points(Price, color = Type, hover = c(Price,Type),glyph=16) %>%ly_lines(Price, color = Type, width = 1.5, data = df) %>%set_palette(discrete_color = pal_color(c(“#6200b3”, “#ff0831”,”#226f54")))multi_layer
Next, we will plot a Histogram to see the frequency of RPM in the Cars93 dataset.
#Simple Histogramhistogram <- figure(width = 700, height = 400, title =”Histogram”) %>%
ly_hist(RPM, data = Cars93, breaks = 5, freq = TRUE,color=”green”)
Now we’ll create a box plot with the following code to show the relationship between horsepower and cylinders. We can also spot a few outliers in the dataset with the Box plot.
#Box-Plotbox_plot <- figure(width = 600, title =”Box Plot”) %>%
ly_boxplot(Cylinders, Horsepower, data = Cars93)
We will plot the various types of cars i.e. categorical data in our next plot, the Bar graph.
#Bar Chartbar_chart <- figure(title =”Bar Chart”) %>%
ly_bar(Type, data = Cars93) %>% theme_axis(“x”, major_label_orientation = 90)
We can use a grid plot to display different figures in one layout. We can combine figures from different categories, such as bar charts, line plots, scatter plots, etc. in a grid layout.
#Grid Plottools <- c(“wheel_zoom”, “box_zoom”, “box_select”)
p1 <- figure(tools = tools, width = 500, height = 500) %>%ly_points(Horsepower, RPM, data = df, color = Type,hover=c(Horsepower, RPM,Type))p2 <- figure(tools = tools, width = 500, height = 500) %>%ly_points(Length, Wheelbase, data = df, color = Type,hover=c(Length, Wheelbase,Type))grid_plot(list(p1, p2), link_data = TRUE)
Finally, we plot a Hexbin chart, which is a 2D density chart to visualize the relationship between two numeric variables, i.e., Price and Engine Size. A Hexbin plot is often used for visualization when the data includes a large number of points. The plot window is split with several hex bins to avoid overlapping.
hexbin <- figure() %>% ly_hexbin(x = EngineSize, y = Price, data = Cars93)
And that’s it! We created and customized various charts in rbokeh from scratch.
In this short tutorial, we explored how to plot a few interactive and pleasant charts in rbokeh. The entire code for this rbokeh tutorial is available on my GitHub repository. If you enjoy working with ggplot, then you might have previously used plotly or ggiraph to make your ggplot charts interactive. The rbokeh package can be a good alternative for your next visualization project.
Robin H. Lock (1993) 1993 New Car Data, Journal of Statistics Education, 1:1, DOI: 10.1080/10691898.1993.11910459