Superset Airbnb: pros and cons of open-source data visualization tool by Airbnb

Superset Airbnb: pros and cons of open-source data visualization tool by Airbnb

Companies use BI data visualization tools and reporting solutions to have their data analyzed and presented in a clear and comprehensible style in economies where big data is becoming increasingly important.

As a data scientist, I’m constantly collaborating with businesses searching for meaningful insights. Today, both large corporations and startups are willing to invest in BI data visualization tools and avoid the limitations of data visualization technical work in order to understand and interpret their data.

In this article, I look at the Superset Airbnb data analysis project, an open-source data visualization tool. In one of our recent projects, we used the Superset Airbnb data analysis project visualization tool, and it worked great in most circumstances. In this essay, I’ll explain why we chose Superset over alternative BI data visualization tools, as well as the platform’s main pros and the limitations of data visualization.

The selection procedure

In our project for a fitness mobile app with a large and rapidly growing client base, we employed Superset open source BI visualization tools. On the one hand, business stakeholders required a BI tool since they required numerous specialized reports to track trends in application usage and better understand customer behavior. Our data science team, on the other hand, may use a BI tool to do exploratory data analysis with respect to different user cohorts before constructing Machine Learning models.

We required a tool that could meet the following criteria:

  • Interaction. The marketing department’s BI data visualization tools users requested interactive filters on a variety of fields, including text, date, and integer filters.
  • There is no coding. Because our users were largely marketing experts, all of the functionality was expected to be available via buttons and other controls.
  • It’s totally free! We needed open source visualization tools that were free.

We chose Superset open source and Pentaho for further evaluation after exploring available solutions.

For the following reasons, we found superset visualization open source BI visualization tools to be more appealing:

  • We found superset open source BI visualization tools to be more appealing. The graphics were immediately adored by both the customer and our team.
  • The superset is open source visualization tools and written in Python, whereas Pentaho is written in Java. Because we were largely a pythonic crew, this was a huge bonus for us. And, before I go any further, I should mention that it was quite useful to us.
  • We were eager to put Superset open source to the test because it is a relatively new technology. We were familiar with Pentaho from past projects, but it didn’t quite meet our expectations.

A brief overview of the utility

The superset is data visualization tools open source that is supposed to be visually appealing, user-friendly, and interactive. The fundamental purpose of Superset is to make it simple to slice, dice, and visualize data. Superset open source, according to its creator, can do analytics at the speed of thought. As previously said, the open source visualization tools are written in the Flask pythonic web framework.

Panoramix was the original name of this project, which was renamed Caravel in March 2016, and is now known as Superset as of November 2016. Source.

Characteristics

  • Superset allows you to create 30 different visuals.
  • Airbnb data analysis project employs Superset with Druid, but it accepts all SQL Alchemy-compatible data sources.
  • caching options for loading dashboards that can be customized
  • Visualization function Object() { [native code] } that is simple to utilise

Step-by-step instructions for creating a dashboard with Superset.

  1. To begin creating visualizations, you must first provide a data source. As I previously stated, Superset takes not only Druid but also SQL Alchemy-enabled databases such as PostgreSQL, MySQL, and SQLite.

Although Superset allows for the integration of numerous data visualization tools open source, in our project, we established a single BI data warehouse that served as Superset’s sole data source.

  1. Add tables from that database and set field properties after you’ve added a data source. You can choose whether or not a field is groupable, filterable, temporal, and so forth. You can also add your own measurements (besides default ones like COUNT, COUNT DISTINCT, SUM, etc.).
  2. After that, you make slices. A single plot based on your data is called a slice. It’s worth noting that you can only make slices for one table at a time. However, this isn’t an issue because you can always create a view by combining as many tables as you need. One or more slices may be included in your future BI report.

Challenges we overcame

When we utilized Superset open source to plot aggregated analytics like user distribution by OS (iOS, Android), age groups, gender, and so forth, it worked well. When we sought to plot aggregations at a higher level, however, the tool seemed hopeless.

For example, we wanted to tally the number of workouts completed by each app user and then group people based on the number of workouts completed. We needed to apply custom data filters to the dashboard as well, however, implementing this functionality using Superset proved difficult. Because the result of the outer aggregate was dependent on filtering conditions in the inner grouping, creating a view couldn’t meet our demands in that scenario.

Here’s another example of a task that appeared to be a challenge for Superset: we wanted to plot weight reduction progress (of fitness app users) over time. The task appears to be straightforward: for each user, select the latest weight log in a period and subtract the beginning weight log for that period. As a result, we combine the results of two queries that provide user ids and weight values, as well as ascending and descending weight log ordering based on log creation time. However, because Superset does not let us use the JOIN clause when constructing a slice, we must develop a custom query within the visualization type class.

It was unable to write a custom SQL query and apply interactive filters at the same time. We couldn’t go much further without developing custom filters… We understood at that point that the main advantage of Superset for us was that it was built in Python.

We discovered that there was no magic in Superset’s source code after a day or so of examining it and that it worked in a straightforward and logical manner. It parses data from controls given in the plot slice, then adds FilterBox data to the WHERE component of the SQL query. We had a complex query with subselects when we used chains of aggregations, and the default query builder didn’t know where to apply the filters.

Solution

We used the usual visualization kinds as a starting point, but we added a new child class that specifies the SQL query. We identify a location where filters should be included in this query by defining the __CUSTOM WHERE CLAUSE element of the query and then replacing it with the FilterBox input. It worked nicely for us and didn’t necessitate any major code changes.

Another option was to employ Superset’s SQL Lab features. However, we couldn’t do so because we needed to give custom filters for marketing department end-users who couldn’t create SQL queries.

The following is a list of the advantages and disadvantages of utilizing Superset:

Advantages: 

  • Multiple data sources are supported.
  • Various visualizations are available (with interactive filters)
  • Even if you don’t know SQL, you can generate visualizations.
  • Visualizations are quite stunning.
  • The crew responds quickly to GitHub issues.

Disadvantages:

  • Because the tool is still in the early stages of development, be prepared to uncover bugs and submit them to GitHub issues (we discovered some issues with non-string filters). However, the issue has been resolved in the newest master branch version).
  • Customization issues (if you don’t want to look at the source code).

This concludes the discussion. Thank you for taking the time to read the article. Hopefully, you learned something new about Superset as a result of this article. Without fear, try out the data visualization tools open source and delve deeper into their capabilities!

50+ successful products for clients from 5 continents

See how we help entrepreneurs innovate

Read success stories
cut_sitkacup_

Do you like our work?
Let’s talk about your project!

Related posts

AI & Analytics: Trends 2024 and Market Research

A new year brings new technological challenges and business opportunities as well as accelerates the digital transformation in the corporate landscape.

sitka-book-banner.png