Tableau, one of the most widely used business intelligence tools, is designed to simplify analysis for business users by letting them interact with their data visually, using drag-and-drop operations, without the need to write complex code or SQL queries. Depending upon their requirements, users can create their dashboards and have real-time conversations with data that lives across multiple systems and platforms.
But what happens when the size of data rises to big data proportions? Do the users still have the same agility as they had when analyzing smaller datasets?
Tableau users often report slowdowns when they connect directly to Hadoop and try to perform complex analysis on large volumes of data.
Why Tableau slows down
Tableau is designed to enable the Cycle of Visual Analysis: get data, view data, ask questions, get answers, repeat. This cycle becomes challenging on Hadoop. The main reason is that Tableau generates and executes a SQL statement for every interaction and on every visualization. As the size of data increases, direct queries from Tableau to Hadoop take very long to return – sometimes minutes and hours instead of seconds. Additionally, when the number of dimensions and cardinalities increase, it takes even longer to fetch query results.
From the user’s perspective, as the analysis deepens, they want to drill down, drill up, or drill across their data to get useful business insights. However, Tableau has to connect to Hadoop to fetch the results for each interaction. This leads to a lag in dashboard refreshes, and interactivity suffers. Though Tableau is the perfect tool to help business users visualize their data, it is not designed to handle big data analysis.
Optimize Tableau performance
Analysts and data experts have tried several approaches to maximize the performance of Tableau on Hadoop. They put in massive amounts of effort to tune their queries. Instead of pulling detailed reports, they make do with summary reports. Sometimes they reduce the size of their datasets to smaller subsets and then run queries on them instead of running them on the entire dataset. These approaches are restrictive and cannot meet the analytical needs of a growing business.
Another popular method is to pull data out of the big data platform in the form of extracts, and then make Tableau work on it. However, there are several scalability and performance limitations on the amount of data that can be processed this way. Also, this approach is resource-heavy and introduces latency as the data is not live.
Over the years, several enhancements have been made to Tableau to speed up big data analytics such as adding query optimizations capabilities, providing named connectors for Hadoop and other big data platforms, and many more. However, it is still difficult to match the speed at which the size of the data is growing.
Instant analytics on Hadoop
An innovative way to improve the performance of Tableau on Hadoop is to create a BI acceleration layer between the big data platform and Tableau that enables quick access to big data. Once this layer is in place, Tableau can connect to this layer instead of connecting directly to the big data platform.
This layer is built using OLAP on big data technology. In this approach, data is pre-aggregated using the processing and storage capacity of Hadoop. Once the OLAP cubes are ready, queries become incredibly light-weight, and response times are instant, even for the most complex queries. Tableau can connect to this layer using standard connectors. Another key advantage of this approach is that this layer is transparent to the end-users. They can continue using their Tableau interface for big data analytics and enjoy the same speed and interactivity as before, without limitations on the amount of data they can bring into their visualizations.
Tableau was initially developed to work on relational technologies. Therefore, it is unfair to expect it to deal with massive volumes of data on Hadoop. By creating a BI acceleration layer on Hadoop, you can quickly scale up and use Tableau’s compelling visualizations for analyzing big data with high performance and unlimited scalability.