<iframe src="//www.googletagmanager.com/ns.html?id=GTM-TT4L49" height="0" width="0" style="display:none;visibility:hidden">
Jethro - We Make Real-Time Business Intelligence Work on Hadoop
tableau-jethro-tip-1

Big Data, BI

Top 10 Jedi Tips for Using Tableau on Jethro

By Michael Kravec on April 09, 2018

Share

Last week's post was an ode to my love for using Tableau on Big Data. Now, I'm excited to share my top tips on doing so with Jethro.

To see the full documentation of best practices that we have found at Jethro, you can download it here: http://info.jethro.io/tableau-jethro-best-practices.  But since it is a long document, I wanted to highlight to top 10 tips for using Tableau on top of Jethro:

Use multiple table data connections not Custom SQL data connections.

This one is a general best practice, but we see this often in the field.  Many customers transfer their Tableau workbook from using extracts to Jethro.  It is easy to create an extract with a Custom SQL data connection since the SQL is only executed upon creation and refreshes.  When using Jethro, it is a live query experience and the Custom SQL is generated every query. All of the standard recommendations apply.  If there are calculations in your SQL, put them in a calculated field in Tableau.  Unions can be done with a standard connection.  Nearly all the things that you can do in Custom SQL will be better served within Tableau’s data connection.  The benefits on the Jethro side are that we do not have the overhead of unwrapping the complex sub queries that result from Custom SQL.  Also, we can do our own dimension table culling from queries. 

Dates – use continuous dates wherever possible

Dates in Tableau can be tricky.  You can do the same Month/Year type of visualization a few ways.  They may look nearly identical to the end user, but you can see the difference in the queries.  Jethro performs best with truncation SQL functions as opposed to finding the month and the year.  The easy way to force this in Tableau is to leverage the continuous date type in Tableau.  This changes the query to do a TRUNC(date_field) instead of MONTH(date_field) and YEAR(date_field). In nearly all cases, the rendered visuals are identical.  When filtering on Month/Year, the user experience is better as well.

Dates -  use rawsql_date to force trunc functions

This is related to the previous tip, but it is worthy of its own discussion since it is more advanced.  Since Jethro is essentially a SQL engine, the best way to prescriptively define the query is via rawsql calculations.  Yes, I know, rawsql is a bear.   I have used it in only a handful of use cases at my time at Tableau, but this is one of those times.  Creating a calc like this: RAWSQL_DATE("trunc (%1,%2)",[DateField],'month') will make a month/year field that is reusable and hides all of the complexity of continuous fields.  This is worth the effort in the data connection creation and can be published to Data Server for the easy reuse for your Tableau authors.

LOD calcs can be replicated with rawsql

This can be accelerated by taking advantage of Jethro’s optimizations for Qlik’s Set analysis.  This can be accessed through rawsql calculations.  The high level syntax of the qlik function is QLIK( [aggregation-column] , [filter-expression] ). 

While we are on the subject to of rawsql, here is another one.  LOD calculations are extremely flexible and powerful.  One thing that it does do is issue its own query.  Since Jethro automatically can create cubes and aggregations for queries, this does add extra work on the Jethro side.  So first off, do not use LOD calcs if it can be done with a Table calc or an aggregated calc.  if an LOD calc is required, you can leverage some additional optimizations we have done on other platforms.  There is a qlik function that performs set analysis.  You can use that for some LOD calculations.  The syntax can get very deep quickly, so refer to the “Jethro Best Practices with Tableau” doc that I created to help you with writing the rawsql for this.

 “Use all” on filters avoids where clause from being included

The two following filter selections will select all three segments in superstore and provide the same results.  The difference is the one on the left will select where segment is Consumer, Corporate and Home Office and the one on the right will not use segment in the where clause.  This can have material effect on query performance since you are asking Jethro to do extra work without any benefit what so ever. 

tableau-jethro-tip-1 tableau-jethro-tip-2

When creating dashboards in Tableau, make sure the Show “All” Value option is selected.  This will give the user an easy path to avoid the situation on the left.  When using web editing or desktop, you need to do this yourself. 

Be judicious with relevant values quick filters

This is a general best practice that get magnified when working with big data.  It doesn’t matter how fast or slow your queries are, when setting up a chain of relative value quick filters, you will make the dashboard render slower by the number of filters in the chain.  If you have ten quick filters that generate queries that take 250 milliseconds to complete they will take about 250 milliseconds to execute if done in parallel on a production Jethro server.  If they are all set to be relative value, they may need to be executed in serial taking 2.5 seconds to execute.  And that is just for the filters.

So, the recommendation is to use them only when necessary.  Save the relative values filters for hierarchies and things that only make sense.  Don’t be afraid to talk to the users to see if they are fine with a faster dashboard with less relative value filters.  I bet you they will be happy with the perceived tradeoff.

Use actions to prevent too many details from being queried

There is still a tendency to dashboard to be created with showing lots of detail first and then filter.  An unfiltered detail sheet can result in millions of returned records.  This technique would have worked on a dashboard from data with 10 million records, but not now with billions of records.  Please stop doing this.  There is an easy fix.  When creating actions for detail drilldowns, select the “Clearing the selection will: Exclude all values” option.  This technique is worth its weight in gold.

tableau-jethro-tip-3

First off, it prevents the query from being executed during the initial rendering of the dashboard.  The next thing it does is it helps the user focus on the higher level of the dashboard first. Any selection on the higher level can then issue the query filtered to the selection bringing a smaller data set back to be rendered.  This is a win-win-win.  The win for the user is a faster dashboard that feed them information and analysis on demand. It is a win for Tableau since it is less rendering effort. And it is a win for Jethro since the fastest query you can send it is no query at all. 

Avoid data transformation calculations that can be done in metadata

Know what Tableau data connections can do.  This can help avoid work and make performance better.  You can do basic transformations, grouping, rounding, aliasing and probably a few other things that I forgot.  Plus it is all saved in the data connection and reusable via Data Server.  An example of this is – aliasing gender from ‘M’ to Male and ‘F’ to Female.

Use Boolean calcs instead of If…Then calcs

Another general best practice that is magnified with Jethro’s optimizations.  I want everyone to go to their workbooks right now and change any if then statements that result in binary results to Boolean calculations.  Instead of:

If [Profit] > 0 then “In the Black”
else “In the Red”
End

Use the Boolean version:

[Profit] >0

Then you can alias True to “In the Back” and False to “In the Red”

Filter First when authoring

And last, and possibly most important… Remember that one of Jethro’s unique values is being an indexing and cubing engine.  We index every field in the data connection, so if you can leverage that, then you will get a faster experience.  Filter first has always been a best practice of mine when working with big datasets or slow databases (is there really a difference?).  With Jethro, this mantra is important since we deliver performance with this technique better than anyone else.  This is most important in web editing or Tableau Desktop since you are working with a large dataset in an ad hoc fashion.  If you are doing a new analysis, filter first, do some analysis.  If you need a more data, pull back on your filters – we probably just built a cube for you to use in the background!