<iframe src="//www.googletagmanager.com/ns.html?id=GTM-TT4L49" height="0" width="0" style="display:none;visibility:hidden">
Jethro - We Make Real-Time Business Intelligence Work on Hadoop

BI, Big Data

An Introduction to Jethro - Interactive BI on Big Data

By Eli Singer on June 07, 2018


Jethro is a leading vendor in the Big Data acceleration space. Our proven solution was developed over 5 years and is in production use by some of the most demanding global enterprises. 

Jethro’s solution is focused exclusively, and is highly optimized for, enabling interactive BI on Big Data. This specific use-case is different than other analytic use-cases such as Machine Learning or Predictive Modeling in that it needs to support high user concurrency and interactive response time. As such, it requires a different technology approach than tools that are optimized for all other big data workloads.

When it comes to high concurrency and fast performance of SQL queries, there are two options:

  • More hardware– by doubling the compute resources throughput and response time can be improved 2x.
  • Less work– by pre-indexing and pre-summarizing the data, the actual work needed to respond to an SQL query can be cut by 10x-50x, resulting in significantly faster response and higher throughput.

While several solutions in the market are focused on reducing the work by pre-aggregating the data (e.g. OLAP Cubes), Jethro is the only solution to combine such cubes and full indexing. This is a critical requirement as BI queries tend to range from highly summarized to highly detailed and both are needed to be accelerated to provide a consistent interactive BI experience.

  • Summary / Aggregated Queries
    Such queries summarize large number of rows into a small aggregate. For example, the query: SUM(sales) WHERE state=‘NY’could potentially aggregate billions of rows into a single figure. A cube which has pre-aggregated all rows by state will be able to respond to such query with little effort.
  • Detailed / Granular Queries
    These queries search for a small subset of the rows based on narrow filtering. For example, the query: SUM(sales) WHERE customer_id=123456789 will likely need only a few dozen rows out of many billions. An index based on customer_id will have no problem responding to this query in seconds.

While indexes are optimal for detailed queries, cubes are largely irrelevant as it will require impractically large and inefficient cubes for these queries. As a result, cube-only solutions need to rely on the underlying SQL-on-Hadoop engine (e.g. Hive, Impala) to perform all detailed queries. Such tools will often perform Full-Scans of the data, resulting in slow response time and heavy cluster compute load.

Jethro’s unique combination of cubes and indexes is the only solution that can accelerate both summary and detailed queries and ensure that BI dashboards perform at interactive speed not just half the time, but for every user interaction.

Another key advantage of Jethro’s solution is that it is designed to be self-driving and fully automated. This means that both cubes and indexes are created automatically and do not require manual IT work:

  • Auto Indexing– Jethro automatically indexes all columns of the BI dataset. This ensures that regardless of how application developers and users want to access the data (i.e. which filters they use), there will always be indexes to support their choices and accelerate their queries.
  • Auto Cubes– Jethro automatically defines and generates cubes based on actual queries submitted by the BI tool.

Unlike Jethro, the Cube-on-Hadoop solutions rely on manual definition and changes of cubes. An expert data modeler with deep knowledge of the applications, queries, and data is required to carefully design each and every cube to ensure it can cover many possible queries while still maintaining a relatively small size. While this is a significant up-front undertake, it has a much greater impact on ongoing changes. Whenever a BI developer changes their BI app (which happens at a high frequency in today’s agile enterprises), they may need to go back and request a change to the cube that was defined for them. Since changing a cube, or defining a new cube is a consequential decision, this manual process is slow and costly, and will require significant headcount to support. With Jethro none of that is needed as cubes will be defined/changed automatically by Jethro with no delay to the app developer.

Other benefits of Jethro include tight integration with BI tools, certification on various Hadoop platforms, support for multiple storage protocols, small cluster footprint, enterprise-grade security, flexible deployment model to support multi-tenancy, and easy-to-use management interface.

Jethro’s solution is purposely made to accelerate interactive BI on Big Data. It is the only solution that can accelerate both summary and detailed BI queries thanks to its combination of cubes and indexes. And it is the only solution that fully automates cube definition, resulting in significant IT headcount savings.