<iframe src="//www.googletagmanager.com/ns.html?id=GTM-TT4L49" height="0" width="0" style="display:none;visibility:hidden">
Jethro - We Make Real-Time Business Intelligence Work on Hadoop

Blog

What's new in JethroData 1.0

By Jethro on April 07, 2015

Share

As we announced earlier today, JethroData 1.0 was just released. Since launching the public beta six months ago, we have added numerous improvements and bug fixes across the board. I would like to share some highlights:

Performance

  • Adaptive Cache - Users typically access Jethro from their dashboards and BI tools. These tools generate SQLs in a
    predictable way: for example, always start with a bunch of top-level aggregations. The Jethro adaptive cache automatically caches intermediate result sets and bitmaps, and reuses them across queries when appropriate. It is also incremental, combining previous result sets with newly-loaded data.
  • Join Elimination - We have found that some BI tools generate excessive joins that could be optimized under certain conditions. We leverage our index metadata to see if that is the case, and if so, eliminate the joins altogether.
  • Count Distinct - We recently refactored our count distinct elimination, improving its parallelism and memory utilization, leading to an average 5x faster queries with count distinct on large data sets, as reported by one of our beta customers.

Operations

  • Client-side load-balancing - Automatically load-balances submitted SQL queries across all JethroServer hosts.
  • Query queuing - Smooths out spikes in SQL query loads by automatically queuing queries under high concurrency or memory pressure. Status can be checked with the SHOW ACTIVE QUERIES command.
  • OVERWRITE option - When loading new data, a user can choose between APPEND and OVERIDE modes. It can be used to atomically replace a dimension with a newer copy, or automatically replace the content of specific partitions of a fact table (for example, for a data correction process).
  • JethroClient - Various improvements to our CLI tool, including a CSV output mode.
  • SHOW TABLE COLUMNS - A useful command that shows up-to-date column-level information, including the number of distinct values, the number of NULLs, size on disk and more. Leverages our index metadata.

SQL

  • VIEWs - Implementation of views
  • Additional query constructs - From our backlog, including CASE expressions, explicit casting function and operator, relaxing some outer join limitations and more.

Want to learn more? Read our technical whitepaper or download Jethro and its documentation and get started.