JethroData Blog

Being "Creative" with TPC-DS Benchmark - Dynamic Partition Pruning

Posted by Ofir Manor on Nov 10, 2014 9:01:00 AM

In this post, I would like to present a common optimization challenge, how is is solved in JethroData, and how some other SQL-on-Hadoop products "overcame" that challenge by manually modifying their benchmark scripts and queries to avoid the situation (which was quite a surprise for us when figured it while running our own benchmarks).

Read More

Topics: Blog

Partitioning in Hive and Impala Versus JethroData (and some TPC-DS gossip)

Posted by Ofir Manor on Oct 30, 2014 5:40:00 PM

In my previous post, I explained how partitioning works in JethroData. In this post, I would like to explain how partitioning was implemented in Hive and Impala, why their design is very problematic, and how our implementation avoids those problems. Design matters!

Read More

Topics: Blog

Simple, Automatic Range Partitioning in JethroData

Posted by Ofir Manor on Oct 23, 2014 7:39:00 AM

This post will introduce how the partitioning feature is implemented in JethroData. In a nutshell, we added  a simple, automatic range partitioning mechanism that are very easy to work with.

Why Use Partitioning?

Generally, there are two reasons to partition a large table:

Ease of maintenance - partitioning allows implementing a data retention policy and enables efficient purging of old data when it is no longer needed (rolling window). Also, it allows removing part of the data if invalid data was accidentally loaded (for example, remove a specific day).

Performance and Scalability - regular parallel databases (like Impala or Hive on Tez in the Hadoop space) execute queries by doing a full scan of the local data in all nodes, in parallel. Partitioning allows each node to scan less data (partition pruning), improving performance. Also, it improves scalability - if a report accesses only one month of data, it will have the same performance even if we add many more months of data to the table.

Partitioning in JethroData Read More

Topics: Blog

Connecting To JethroData from Tableau

Posted by Ofir Manor on Oct 13, 2014 6:58:05 PM

JethroData allows fast interactive queries over big data, by indexing all your data. Tableau is a popular BI tool that can be used with JethroData. Tableau runs on Windows 64-bit servers and connects to databases using ODBC. In order to connect to JethroData from Tableau, all you need to do is install our ODBC driver, define a DSN and do a quick Tableau setup. Here is the way to do it:

Read More

Topics: Blog

Announcing our public beta

Posted by Ofir Manor on Sep 23, 2014 3:19:00 PM

We are happy to announce, after two years of hard work, that our beta is now publicly available for download!

JethroData is a fully-index SQL engine for Hadoop, focusing on enabling fast interactive queries for business users. It uses its hierarchial bitmap indexes to accelerate most queries. It is easy to get started, does not require any modification of your Hadoop cluster and only add minimal load to your production environment.

Read More

Topics: Blog