JethroData Blog

Simple, Automatic Range Partitioning in JethroData

Posted by Ofir Manor on Oct 23, 2014 7:39:00 AM

This post will introduce how the partitioning feature is implemented in JethroData. In a nutshell, we added  a simple, automatic range partitioning mechanism that are very easy to work with.

Why Use Partitioning?

Generally, there are two reasons to partition a large table:

Ease of maintenance - partitioning allows implementing a data retention policy and enables efficient purging of old data when it is no longer needed (rolling window). Also, it allows removing part of the data if invalid data was accidentally loaded (for example, remove a specific day).

Performance and Scalability - regular parallel databases (like Impala or Hive on Tez in the Hadoop space) execute queries by doing a full scan of the local data in all nodes, in parallel. Partitioning allows each node to scan less data (partition pruning), improving performance. Also, it improves scalability - if a report accesses only one month of data, it will have the same performance even if we add many more months of data to the table.

Partitioning in JethroData Read More

Topics: Blog

Connecting To JethroData from Tableau

Posted by Ofir Manor on Oct 13, 2014 6:58:05 PM

JethroData allows fast interactive queries over big data, by indexing all your data. Tableau is a popular BI tool that can be used with JethroData. Tableau runs on Windows 64-bit servers and connects to databases using ODBC. In order to connect to JethroData from Tableau, all you need to do is install our ODBC driver, define a DSN and do a quick Tableau setup. Here is the way to do it:

Read More

Topics: Blog

Announcing our public beta

Posted by Ofir Manor on Sep 23, 2014 3:19:00 PM

We are happy to announce, after two years of hard work, that our beta is now publicly available for download!

JethroData is a fully-index SQL engine for Hadoop, focusing on enabling fast interactive queries for business users. It uses its hierarchial bitmap indexes to accelerate most queries. It is easy to get started, does not require any modification of your Hadoop cluster and only add minimal load to your production environment.

Read More

Topics: Blog

JethroData and eBay’s Big Data Lab project

Posted by Ofir Manor on Aug 24, 2014 10:38:00 AM

In the end of 2013, JethroData was invited to participate in the first round of eBay Big Data Lab project. The project grant selected startups and researches access to anonymoized data sets on eBay large internal clusters. We used the fantastic opportunity to harden our alpha software, identify and solve performance and scalability bottlenecks and add basic Kerberos support, to name a few.

Read More

Topics: Press Releases

Comparing JethroData to Impala's Interactive Benchmark

Posted by Ofir Manor on Jun 8, 2014 9:45:00 PM

Last week, Cloudera published a benchmark on its blog comparing Impala's performance to some of of its alternatives - specifically Impala 1.3.0, Hive 0.13 on Tez, Shark 0.9.2 and Presto 0.6.0. While it faced some criticism on the atypical hardware sizing, modifying the original SQLs and avoiding fact-to-fact joins, it still provides a valuable data point:

Read More

Topics: Blog