Share
Installation Prerequisites
Before starting to use external tables, ensure that the conditions specified in the following sections are met according to your environment:
Java Installation
- Check which JAVA version is installed on your machine by running the command:
java -version
- In case JAVA version is lower than 1.8 :
- Uninstall all previous installed versions of JAVA by using one of the following methods:
- Run the command:
sudo yum remove {JAVA version}
For example:
sudo yum remove java-1.7.0
- Show all installed packages on server and locate relevant java packages for removal:
yum list installed
Remove the located java packages:
sudo rpm -e {package name}
For example:
sudo rpm -e java-1.7.0-openjdk.x86_64
- Run the command:
- Install JAVA 1.8 by using one of the following methods:
- Run the command:
sudo yum install {java version}
For example:
sudo yum install java-1.8.0
- Install JAVA based on RPM package with JRE 1.8 previously downloaded from Java's site:
- JAVA latest version can be found here:
https://java.com/en/download/linux_manual.jsp - Copy the link of Linux x64 RPM
-
wget {copied link}
-
mv {downloaded file} java1.8.rpm
-
rpm -Uhiv java1.8.rpm
- JAVA latest version can be found here:
- Additional information regarding this installation process can be found here:
https://docs.oracle.com/javase/8/docs/technotes/guides/install/linux_jre.html#CFHBHAGI
- Run the command:
- Uninstall all previous installed versions of JAVA by using one of the following methods:
Hive client
Latest hive-JDBC version needs to be installed through yum install:
sudo yum install hive
For that purpose make sure you already retrieved the relevant installation repo file and installed necessary Hadoop components, depending on your Hadoop distribution as specified on Jethro Installation Guide – Installing Jethro on Hadoop (“Configuring Jethro Connection with Hadoop” section)
For example:
For Cloudera CDH 5.x distribution:
- Switch to user root.
- Retrieve the installation repo file by running the following commands:
rpm --quiet --import http://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera
wget http://archive.cloudera.com/cdh5/one-click-install/redhat/6/x86_64/cloudera-cdh-5-0.x86_64.rpm
- Install Hadoop client component:
yum -y localinstall cloudera-cdh-5-0.x86_64.rpm
- Install hive-jdbc:
yum -y install hive
Using External Tables
Create External Data Source in Jethro Client
Defining a connector between Jethro and a foreign Hive data source (schema).
Basic Syntax
CREATE EXTERNAL DATA SOURCE <new_data_source_name>TYPE = HIVE
LOCATION = [sub-protocol://]<host>:<port>/<db>
credentials=<user name>/<password>
For example:
CREATE EXTERNAL DATA SOURCE hive_simple_test TYPE=HIVE LOCATION='jdbc:hive2://10.1.1.61:10000/simpletest' credentials=''/'';
Note: A general connector can be created without specifying data base name. In this case the reference to the schema will be done later on upon external table creation.
For example:
CREATE EXTERNAL DATA SOURCE hive_simple_test TYPE=HIVE LOCATION='jdbc:hive2://10.1.1.61:10000/default' credentials=''/'';CREATE EXTERNAL TABLE hive_ext_jethro_simple_test
(
a STRING,
b STRING,
c STRING,
d STRING
)
DATA SOURCE= hive_simple_test
LOCATION='simpletest.simple_test';
Additional information regarding External Data Source can be found here.
Create External Data Source With KERBEROS in Jethro Client
Defining a connector between Jethro and a foreign Hive data source (schema) with KERBEROS.
Basic Syntax
CREATE EXTERNAL DATA SOURCE <new_data_source_name>TYPE = HIVE
LOCATION = [sub-protocol://]<host>:<port>/<db>;principal=<principal name>/<server name or IP address>@<KERBEROS domain>
credentials=<user name>/<password>
For example:
CREATE EXTERNAL DATA SOURCE hive_simple_test TYPE=HIVE LOCATION='jdbc:hive2://10.1.1.61:10000/simpletest;principal=hive/ip-10-1-1-116.us-west-2.compute.internal@JETHRO.COM' credentials=''/'';
Notes:
- Active KERBEROS ticket for user Jethro is needed.
- Use principal which has access to hive (in the example above: “hive”)
- Use server name/IP address (In the example above: “ip-10-1-1-116.us-west-2.compute.internal”) that is connected to KERBEROS domain (in the example above: “JETHRO.COM”)
Create Target Table in Jethro Client
Creating the target table in Jethro which will be populated with the data driven from the Hive data source table.
For example:
create table jethro_simple_test(
j_a STRING,
j_b STRING,
j_c STRING,
j_d STRING
);
Create External Table in Jethro Client
Creating a table in Jethro which is mapped to a foreign Hive data source table. It will use as an intermediate between Hive source table and Jethro Target Table and will allow loading the Hive source table data into Jethro Target Table using INSERT INTO command.
Basic Syntax
CREATE EXTERNAL TABLE <new_external_table_name>(
<column name><column_type>
….
<column name>
<column_type>
)
DATA_SOURCE=<existing_external_data_source_name>
LOCATION= <table_name_from_Hive_data_source>
For example:
CREATE EXTERNAL TABLE hive_ext_jethro_simple_test(
a STRING,
b STRING,
c STRING,
d STRING
)
DATA SOURCE=hive_simple_test
LOCATION='simple_test';
Additional information regarding External Table can be found here.
Loading Data Into Jethro Target Table in Jethro Client
Basic Syntax
Insert into <Target Table Name in Jethro> select * from <External Table in Jethro>
For example:
insert into jethro_simple_test select * from hive_ext_jethro_simple_test;