Like the Blog?

Followers

Sunday 17 September 2017

Installation of Hive 2.3.0 on Ubuntu 16.04.3

Step-by-Step tutorial to Install
Hive on Ubuntu
(with detailed Screenshots and Explanations)

[Note: Here, Hive-2.3.0 is being installed on Ubuntu-16.04.3, but this document can be referred for installation of any version of Hadoop in any version of Ubuntu(14.04 or above)]

Prerequisites: 
i) Login as hduser, by running the command $ su hduser.
ii) Start all Hadoop components, and check by jps that all the components are running or not. 

Please follow the below steps to install Apache Hive on Ubuntu:
Step 1:  Download Hive tar.

hduser@Soumitra-PC:/tmp$ wget http://archive.apache.org/dist/hive/hive-2.3.0/apache-hive-2.3.0-bin.tar.gz

Step 2:  Extract the tar file.

hduser@Soumitra-PC:/tmp$ sudo tar -xzf apache-hive-2.3.0-bin.tar.gz -C /usr/local

Step 3: Edit the “~/.bashrc” file.


hduser@Soumitra-PC:/tmp$  sudo gedit ~/.bashrc
Note: In systems where sudo gedit is not working. the file editing can be done using 'vi' or 'nano' command also.
Add the following at the end of the file:
#Hive Variables Start
export HIVE_HOME=/usr/local/apache-hive-2.3.0-bin
export HIVE_CONF_DIR=/usr/local/apache-hive-2.3.0-bin/conf
export PATH=$HIVE_HOME/bin:$PATH
export CLASSPATH=$CLASSPATH:/usr/local/hadoop/lib/*:.
export CLASSPATH=$CLASSPATH:/usr/local/apache-hive-2.3.0-bin/lib/*:.
#Hive Variables End
Also, make sure the Hadoop path is also set.
Run below command to make the changes work in same terminal.

hduser@laptop:/tmp$ source ~/.bashrc

Step 4: Check hive version.
hduser@laptop:/tmp$ hive --version

Step 5:  Create Hive directories within HDFS
The directory ‘warehouse’ is the location to store the table or data related to hive.

hduser@Soumitra-PC:~$ hdfs dfs -mkdir -p /user/hive/warehouse
hduser@Soumitra-PC:~$ hdfs dfs -mkdir -p /tmp
Set read/write permissions for table.
hduser@Soumitra-PC:~$ hdfs dfs -chmod g+w /tmp
hduser@Soumitra-PC:~$ hdfs dfs -chmod g+w /user/hive/warehouse

Step 6:  Configuring Hive
To configure Hive with Hadoop, we need to edit the hive-env.sh file, which is placed in the $HIVE_HOME/conf directory. The following commands redirect to Hive conf folder and copy the template file:
hduser@Soumitra-PC:/tmp$ cd $HIVE_HOME/conf
hduser@Soumitra-PC:/usr/local/apache-hive-2.3.0-bin/conf$ sudo cp hive-env.sh.template hive-env.sh

Edit the “hive-env.sh” file to update the environment variables for user.

hduser@Soumitra-PC:/usr/local/apache-hive-2.3.0-bin/conf$ sudo gedit hive-env.sh
Note: In systems where sudo gedit is not working. the file editing can be done using 'vi' or 'nano' command also.
Set the parameters as shown in the below snapshot.
export HADOOP_HOME=/usr/local/hadoop

Step 7: Edit hive-site.xml
hduser@Soumitra-PC:/tmp$ cd $HIVE_HOME/conf
hduser@Soumitra-PC:/usr/local/apache-hive-2.3.0-bin/conf$ sudo cp hive-default.xml.template hive-site.xml


Edit the “hive-site.xml” file to update the environment variables for user.

hduser@Soumitra-PC:/usr/local/apache-hive-2.3.0-bin/conf$ sudo gedit hive-site.xml
Note: In systems where sudo gedit is not working. the file editing can be done using 'vi' or 'nano' command also.
Replace everything inside hive-site.xml with the below code snippet:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:;databaseName=/home/hduser/apache-hive-2.3.0-bin/metastore_db;create=true</value>
<description>
JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>hive.metastore.uris</name>
<value/>
<description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.apache.derby.jdbc.EmbeddedDriver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.PersistenceManagerFactoryClass</name>
<value>org.datanucleus.api.jdo.JDOPersistenceManagerFactory</value>
<description>class implementing the jdo persistence</description>
</property>
</configuration>

Create a file named jpox.properties inside $HIVE_HOME/conf and add the following lines into it:
javax.jdo.PersistenceManagerFactoryClass =
org.jpox.PersistenceManagerFactoryImpl
org.jpox.autoCreateSchema = false
org.jpox.validateTables = false
org.jpox.validateColumns = false
org.jpox.validateConstraints = false
org.jpox.storeManagerType = rdbms
org.jpox.autoCreateSchema = true
org.jpox.autoStartMechanismMode = checked
org.jpox.transactionIsolation = read_committed
javax.jdo.option.DetachAllOnCommit = true
javax.jdo.option.NontransactionalRead = true
javax.jdo.option.ConnectionDriverName = org.apache.derby.jdbc.ClientDriver
javax.jdo.option.ConnectionURL = jdbc:derby://hadoop1:1527/metastore_db;create = true
javax.jdo.option.ConnectionUserName = APP
javax.jdo.option.ConnectionPassword = mine
We need to  set permission to Hive folder
hduser@Soumitra-PC:/usr/local$ sudo chown -R hduser:hadoop apache-hive-2.3.0-bin
Step 8: Downloading and Installing Apache Derby
hduser@Soumitra-PC:/usr/local/apache-hive-2.3.0-bin/conf$ cd /tmp
hduser@Soumitra-PC:/tmp$ wget http://archive.apache.org/dist/db/derby/db-derby-10.13.1.1/db-derby-10.13.1.1-bin.tar.gz

hduser@Soumitra-PC:/tmp$ sudo tar xvzf db-derby-10.13.1.1-bin.tar.gz -C /usr/local

Let's set up the Derby environment by appending the following lines to  ~/.bashrc file:

#DERBY Variables Start

export DERBY_HOME=/usr/local/db-derby-10.13.1.1-bin
export PATH=$PATH:$DERBY_HOME/bin
export CLASSPATH=$CLASSPATH:$DERBY_HOME/lib/derby.jar:$DERBY_HOME/lib/derbytools.jar
#DERBY Variables End

We need to create a directory named data into store Metastore data.


hduser@Soumitra-PC:/tmp$ sudo mkdir $DERBY_HOME/data




Now we completed Derby installation and environmental setup.
Step 9: By default, Hive uses Derby database. Initialize Derby database.
hduser@Soumitra-PC:/usr/local$ cd $HIVE_HOME/bin
hduser@Soumitra-PC:/usr/local/apache-hive-2.3.0-bin/bin$ schematool -dbType derby -initSchema

Step 10: Launch Hive.
hduser@Soumitra-PC:/usr/local/apache-hive-2.3.0-bin$ hive



References

1. http://www.bogotobogo.com/Hadoop/BigData_hadoop_Hive_Install_On_Ubuntu_16_04.php
2. https://www.edureka.co/blog/apache-hive-installation-on-ubuntu



Document prepared by 
Mr. Soumitra Ghosh

Assistant Professor, Information Technology,
C.V.Raman College of Engineering, Bhubaneswar
Contact: soumitraghosh@cvrce.edu.in

2 comments: