Step-by-Step tutorial to Install
(with detailed Screenshots and Explanations)
[Note: Here, Hive-2.3.0 is being installed on Ubuntu-16.04.3, but this document can be referred for installation of any version of Hadoop in any version of Ubuntu(14.04 or above)]
Prerequisites:
i) Login as hduser, by running the command $ su hduser.
ii) Start all Hadoop components, and check by jps that all the components are running or not.
Please follow the below steps to install Apache Hive on Ubuntu:
Step 1: Download Hive tar.
hduser@Soumitra-PC:/tmp$ wget http://archive.apache.org/dist/hive/hive-2.3.0/apache-hive-2.3.0-bin.tar.gz
Step 2: Extract the tar file.
hduser@Soumitra-PC:/tmp$ sudo tar -xzf apache-hive-2.3.0-bin.tar.gz -C /usr/local
Step 3: Edit the “~/.bashrc” file.
hduser@Soumitra-PC:/tmp$ sudo gedit ~/.bashrc
Note: In systems where sudo gedit is not working. the file editing can be done using 'vi' or 'nano' command also.
Add the following at the end of the file:
#Hive Variables Start
export HIVE_HOME=/usr/local/apache-hive-2.3.0-bin
export HIVE_CONF_DIR=/usr/local/apache-hive-2.3.0-bin/conf
export PATH=$HIVE_HOME/bin:$PATH
export CLASSPATH=$CLASSPATH:/usr/local/hadoop/lib/*:.
export CLASSPATH=$CLASSPATH:/usr/local/apache-hive-2.3.0-bin/lib/*:.
#Hive Variables End
Also, make sure the Hadoop path is also set.
Run below command to make the changes work in same terminal.
hduser@laptop:/tmp$ source ~/.bashrc
Step 4: Check hive version.
hduser@laptop:/tmp$ hive --version
Step 5: Create Hive directories within HDFS.
The directory ‘warehouse’ is the location to store the table or data related to hive.
hduser@Soumitra-PC:~$ hdfs dfs -mkdir -p /user/hive/warehouse
hduser@Soumitra-PC:~$ hdfs dfs -mkdir -p /tmp
Set read/write permissions for table.
hduser@Soumitra-PC:~$ hdfs dfs -chmod g+w /tmp
hduser@Soumitra-PC:~$ hdfs dfs -chmod g+w /user/hive/warehouse
Step 6: Configuring Hive
To configure Hive with Hadoop, we need to edit the hive-env.sh file, which is placed in the $HIVE_HOME/conf directory. The following commands redirect to Hive conf folder and copy the template file:
hduser@Soumitra-PC:/tmp$ cd $HIVE_HOME/conf
hduser@Soumitra-PC:/usr/local/apache-hive-2.3.0-bin/conf$ sudo cp hive-env.sh.template hive-env.sh
Edit the “hive-env.sh” file to update the environment variables for user.
hduser@Soumitra-PC:/usr/local/apache-hive-2.3.0-bin/conf$ sudo gedit hive-env.sh
Note: In systems where sudo gedit is not working. the file editing can be done using 'vi' or 'nano' command also.
Set the parameters as shown in the below snapshot.
export HADOOP_HOME=/usr/local/hadoop
Step 7: Edit hive-site.xml
hduser@Soumitra-PC:/tmp$ cd $HIVE_HOME/conf
hduser@Soumitra-PC:/usr/local/apache-hive-2.3.0-bin/conf$ sudo cp hive-default.xml.template hive-site.xml
Edit the “hive-site.xml” file to update the environment variables for user.
hduser@Soumitra-PC:/usr/local/apache-hive-2.3.0-bin/conf$ sudo gedit hive-site.xml
Note: In systems where sudo gedit is not working. the file editing can be done using 'vi' or 'nano' command also.
Replace everything inside hive-site.xml with the below code snippet:
<?xml
version="1.0" encoding="UTF-8"
standalone="no"?>
<?xml-stylesheet
type="text/xsl" href="configuration.xsl"?><!--
Licensed to
the Apache Software Foundation (ASF) under one or more
contributor
license agreements. See the NOTICE file distributed with
this work for
additional information regarding copyright ownership.
The ASF
licenses this file to You under the Apache License, Version 2.0
(the
"License"); you may not use this file except in compliance with
the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless
required by applicable law or agreed to in writing, software
distributed
under the License is distributed on an "AS IS" BASIS,
WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the
License for the specific language governing permissions and
limitations
under the License.
-->
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:;databaseName=/home/hduser/apache-hive-2.3.0-bin/metastore_db;create=true</value>
<description>
JDBC connect
string for a JDBC metastore.
To use SSL to
encrypt/authenticate the connection, provide database-specific SSL flag in the
connection URL.
For example,
jdbc:postgresql://myhost/db?ssl=true for postgres database.
</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location
of default database for the warehouse</description>
</property>
<property>
<name>hive.metastore.uris</name>
<value/>
<description>Thrift
URI for the remote metastore. Used by metastore client to connect to remote
metastore.</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.apache.derby.jdbc.EmbeddedDriver</value>
<description>Driver
class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.PersistenceManagerFactoryClass</name>
<value>org.datanucleus.api.jdo.JDOPersistenceManagerFactory</value>
<description>class
implementing the jdo persistence</description>
</property>
</configuration>
Create a file named jpox.properties inside $HIVE_HOME/conf and add the following lines into it:
javax.jdo.PersistenceManagerFactoryClass =
org.jpox.PersistenceManagerFactoryImpl
org.jpox.autoCreateSchema = false
org.jpox.validateTables = false
org.jpox.validateColumns = false
org.jpox.validateConstraints = false
org.jpox.storeManagerType = rdbms
org.jpox.autoCreateSchema = true
org.jpox.autoStartMechanismMode = checked
org.jpox.transactionIsolation = read_committed
javax.jdo.option.DetachAllOnCommit = true
javax.jdo.option.NontransactionalRead = true
javax.jdo.option.ConnectionDriverName = org.apache.derby.jdbc.ClientDriver
javax.jdo.option.ConnectionURL = jdbc:derby://hadoop1:1527/metastore_db;create = true
javax.jdo.option.ConnectionUserName = APP
javax.jdo.option.ConnectionPassword = mine
We need to set permission to Hive folder
hduser@Soumitra-PC:/usr/local$ sudo chown -R hduser:hadoop apache-hive-2.3.0-bin
Step 8: Downloading and Installing Apache Derby
hduser@Soumitra-PC:/usr/local/apache-hive-2.3.0-bin/conf$ cd /tmp
hduser@Soumitra-PC:/tmp$ wget http://archive.apache.org/dist/db/derby/db-derby-10.13.1.1/db-derby-10.13.1.1-bin.tar.gz
hduser@Soumitra-PC:/tmp$ sudo tar xvzf db-derby-10.13.1.1-bin.tar.gz -C /usr/local
Let's set up the Derby environment by appending the following lines to ~/.bashrc file:
#DERBY Variables Start
export DERBY_HOME=/usr/local/db-derby-10.13.1.1-bin
export PATH=$PATH:$DERBY_HOME/bin
export CLASSPATH=$CLASSPATH:$DERBY_HOME/lib/derby.jar:$DERBY_HOME/lib/derbytools.jar
#DERBY Variables End
We need to create a directory named data into store Metastore data.
hduser@Soumitra-PC:/tmp$ sudo mkdir $DERBY_HOME/data
Now we completed Derby installation and environmental setup.
Step 9: By default, Hive uses Derby database. Initialize Derby database.
hduser@Soumitra-PC:/usr/local$ cd $HIVE_HOME/bin
hduser@Soumitra-PC:/usr/local/apache-hive-2.3.0-bin/bin$ schematool -dbType derby -initSchema
Step 10: Launch Hive.
hduser@Soumitra-PC:/usr/local/apache-hive-2.3.0-bin$ hive
References
1. http://www.bogotobogo.com/Hadoop/BigData_hadoop_Hive_Install_On_Ubuntu_16_04.php
2. https://www.edureka.co/blog/apache-hive-installation-on-ubuntu
Document prepared by Mr. Soumitra Ghosh
Assistant Professor, Information Technology,
C.V.Raman College of Engineering, Bhubaneswar
Contact: soumitraghosh@cvrce.edu.in