Like the Blog?

Followers

Monday, 18 September 2017

Installation of Apache Pig 0.17.0 on Ubuntu 16.04.3

Step 1:  Download Pig tar file.

hduser@Soumitra-PC:~$ wget http://www-us.apache.org/dist/pig/pig-0.17.0/pig-0.17.0.tar.gz

Step 2: Extract the tar file using tar command. 
In below tar command, x means extract an archive file, z means filter an archive through gzip, f means filename of an archive file.
hduser@Soumitra-PC:~$ tar -xzf pig-0.17.0.tar.gz 
hduser@Soumitra-PC:~$ ls

Step 3: Move the extracted file to /usr/local/ directory

hduser@Soumitra-PC:~$ sudo mv /home/hduser/pig-0.17.0 /usr/local


Step 4: Edit the “~.bashrc” file to update the environment variables of Apache Pig. 
hduser@Soumitra-PC:~$ sudo gedit ~/.bashrc
Add the following at the end of the file:
#PIG VARIABLES START
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export PIG_HOME=/usr/local/pig-0.17.0
export PATH=$PATH:$PIG_HOME/bin
export PIG_CLASSPATH=$HADOOP_HOME_conf
export HADOOP_USER_CLASSPATH_FIRST=true
#PIG VARIABLES END
Also, make sure that hadoop path is also set.

Run the source command to make sure the changes get updated in the ~/.bashrc file.
hduser@Soumitra-PC:~$ source ~/.bashrc

Step 5: Check pig version. 
hduser@Soumitra-PC:~$ pig -version

Step 6: Run Pig. 
The Grunt shell can be started using the following command:
hduser@Soumitra-PC:~$ pig
Grunt Shell is used to run Pig Latin scripts.

Apache Pig can run in two modes, by default it chooses MapReduce mode. We can also run in MapReduce mode by writing the below command:
hduser@Soumitra-PC:~$ pig -x mapreduce

The other mode, local, can be run as:
hduser@Soumitra-PC:~$ pig -x local


References
https://www.edureka.co/blog/apache-pig-installation

Document prepared by Mr. Soumitra Ghosh

Assistant Professor, Information Technology,
C.V.Raman College of Engineering, Bhubaneswar
Contact: soumitraghosh@cvrce.edu.in

Sunday, 17 September 2017

Running a Hive Script

Step 1: We need to create an input file which contains the records that need to be inserted in the table. Let us create an input file.

hduser@Soumitra-PC:~$ sudo gedit input.txt


Edit the contents in the file as shown in the figure.


Step 2: To write the Hive Script the file should be saved with .sql extension. In this script,  a table will be created, described and data will be loaded and retrieved from the table.
hduser@Soumitra-PC:~$ sudo gedit hivescript.sql

On executing the above command, it will open the file with the list of all the Hive commands that need to be executed.
Now, we write the below lines inside the hivescript.sql file:
create table employees ( id INT, name STRING, sal DOUBLE ) row format delimited fields terminated by ',';
describe employees;
load data local inpath '/home/hduser/input.txt' into table employees;
select * from employees;
Significance of each line explained below:
Line 1 -> Creating the Table in Hive:
Command: create table employees ( id INT, name STRING, sal DOUBLE ) row format delimited fields terminated by ',';
Here, employees is the table name and { id, name, sal} are the columns of this table. Fields terminated by ‘,’ indicate that the columns in the input file are separated by the symbol ‘,’. By default, the records in the input file are separated by a new line.
Line 2 -> Describing the Table:
Command: describe employees;

Line 3 -> Loading the Data into the Table.

Command : load data local inpath '/home/hduser/input.txt' into table employees;

To load the input data into the table that we have created earlier in input.txt file.
Line 4 -> Retrieving the Data:
Command : Select * from product; 
The above command is used to retrieve the value of all the columns present in the table. 
The script should be like as it is shown in the below image:
Now, we are done with writing the Hive script.  The file hivescript.sql  can now be saved.

Step 3: Running the Hive Script
The following is the command to run the Hive script:
hduser@Soumitra-PC:~$ hive –f /home/hduser/hivescript.sql
While executing the script, make sure that the entire path of the location of the Script file is present.

We can see that all the commands are executed successfully. This is how Hive scripts are run and executed in Hadoop.

References
https://www.edureka.co/blog/how-to-run-hive-scripts/


Document prepared by 
Mr. Soumitra Ghosh

Assistant Professor, Information Technology,
C.V.Raman College of Engineering, Bhubaneswar
Contact: soumitraghosh@cvrce.edu.in

Installation of Hive 2.3.0 on Ubuntu 16.04.3

Step-by-Step tutorial to Install
Hive on Ubuntu
(with detailed Screenshots and Explanations)

[Note: Here, Hive-2.3.0 is being installed on Ubuntu-16.04.3, but this document can be referred for installation of any version of Hadoop in any version of Ubuntu(14.04 or above)]

Prerequisites: 
i) Login as hduser, by running the command $ su hduser.
ii) Start all Hadoop components, and check by jps that all the components are running or not. 

Please follow the below steps to install Apache Hive on Ubuntu:
Step 1:  Download Hive tar.

hduser@Soumitra-PC:/tmp$ wget http://archive.apache.org/dist/hive/hive-2.3.0/apache-hive-2.3.0-bin.tar.gz

Step 2:  Extract the tar file.

hduser@Soumitra-PC:/tmp$ sudo tar -xzf apache-hive-2.3.0-bin.tar.gz -C /usr/local

Step 3: Edit the “~/.bashrc” file.


hduser@Soumitra-PC:/tmp$  sudo gedit ~/.bashrc
Note: In systems where sudo gedit is not working. the file editing can be done using 'vi' or 'nano' command also.
Add the following at the end of the file:
#Hive Variables Start
export HIVE_HOME=/usr/local/apache-hive-2.3.0-bin
export HIVE_CONF_DIR=/usr/local/apache-hive-2.3.0-bin/conf
export PATH=$HIVE_HOME/bin:$PATH
export CLASSPATH=$CLASSPATH:/usr/local/hadoop/lib/*:.
export CLASSPATH=$CLASSPATH:/usr/local/apache-hive-2.3.0-bin/lib/*:.
#Hive Variables End
Also, make sure the Hadoop path is also set.
Run below command to make the changes work in same terminal.

hduser@laptop:/tmp$ source ~/.bashrc

Step 4: Check hive version.
hduser@laptop:/tmp$ hive --version

Step 5:  Create Hive directories within HDFS
The directory ‘warehouse’ is the location to store the table or data related to hive.

hduser@Soumitra-PC:~$ hdfs dfs -mkdir -p /user/hive/warehouse
hduser@Soumitra-PC:~$ hdfs dfs -mkdir -p /tmp
Set read/write permissions for table.
hduser@Soumitra-PC:~$ hdfs dfs -chmod g+w /tmp
hduser@Soumitra-PC:~$ hdfs dfs -chmod g+w /user/hive/warehouse

Step 6:  Configuring Hive
To configure Hive with Hadoop, we need to edit the hive-env.sh file, which is placed in the $HIVE_HOME/conf directory. The following commands redirect to Hive conf folder and copy the template file:
hduser@Soumitra-PC:/tmp$ cd $HIVE_HOME/conf
hduser@Soumitra-PC:/usr/local/apache-hive-2.3.0-bin/conf$ sudo cp hive-env.sh.template hive-env.sh

Edit the “hive-env.sh” file to update the environment variables for user.

hduser@Soumitra-PC:/usr/local/apache-hive-2.3.0-bin/conf$ sudo gedit hive-env.sh
Note: In systems where sudo gedit is not working. the file editing can be done using 'vi' or 'nano' command also.
Set the parameters as shown in the below snapshot.
export HADOOP_HOME=/usr/local/hadoop

Step 7: Edit hive-site.xml
hduser@Soumitra-PC:/tmp$ cd $HIVE_HOME/conf
hduser@Soumitra-PC:/usr/local/apache-hive-2.3.0-bin/conf$ sudo cp hive-default.xml.template hive-site.xml


Edit the “hive-site.xml” file to update the environment variables for user.

hduser@Soumitra-PC:/usr/local/apache-hive-2.3.0-bin/conf$ sudo gedit hive-site.xml
Note: In systems where sudo gedit is not working. the file editing can be done using 'vi' or 'nano' command also.
Replace everything inside hive-site.xml with the below code snippet:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:;databaseName=/home/hduser/apache-hive-2.3.0-bin/metastore_db;create=true</value>
<description>
JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>hive.metastore.uris</name>
<value/>
<description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.apache.derby.jdbc.EmbeddedDriver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.PersistenceManagerFactoryClass</name>
<value>org.datanucleus.api.jdo.JDOPersistenceManagerFactory</value>
<description>class implementing the jdo persistence</description>
</property>
</configuration>

Create a file named jpox.properties inside $HIVE_HOME/conf and add the following lines into it:
javax.jdo.PersistenceManagerFactoryClass =
org.jpox.PersistenceManagerFactoryImpl
org.jpox.autoCreateSchema = false
org.jpox.validateTables = false
org.jpox.validateColumns = false
org.jpox.validateConstraints = false
org.jpox.storeManagerType = rdbms
org.jpox.autoCreateSchema = true
org.jpox.autoStartMechanismMode = checked
org.jpox.transactionIsolation = read_committed
javax.jdo.option.DetachAllOnCommit = true
javax.jdo.option.NontransactionalRead = true
javax.jdo.option.ConnectionDriverName = org.apache.derby.jdbc.ClientDriver
javax.jdo.option.ConnectionURL = jdbc:derby://hadoop1:1527/metastore_db;create = true
javax.jdo.option.ConnectionUserName = APP
javax.jdo.option.ConnectionPassword = mine
We need to  set permission to Hive folder
hduser@Soumitra-PC:/usr/local$ sudo chown -R hduser:hadoop apache-hive-2.3.0-bin
Step 8: Downloading and Installing Apache Derby
hduser@Soumitra-PC:/usr/local/apache-hive-2.3.0-bin/conf$ cd /tmp
hduser@Soumitra-PC:/tmp$ wget http://archive.apache.org/dist/db/derby/db-derby-10.13.1.1/db-derby-10.13.1.1-bin.tar.gz

hduser@Soumitra-PC:/tmp$ sudo tar xvzf db-derby-10.13.1.1-bin.tar.gz -C /usr/local

Let's set up the Derby environment by appending the following lines to  ~/.bashrc file:

#DERBY Variables Start

export DERBY_HOME=/usr/local/db-derby-10.13.1.1-bin
export PATH=$PATH:$DERBY_HOME/bin
export CLASSPATH=$CLASSPATH:$DERBY_HOME/lib/derby.jar:$DERBY_HOME/lib/derbytools.jar
#DERBY Variables End

We need to create a directory named data into store Metastore data.


hduser@Soumitra-PC:/tmp$ sudo mkdir $DERBY_HOME/data




Now we completed Derby installation and environmental setup.
Step 9: By default, Hive uses Derby database. Initialize Derby database.
hduser@Soumitra-PC:/usr/local$ cd $HIVE_HOME/bin
hduser@Soumitra-PC:/usr/local/apache-hive-2.3.0-bin/bin$ schematool -dbType derby -initSchema

Step 10: Launch Hive.
hduser@Soumitra-PC:/usr/local/apache-hive-2.3.0-bin$ hive



References

1. http://www.bogotobogo.com/Hadoop/BigData_hadoop_Hive_Install_On_Ubuntu_16_04.php
2. https://www.edureka.co/blog/apache-hive-installation-on-ubuntu



Document prepared by 
Mr. Soumitra Ghosh

Assistant Professor, Information Technology,
C.V.Raman College of Engineering, Bhubaneswar
Contact: soumitraghosh@cvrce.edu.in