Like the Blog?

Followers

Friday 8 September 2017

Frequently used Hadoop / HDFS Shell Commands


Here we will discuss some frequently used Hadoop Distributed File System Shell commands that will help us perform many operations on managing files on HDFS.

Most of the commands here are similar to corresponding Unix commands, with respect to syntax as well as their functionality. Only, we need to prefix "hadoop fs -" to the Unix Command. E.g.: hadoop fs -<UNIX-Command>".

Error information is sent to stderr and the output is sent to stdout. So, let's get started.

FS relates to a generic file system which can point to any file systems like local, HDFS etc. But dfs is very specific to HDFS. So when we use FS it can perform operation with from/to local or hadoop distributed file system to destination. But specifying DFS operation relates to HDFS.




But, before we start with executing any Hadoop Command, we need to start our Hadoop on our machine, logged in as the dedicated user that one must have created while installing Hadoop in his/her system.

In my case, the dedicated user created is 'hduser'. So, we need to do 'su hduser' and go to the directory where we have installed hadoop (in my case /usr/local/hadoop), and from 'sbin' folder, we need to execute 'start-all.sh' to start all necessary components in Hadoop. Do a jps to be sure whether all the components are running or not. Please refer the screenshot below:




1) version Check : To check the version of Hadoop.

hduser@Soumitra-PC:/sbin$ hadoop version

2) mkdir and ls Command : HDFS Command to create the directory in HDFS.

#Creating multiple directories at the same time
hduser@Soumitra-PC:/sbin$ hdfs dfs -mkdir /hadoop /soumitra
hduser@Soumitra-PC:/sbin$ hdfs dfs -ls /

#Creating sub-directory(sub_folder) inside another directory(soumitra)

hduser@Soumitra-PC:/sbin$ hdfs dfs -mkdir /soumitra/sub_folder
hduser@Soumitra-PC:/sbin$ hdfs dfs -ls /soumitra

3) put and copyFromLocal Command : 

put : Copy file from single src, or multiple srcs from local file system to the destination file system.

hduser@Soumitra-PC:/sbin$ hdfs dfs -put /home/soumitra/simple.java /
hduser@Soumitra-PC:/sbin$ hdfs dfs -ls /
copyFromLocal : Like 'put' command, it also copies file(s) from Local file system to HDFS.
hduser@Soumitra-PC:/sbin$ hdfs dfs -copyFromLocal /home/soumitra/simple.java /hadoop
hduser@Soumitra-PC:/sbin$ hdfs dfs -ls /hadoop

4) get and copyToLocal Command : 

get : Copy file from single src, or multiple srcs from hadoop file system to the local file system.

hduser@Soumitra-PC:/sbin$ hdfs dfs -get /file1 /home/hduser
hduser@Soumitra-PC:/sbin$ ls


copyToLocal : Like 'get' command, it also copies file(s) from HDFS to local FS.
hduser@Soumitra-PC:/sbin$ hdfs dfs -copyToLocal /file1 /home/hduser
hduser@Soumitra-PC:/sbin$ ls



5) df Command : Displays free space at given hdfs destination

hduser@Soumitra-PC:/sbin$ hdfs dfs -df hdfs:/
6) count Command : Count the number of directories, files and bytes under the paths that match the specified file pattern.

hduser@Soumitra-PC:/sbin$ hdfs dfs -count hdfs:/

7) fsck Command : HDFS Command to check the health of the Hadoop file system.

hduser@Soumitra-PC:/sbin$ hdfs fsck - /


8) balancer Command : Run a cluster balancing utility.

hduser@Soumitra-PC:/sbin$ hdfs balancer



9) du Command : Displays size of files and directories contained in the given directory or the size of a file if its just a file.

hduser@Soumitra-PC:/sbin$ hdfs dfs -du /


10) rm Command : HDFS Command to remove the file from HDFS.

hduser@Soumitra-PC:/sbin$ hdfs dfs -rm /hadoop/simple.java


11) rm -r Command : HDFS Command to remove the entire directory and all of its content from HDFS.

hduser@Soumitra-PC:/sbin$ hdfs dfs -rm -r /soumitra




12) expunge Command : HDFS Command that makes the trash empty.


hduser@Soumitra-PC:/sbin$ hdfs dfs -expunge


13) touchz Command : HDFS Command to create a file in HDFS with file size 0 bytes.

hduser@Soumitra-PC:/sbin$ hdfs dfs -touchz /empty_file
hduser@Soumitra-PC:/sbin$ hdfs dfs -ls /


14) chmod Command : Change the permissions of files.

hduser@Soumitra-PC:/sbin$ hdfs dfs -chmod 777 /empty_file
hduser@Soumitra-PC:/sbin$ hdfs dfs -ls /


15) cat Command : HDFS Command that copies source paths to stdout.

hduser@Soumitra-PC:/sbin$ hdfs dfs -cat /simple.java


16) text Command : HDFS Command that takes a source file and outputs the file in text format.

hduser@Soumitra-PC:/sbin$ hdfs dfs -text /simple.java


17) mv Command : HDFS Command to move files from source to destination. This command allows multiple sources as well, in which case the destination needs to be a directory.

hduser@Soumitra-PC:/sbin$ hdfs dfs -mv /simple.java /soumitra
hduser@Soumitra-PC:/sbin$ hdfs dfs -ls /soumitra


18) cp Command : HDFS Command to copy files from source to destination. This command allows multiple sources as well, in which case the destination must be a directory.

hduser@Soumitra-PC:/sbin$ hdfs dfs -cp /soumitra/simple.java /
hduser@Soumitra-PC:/sbin$ hdfs dfs -ls /


19) tail Command : Displays last kilobyte of the file "new" to stdout.

hduser@Soumitra-PC:/sbin$ hdfs dfs -tail /simple.java


20) chown Command : HDFS command to change the owner of files.

hduser@Soumitra-PC:/sbin$ hdfs dfs -chown ubuntu:hadoop /empty_file
hduser@Soumitra-PC:/sbin$ hdfs dfs -ls /


21) setrep Command : Default replication factor to a file is 3. Below HDFS command is used to change replication factor of a file.

hduser@Soumitra-PC:/sbin$ hdfs dfs -setrep 3 /simple.java


22) stat Command : Print statistics about the file/directory at <path> in the specified format. Format accepts filesize in blocks (%b), type (%F), group name of owner (%g), name (%n), block size (%o), replication (%r), user name of owner(%u), and modification date (%y, %Y). %y shows UTC date as “yyyy-MM-dd HH:mm:ss” and %Y shows milliseconds since January 1, 1970 UTC. If the format is not specified, %y is used by default.

hduser@Soumitra-PC:/sbin$ hdfs dfs -stat "%F %u:%g %b %o %y %n" /hadoop


23) getfacl Command : Displays the Access Control Lists (ACLs) of files and directories. If a directory has a default ACL, then getfacl also displays the default ACL.

hduser@Soumitra-PC:/sbin$ hdfs dfs -getfacl /hadoop


24) du -s Command : Displays a summary of file lengths.

hduser@Soumitra-PC:/sbin$ hdfs dfs -du -s /simple.java


25) checksum Command : Returns the checksum information of a file.

hduser@Soumitra-PC:/sbin$ hdfs dfs -checksum /simple.java



26) moveFromLocal Command : 

moveFromLocal : Moves file(s) from single source, or multiple sources from 
the local destination file system to HDFS.

hduser@Soumitra-PC:/sbin$ hdfs dfs -moveFromLocal /home/hduser/test.cc /
hduser@Soumitra-PC:/sbin$ ls


27) moveToLocal Command : 

moveToLocal : The command is not implemented yet.
hduser@Soumitra-PC:/sbin$ hdfs dfs -moveToLocal /test.cc /home/hduser






Document Created by Mr. Soumitra Ghosh

Assistant Professor, Information Technology,
C.V.Raman College of Engineering, Bhubaneswar

Contact: soumitraghosh@cvrce.edu.in

No comments:

Post a Comment