Like the Blog?

Followers

Setting and Altering Replication and Blocksize in HDFS


1. Configuring Blocksize in HDFS

1.a. Setting the default blocksize for the entire file system

We can set up the default blocksize on HDFS by configuring the hdfs-site.xml configuration file. To do this, we need to add the below property for dfs.blocksize inside the <configuration> tag in hdfs-site.xml (to set the default size as 64MB which is equivalent to 67108864 bytes):

<property>
        <name>dfs.blocksize</name>
        <value>67108864</value>
<description>Default blocksize in HDFS</description>
</property>

We can change this <value> as and when needed and according to our requirement.

To check the default blocksize of the Hadoop File System, we need to run the command below:

hduser@Soumitra-PC:~$ hdfs getconf -confKey dfs.blocksize
67108864


Now if we need to change this default blocksize (to 128MB, equivalent to 134217728 bytes), we can do it by updating the required value dfs.blocksize property in hdfs-site.xml as shown below:

<property>
        <name>dfs.blocksize</name>
        <value>134217728</value>
<description>Default blocksize in HDFS</description>
</property>

We can check the change by the following command again: 

hduser@Soumitra-PC:~$ hdfs getconf -confKey dfs.blocksize
134217728



1.b. Setting the blocksize of a particular file.

If we need to check the blocksize of a particular file in HDFS, the command to do that is as below:

hduser@Soumitra-PC:~$ hdfs dfs -stat %o /file1
67108864

Now, if we want to change this blocksize value to 128MB, i.e, to 134217728 bytes. So let us delete the 'file1' from HDFS directory and re-copy it from the local file system. with the additional 'dfs.blocksize=134217728' pre-mentioned in the command statement itself. The commands are as follows:

hduser@Soumitra-PC:~$ hdfs dfs -rm /file1
17/09/14 22:09:57 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /file1

hduser@Soumitra-PC:~$ hdfs dfs -D dfs.blocksize=134217728 -put /home/hduser/file1 /

hduser@Soumitra-PC:~$ hdfs dfs -stat %o /file1
134217728


So, by this way, we can change the blocksize for files on individual basis, instead of changing the default blocksize of the whole file system.


2. Configuring Replication in HDFS

2.a. Setting the default replication for the entire file system

We can set up the default replication on HDFS by configuring the hdfs-site.xml configuration file. To do this, we need to add/modify the below property for dfs.replication inside the <configuration> tag in hdfs-site.xml (to set the default size as 1):

 <property>
  <name>dfs.replication</name>
  <value>1</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
 </property>

We can change this <value> as and when needed and according to our requirement.

To check the replication of file(s) in Hadoop File System, we can do a simple ls on the HDFS and check the second column of the outpit, whichs shows the replication of all files in the HDFS:

hduser@Soumitra-PC:~$ hdfs dfs -ls /
Found 6 items
drwxr-xr-x   - hduser supergroup          0 2017-09-12 07:26 /WordCount
-rw-r--r--   1 hduser supergroup         43 2017-09-14 22:11 /file1
-rw-r--r--   1 hduser supergroup         30 2017-09-13 13:42 /file2
drwxr-xr-x   - hduser supergroup          0 2017-09-10 19:41 /hadoop
drwxr-xr-x   - hduser supergroup          0 2017-09-09 21:56 /system
-rw-r--r--   1 hduser supergroup        209 2017-09-14 13:50 /test.cc

Mostly, we can see all files are having replication 1. This may happen because of the default replication set as 1 in hdfs-site.xml file.


Now if we need to change this default replication (to 3), we can do it by updating the required value dfs.replication property in hdfs-site.xml as shown below:

 <property>
  <name>dfs.replication</name>
  <value>3</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
 </property>

Now, let's copy a file from local file system to the HDFS home directory and do a ls and check the replication:

hduser@Soumitra-PC:~$ hdfs dfs -put /home/soumitra/test.txt /

hduser@Soumitra-PC:~$ hdfs dfs -ls /
Found 7 items
drwxr-xr-x   - hduser supergroup          0 2017-09-12 07:26 /WordCount
-rw-r--r--   1 hduser supergroup         43 2017-09-14 22:11 /file1
-rw-r--r--   1 hduser supergroup         30 2017-09-13 13:42 /file2
drwxr-xr-x   - hduser supergroup          0 2017-09-10 19:41 /hadoop
drwxr-xr-x   - hduser supergroup          0 2017-09-09 21:56 /system
-rw-r--r--   1 hduser supergroup        209 2017-09-14 13:50 /test.cc
-rw-r--r--   3 hduser supergroup         21 2017-09-14 22:45 /test.txt



2.b. Setting the replication of a particular file.

If we need to change the replication of a particular file in HDFS instead of the default replication of the whole file system. The command to do that is as below:

hduser@Soumitra-PC:~$ hdfs dfs -setrep 2 /test.txt
Replication 2 set: /test.txt

hduser@Soumitra-PC:~$ hdfs dfs -ls /
Found 7 items
drwxr-xr-x   - hduser supergroup          0 2017-09-12 07:26 /WordCount
-rw-r--r--   1 hduser supergroup         43 2017-09-14 22:11 /file1
-rw-r--r--   1 hduser supergroup         30 2017-09-13 13:42 /file2
drwxr-xr-x   - hduser supergroup          0 2017-09-10 19:41 /hadoop
drwxr-xr-x   - hduser supergroup          0 2017-09-09 21:56 /system
-rw-r--r--   1 hduser supergroup        209 2017-09-14 13:50 /test.cc
-rw-r--r--   2 hduser supergroup         21 2017-09-14 22:45 /test.txt

Here, we have changed the replication of test.txt file from 3 to 2 using the setrep command, and displayed the result using ls. The change is clearly reflected in the output.






Document prepared by Mr. Soumitra Ghosh

Assistant Professor, Information Technology,
C.V.Raman College of Engineering, Bhubaneswar
Contact: soumitraghosh@cvrce.edu.in

No comments:

Post a Comment