Saturday, June 7, 2014

Installing and Running Apache Pig on Hadoop 2.x versions

Pig is a high-level platform for creating MapReduce programs used with Hadoop. The language for this platform is called Pig Latin. Pig Latin abstracts the programming from the Java MapReduce idiom into a notation which makes MapReduce programming high level, similar to that of SQL for RDBMS systems. Pig Latin can be extended using UDF (User Defined Functions) which the user can write in Java, Python, JavaScript, Ruby or Groovy and then call directly from the language.
Installing Pig is just as simple. What you will have to do is:
  1. Download the desired pig distribution from any one of the Apache Mirrors. It is best if you choose the latest version of Pig. Also, download the file whose name is like pig-0.12.1.tar.gz, where "0.12.1"is the version number. 
  2. Extract Pig to a desired directory.
    1.  Simple way is to copy the tar.gz to the root directory where you want your installation to reside.
    2. Now, execute the following code in Linux bash at the directory
      tar -xzf pig-0.12.1.tar.gz
    3. You will have the installation ready.
  3. Editing Path
    1. To access the pig installation easily and run your scripts from anywhere, make sure to add the pig's bin in your path.
    2. To do this, open the file /home/user/.bashrc in your favorite editor and copy the following line at the end of the file. 
      export PATH=/<my path to pig>/pig-n.n.n/bin:$PATH
    3. After doing all this, your Pig installation is ready for further configuration.
You might get the following error when running Pig scripts with Apache Hadoop 2.x. Here is a highlight of the error:
Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected

Now, that's a problem. Don't worry, solving it is unbelievably simple. This is what you will have to do:
  • cd  to your pig installation directory. Yes, inside the Pig directory.
  • And run this code:
    ant clean jar-withouthadoop -Dhadoopversion=23

After that, try running your Pig script again. You will find that everything is alright now.
Any problems working this around or have any suggestions? Just comment it below :)