Apache Pig Interview Questions and Answers 2022

Apache Pig is an abstraction over MapReduce. It is a tool/platform which is used to analyze larger sets of data representing them as data flows or is a high-level data flow platform for executing MapReduce programs of Hadoop. The language used for Pig is Pig Latin. Apache Pig is generally used with Hadoop; we can perform all the data manipulation operations in Hadoop using Apache Pig.

Features of Apache Pig

Below are the features of Apache Pig

1) Ease of programming

2) Optimization opportunities

3) Extensibility

4) Flexible

5) In-built operators

A Apache Pig Latin program consists of a series of operations or transformations which are applied to the input data to produce output. These operations describe a data flow which is translated into an executable representation, by Hadoop Pig execution environment. Underneath, results of these transformations are series of MapReduce jobs which a programmer is unaware of. So, in a way, Pig in Hadoop allows the programmer to focus on data rather than the nature of execution.

Differences between Apache MapReduce and Apache PIG

Apache MapReduceApache PIG
It is a low-level data processing tool.It is a high-level data flow tool.
Here, it is required to develop complex programs using Java or Python.It is not required to develop complex programs.
It is difficult to perform data operations in MapReduce.It provides built-in operators to perform data operations like union, sorting and ordering.
It doesn’t allow nested data types.It provides nested data types like tuple, bag, and map.
Apache Pig questions

Q.1 You can run Pig in batch mode using __ .

1)Pig shell command

2)Pig Latin statements

3)Pig scripts

4)All of the options

Correct Answer :Pig scripts

Q.2 Which of the following is correct about Pig?

1)Pig may generate a different number of Hadoop jobs given a particular script, dependent on the amount/type of data that is being processed

2)Pig replaces the MapReduce core with its own execution engine

3)When doing a default join, Pig will detect which join-type is probably the most efficient

4)Pig always generates the same number of Hadoop jobs given a particular script, independent of the amount/type of data that is being processed

Correct Answer :Pig may generate a different number of Hadoop jobs given a particular script, dependent on the amount/type of data that is being processed

Q.3 Pig Latin statements are generally organized in __.

1)A series of “transformation” statements to process the data

2)A DUMP statement to view results or a STORE statement to save the results

3)A LOAD statement to read data from the file system

Apache Pig hadoop questions

4)All of the options

Correct Answer :All of the options

Q.4 Which of the following is false about Pig operators?

1)To run Pig in mapreduce mode, you need access to a Hadoop cluster and HDFS installation

2)The DISPLAY operator will display the results to your terminal screen

3)To run Pig in local mode, you need access to a single machine

4)All of the options

Correct Answer :The DISPLAY operator will display the results to your terminal screen

Q.5 Command to run pig in local mode?

1)pig-x local

2)pig -x tez-local

3)pig

4)None of the options

Correct Answer :pig-x local

Q.6 Which of the following is correct?

1)Pig is an execution engine that utilizes the MapReduce core in Hadoop

2)Pig is an execution engine that compiles Pig Latin scripts into HDFS

3)Pig is an execution engine that replaces the MapReduce core in Hadoop

4)Pig is an execution engine that compiles Pig Latin scripts into database queries

Correct Answer :Pig is an execution engine that utilizes the MapReduce core in Hadoop

Q.7 Pig Latin statements are generally organized in __.

1)A LOAD statement to read data from the file system

2)A series of “transformation” statements to process the data

3)A DUMP statement to view results or a STORE statement to save the results

4)All of the options

Correct Answer :All of the options

Q.8 Interactive mode of Pig is __

1)grunt

2)FS

3)HDFS

4)None of the options

Correct Answer :grunt

Q.9 Which mode does PigUnit work by default?

1)tez

2)mapreduce

3)local

4)None of the options

Correct Answer :local

Q.10 Which of the following is an entry in jobconf ?

1)pig.feature

2)pig.job

3)pig.input.dirs

4)None of the options

Correct Answer :pig.input.dirs

Q.11 Which of the following helps to enable pig scripts?

1)PigUnitX

2)PigXUnit

3)PigUnit

4)None of the options

Correct Answer :PigXUnit

Q.12 You are asked to find the unique names in the file. Which operator will you choose?

1)filter, distinct

2)filter

3)foreach, distinct

4)foreach

Correct Answer :foreach, distinct

Q.13 Which of the following is used to deal with metadata?

1)LoadCaster

2)LoadPushDown

3)LoadMetadata

4)All of the options

Correct Answer :LoadMetadata

Q.14 Which function is used to return hdfs files to ship to distributed cache.

1)getShipFiles()

2)setUdfContextSignature()

3)relativeToAbsolutePath()

4)getCacheFiles()

Correct Answer :getShipFiles()

Q.15 top() is used to find the top data in the group.

1)True

2)False

Correct Answer :True

Q.16 What is Piggybank?

1)It’s a framework

2)It’s a platform

3)It’s a repository

4)None of the options

Correct Answer :It’s a repository

Q.17 PigLatin is __ while SQL is declarative.

1)procedural

2)functional

3)declarative

4)None of the options

Correct Answer :procedural

Q.18 Pig uses:-

1)Lazy evaluation

2)pipeline splits

3)ETL

4)All of the options

Correct Answer :All of the options

Q.19 Which of the following is not a scalar data type?

1)long

2)int

3)float

4)Map

Correct Answer :Map

Q.20 Which operator is used to view the schema of a table?

1)DUMP

2)DESCRIBE

3)STORE

4)EXPLAIN

Correct Answer :DESCRIBE

Q.21 Which of the following is true about Pig?

1)Pig works with data from many sources

2)LoadPredicatePushdown is same as LoadMetadata.setPartitionFilter

3)getOutputFormat() is called by Pig to get the InputFormat used by the loader

4)None of the options

Correct Answer :Pig works with data from many sources

Q.22 There is no connection between aggregate functions and group.

1)True

2)False

Correct Answer :True

Q.23 Which of the operator is used to used to show values to keys used in Pig?

1)show

2)declare

3)DESCRIBE

4)set

Correct Answer :set

Q.24 Which of the command is used to run pig script in grunt shell?

1)run

2)All of the options

3)fetch

4)declare

Correct Answer :run

Q.25 Which of the following is used for debugging?

1)exec

2)execute

3)error

4)throw

Correct Answer :exec

Q.26 Which of the following is used to view the map reduce execution steps?

1)DESCRIBE

2)explain

3)declare

4)show

Correct Answer :explain

Q.27 Which of the following is not true?

1)To implement a task, the number of lines of code in Pig and Hadoop are roughly the same

2)Code written for the Pig engine is directly compiled into machine code

3)Pig makes use of Hadoop job chaining

4)None of the options

Correct Answer :Pig makes use of Hadoop job chaining

Q.28 Pig Latin statements are generally organized in __.

1)A DUMP statement to view results or a STORE statement to save the results

2)A series of “transformation” statements to process the data

3)A LOAD statement to read data from the file system

4)All of the options

Correct Answer :All of the options

Q.29 pig -x tez-local will enable____ mode in Pig.

1)Mapreduce

2)tez

3)local

4)None of the options

Correct Answer :Mapreduce

Q.30 Which of the following is the default mode ?

1)Mapreduce

2)Local

3)Tez

4)None of the options

Correct Answer :Mapreduce

Q.31 The data can be loaded with or without defining the schema.

1)True

2)False

Correct Answer :True

Q.32 Which of the following says Hadoop provides does Pig break?

1)All values associated with a single key are processed by the same Reducer

2)The Combiner (if define4)may run multiple times, on the Map-side as well as the Reduce-side

3)Task stragglers due to slow machines (not data skew) can be sped up through speculative execution

4)Calls to the Reducer’s reduce() method only occur after the last Mapper has finished running

Correct Answer :All values associated with a single key are processed by the same Reducer

About Author


After years of Technical Work, I feel like an expert when it comes to Develop wordpress website. Check out How to Create a Wordpress Website in 5 Mins, and Earn Money Online Follow me on Facebook for all the latest updates.