Question # 1
Problem Scenario 84 : In Continuation of previous question, please accomplish following activities. 1. Select all the products which has product code as null 2. Select all the products, whose name starts with Pen and results should be order by Price descending order. 3. Select all the products, whose name starts with Pen and results should be order by Price descending order and quantity ascending order. 4. Select top 2 products by price
|
Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : Step 1 : Select all the products which has product code as null val results = sqlContext.sql(......SELECT' FROM products WHERE code IS NULL......) results. showQ val results = sqlContext.sql(......SELECT * FROM products WHERE code = NULL ",,M ) results.showQ Step 2 : Select all the products , whose name starts with Pen and results should be order by Price descending order. val results = sqlContext.sql(......SELECT * FROM products WHERE name LIKE 'Pen %' ORDER BY price DESC......) results. showQ Step 3 : Select all the products , whose name starts with Pen and results should be order by Price descending order and quantity ascending order. val results = sqlContext.sql('.....SELECT * FROM products WHERE name LIKE 'Pen %' ORDER BY price DESC, quantity......) results. showQ Step 4 : Select top 2 products by price val results = sqlContext.sql(......SELECT' FROM products ORDER BY price desc Question No : 59 CORRECT TEXT 88 LIMIT2......} results. show()
Question # 2
Problem Scenario 74 : You have been given MySQL DB with following details. user=retail_dba password=cloudera database=retail_db table=retail_db.orders table=retail_db.order_items jdbc URL = jdbc:mysql://quickstart:3306/retail_db Columns of order table : (orderjd , order_date , ordercustomerid, order status} Columns of orderjtems table : (order_item_td , order_item_order_id , order_item_product_id, order_item_quantity,order_item_subtotal,order_item_product_price) Please accomplish following activities. 1. Copy "retaildb.orders" and "retaildb.orderjtems" table to hdfs in respective directory p89_orders and p89_order_items . 2. Join these data using orderjd in Spark and Python 3. Now fetch selected columns from joined data Orderld, Order date and amount collected on this order. 4. Calculate total order placed for each date, and produced the output sorted by date.
|
Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution: Step 1 : Import Single table . sqoop import -connect jdbc:mysql://quickstart:3306/retail_db -username=retail_dba - password=cloudera -table=orders -target-dir=p89_orders - -m1 sqoop import -connect jdbc:mysql://quickstart:3306/retail_db -username=retail_dba - password=cloudera -table=order_items ~target-dir=p89_ order items -m 1 Note : Please check you dont have space between before or after '=' sign. Sqoop uses the MapReduce framework to copy data from RDBMS to hdfs Step 2 : Read the data from one of the partition, created using above command, hadoopfs -cat p89_orders/part-m-00000 hadoop fs -cat p89_order_items/part-m-00000 Step 3 : Load these above two directory as RDD using Spark and Python (Open pyspark terminal and do following). orders = sc.textFile("p89_orders") orderitems = sc.textFile("p89_order_items") Step 4 : Convert RDD into key value as (orderjd as a key and rest of the values as a value) #First value is orderjd ordersKeyValue = orders.map(lambda line: (int(line.split(",")[0]), line)) #Second value as an Orderjd orderltemsKeyValue = orderltems.map(lambda line: (int(line.split(",")[1]), line)) Step 5 : Join both the RDD using orderjd joinedData = orderltemsKeyValue.join(ordersKeyValue) #print the joined data tor line in joinedData.collect(): print(line) Format of joinedData as below. [Orderld, 'All columns from orderltemsKeyValue', 'All columns from orders Key Value'] Step 6 : Now fetch selected values Orderld, Order date and amount collected on this order. revenuePerOrderPerDay = joinedData.map(lambda row: (row[0]( row[1][1].split(",")[1]( f!oat(row[1][0].split('\M}[4]}}} #printthe result for line in revenuePerOrderPerDay.collect(): print(line) Step 7 : Select distinct order ids for each date. #distinct(date,order_id) distinctOrdersDate = joinedData.map(lambda row: row[1][1].split('\")[1] + "," + str(row[0])).distinct() for line in distinctOrdersDate.collect(): print(line) Step 8 : Similar to word count, generate (date, 1) record for each row. newLineTuple = distinctOrdersDate.map(lambda line: (line.split(",")[0], 1)) Step 9 : Do the count for each key(date), to get total order per date. totalOrdersPerDate = newLineTuple.reduceByKey(lambda a, b: a + b} #print results for line in totalOrdersPerDate.collect(): print(line) step 10 : Sort the results by date sortedData=totalOrdersPerDate.sortByKey().collect() #print results for line in sortedData: print(line)
Question # 3
Problem Scenario 46 : You have been given belwo list in scala (name,sex,cost) for each work done. List( ("Deeapak" , "male", 4000), ("Deepak" , "male", 2000), ("Deepika" , "female", 2000),("Deepak" , "female", 2000), ("Deepak" , "male", 1000) , ("Neeta" , "female", 2000)) Now write a Spark program to load this list as an RDD and do the sum of cost for combination of name and sex (as key)
|
Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : Step 1 : Create an RDD out of this list val rdd = sc.parallelize(List( ("Deeapak" , "male", 4000}, ("Deepak" , "male", 2000), ("Deepika" , "female", 2000),("Deepak" , "female", 2000), ("Deepak" , "male", 1000} , ("Neeta" , "female", 2000}}} Step 2 : Convert this RDD in pair RDD val byKey = rdd.map({case (name,sex,cost) => (name,sex)->cost}) Step 3 : Now group by Key val byKeyGrouped = byKey.groupByKey Step 4 : Nowsum the cost for each group val result = byKeyGrouped.map{case ((id1,id2),values) => (id1,id2,values.sum)} Step 5 : Save the results result.repartition(1).saveAsTextFile("spark12/result.txt")
Question # 4
Problem Scenario 87 : You have been given below three files product.csv (Create this file in hdfs) productID,productCode,name,quantity,price,supplierid 1001,PEN,Pen Red,5000,1.23,501 1002,PEN,Pen Blue,8000,1.25,501 1003,PEN,Pen Black,2000,1.25,501 1004,PEC,Pencil 2B,10000,0.48,502 1005,PEC,Pencil 2H,8000,0.49,502 1006,PEC,Pencil HB,0,9999.99,502 2001,PEC,Pencil 3B,500,0.52,501 2002,PEC,Pencil 4B,200,0.62,501 2003,PEC,Pencil 5B,100,0.73,501 2004,PEC,Pencil 6B,500,0.47,502 supplier.csv supplierid,name,phone 501,ABC Traders,88881111 502,XYZ Company,88882222 503,QQ Corp,88883333 products_suppliers.csv productID,supplierID 2001,501 2002,501 2003,501 2004,502 2001,503 Now accomplish all the queries given in solution. Select product, its price , its supplier name where product price is less than 0.6 using SparkSQL
|
Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : Step 1: hdfs dfs -mkdir sparksql2 hdfs dfs -put product.csv sparksq!2/ hdfs dfs -put supplier.csv sparksql2/ hdfs dfs -put products_suppliers.csv sparksql2/ Step 2 : Now in spark shell // this Is used to Implicitly convert an RDD to a DataFrame. import sqlContext.impIicits._ // Import Spark SQL data types and Row. import org.apache.spark.sql._ // load the data into a new RDD val products = sc.textFile("sparksql2/product.csv") val supplier = sc.textFileC'sparksq^supplier.csv") val prdsup = sc.textFile("sparksql2/products_suppliers.csv"} // Return the first element in this RDD products.fi rst() supplier.first{). prdsup.first() //define the schema using a case class case class Product(productid: Integer, code: String, name: String, quantity:lnteger, price: Float, supplierid:lnteger) case class Suplier(supplierid: Integer, name: String, phone: String) case class PRDSUP(productid: Integer.supplierid: Integer) // create an RDD of Product objects val prdRDD = products.map(_.split('\")).map(p => Product(p(0).tolnt,p(1),p(2),p(3).tolnt,p(4).toFloat,p(5).toint)) val supRDD = supplier.map(_.split(",")).map(p => Suplier(p(0).tolnt,p(1),p(2))) val prdsupRDD = prdsup.map(_.split(",")).map(p => PRDSUP(p(0).tolnt,p(1}.tolnt}} prdRDD.first() prdRDD.count() supRDD.first() supRDD.count() prdsupRDD.first() prdsupRDD.count(} // change RDD of Product objects to a DataFrame val prdDF = prdRDD.toDF() val supDF = supRDD.toDF() val prdsupDF = prdsupRDD.toDF() // register the DataFrame as a temp table prdDF.registerTempTablef'products") supDF.registerTempTablef'suppliers") prdsupDF.registerTempTablef'productssuppliers"} //Select product, its price , its supplier name where product price is less than 0.6 val results = sqlContext.sql(......SELECT products.name, price, suppliers.name as sup_name FROM products JOIN suppliers ON products.supplierlD= suppliers.supplierlD WHERE price < 0.6......] results. show()
Question # 5
Problem Scenario 5 : You have been given following mysql database details. user=retail_dba password=cloudera database=retail_db jdbc URL = jdbc:mysql://quickstart:3306/retail_db Please accomplish following activities. 1. List all the tables using sqoop command from retail_db 2. Write simple sqoop eval command to check whether you have permission to read database tables or not. 3. Import all the tables as avro files in /user/hive/warehouse/retail cca174.db 4. Import departments table as a text file in /user/cloudera/departments.
|
Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution: Step 1 : List tables using sqoop sqoop list-tables -connect jdbc:mysql://quickstart:330G/retail_db -username retail dba - password cloudera Step 2 : Eval command, just run a count query on one of the table. sqoop eval \ -connect jdbc:mysql://quickstart:3306/retail_db \ -username retail_dba \ -password cloudera \ -query "select count(1) from ordeMtems" Step 3 : Import all the tables as avro file. sqoop import-all-tables \ -connect jdbc:mysql://quickstart:3306/retail_db \ -username=retail_dba \ -password=cloudera \ -as-avrodatafile \ -warehouse-dir=/user/hive/warehouse/retail stage.db \ -ml Step 4 : Import departments table as a text file in /user/cloudera/departments sqoop import \ -connect jdbc:mysql://quickstart:3306/retail_db \ -username=retail_dba \ -password=cloudera \ -table departments \ -as-textfile \ -target-dir=/user/cloudera/departments Step 5 : Verify the imported data. hdfs dfs -Is /user/cloudera/departments hdfs dfs -Is /user/hive/warehouse/retailstage.db hdfs dfs -Is /user/hive/warehouse/retail_stage.db/products
Helping People Grow Their Careers
1. Updated CCA Spark and Hadoop Developer Exam Dumps Questions
2. Free CCA175 Updates for 90 days
3. 24/7 Customer Support
4. 96% Exam Success Rate
5. CCA175 Cloudera Dumps PDF Questions & Answers are Compiled by Certification Experts
6. CCA Spark and Hadoop Developer Dumps Questions Just Like on the Real Exam Environment
7. Live Support Available for Customer Help
8. Verified Answers
9. Cloudera Discount Coupon Available on Bulk Purchase
10. Pass Your CCA Spark and Hadoop Developer Exam Exam Easily in First Attempt
11. 100% Exam Passing Assurance
-->
|