Question # 1
Problem Scenario 13 : You have been given following mysql database details as well as other info. user=retail_dba password=cloudera database=retail_db jdbc URL = jdbc:mysql://quickstart:3306/retail_db Please accomplish following. 1. Create a table in retailedb with following definition. CREATE table departments_export (department_id int(11), department_name varchar(45), created_date T1MESTAMP DEFAULT NOWQ); 2. Now import the data from following directory into departments_export table, /user/cloudera/departments new
|
Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : Step 1 : Login to musql db mysql -user=retail_dba -password=cloudera show databases; use retail_db; show tables; step 2 : Create a table as given in problem statement. CREATE table departments_export (departmentjd int(11), department_name varchar(45), created_date T1MESTAMP DEFAULT NOW()); show tables; Step 3 : Export data from /user/cloudera/departmentsnew to new table departments_export sqoop export -connect jdbc:mysql://quickstart:3306/retail_db \ -username retaildba \ -password cloudera \ -table departments_export \ -export-dir /user/cloudera/departments_new \ -batch Step 4 : Now check the export is correctly done or not. mysql -user*retail_dba - password=cloudera show databases; use retail _db; show tables; select' from departments_export;
Question # 2
Problem Scenario 80 : You have been given MySQL DB with following details. user=retail_dba password=cloudera database=retail_db table=retail_db.products jdbc URL = jdbc:mysql://quickstart:3306/retail_db Columns of products table : (product_id | product_category_id | product_name | product_description | product_price | product_image ) Please accomplish following activities. 1. Copy "retaildb.products" table to hdfs in a directory p93_products 2. Now sort the products data sorted by product price per category, use productcategoryid colunm to group by category |
Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : Step 1 : Import Single table . sqoop import -connect jdbc:mysql://quickstart:3306/retail_db -username=retail_dba - password=cloudera -table=products -target-dir=p93 Note : Please check you dont have space between before or after '=' sign. Sqoop uses the MapReduce framework to copy data from RDBMS to hdfs Step 2 : Step 2 : Read the data from one of the partition, created using above command, hadoop fs -cat p93_products/part-m-00000 Step 3 : Load this directory as RDD using Spark and Python (Open pyspark terminal and do following}. productsRDD = sc.textFile(Mp93_products") Step 4 : Filter empty prices, if exists #filter out empty prices lines Nonempty_lines = productsRDD.filter(lambda x: len(x.split(",")[4]) > 0) Step 5 : Create data set like (categroyld, (id,name,price) mappedRDD = nonempty_lines.map(lambda line: (line.split(",")[1], (line.split(",")[0], line.split(",")[2], float(line.split(",")[4])))) tor line in mappedRDD.collect(): print(line) Step 6 : Now groupBy the all records based on categoryld, which a key on mappedRDD it will produce output like (categoryld, iterable of all lines for a key/categoryld) groupByCategroyld = mappedRDD.groupByKey() for line in groupByCategroyld.collect(): print(line) step 7 : Now sort the data in each category based on price in ascending order. # sorted is a function to sort an iterable, we can also specify, what would be the Key on which we want to sort in this case we have price on which it needs to be sorted. groupByCategroyld.map(lambda tuple: sorted(tuple[1], key=lambda tupleValue: tupleValue[2])).take(5) Step 8 : Now sort the data in each category based on price in descending order. # sorted is a function to sort an iterable, we can also specify, what would be the Key on which we want to sort in this case we have price which it needs to be sorted. on groupByCategroyld.map(lambda tuple: sorted(tuple[1], key=lambda tupleValue: tupleValue[2] , reverse=True)).take(5)
Question # 3
Problem Scenario 54 : You have been given below code snippet. val a = sc.parallelize(List("dog", "tiger", "lion", "cat", "panther", "eagle")) val b = a.map(x => (x.length, x)) operation1 Write a correct code snippet for operationl which will produce desired output, shown below. Array[(lnt, String)] = Array((4,lion), (7,panther), (3,dogcat), (5,tigereagle))
|
Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : b.foidByKey("")(_ + J.collect foldByKey [Pair] Very similar to fold, but performs the folding separately for each key of the RDD. This function is only available if the RDD consists of two-component tuples Listing Variants def foldByKey(zeroValue: V)(func: (V, V) => V): RDD[(K, V}] def foldByKey(zeroValue: V, numPartitions: lnt)(func: (V, V) => V): RDD[(K, V)] def foldByKey(zeroValue: V, partitioner: Partitioner)(func: (V, V) => V): RDD[(K, V}]
Question # 4
Problem Scenario 85 : In Continuation of previous question, please accomplish following activities. 1. Select all the columns from product table with output header as below. productID AS ID code AS Code name AS Description price AS 'Unit Price' 2. Select code and name both separated by ' -' and header name should be Product Description'. 3. Select all distinct prices. 4. Select distinct price and name combination. 5. Select all price data sorted by both code and productID combination. 6. count number of products. 7. Count number of products for each code.
|
Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution : Step 1 : Select all the columns from product table with output header as below. productID AS ID code AS Code name AS Description price AS "Unit Price' val results = sqlContext.sql(......SELECT productID AS ID, code AS Code, name AS Description, price AS Unit Price' FROM products ORDER BY ID""" results.show() Step 2 : Select code and name both separated by ' -' and header name should be "Product Description. val results = sqlContext.sql(......SELECT CONCAT(code,' -', name) AS Product Description, price FROM products""" ) results.showQ Step 3 : Select all distinct prices. val results = sqlContext.sql(......SELECT DISTINCT price AS Distinct Price" FROM products......) results.show() Step 4 : Select distinct price and name combination. val results = sqlContext.sql(......SELECT DISTINCT price, name FROM products""" ) results. showQ Step 5 : Select all price data sorted by both code and productID combination. val results = sqlContext.sql('.....SELECT' FROM products ORDER BY code, productID'.....) results.show() Step 6 : count number of products. val results = sqlContext.sql(......SELECT COUNT(') AS 'Count' FROM products......) results.show() Step 7 : Count number of products for each code. val results = sqlContext.sql(......SELECT code, COUNT('} FROM products GROUP BY code......) results. showQ val results = sqlContext.sql(......SELECT code, COUNT('} AS count FROM products GROUP BY code ORDER BY count DESC......) results. showQ
Question # 5
Problem Scenario 74 : You have been given MySQL DB with following details. user=retail_dba password=cloudera database=retail_db table=retail_db.orders table=retail_db.order_items jdbc URL = jdbc:mysql://quickstart:3306/retail_db Columns of order table : (orderjd , order_date , ordercustomerid, order status} Columns of orderjtems table : (order_item_td , order_item_order_id , order_item_product_id, order_item_quantity,order_item_subtotal,order_item_product_price) Please accomplish following activities. 1. Copy "retaildb.orders" and "retaildb.orderjtems" table to hdfs in respective directory p89_orders and p89_order_items . 2. Join these data using orderjd in Spark and Python 3. Now fetch selected columns from joined data Orderld, Order date and amount collected on this order. 4. Calculate total order placed for each date, and produced the output sorted by date.
|
Answer: See the explanation for Step by Step Solution and configuration. Explanation: Solution: Step 1 : Import Single table . sqoop import -connect jdbc:mysql://quickstart:3306/retail_db -username=retail_dba - password=cloudera -table=orders -target-dir=p89_orders - -m1 sqoop import -connect jdbc:mysql://quickstart:3306/retail_db -username=retail_dba - password=cloudera -table=order_items ~target-dir=p89_ order items -m 1 Note : Please check you dont have space between before or after '=' sign. Sqoop uses the MapReduce framework to copy data from RDBMS to hdfs Step 2 : Read the data from one of the partition, created using above command, hadoopfs -cat p89_orders/part-m-00000 hadoop fs -cat p89_order_items/part-m-00000 Step 3 : Load these above two directory as RDD using Spark and Python (Open pyspark terminal and do following). orders = sc.textFile("p89_orders") orderitems = sc.textFile("p89_order_items") Step 4 : Convert RDD into key value as (orderjd as a key and rest of the values as a value) #First value is orderjd ordersKeyValue = orders.map(lambda line: (int(line.split(",")[0]), line)) #Second value as an Orderjd orderltemsKeyValue = orderltems.map(lambda line: (int(line.split(",")[1]), line)) Step 5 : Join both the RDD using orderjd joinedData = orderltemsKeyValue.join(ordersKeyValue) #print the joined data tor line in joinedData.collect(): print(line) Format of joinedData as below. [Orderld, 'All columns from orderltemsKeyValue', 'All columns from orders Key Value'] Step 6 : Now fetch selected values Orderld, Order date and amount collected on this order. revenuePerOrderPerDay = joinedData.map(lambda row: (row[0]( row[1][1].split(",")[1]( f!oat(row[1][0].split('\M}[4]}}} #printthe result for line in revenuePerOrderPerDay.collect(): print(line) Step 7 : Select distinct order ids for each date. #distinct(date,order_id) distinctOrdersDate = joinedData.map(lambda row: row[1][1].split('\")[1] + "," + str(row[0])).distinct() for line in distinctOrdersDate.collect(): print(line) Step 8 : Similar to word count, generate (date, 1) record for each row. newLineTuple = distinctOrdersDate.map(lambda line: (line.split(",")[0], 1)) Step 9 : Do the count for each key(date), to get total order per date. totalOrdersPerDate = newLineTuple.reduceByKey(lambda a, b: a + b} #print results for line in totalOrdersPerDate.collect(): print(line) step 10 : Sort the results by date sortedData=totalOrdersPerDate.sortByKey().collect() #print results for line in sortedData: print(line)
Helping People Grow Their Careers
1. Updated CCA Spark and Hadoop Developer Exam Dumps Questions
2. Free CCA175 Updates for 90 days
3. 24/7 Customer Support
4. 96% Exam Success Rate
5. CCA175 Cloudera Dumps PDF Questions & Answers are Compiled by Certification Experts
6. CCA Spark and Hadoop Developer Dumps Questions Just Like on the Real Exam Environment
7. Live Support Available for Customer Help
8. Verified Answers
9. Cloudera Discount Coupon Available on Bulk Purchase
10. Pass Your CCA Spark and Hadoop Developer Exam Exam Easily in First Attempt
11. 100% Exam Passing Assurance
-->
|