免费一级欧美片在线观看网站_国产一区再线_欧美日本一区二区高清播放视频_国产99久久精品一区二区300

IEMS 5730代做、c++,Java語言編程代寫

時間:2024-03-12  來源:  作者: 我要糾錯



IEMS 5730 Spring 2024 Homework 2
Release date: Feb 23, 2024
Due date: Mar 11, 2024 (Monday) 11:59:00 pm
We will discuss the solution soon after the deadline. No late homework will be accepted!
Every Student MUST include the following statement, together with his/her signature in the
submitted homework.
I declare that the assignment submitted on Elearning system is original
except for source material explicitly acknowledged, and that the same or
related material has not been previously submitted for another course. I
also acknowledge that I am aware of University policy and regulations on
honesty in academic work, and of the disciplinary guidelines and
procedures applicable to breaches of such policy and regulations, as
contained in the website
http://www.cuhk.edu.hk/policy/academichonesty/.
Signed (Student_________________________) Date:______________________________
Name_________________________________ SID_______________________________
Submission notice:
● Submit your homework via the elearning system.
● All students are required to submit this assignment.
General homework policies:
A student may discuss the problems with others. However, the work a student turns in must
be created COMPLETELY by oneself ALONE. A student may not share ANY written work or
pictures, nor may one copy answers from any source other than one’s own brain.
Each student MUST LIST on the homework paper the name of every person he/she has
discussed or worked with. If the answer includes content from any other source, the
student MUST STATE THE SOURCE. Failure to do so is cheating and will result in
sanctions. Copying answers from someone else is cheating even if one lists their name(s) on
the homework.
If there is information you need to solve a problem, but the information is not stated in the
problem, try to find the data somewhere. If you cannot find it, state what data you need,
make a reasonable estimate of its value, and justify any assumptions you make. You will be
graded not only on whether your answer is correct, but also on whether you have done an
intelligent analysis.
Submit your output, explanation, and your commands/ scripts in one SINGLE pdf file.
Q1 [20 marks + 5 Bonus marks]: Basic Operations of Pig
You are required to perform some simple analysis using Pig on the n-grams dataset of
Google books. An ‘n-gram’ is a phrase with n words. The dataset lists all n-grams present in
books from books.google.com along with some statistics.
In this question, you only use the Google books bigram (1-grams). Please go to Reference
[1] and [2] to download the two datasets. Each line in these two files has the following format
(TAB separated):
bigram year match_count volume_count
An example for 1-grams would be:
circumvallate 1978 335 91
circumvallate 1979 261 95
This means that in 1978(1979), the word "circumvallate" occurred 335(261) times overall,
from 91(95) distinct books.
(a) [Bonus 5 marks] Install Pig in your Hadoop cluster. You can reuse your Hadoop
cluster in IEMS 5730 HW#0 and refer to the following link to install Pig 0.17.0 over
the master node of your Hadoop cluster :
http://pig.apache.org/docs/r0.17.0/start.html#Pig+Setup
Submit the screenshot(s) of your installation process.
If you choose not to do the bonus question in (a), you can use any well-installed Hadoop
cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7]
to complete the following parts of the question:
(b) [5 marks] Upload these two files to HDFS and join them into one table.
(c) [5 marks] For each unique bigram, compute its average number of occurrences per
year. In the above example, the result is:
circumvallate (335 + 261) / 2 = 298
Notes: The denominator is the number of years in which that word has appeared.
Assume the data set contains all the 1-grams in the last 100 years, and the above
records are the only records for the word ‘circumvallate’. Then the average value is:
(335 + 261) / 2 = 298,
instead of
(335 + 261) / 100 = 5.96
(d) [10 marks] Output the 20 bigrams with the highest average number of occurrences
per year along with their corresponding average values sorted in descending order. If
multiple bigrams have the same average value, write down anyone you like (that is,
break ties as you wish).
You need to write a Pig script to perform this task and save the output into HDFS.
Hints:
● This problem is very similar to the word counting example shown in the lecture notes
of Pig. You can use the code there and just make some minor changes to perform
this task.
Q2 [20 marks + 5 bonus marks]: Basic Operations of Hive
In this question, you are asked to repeat Q1 using Hive and then compare the performance
between Hive and Pig.
(a) [Bonus 5 marks] Install Hive on top of your own Hadoop cluster. You can reuse your
Hadoop cluster in IEMS 5730 HW#0 and refer to the following link to install Hive
2.3.8 over the master node of your Hadoop cluster.
https://cwiki.apache.org/confluence/display/Hive/GettingStarted
Submit the screenshot(s) of your installation process.
If you choose not to do the bonus question in (a), you can use any well-installed Hadoop
cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7].
(b) [20 marks] Write a Hive script to perform exactly the same task as that of Q1 with
the same datasets stored in the HDFS. Rerun the Pig script in this cluster and
compare the performance between Pig and Hive in terms of overall run-time and
explain your observation.
Hints:
● Hive will store its tables on HDFS and those locations needs to be bootstrapped:
$ hdfs dfs -mkdir /tmp
$ hdfs dfs -mkdir /user/hive/warehouse
$ hdfs dfs -chmod g+w /tmp
$ hdfs dfs -chmod g+w /user/hive/warehouse
● While working with the interactive shell (or otherwise), you should first test on a small
subset of the data instead of the whole data set. Once your Hive commands/ scripts
work as desired, you can then run them up on the complete data set.
Q3 [30 marks + 10 Bonus marks]: Similar Users Detection in
the MovieLens Dataset using Pig
Similar user detection has drawn lots of attention in the machine learning field which is
aimed at grouping users with similar interests, behaviors, actions, or general patterns. In this
homework, you will implement a similar-users-detection algorithm for the online movie rating
system. Basically, users who rate similar scores for the same movies may have common
tastes or interests and be grouped as similar users.
To detect similar users, we need to calculate the similarity between each user pair. In this
homework, the similarity between a given pair of users (e.g. A and B) is measured as the
total number of movies both A and B have watched divided by the total number of
movies watched by either A or B. The following is the formal definition of similarity: Let
M(A) be the set of all the movies user A has watched. Then the similarity between user A
and user B is defined as:
………..(**) 𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝐴, 𝐵) =
|𝑀(𝐴)∩𝑀(𝐵)|
|𝑀(𝐴)∪𝑀(𝐵)|
where |S| means the cardinality of set S.
(Note: if |𝑀(𝐴)∪𝑀(𝐵)| = 0, we set the similarity to be 0.)
The following figure illustrates the idea:
Two datasets [3][4] with different sizes are provided by MovieLens. Each user is represented
by its unique userID and each movie is represented by its unique movieID. The format of the
data set is as follows:
<userID>, <movieID>
Write a program in Pig to detect the TOP K similar users for each user. You can use the
cluster you built for Q1 and Q2 or you can use the IE DIC or one provided by the Google
Cloud/AWS [5, 6, 7].
(a) [10 marks] For each pair of users in the dataset [3] and [4], output the number of
movies they have both watched.
For your homework submission, you need to submit i) the Pig script and ii) the
list of the 10 pairs of users having the largest number of movies watched by
both users in the pair within the corresponding dataset. The format of your
answer should be as follows:
請加QQ:99515681  郵箱:99515681@qq.com   WX:codehelp 

標簽:

掃一掃在手機打開當前頁
  • 上一篇:COMP 315代寫、Java程序語言代做
  • 下一篇:代做CSCI 2525、c/c++,Java程序語言代寫
  • 無相關信息
    昆明生活資訊

    昆明圖文信息
    蝴蝶泉(4A)-大理旅游
    蝴蝶泉(4A)-大理旅游
    油炸竹蟲
    油炸竹蟲
    酸筍煮魚(雞)
    酸筍煮魚(雞)
    竹筒飯
    竹筒飯
    香茅草烤魚
    香茅草烤魚
    檸檬烤魚
    檸檬烤魚
    昆明西山國家級風景名勝區
    昆明西山國家級風景名勝區
    昆明旅游索道攻略
    昆明旅游索道攻略
  • 短信驗證碼平臺 理財 WPS下載

    關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 kmw.cc Inc. All Rights Reserved. 昆明網 版權所有
    ICP備06013414號-3 公安備 42010502001045

    免费一级欧美片在线观看网站_国产一区再线_欧美日本一区二区高清播放视频_国产99久久精品一区二区300
    成人免费在线视频| 老色鬼精品视频在线观看播放| 欧洲激情一区二区| 午夜欧美大尺度福利影院在线看 | 久久影视一区二区| 国产成人激情av| 亚洲欧美另类综合偷拍| 欧美日韩在线亚洲一区蜜芽| 日韩成人精品在线观看| 精品国产a毛片| 成人高清视频免费观看| 亚洲综合成人在线| 宅男噜噜噜66一区二区66| 精品一区二区三区在线视频| 精品亚洲欧美一区| 国产精品久久久久国产精品日日| 色综合天天综合狠狠| 婷婷久久综合九色国产成人 | 日韩一级成人av| 国产伦精品一区二区三区视频青涩| 国产精品国产三级国产| 在线观看一区不卡| 久久精品99国产精品| 亚洲国产精品激情在线观看| 欧洲一区在线观看| 精品亚洲aⅴ乱码一区二区三区| 国产欧美一区二区在线观看| 91黄视频在线观看| 奇米影视7777精品一区二区| 欧美极品xxx| 欧美日韩另类一区| 国产成人综合在线观看| 亚洲精品美腿丝袜| 精品国产亚洲在线| 91在线视频在线| 美国三级日本三级久久99| 国产精品青草综合久久久久99| 欧美性xxxxx极品少妇| 国产一区二区免费在线| 一区二区三区不卡在线观看| 337p日本欧洲亚洲大胆精品| 91九色最新地址| 国产在线精品视频| 亚洲综合激情网| 国产亚洲一区字幕| 欧美精品色综合| www.久久久久久久久| 日本在线播放一区二区三区| 国产精品福利影院| 日韩女同互慰一区二区| 91麻豆国产精品久久| 久久99精品久久久| 一区二区成人在线| 国产人成一区二区三区影院| 欧美精品久久99久久在免费线| 高清av一区二区| 日本网站在线观看一区二区三区| 成人综合在线视频| 日韩电影一区二区三区四区| 亚洲丝袜美腿综合| 2023国产精品自拍| 欧美精品自拍偷拍动漫精品| 91一区在线观看| 国产精品资源在线看| 天堂影院一区二区| 亚洲男人的天堂av| 欧美国产激情一区二区三区蜜月| 欧美一级日韩一级| 欧美在线不卡一区| 99久久精品免费| 国产精品亚洲专一区二区三区| 日韩影院免费视频| 伊人夜夜躁av伊人久久| 中文乱码免费一区二区| 精品国产第一区二区三区观看体验 | 国产精品区一区二区三区| 欧美v亚洲v综合ⅴ国产v| 欧美日韩精品一区二区三区四区| 99国产精品久| 成人精品视频一区| 国产美女精品一区二区三区| 欧美96一区二区免费视频| 亚洲一区二区视频在线| 亚洲欧洲在线观看av| 国产亚洲精品bt天堂精选| 日韩视频中午一区| 欧美精品久久一区| 欧美日韩在线播| 在线观看亚洲精品视频| 91视频免费看| 99久久久国产精品| www.成人网.com| 成人免费高清在线| 高清在线不卡av| 国产精品影视在线| 国产另类ts人妖一区二区| 久草精品在线观看| 精品综合久久久久久8888| 日本系列欧美系列| 日韩中文字幕区一区有砖一区| 亚洲高清视频在线| 亚洲成a人v欧美综合天堂 | 国产精品18久久久久久久久久久久| 蜜桃av一区二区三区电影| 日韩成人精品在线观看| 日本不卡在线视频| 美脚の诱脚舐め脚责91| 麻豆专区一区二区三区四区五区| 青青草97国产精品免费观看无弹窗版| 婷婷亚洲久悠悠色悠在线播放 | 99久久国产综合精品麻豆| 成人性生交大片免费看视频在线| 国产99久久精品| 不卡视频在线看| av网站免费线看精品| 91亚洲精品乱码久久久久久蜜桃| 97成人超碰视| 日本高清不卡在线观看| 欧美色视频一区| 日韩影院精彩在线| 久久国产尿小便嘘嘘| 激情久久久久久久久久久久久久久久| 久久99久久99小草精品免视看| 国模套图日韩精品一区二区| 国产精品一卡二卡在线观看| 丁香桃色午夜亚洲一区二区三区| av午夜精品一区二区三区| 色8久久精品久久久久久蜜| 欧美视频一区在线观看| 91精品国产一区二区人妖| 精品日韩在线观看| 国产欧美一区二区精品仙草咪 | 亚洲第一av色| 久久国产尿小便嘘嘘| 国产福利精品导航| 91丝袜美腿高跟国产极品老师 | 国产欧美精品在线观看| 亚洲欧洲日韩综合一区二区| 玉足女爽爽91| 日韩电影在线一区二区三区| 国产一区二区在线电影| va亚洲va日韩不卡在线观看| 欧美主播一区二区三区| 91精品国产综合久久婷婷香蕉 | 久久99久国产精品黄毛片色诱| 国产福利不卡视频| 一本到不卡精品视频在线观看| 欧美久久高跟鞋激| 久久精品人人做人人综合| 亚洲欧美电影院| 日本成人中文字幕在线视频| 国产精品77777| 在线精品亚洲一区二区不卡| 日韩视频一区在线观看| 国产欧美日韩中文久久| 一区二区三区 在线观看视频| 免费黄网站欧美| 成人精品高清在线| 欧美日韩国产综合一区二区三区 | 丰满岳乱妇一区二区三区| 在线观看免费亚洲| 精品国产乱码久久久久久1区2区| 日韩一区在线看| 蜜臀av国产精品久久久久| 成人白浆超碰人人人人| 欧美美女一区二区| 欧美国产激情二区三区| 亚洲福利视频导航| 国产精品主播直播| 欧美视频你懂的| 国产网红主播福利一区二区| 亚洲国产日韩a在线播放性色| 国产在线精品免费av| 在线国产亚洲欧美| 久久老女人爱爱| 亚洲自拍另类综合| 国产九色精品成人porny| 欧美伊人久久大香线蕉综合69| 久久亚洲一区二区三区四区| 亚洲视频一区二区在线观看| 精品一区二区三区久久| 在线视频一区二区三区| 国产三级一区二区| 石原莉奈一区二区三区在线观看| 国产精品美女久久久久久久久| 午夜日韩在线观看| caoporen国产精品视频| 精品少妇一区二区三区日产乱码| 亚洲精品亚洲人成人网| 国产九色精品成人porny| 8x福利精品第一导航| 亚洲三级免费观看| 国产一区不卡视频| 欧美精品久久天天躁| 日韩美女视频一区二区| 国产乱码字幕精品高清av| 欧美男男青年gay1069videost| 中文字幕日韩av资源站| 国产精品资源网站| 日韩一级片在线播放|