免费一级欧美片在线观看网站_国产一区再线_欧美日本一区二区高清播放视频_国产99久久精品一区二区300

代做IEMS 5730、代寫 c++,Java 程序設計

時間:2024-03-11  來源:  作者: 我要糾錯



IEMS 5730 Spring 2024 Homework 2
Release date: Feb 23, 2024
Due date: Mar 11, 2024 (Monday) 11:59:00 pm
We will discuss the solution soon after the deadline. No late homework will be accepted!
Every Student MUST include the following statement, together with his/her signature in the submitted homework.
I declare that the assignment submitted on Elearning system is original except for source material explicitly acknowledged, and that the same or related material has not been previously submitted for another course. I also acknowledge that I am aware of University policy and regulations on honesty in academic work, and of the disciplinary guidelines and procedures applicable to breaches of such policy and regulations, as contained in the website http://www.cuhk.edu.hk/policy/academichonesty/.
Signed (Student_________________________) Date:______________________________ Name_________________________________ SID_______________________________
Submission notice:
● Submit your homework via the elearning system.
● All students are required to submit this assignment.
General homework policies:
A student may discuss the problems with others. However, the work a student turns in must be created COMPLETELY by oneself ALONE. A student may not share ANY written work or pictures, nor may one copy answers from any source other than one’s own brain.
Each student MUST LIST on the homework paper the name of every person he/she has discussed or worked with. If the answer includes content from any other source, the student MUST STATE THE SOURCE. Failure to do so is cheating and will result in sanctions. Copying answers from someone else is cheating even if one lists their name(s) on the homework.
If there is information you need to solve a problem, but the information is not stated in the problem, try to find the data somewhere. If you cannot find it, state what data you need, make a reasonable estimate of its value, and justify any assumptions you make. You will be graded not only on whether your answer is correct, but also on whether you have done an intelligent analysis.
Submit your output, explanation, and your commands/ scripts in one SINGLE pdf file.

 Q1 [20 marks + 5 Bonus marks]: Basic Operations of Pig
You are required to perform some simple analysis using Pig on the n-grams dataset of Google books. An ‘n-gram’ is a phrase with n words. The dataset lists all n-grams present in books from books.google.com along with some statistics.
In this question, you only use the Google books bigram (1-grams). Please go to Reference [1] and [2] to download the two datasets. Each line in these two files has the following format (TAB separated):
bigram year match_count
An example for 1-grams would be:
volume_count
circumvallate 1978 335 91 circumvallate 1979 261 95
This means that in 1978(1979), the word "circumvallate" occurred 335(261) times overall, from 91(95) distinct books.
(a) [Bonus 5 marks] Install Pig in your Hadoop cluster. You can reuse your Hadoop cluster in IEMS 5730 HW#0 and refer to the following link to install Pig 0.17.0 over the master node of your Hadoop cluster :
http://pig.apache.org/docs/r0.17.0/start.html#Pig+Setup
Submit the screenshot(s) of your installation process.
If you choose not to do the bonus question in (a), you can use any well-installed Hadoop cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7] to complete the following parts of the question:
(b) [5 marks] Upload these two files to HDFS and join them into one table.
(c) [5 marks] For each unique bigram, compute its average number of occurrences per year. In the above example, the result is:
circumvallate (335 + 261) / 2 = 298
Notes: The denominator is the number of years in which that word has appeared. Assume the data set contains all the 1-grams in the last 100 years, and the above records are the only records for the word ‘circumvallate’. Then the average value is:
 instead of
(335 + 261) / 2 = 298, (335 + 261) / 100 = 5.96
(d) [10 marks] Output the 20 bigrams with the highest average number of occurrences per year along with their corresponding average values sorted in descending order. If multiple bigrams have the same average value, write down anyone you like (that is,

 break ties as you wish).
You need to write a Pig script to perform this task and save the output into HDFS.
Hints:
● This problem is very similar to the word counting example shown in the lecture notes
of Pig. You can use the code there and just make some minor changes to perform this task.
Q2 [20 marks + 5 bonus marks]: Basic Operations of Hive
In this question, you are asked to repeat Q1 using Hive and then compare the performance between Hive and Pig.
(a) [Bonus 5 marks] Install Hive on top of your own Hadoop cluster. You can reuse your Hadoop cluster in IEMS 5730 HW#0 and refer to the following link to install Hive 2.3.8 over the master node of your Hadoop cluster.
https://cwiki.apache.org/confluence/display/Hive/GettingStarted
Submit the screenshot(s) of your installation process.
If you choose not to do the bonus question in (a), you can use any well-installed Hadoop cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7].
(b) [20 marks] Write a Hive script to perform exactly the same task as that of Q1 with the same datasets stored in the HDFS. Rerun the Pig script in this cluster and compare the performance between Pig and Hive in terms of overall run-time and explain your observation.
Hints:
● Hive will store its tables on HDFS and those locations needs to be bootstrapped:
$ hdfs dfs -mkdir /tmp
$ hdfs dfs -mkdir /user/hive/warehouse
$ hdfs dfs -chmod g+w /tmp
$ hdfs dfs -chmod g+w /user/hive/warehouse
● While working with the interactive shell (or otherwise), you should first test on a small subset of the data instead of the whole data set. Once your Hive commands/ scripts work as desired, you can then run them up on the complete data set.
 
 Q3 [30 marks + 10 Bonus marks]: Similar Users Detection in the MovieLens Dataset using Pig
Similar user detection has drawn lots of attention in the machine learning field which is aimed at grouping users with similar interests, behaviors, actions, or general patterns. In this homework, you will implement a similar-users-detection algorithm for the online movie rating system. Basically, users who rate similar scores for the same movies may have common tastes or interests and be grouped as similar users.
To detect similar users, we need to calculate the similarity between each user pair. In this homework, the similarity between a given pair of users (e.g. A and B) is measured as the total number of movies both A and B have watched divided by the total number of movies watched by either A or B. The following is the formal definition of similarity: Let M(A) be the set of all the movies user A has watched. Then the similarity between user A and user B is defined as:
𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝐴, 𝐵) = |𝑀(𝐴)∩𝑀(𝐵)| ...........(**) |𝑀(𝐴)∪𝑀(𝐵)|
where |S| means the cardinality of set S.
(Note: if |𝑀(𝐴)∪𝑀(𝐵)| = 0, we set the similarity to be 0.)
The following figure illustrates the idea:
Two datasets [3][4] with different sizes are provided by MovieLens. Each user is represented by its unique userID and each movie is represented by its unique movieID. The format of the data set is as follows:
<userID>, <movieID>
Write a program in Pig to detect the TOP K similar users for each user. You can use the
  
 cluster you built for Q1 and Q2 or you can use the IE DIC or one provided by the Google Cloud/AWS [5, 6, 7].
(a) [10 marks] For each pair of users in the dataset [3] and [4], output the number of movies they have both watched.
For your homework submission, you need to submit i) the Pig script and ii) the list of the 10 pairs of users having the largest number of movies watched by both users in the pair within the corresponding dataset. The format of your answer should be as follows:
<userID A>, <userID B>, <the number of movie both A and B have watched> //top 1 ...
<userID X>, <userID Y>, <the number of movie both X and Y have watched> //top 10
(b) [20 marks] By modifying/ extending part of your codes in part (a), find the Top-K (K=3) most similar users (as defined by Equation (**)) for every user in the datasets [3], [4]. If multiple users have the same similarity, you can just pick any three of them.
(c)
Hint:
1. In part (b), to facilitate the computation of the similarity measure as
defined in (**), you can use the inclusion-exclusion principle, i.e.
請加QQ:99515681  郵箱:99515681@qq.com   WX:codehelp 

標簽:

掃一掃在手機打開當前頁
  • 上一篇:&#160;ICT239 代做、代寫 java/c/c++程序
  • 下一篇:代寫COMP9334 Capacity Planning of Computer
  • 無相關信息
    昆明生活資訊

    昆明圖文信息
    蝴蝶泉(4A)-大理旅游
    蝴蝶泉(4A)-大理旅游
    油炸竹蟲
    油炸竹蟲
    酸筍煮魚(雞)
    酸筍煮魚(雞)
    竹筒飯
    竹筒飯
    香茅草烤魚
    香茅草烤魚
    檸檬烤魚
    檸檬烤魚
    昆明西山國家級風景名勝區
    昆明西山國家級風景名勝區
    昆明旅游索道攻略
    昆明旅游索道攻略
  • 短信驗證碼平臺 理財 WPS下載

    關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 kmw.cc Inc. All Rights Reserved. 昆明網 版權所有
    ICP備06013414號-3 公安備 42010502001045

    免费一级欧美片在线观看网站_国产一区再线_欧美日本一区二区高清播放视频_国产99久久精品一区二区300
    一区二区视频免费在线观看| 日韩精品一二三区| 色欧美片视频在线观看| 自拍av一区二区三区| 91女厕偷拍女厕偷拍高清| 亚洲日本丝袜连裤袜办公室| 91色视频在线| 一区二区三区在线视频观看58| 在线视频一区二区三区| 亚洲一区二区视频在线观看| 欧美日韩在线观看一区二区| 五月天亚洲婷婷| 日韩欧美aaaaaa| 国产另类ts人妖一区二区| 日本一区二区成人在线| 成人高清在线视频| 亚洲精品中文字幕在线观看| 欧美视频完全免费看| 蜜桃av一区二区在线观看| 精品处破学生在线二十三| 国产91精品免费| 亚洲色图制服诱惑| 欧美日本韩国一区二区三区视频| 日本不卡免费在线视频| 久久久久久夜精品精品免费| 成人免费福利片| 夜夜精品视频一区二区 | 91丨porny丨户外露出| 亚洲国产精品久久艾草纯爱| 日韩一区二区电影在线| 国产精品1024久久| 亚洲精品高清视频在线观看| 3d动漫精品啪啪一区二区竹菊| 久久99热狠狠色一区二区| 国产精品午夜免费| 欧美视频中文一区二区三区在线观看| 免费在线观看不卡| 国产精品色噜噜| 欧美日韩免费高清一区色橹橹| 乱中年女人伦av一区二区| 国产精品久久久久久久岛一牛影视| 欧美三级电影在线观看| 韩国精品免费视频| 亚洲人成网站在线| 日韩精品自拍偷拍| 91啪九色porn原创视频在线观看| 日本大胆欧美人术艺术动态| 国产精品美女久久久久aⅴ国产馆| 欧美亚洲免费在线一区| 国产伦精品一区二区三区视频青涩 | 成人国产精品免费网站| 天天色天天爱天天射综合| 久久众筹精品私拍模特| 在线观看日韩av先锋影音电影院| 久久精品国产久精国产| 亚洲视频精选在线| 精品国产乱码91久久久久久网站| 91丨九色丨蝌蚪富婆spa| 久久99国产精品成人| 亚洲卡通动漫在线| 国产色婷婷亚洲99精品小说| 欧美亚洲动漫精品| 懂色av一区二区三区蜜臀| 五月综合激情网| 日韩美女视频一区二区| 欧美本精品男人aⅴ天堂| 91成人在线观看喷潮| 国产黄色精品网站| 婷婷丁香久久五月婷婷| 国产精品毛片久久久久久| 日韩一级成人av| 在线免费不卡电影| 粉嫩蜜臀av国产精品网站| 五月天网站亚洲| 亚洲图片你懂的| 久久久久久久一区| 欧美一区三区四区| 在线观看一区不卡| 不卡的电影网站| 国产又黄又大久久| 天堂va蜜桃一区二区三区漫画版 | 国产精品久久毛片| 久久蜜桃av一区精品变态类天堂 | 欧美一区二区三区小说| 欧洲精品一区二区三区在线观看| 大尺度一区二区| 精品一区二区三区在线观看国产 | 亚洲成人一区二区在线观看| 中文字幕在线不卡| 久久久欧美精品sm网站| 日韩一区二区免费在线电影| 欧美色图免费看| 91蜜桃免费观看视频| 懂色av一区二区夜夜嗨| 国产一区二区三区精品视频| 免费一级片91| 日韩精品成人一区二区三区| 亚洲美女一区二区三区| 综合激情成人伊人| 国产精品沙发午睡系列990531| 26uuu色噜噜精品一区二区| 日韩一级视频免费观看在线| 3d动漫精品啪啪一区二区竹菊| 在线视频欧美精品| 色噜噜久久综合| 色婷婷久久久亚洲一区二区三区 | 亚洲成年人网站在线观看| 亚洲精品视频一区二区| 1024成人网| 国产精品剧情在线亚洲| 欧美高清在线精品一区| 日本一区二区三区视频视频| 国产日产欧美一区| 久久综合狠狠综合久久综合88| 欧美tk—视频vk| 亚洲精品一区在线观看| 精品少妇一区二区三区在线播放| 欧美一区二区视频在线观看2020| 欧美欧美欧美欧美| 在线播放中文字幕一区| 欧美高清性hdvideosex| 91麻豆精品国产综合久久久久久| 欧美日本精品一区二区三区| 欧美精品一级二级| 日韩一区二区三区免费看| 日韩免费高清视频| 久久久久一区二区三区四区| 久久久99精品久久| 欧美国产1区2区| 日韩伦理av电影| 夜夜嗨av一区二区三区网页| 亚洲成av人片一区二区三区| 五月综合激情网| 奇米影视一区二区三区小说| 精品一区二区三区视频在线观看| 精品一区二区免费| 国产不卡高清在线观看视频| 成人国产电影网| 91久久精品午夜一区二区| 欧美色男人天堂| 欧美一区二区福利在线| 精品久久久久久久久久久院品网| 久久先锋影音av鲁色资源网| 欧美韩国日本一区| 亚洲欧美成人一区二区三区| 亚洲一级二级三级在线免费观看| 婷婷激情综合网| 国产揄拍国内精品对白| 成人av午夜影院| 欧美午夜视频网站| 日韩写真欧美这视频| 久久久久久日产精品| 成人欧美一区二区三区在线播放| 亚洲三级视频在线观看| 天堂久久一区二区三区| 国产在线视频不卡二| 成人av资源在线| 精品视频1区2区| xf在线a精品一区二区视频网站| 国产欧美va欧美不卡在线| 亚洲精品v日韩精品| 日本va欧美va精品| 国产成人精品综合在线观看| 色哟哟日韩精品| 日韩欧美黄色影院| 国产精品欧美精品| 午夜激情久久久| 国产福利电影一区二区三区| 色哟哟国产精品免费观看| 欧美一区二区在线播放| 欧美经典一区二区三区| 一区二区三区成人| 精一区二区三区| 一本久久精品一区二区| 日韩丝袜美女视频| 中文在线资源观看网站视频免费不卡 | 亚洲欧美另类图片小说| 麻豆成人av在线| 91丝袜高跟美女视频| 91麻豆精品国产91久久久久久 | 日韩一级黄色片| 成人免费小视频| 久久国产成人午夜av影院| 91在线视频在线| 日韩美女主播在线视频一区二区三区| 中文字幕视频一区二区三区久| 日韩成人一区二区| 成人动漫精品一区二区| 日韩欧美成人午夜| 一区二区三区美女视频| 国产在线播放一区| 欧美日韩www| 国产精品免费网站在线观看| 日韩制服丝袜先锋影音| 91视频91自| 国产亚洲成年网址在线观看| 亚洲国产日韩a在线播放性色| 粉嫩av一区二区三区粉嫩| 日韩一区二区三区视频在线| 一区二区三区欧美在线观看|