免费一级欧美片在线观看网站_国产一区再线_欧美日本一区二区高清播放视频_国产99久久精品一区二区300

COM6511代寫、Python語言編程代做

時間:2024-05-09  來源:  作者: 我要糾錯



COM4511/COM6511 Speech Technology - Practical Exercise -
Keyword Search
Anton Ragni
Note that for any module assignment full marks will only be obtained for outstanding performance that
goes well beyond the questions asked. The marks allocated for each assignment are 20%. The marks will be
assigned according to the following general criteria. For every assignment handed in:
1. Fulfilling the basic requirements (5%)
Full marks will be given to fulfilling the work as described, in source code and results given.
2. Submitting high quality documentation (5%)
Full marks will be given to a write-up that is at the highest standard of technical writing and illustration.
3. Showing good reasoning (5%) Full marks will be given if the experiments and the outcomes are explained to the best standard.
4. Going beyond what was asked (5%)
Full marks will be given for interesting ideas on how to extend work that are well motivated and
described.
1 Background
The aim of this task is to build and investigate the simplest form of a keyword search (KWS) system allowing to find information
in large volumes of spoken data. Figure below shows an example of a typical KWS system which consists of an index and
a search module. The index provides a compact representation of spoken data. Given a set of keywords, the search module
Search Results
Index
Key− words
queries the index to retrieve all possible occurrences ranked according to likelihood. The quality of a KWS is assessed based
on how accurately it can retrieve all true occurrences of keywords.
A number of index representations have been proposed and examined for KWS. Most popular representations are derived
from the output of an automatic speech recognition (ASR) system. Various forms of output have been examined. These differ
in terms of the amount of information retained regarding the content of spoken data. The simplest form is the most likely word
sequence or 1-best. Additional information such as start and end times, and recognition confidence may also be provided for
each word. Given a collection of 1-best sequences, the following index can be constructed
w1 (f1,1, s1,1, e1,1) . . . (f1,n1 , s1,n1 , e1,n1 )
w2 (f1,1, s1,1, e1,1) . . . (f1,n1 , s1,n1 , e1,n1 )
.
.
.
wN (fN,1, sN,1, eN,1) . . . (fN,nN , sN,nN , eN,nN )
(1)
1
where wi is a word, ni is the number of times word wi occurs, fi,j is a file where word wi occurs for the j-th time, si,j and ei,j
is the start and end time. Searching such index for single word keywords can be as simple as finding the correct row (e.g. k)
and returning all possible tuples (fk,1, sk,1, ek,1), . . ., (fk,nk , sk,nk , ek,nk ).
The search module is expected to retrieve all possible keyword occurrences. If ASR makes no mistakes such module
can be created rather trivially. To account for possible retrieval errors, the search module provides each potential occurrence
with a relevance score. Relevance scores reflect confidence in a given occurrence being relevant. Occurrences with extremely
low relevance scores may be eliminated. If these scores are accurate each eliminated occurrence will decrease the number of
false alarms. If not then the number of misses will increase. What exactly an extremely low score is may not be very easy
to determine. Multiple factors may affect a relevance score: confidence score, duration, word confusability, word context,
keyword length. Therefore, simple relevance scores, such as those based on confidence scores, may have a wide dynamic range
and may be incomparable across different keywords. In order to ensure that relevance scores are comparable among different
keywords they need to be calibrated. A simple calibration scheme is called sum-to-one (STO) normalisation
rˆi,j = r
γ
 
i,j
ni
k=1 r
γ
i,k
(2)
where ri,j is an original relevance score for the j-th occurrence of the i-th keyword, γ is a scale enabling to either sharpen or
flatten the distribution of relevance scores. More complex schemes have also been examined. Given a set of occurrences with
associated relevance scores, there are several options available for eliminating spurious occurrences. One popular approach
is thresholding. Given a global or keyword specific threshold any occurrence falling under is eliminated. Simple calibration
schemes such as STO require thresholds to be estimated on a development set and adjusted to different collection sizes. More
complex approaches such as Keyword Specific Thresholding (KST) yield a fixed threshold across different keywords and
collection sizes.
Accuracy of KWS systems can be assessed in multiple ways. Standard approaches include precision (proportion of relevant retrieved occurrences among all retrieved occurrences) and recall (proportion of relevant retrieved occurrences among all
relevant occurrences), mean average precision and term weighted value. A collection of precision and recall values computed
for different thresholds yields a precision-recall (PR) curve. The area under PR curve (AUC) provides a threshold independent summative statistics for comparing different retrieval approaches. The mean average precision (mAP) is another popular,
threshold-independent, precision based metric. Consider a KWS system returning 3 correct and 4 incorrect occurrences arranged according to relevance score as follows: ✓ , ✗ , ✗ , ✓ , ✓ , ✗ , ✗ , where ✓ stands for correct occurrence and ✗ stands
for incorrect occurrence. The average precision at each rank (from 1 to 7) is 1
1 , 0
2 , 0
3 , 2
4 , 3
5 , 0
6 , 0
7 . If the number of true correct
occurrences is 3, the mean average precision for this keyword 0.7. A collection-level mAP can be computed by averaging
keyword specific mAPs. Once a KWS system operates at a reasonable AUC or mAP level it is possible to use term weighted
value (TWV) to assess accuracy of thresholding. The TWV is defined by
TWV(K, θ) = 1 −
 
1
|K|
 
k∈K
Pmiss(k, θ) + βPfa(k, θ)
 
(3)
where k ∈ K is a keyword, Pmiss and Pfa are probabilities of miss and false alarm, β is a penalty assigned to false alarms.
These probabilities can be computed by
Pmiss(k, θ) = Nmiss(k, θ)
Ncorrect(k) (4)
Pfa(k, θ) = Nfa(k, θ)
Ntrial(k) (5)
where N<event> is a number of events. The number of trials is given by
Ntrial(k) = T − Ncorrect(k) (6)
where T is the duration of speech in seconds.
2 Objective
Given a collection of 1-bests, write a code that retrieves all possible occurrences of keyword list provided. Describe the search
process including index format, handling of multi-word keywords, criterion for matching, relevance score calibration and
threshold setting methodology. Write a code to assess retrieval performance using reference transcriptions according to AUC,
mAP and TWV criteria using β = 20. Comment on the difference between these criteria including the impact of parameter β.
Start and end times of hypothesised occurrences must be within 0.5 seconds of true occurrences to be considered for matching.
2
3 Marking scheme
Two critical elements are assessed: retrieval (65%) and assessment (35%). Note: Even if you cannot complete this task as a
whole you can certainly provide a description of what you were planning to accomplish.
1. Retrieval
1.1 Index Write a code that can take provided CTM files (and any other file you deem relevant) and create indices in
your own format. For example, if Python language is used then the execution of your code may look like
python index.py dev.ctm dev.index
where dev.ctm is an CTM file and dev.index is an index.
Marks are distributed based on handling of multi-word keywords
• Efficient handling of single-word keywords
• No ability to handle multi-word keywords
• Inefficient ability to handle multi-word keywords
• Or efficient ability to handle multi-word keywords
1.2 Search Write a code that can take the provided keyword file and index file (and any other file you deem relevant)
and produce a list of occurrences for each provided keyword. For example, if Python language is used then the
execution of your code may look like
python search.py dev.index keywords dev.occ
where dev.index is an index, keywords is a list of keywords, dev.occ is a list of occurrences for each
keyword.
Marks are distributed based on handling of multi-word keywords
• Efficient handling of single-word keywords
• No ability to handle multi-word keywords
• Inefficient ability to handle multi-word keywords
• Or efficient ability to handle multi-word keywords
1.3 Description Provide a technical description of the following elements
• Index file format
• Handling multi-word keywords
• Criterion for matching keywords to possible occurrences
• Search process
• Score calibration
• Threshold setting
2. Assessment Write a code that can take the provided keyword file, the list of found keyword occurrences and the corresponding reference transcript file in STM format and compute the metrics described in the Background section. For
instance, if Python language is used then the execution of your code may look like
python <metric>.py keywords dev.occ dev.stm
where <metric> is one of precision-recall, mAP and TWV, keywords is the provided keyword file, dev.occ is the
list of found keyword occurrences and dev.stm is the reference transcript file.
Hint: In order to simplify assessment consider converting reference transcript from STM file format to CTM file format.
Using indexing and search code above obtain a list of true occurrences. The list of found keyword occurrences then can
be assessed more easily by comparing it with the list of true occurrences rather than the reference transcript file in STM
file format.
2.1 Implementation
• AUC Integrate an existing implementation of AUC computation into your code. For example, for Python
language such implementation is available in sklearn package.
• mAP Write your own implementation or integrate any freely available.
3
• TWV Write your own implementation or integrate any freely available.
2.2 Description
• AUC Plot precision-recall curve. Report AUC value . Discuss performance in the high precision and low
recall area. Discuss performance in the high recall and low precision area. Suggest which keyword search
applications might be interested in a good performance specifically in those two areas (either high precision
and low recall, or high recall and low precision).
• mAP Report mAP value. Report mAP value for each keyword length (1-word, 2-words, etc.). Compare and
discuss differences in mAP values.
• TWV Report TWV value. Report TWV value for each keyword length (1-word, 2-word, etc.). Compare and
discuss differences in TWV values. Plot TWV values for a range of threshold values. Report maximum TWV
value or MTWV. Report actual TWV value or ATWV obtained with a method used for threshold selection.
• Comparison Describe the use of AUC, mAP and TWV in the development of your KWS approach. Compare
these metrics and discuss their advantages and disadvantages.
4 Hand-in procedure
All outcomes, however complete, are to be submitted jointly in a form of a package file (zip/tar/gzip) that includes
directories for each task which contain the associated required files. Submission will be performed via MOLE.
5 Resources
Three resources are provided for this task:
• 1-best transcripts in NIST CTM file format (dev.ctm,eval.ctm). The CTM file format consists of multiple records
of the following form
<F> <H> <T> <D> <W> <C>
where <F> is an audio file name, <H> is a channel, <T> is a start time in seconds, <D> is a duration in seconds, <W> is a
word, <C> is a confidence score. Each record corresponds to one recognised word. Any blank lines or lines starting with
;; are ignored. An excerpt from a CTM file is shown below
7654 A 11.34 0.2 YES 0.5
7654 A 12.00 0.34 YOU 0.7
7654 A 13.30 0.5 CAN 0.1
• Reference transcript in NIST STM file format (dev.stm, eval.stm). The STM file format consists of multiple records
of the following form
<F> <H> <S> <T> <E> <L> <W>...<W>
where <S> is a speaker, <E> is an end time, <L> topic, <W>...<W> is a word sequence. Each record corresponds to
one manually transcribed segment of audio file. An excerpt from a STM file is shown below
2345 A 2345-a 0.10 2.03 <soap> uh huh yes i thought
2345 A 2345-b 2.10 3.04 <soap> dog walking is a very
2345 A 2345-a 3.50 4.59 <soap> yes but it’s worth it
Note that exact start and end times for each word are not available. Use uniform segmentation as an approximation. The
duration of speech in dev.stm and eval.stm is estimated to be 57474.2 and 25694.3 seconds.
• Keyword list keywords. Each keyword contains one or more words as shown below
請加QQ:99515681  郵箱:99515681@qq.com   WX:codinghelp




















 

標簽:

掃一掃在手機打開當前頁
  • 上一篇:EBU6304代寫、Java編程設計代做
  • 下一篇:COM4511代做、代寫Python設計編程
  • 無相關信息
    昆明生活資訊

    昆明圖文信息
    蝴蝶泉(4A)-大理旅游
    蝴蝶泉(4A)-大理旅游
    油炸竹蟲
    油炸竹蟲
    酸筍煮魚(雞)
    酸筍煮魚(雞)
    竹筒飯
    竹筒飯
    香茅草烤魚
    香茅草烤魚
    檸檬烤魚
    檸檬烤魚
    昆明西山國家級風景名勝區
    昆明西山國家級風景名勝區
    昆明旅游索道攻略
    昆明旅游索道攻略
  • 短信驗證碼平臺 理財 WPS下載

    關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 kmw.cc Inc. All Rights Reserved. 昆明網 版權所有
    ICP備06013414號-3 公安備 42010502001045

    免费一级欧美片在线观看网站_国产一区再线_欧美日本一区二区高清播放视频_国产99久久精品一区二区300
    国产成人在线电影| 欧美美女喷水视频| 一区二区三区精品视频在线| 91久久国产综合久久| 亚洲一区二区欧美| 欧美精品日韩综合在线| 美女视频黄a大片欧美| 欧美va亚洲va在线观看蝴蝶网| 国产真实乱子伦精品视频| 欧美高清在线一区| 91网站在线播放| 亚洲a一区二区| 欧美videos中文字幕| 成人午夜大片免费观看| 亚洲女与黑人做爰| 欧美理论在线播放| 国产中文一区二区三区| 国产精品久久久99| 欧美性色aⅴ视频一区日韩精品| 日日摸夜夜添夜夜添亚洲女人| 日韩精品中文字幕一区| 成人小视频免费观看| 亚洲国产日韩在线一区模特| 日韩欧美成人激情| 成人av网站免费观看| 亚洲永久精品国产| 久久―日本道色综合久久| 成人av午夜电影| 天天亚洲美女在线视频| 久久久777精品电影网影网| 91在线一区二区三区| 首页国产丝袜综合| 中文字幕乱码亚洲精品一区| 欧美视频在线不卡| 国产一区二区三区观看| 一区二区三区毛片| 精品成人佐山爱一区二区| 91网站在线观看视频| 麻豆国产一区二区| 亚洲丝袜另类动漫二区| 日韩一卡二卡三卡四卡| 成人国产精品免费| 午夜视频一区二区| 国产日本欧美一区二区| 色哟哟精品一区| 美国十次了思思久久精品导航| 国产精品毛片久久久久久久| 欧美久久一二区| 成人午夜看片网址| 免费的国产精品| 《视频一区视频二区| 日韩精品一区二区三区在线| 韩国成人精品a∨在线观看| 亚洲精品成人天堂一二三| 久久天天做天天爱综合色| 色综合天天性综合| 久久国产精品免费| 亚洲免费成人av| 久久久影院官网| 色综合久久久久久久久久久| 蜜芽一区二区三区| 亚洲精品国产成人久久av盗摄 | 日韩一区二区三| 色综合天天在线| 国产精品99久久久久久似苏梦涵 | 五月综合激情网| 国产精品久久99| 2017欧美狠狠色| 欧美精品久久久久久久多人混战 | 亚洲六月丁香色婷婷综合久久| 欧美成人三级电影在线| 欧洲另类一二三四区| 成人午夜精品在线| 激情综合色丁香一区二区| 午夜av电影一区| 亚洲自拍偷拍图区| 国产精品的网站| 国产亚洲精品久| 日韩欧美第一区| 制服丝袜中文字幕一区| 日本韩国精品在线| 99久久精品免费精品国产| 国产剧情一区二区三区| 麻豆成人久久精品二区三区红| 亚洲香蕉伊在人在线观| 最新成人av在线| 国产精品色哟哟网站| 久久先锋影音av鲁色资源网| 91精品久久久久久久久99蜜臂| 精品视频在线看| 欧美在线高清视频| 色噜噜狠狠色综合中国| 99这里都是精品| yourporn久久国产精品| 成人黄色小视频| 国产成人精品免费一区二区| 国产一区二区三区在线观看免费视频| 免费成人结看片| 日韩精彩视频在线观看| 日韩中文字幕1| 婷婷综合在线观看| 天堂精品中文字幕在线| 亚洲mv大片欧洲mv大片精品| 亚洲小说欧美激情另类| 亚洲午夜一区二区三区| 亚洲小少妇裸体bbw| 亚洲一区二区三区四区五区黄 | 国产成人精品三级麻豆| 国产成人亚洲综合a∨婷婷| 国产一区二区三区久久久| 国产九色精品成人porny| 黄色精品一二区| 韩国精品在线观看| 国产精品一二三四区| 国产高清视频一区| 欧美精品一区二区蜜臀亚洲| 欧美一卡二卡三卡四卡| 日韩一级片网址| 精品国产一区二区三区四区四| 欧美大片国产精品| 26uuu国产一区二区三区| 337p粉嫩大胆噜噜噜噜噜91av| 精品成人私密视频| 国产欧美日韩视频在线观看| 国产精品美女久久久久久2018 | 中文字幕在线视频一区| 亚洲人成影院在线观看| 亚洲在线观看免费视频| 日韩专区欧美专区| 九一九一国产精品| 国产91精品久久久久久久网曝门| 成人国产在线观看| 日本伦理一区二区| 欧美丰满少妇xxxbbb| 26uuu成人网一区二区三区| 国产欧美一区二区精品秋霞影院| 国产精品进线69影院| 亚洲一区中文日韩| 青青草国产成人99久久| 国产一区二区三区精品视频| 成人av午夜影院| 欧美色区777第一页| 日韩欧美国产wwwww| 国产亚洲欧美色| 亚洲黄色免费网站| 日本vs亚洲vs韩国一区三区| 久久精品久久精品| 成人h动漫精品一区二| 精品视频在线免费看| 精品福利二区三区| 日韩理论片中文av| 日本视频在线一区| 国产成人免费高清| 日本乱人伦aⅴ精品| 日韩欧美中文字幕精品| 欧美国产激情一区二区三区蜜月| 一区二区三区高清不卡| 毛片基地黄久久久久久天堂| 国产精品888| 在线欧美一区二区| 日韩免费观看高清完整版| 国产精品久久久久婷婷| 午夜视频在线观看一区二区| 国产精品123区| 在线免费不卡电影| 精品国产网站在线观看| 一区二区三区在线影院| 蜜臀av性久久久久蜜臀aⅴ| 成人精品视频一区二区三区尤物| 欧美亚洲动漫精品| 久久亚洲精品国产精品紫薇| 亚洲精品一卡二卡| 久久精品国产**网站演员| 99热99精品| 日韩一级高清毛片| 国产精品毛片无遮挡高清| 日韩精品一卡二卡三卡四卡无卡| 国产激情91久久精品导航 | 国产精品久久毛片| 日韩精品免费专区| 成人短视频下载| 91精品啪在线观看国产60岁| 国产精品每日更新| 麻豆91免费观看| 99精品热视频| 精品国产百合女同互慰| 一个色在线综合| 国产九色精品成人porny| 欧美日韩国产系列| 国产精品剧情在线亚洲| 理论片日本一区| 欧美亚洲精品一区| 国产精品麻豆欧美日韩ww| 美女视频黄 久久| 欧美色图天堂网| 中文一区二区完整视频在线观看| 日韩高清电影一区| 色素色在线综合| 中文字幕国产一区| 精品一区二区三区在线播放|