Национальный цифровой ресурс Руконт - межотраслевая электронная библиотека (ЭБС) на базе технологии Контекстум (всего произведений: 634942)
Контекстум
Руконтекст антиплагиат система
0   0
Первый авторZnamenskij
Страниц5
ID453704
АннотацияThe ROUGE-W algorithm to calculate the similarity of texts is referred in more than 500 scientific publications since 2004. The power of the algorithm depends on the weight function choice. An optimal selection of the weight function is studied. The weight functions used previously are far from optimality. An example of incorrect output of the algorithm is provided. Simple changes are described to ensure the expected result.
УДК519.686
Znamenskij, SergejV. Simple Essential Improvements to the ROUGE-W Algorithm / SergejV. Znamenskij // Журнал Сибирского федерального университета. Математика и физика. Journal of Siberian Federal University, Mathematics & Physics .— 2015 .— №4 .— С. 123-127 .— URL: https://rucont.ru/efd/453704 (дата обращения: 03.05.2024)

Предпросмотр (выдержки из произведения)

Mathematics & Physics 2015, 8(4), 497–501 УДК 519.686 Simple Essential Improvements to the ROUGE-W Algorithm Sergej V. Znamenskij∗ Ailamazyan Program Systems Institute of RAS Peter the First Street, 4, Veskovo village Pereslavl area, Yaroslavl region, 152021 Russia Received 10.10.2015, received in revised form 01.11.2015, accepted 16.11.2015 The ROUGE-W algorithm to calculate the similarity of texts is referred in more than 500 scientific publications since 2004. <...> The power of the algorithm depends on the weight function choice. <...> An optimal selection of the weight function is studied. <...> Simple changes are described to ensure the expected result. <...> Keywords: sequence alignment, longest common subsequence, ROUGE-W, edit distance, string similarity, optimization, complexity bounds. <...> String is the finite sequence of letters X = (x1, . . . ,xn) that can be considered as a function X: 1,n→Σ returning a letter located at the given position. <...> The well known sequence alignment problem is to find the most valuable common subsequence for any given strings. <...> A common subsequence P became the Longest Common Subsequence (LCS) if "the most valuable" means "with the maximal possible length k". <...> Since 70th it is well known that such meaning does not meet practical needs [1]: when alignment intended to identify common part and difference in computer logs, LCS often finds unnaturally fragmented common part bearing a lot of frequently used letters being sporadically aligned. <...> For example, the option pbe a refversenced being revfersenced has the common sequence of the length equal to 11 against of only 10 symbols observed for be a reversed preference being reversed that is much better for text editing. <...> A lot of other applications (partially mentioned in [2]) make reason to consider a long string or a chain of close longer strings to be preferably found in a common part than just a long list of very short senseless matches as in this example of edit distance for texts. <...> Comparison of string (A) to strings (B) and (C) shown below (A) preference being reversed, ∗svz@latex.pereslavl.ru ⃝ Siberian Federal University. <...> All rights reserved c – 497 – Sergej V. Znamenskij (B) be a reversed <...>

Облако ключевых слов *


* - вычисляется автоматически
Антиплагиат система на базе ИИ