perf(5-1.py):优化 token 权重计算逻辑- 修改了计算 token 权重乘积的方式,仅在 token同时存在于 Amazon 和 Google 数据中时进行计算
- 这样可以减少不必要的计算,提高代码执行效率
This commit is contained in:
parent
38917b896f
commit
036a740505
5
5-1.py
5
5-1.py
@ -131,9 +131,8 @@ def fast_cosine_similarity(record):
|
||||
tokens = record[1]
|
||||
|
||||
# 使用 .get() 方法来安全地访问字典中的元素,避免 KeyError
|
||||
s = sum([amazon_weights_broadcast.value[amazon_id].get(token, 0) * google_weights_broadcast.value[google_url].get(
|
||||
token, 0)
|
||||
for token in tokens])
|
||||
s = sum([amazon_weights_broadcast.value[amazon_id].get(token, 0) * google_weights_broadcast.value[google_url].get(token, 0)
|
||||
for token in tokens if token in amazon_weights_broadcast.value[amazon_id] and token in google_weights_broadcast.value[google_url]])
|
||||
|
||||
# 使用广播变量计算余弦相似度
|
||||
value = s / (amazon_norms_broadcast.value[amazon_id] * google_norms_broadcast.value[google_url])
|
||||
|
Loading…
Reference in New Issue
Block a user