perf(5-1.py):优化 token 权重计算逻辑- 修改了计算 token 权重乘积的方式,仅在 token同时存在于 Amazon 和 Google 数据中时进行计算

- 这样可以减少不必要的计算,提高代码执行效率
This commit is contained in:
fly6516 2025-04-20 03:01:42 +08:00
parent 38917b896f
commit 036a740505

5
5-1.py
View File

@ -131,9 +131,8 @@ def fast_cosine_similarity(record):
tokens = record[1]
# 使用 .get() 方法来安全地访问字典中的元素,避免 KeyError
s = sum([amazon_weights_broadcast.value[amazon_id].get(token, 0) * google_weights_broadcast.value[google_url].get(
token, 0)
for token in tokens])
s = sum([amazon_weights_broadcast.value[amazon_id].get(token, 0) * google_weights_broadcast.value[google_url].get(token, 0)
for token in tokens if token in amazon_weights_broadcast.value[amazon_id] and token in google_weights_broadcast.value[google_url]])
# 使用广播变量计算余弦相似度
value = s / (amazon_norms_broadcast.value[amazon_id] * google_norms_broadcast.value[google_url])