91超碰碰碰碰久久久久久综合_超碰av人澡人澡人澡人澡人掠_国产黄大片在线观看画质优化_txt小说免费全本

溫馨提示×

溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊×
其他方式登錄
點擊 登錄注冊 即表示同意《億速云用戶服務條款》

Elasticsearch 搜索打分計算原理淺析

發布時間:2020-03-03 23:44:32 來源:網絡 閱讀:336 作者:baizhihua0809 欄目:開發技術

搜索打分計算幾個關鍵詞

  • TF: token frequency ,某個搜索字段分詞后再document中字段(待搜索的字段)中出現的次數

  • IDF:inverse document frequency,逆文檔頻率,某個搜索的字段在所有document中出現的次數取反

  • TFNORM:token frequency normalized,詞頻歸一化
  • BM25:算法:(freq + k1 * (1 - b + b * dl / avgdl))

兩個文檔如下:

{
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "321697",
        "_score" : 6.6273837,
        "_source" : {
          "title" : "Steve Jobs"
      }
}
{
        "_index" : "movies",
        "_type" : "_doc",
        "_id" : "23706",
        "_score" : 6.0948296,
        "_source" : {
          "title" : "All About Steve"
      }
}

如果我們通過titlematch查詢

GET /movies/_search
{
  "query": {
    "match": {
      "title": "steve"
    }
  }
}

那么從打分結果就可以看出第一個文檔打分高于第二個,這個具體原因是:

TF方面看在帶搜索字段上出現的頻率一致

IDF方面看在整個文檔中出現的頻率一致

TFNORM方面則不一樣了,第一個文檔中該詞占比為1/2,第二個文檔中該詞占比為1/3,故而第一個文檔在該搜索下打分比第二個索引高,所以ES算法時使用了TFNORM計算方式freq / (freq + k1 * (1 - b + b * dl / avgdl))

最后的ES中的TF算法融合了詞頻歸一化BM25

如果我們要查看具體Elasticsearch一個打分算法,則可以通過如下命令展示

GET /movies/_search
{
  // 和MySQL的執行計劃類似
  "explain": true, 
  "query": {
    "match": {
      "title": "steve"
    }
  }
}

執行結果,查看其中一個

{
    "_shard": "[movies][1]",
    "_node": "pqNhgutvQfqcLqLEzIDnbQ",
    "_index": "movies",
    "_type": "_doc",
    "_id": "321697",
    "_score": 6.6273837,
    "_source": {
        "overview": "Set backstage at three iconic product launches and ending in 1998 with the unveiling of the iMac, Steve Jobs takes us behind the scenes of the digital revolution to paint an intimate portrait of the brilliant man at its epicenter.",
        "voteAverage": 6.8,
        "keywords": [
            {
                "id": 5565,
                "name": "biography"
            },
            {
                "id": 6104,
                "name": "computer"
            },
            {
                "id": 15300,
                "name": "father daughter relationship"
            },
            {
                "id": 157935,
                "name": "apple computer"
            },
            {
                "id": 161160,
                "name": "steve jobs"
            },
            {
                "id": 185722,
                "name": "based on true events"
            }
        ],
        "releaseDate": "2015-01-01T00:00:00.000Z",
        "runtime": 122,
        "originalLanguage": "en",
        "title": "Steve Jobs",
        "productionCountries": [
            {
                "iso_3166_1": "US",
                "name": "United States of America"
            }
        ],
        "revenue": 34441873,
        "genres": [
            {
                "id": 18,
                "name": "Drama"
            },
            {
                "id": 36,
                "name": "History"
            }
        ],
        "originalTitle": "Steve Jobs",
        "popularity": 53.670525,
        "tagline": "Can a great man be a good man?",
        "spokenLanguages": [
            {
                "iso_639_1": "en",
                "name": "English"
            }
        ],
        "id": 321697,
        "voteCount": 1573,
        "productionCompanies": [
            {
                "name": "Universal Pictures",
                "id": 33
            },
            {
                "name": "Scott Rudin Productions",
                "id": 258
            },
            {
                "name": "Legendary Pictures",
                "id": 923
            },
            {
                "name": "The Mark Gordon Company",
                "id": 1557
            },
            {
                "name": "Management 360",
                "id": 4220
            },
            {
                "name": "Cloud Eight Films",
                "id": 6708
            }
        ],
        "budget": 30000000,
        "homepage": "http://www.stevejobsthefilm.com",
        "status": "Released"
    },
    -          }
                ]
            }
        ]
    }
}

此時可以看到結果多出了以下的一組數據(執行計劃)

{
    "_explanation": {
        "value": 6.6273837,
        // title字段值steve在所有匹配的1526個文檔中的權重
        "description": "weight(title:steve in 1526) [PerFieldSimilarity], result of:",
        "details": [
            {
                // value = idf.value * tf.value * 2.2
                // 6.6273837 = 6.4412656 * 0.46767938 * 2.2
                "value": 6.6273837,
                "description": "score(freq=1.0), product of:",
                "details": [
                    {
                        "value": 2.2,
                        // 放大因子,這個數值可以在創建索引的時候指定,默認值是2.2
                        "description": "boost",
                        "details": []
                    },
                    {
                        "value": 6.4412656,
                        "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                        "details": [
                            {
                                "value": 2,
                                "description": "n, number of documents containing term",
                                "details": []
                            },
                            {
                                "value": 1567,
                                "description": "N, total number of documents with field",
                                "details": []
                            }
                        ]
                    },
                    {
                        "value": 0.46767938,
                        "description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                        "details": [
                            {
                                "value": 1,
                                "description": "freq, occurrences of term within document",
                                "details": []
                            },
                            // 這塊提現了BM25算法((freq + k1 * (1 - b + b * dl / avgdl)))
                            {
                                "value": 1.2,
                                "description": "k1, term saturation parameter",
                                "details": []
                            },
                            {
                                "value": 0.75,
                                "description": "b, length normalization parameter",
                                "details": []
                            },
                            // 這塊就可以提現出一個歸一化的操作算法
                            {
                                "value": 2,
                                "description": "dl, length of field",
                                "details": []
                            },
                            {
                                "value": 2.1474154,
                                "description": "avgdl, average length of field",
                                "details": []
                            }
                        ]
                    }
                ]
            }
        ]
    }
}
向AI問一下細節

免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。

AI

丹棱县| 中方县| 五原县| 杭锦旗| 德阳市| 剑川县| 新和县| 宜良县| 平武县| 常山县| 德惠市| 耿马| 攀枝花市| 西贡区| 静宁县| 盐亭县| 孟村| 长沙县| 云和县| 紫云| 古田县| 盐池县| 中阳县| 林州市| 灌阳县| 罗平县| 吉木乃县| 象山县| 东兴市| 江油市| 仁布县| 马龙县| 常德市| 桂东县| 无为县| 济阳县| 普陀区| 贵溪市| 杭州市| 延边| 习水县|