elasticsearch中文分词集成怎么实现

225次阅读

没有评论

共计 1864 个字符，预计需要花费 5 分钟才能阅读完成。

本篇内容介绍了“elasticsearch 中文分词集成怎么实现”的有关知识，在实际案例的操作过程中，不少人都会遇到这样的困境，接下来就让丸趣 TV 小编带领大家学习一下如何处理这些情况吧！希望大家仔细阅读，能够学有所成！

对于索引可能最关系的就是分词了一般对于 es 来说默认的 smartcn 但效果不是很好

一个是 ik 的，一个是 mmseg 的，下面分别介绍下两者的用法，其实都差不多的，先安装插件，命令行：

安装 ik 插件

plugin -install medcl/elasticsearch-analysis-ik/1.1.0

下载 ik 相关配置词典文件到 config 目录

unzip ik.zip

rm ik.zip

分词配置

ik 分词配置，在 elasticsearch.yml 文件中加上

index:
 analysis: 
 analyzer: 
 ik:
 alias: [ik_analyzer]
 type: org.elasticsearch.index.analysis.IkAnalyzerProvider

或

index.analysis.analyzer.ik.type : “ik”

安装 mmseg 插件：

bin/plugin -install medcl/elasticsearch-analysis-mmseg/1.1.0

下载相关配置词典文件到 config 目录

cd config

wget http://github.com/downloads/medcl/elasticsearch-analysis-mmseg/mmseg.zip –no-check-certificate

unzip mmseg.zip

rm mmseg.zip

mmseg 分词配置，也是在在 elasticsearch.yml 文件中

index:
 analysis:
 analyzer:
 mmseg:
 alias: [news_analyzer, mmseg_analyzer]
 type: org.elasticsearch.index.analysis.MMsegAnalyzerProvider

或

index.analysis.analyzer.default.type :  mmseg

mmseg 分词还有些更加个性化的参数设置如下

index:
 analysis:
 tokenizer:
 mmseg_maxword:
 type: mmseg
 seg_type:  max_word 
 mmseg_complex:
 type: mmseg
 seg_type:  complex 
 mmseg_simple:
 type: mmseg
 seg_type:  simple

这样配置完后插件安装完成，启动 es 就会加载插件。

定义 mapping

在添加索引的 mapping 时就可以这样定义分词器

{
  page :{
  properties :{
  title :{
  type : string ,
  indexAnalyzer : ik ,
  searchAnalyzer : ik 
 },
  content :{
  type : string ,
  indexAnalyzer : ik ,
  searchAnalyzer : ik 
 }
 }
 }
}

indexAnalyzer 为索引时使用的分词器，searchAnalyzer 为搜索时使用的分词器。

java mapping 代码如下：

XContentBuilder content = XContentFactory.jsonBuilder().startObject()
 .startObject(page)
 .startObject(properties) 
 .startObject(title)
 .field(type ,  string) 
 .field(indexAnalyzer ,  ik)
 .field(searchAnalyzer ,  ik)
 .endObject() 
 .startObject(code)
 .field(type ,  string) 
 .field(indexAnalyzer ,  ik)
 .field(searchAnalyzer ,  ik)
 .endObject() 
 .endObject()
 .endObject()
 .endObject()

测试分词可用调用下面 api，注意 indexname 为索引名，随便指定一个索引就行了

“elasticsearch 中文分词集成怎么实现”的内容就介绍到这里了，感谢大家的阅读。如果想了解更多行业相关的知识可以关注丸趣 TV 网站，丸趣 TV 小编将为大家输出更多高质量的实用文章！

正文完