solr自动聚类怎么实现

86次阅读

共计 4203 个字符，预计需要花费 11 分钟才能阅读完成。

这篇文章主要讲解了“solr 自动聚类怎么实现”，文中的讲解内容简单清晰，易于学习与理解，下面请大家跟着丸趣 TV 小编的思路慢慢深入，一起来研究和学习“solr 自动聚类怎么实现”吧！

Solr 使用 Carrot2 完成了聚类功能, 能够把检索到的内容自动分类, Carrot2 聚类示例:

要想 Solr 支持聚类功能, 首选要把 Solr 发行包的中的 dist/ solr-clustering-4.2.0.jar, 复制到 \solr\contrib\analysis-extras\lib 下. 然后打开 solrconfig.xml 进行添加配置:

searchComponent name= clustering

enable= ${solr.clustering.enabled:true}

>

lst name= engine

str name= name default /str

str name= carrot.algorithm org.carrot2.clustering.lingo.LingoClusteringAlgorithm /str

str name= LingoClusteringAlgorithm.desiredClusterCountBase 30 /str !–2~100–

str name= LingoClusteringAlgorithm.clusterMergingThreshold 0.70 /str !–0~1–

str name= LingoClusteringAlgorithm.scoreWeight 0 /str !–0~1–

str name= LingoClusteringAlgorithm.labelAssigner org.carrot2.clustering.lingo.SimpleLabelAssigner /str !–org.carrot2.clustering.lingo.UniqueLabelAssigner —

str name= LingoClusteringAlgorithm.phraseLabelBoost 1.5 /str !–0~10–

str name= LingoClusteringAlgorithm.phraseLengthPenaltyStart 8 /str !–2~8–

str name= LingoClusteringAlgorithm.phraseLengthPenaltyStop 8 /str !–2~8–

str name= TermDocumentMatrixReducer.factorizationQuality HIGH /str !–LOW,MEDIUM,HIGH–

!–

org.carrot2.matrix.factorization.PartialSingularValueDecompositionFactory

org.carrot2.matrix.factorization.NonnegativeMatrixFactorizationEDFactory

org.carrot2.matrix.factorization.NonnegativeMatrixFactorizationKLFactory

org.carrot2.matrix.factorization.LocalNonnegativeMatrixFactorizationFactory

org.carrot2.matrix.factorization.KMeansMatrixFactorizationFactory

—

str name= TermDocumentMatrixReducer.factorizationFactory org.carrot2.matrix.factorization.NonnegativeMatrixFactorizationEDFactory /str

str name= TermDocumentMatrixBuilder.maximumMatrixSize 37500 /str !–MinValue5000–

str name= TermDocumentMatrixBuilder.titleWordsBoost 2.0 /str !–2~10–

str name= TermDocumentMatrixBuilder.maxWordDf 0.9 /str !–0~1–

!–org.carrot2.text.vsm.LogTfIdfTermWeighting,org.carrot2.text.vsm.LinearTfIdfTermWeighting–

str name= TermDocumentMatrixBuilder.termWeighting org.carrot2.text.vsm.TfTermWeighting /str

str name= MultilingualClustering.defaultLanguage CHINESE_SIMPLIFIED /str

str name= MultilingualClustering.languageAggregationStrategy org.carrot2.text.clustering.MultilingualClustering.LanguageAggregationStrategy.FLATTEN_MAJOR_LANGUAGE /str !–FLATTEN_ALL,FLATTEN_NONE–

str name= GenitiveLabelFilter.enabled true /str

str name= StopWordLabelFilter.enabled true /str

str name= NumericLabelFilter.enabled true /str

str name= QueryLabelFilter.enabled true /str

str name= MinLengthLabelFilter.enabled true /str

str name= StopLabelFilter.enabled true /str

str name= CompleteLabelFilter.enabled true /str

str name= CompleteLabelFilter.labelOverrideThreshold 0.65 /str !–0~1–

str name= DocumentAssigner.exactPhraseAssignment false /str

str name= DocumentAssigner.minClusterSize 2 /str !–1~100–

str name= merge-resources true /str

str name= CaseNormalizer.dfThreshold 1 /str !–1~100–

str name= PhraseExtractor.dfThreshold 1 /str !–1~100–

str name= carrot.lexicalResourcesDir clustering/carrot2 /str

str name= SolrDocumentSource.solrIdFieldName id /str

/lst

/searchComponent

配好了聚类组件后, 下面配置 requestHandler:

requestHandler name= /clustering

startup= lazy

enable= ${solr.clustering.enabled:true}

>

lst name= defaults

str name= echoParams explicit /str

bool name= clustering true /bool

str name= clustering.engine default /str

bool name= clustering.results true /bool

str name= carrot.title category_s /str

str name= carrot.snippet content /str

str name= carrot.url path /str

str name= carrot.produceSummary true /str

/lst

arr name= last-components

str clustering /str

/arr

/requestHandler

有两个参数要注意 carrot.title,carrot.snippet 是聚类的比较计算字段, 这两个参数必须是 stored= true .carrot.title 的权重要高于 carrot.snippet, 如果只有一个做计算的字段 carrot.snippet 可以去掉 (是去掉不是值为空). 设完了用下面的 URL 就可以查询了

http://localhost:8080/skyCore/clustering?q=*%3A* wt=xml indent=true

感谢各位的阅读，以上就是“solr 自动聚类怎么实现”的内容了，经过本文的学习后，相信大家对 solr 自动聚类怎么实现这一问题有了更深刻的体会，具体使用情况还需要大家实践验证。这里是丸趣 TV，丸趣 TV 小编将为大家推送更多相关知识点的文章，欢迎关注！

正文完