共计 3456 个字符,预计需要花费 9 分钟才能阅读完成。
这篇文章主要介绍“SimpleKMeansClustering 运行报错怎么解决”的相关知识,丸趣 TV 小编通过实际案例向大家展示操作过程,操作方法简单快捷,实用性强,希望这篇“SimpleKMeansClustering 运行报错怎么解决”文章能帮助大家解决问题。
环境列表
软件明称版本
hadoop
0.20.2
mahout
0.4
eclipse
Kepler Service Release 1
报错代码:
ClassNotFoundException: org.apache.mahout.math.function.IntDoubleProcedure
解决办法:
开始的主观认为 IntDoubleProcedure 在 mahout-math-0.4.jar 包里, 可是经测试确实没有在这个包里面.
后来发现 IntDoubleProcedure 在 mahout-collections-1.0.jar 里面, 增加 mahout-collections-1.0.jar 这个包, 就不会报出上面的错误了.
文件内容:
package com.mahout.cluster;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Text;
import org.apache.mahout.clustering.WeightedVectorWritable;
import org.apache.mahout.clustering.kmeans.Cluster;
import org.apache.mahout.clustering.kmeans.KMeansDriver;
import org.apache.mahout.common.distance.EuclideanDistanceMeasure;
import org.apache.mahout.math.RandomAccessSparseVector;
import org.apache.mahout.math.Vector;
import org.apache.mahout.math.VectorWritable;
public class SimpleKMeansClustering { public static final double[][] points = { {1, 1}, {2, 1}, {1, 2},
{2, 2}, {3, 3}, {8, 8},
{9, 8}, {8, 9}, {9, 9}};
public static void writePointsToFile(List Vector points,
String fileName,
FileSystem fs,
Configuration conf) throws IOException { Path path = new Path(fileName);
SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf,
path, LongWritable.class, VectorWritable.class);
long recNum = 0;
VectorWritable vec = new VectorWritable();
for (Vector point : points) { vec.set(point);
writer.append(new LongWritable(recNum++), vec);
}
writer.close();
}
public static List Vector getPoints(double[][] raw) {
List Vector points = new ArrayList Vector
for (int i = 0; i raw.length; i++) { double[] fr = raw[i];
Vector vec = new RandomAccessSparseVector(fr.length);
vec.assign(fr);
points.add(vec);
}
return points;
}
public static void main(String args[]) throws Exception {
int k = 3;
List Vector vectors = getPoints(points);
File testData = new File( testdata
if (!testData.exists()) { testData.mkdir();
}
testData = new File( testdata/points
if (!testData.exists()) { testData.mkdir();
}
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
writePointsToFile(vectors, testdata/points/file1 , fs, conf);
Path path = new Path( testdata/clusters/part-00000
SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf,
path, Text.class, Cluster.class);
for (int i = 0; i k; i++) { Vector vec = vectors.get(i);
Cluster cluster = new Cluster(vec, i, new EuclideanDistanceMeasure());
writer.append(new Text(cluster.getIdentifier()), cluster);
}
writer.close();
KMeansDriver.run(conf, new Path( testdata/points), new Path(testdata/clusters),
new Path(output), new EuclideanDistanceMeasure(), 0.001, 10,
true, false);
SequenceFile.Reader reader = new SequenceFile.Reader(fs,
new Path( output/ + Cluster.CLUSTERED_POINTS_DIR
+ /part-m-00000 ), conf);
IntWritable key = new IntWritable();
WeightedVectorWritable value = new WeightedVectorWritable();
while (reader.next(key, value)) { System.out.println(value.toString() + belongs to cluster
+ key.toString());
}
reader.close();
}
}
关于“SimpleKMeansClustering 运行报错怎么解决”的内容就介绍到这里了,感谢大家的阅读。如果想了解更多行业相关的知识,可以关注丸趣 TV 行业资讯频道,丸趣 TV 小编每天都会为大家更新不同的知识点。
正文完