I
Ian Hannah
Guest
Hi,
We are using the Microsoft Machine Learning library (Microsoft.ML). We have the following script working:
var trainingData = mlContext.Data.LoadFromTextFile<CSOData>(
path: @"C:\Users\Administrator\source\repos\WindowsFormsApp1\WindowsFormsApp1\level-data - reduced.txt",
hasHeader: false,
separatorChar: ',');
// set up a learning pipeline
// step 1: concatenate input features into a single column
var pipeline = mlContext.Transforms.Concatenate(
"Features",
"Level")
// step 2: use the k-means clustering algorithm
// assume there are 3 clusters
.Append(mlContext.Clustering.Trainers.KMeans(
"Features",
numberOfClusters: 3));
// train the model on the data file
Debug.WriteLine("Start training model....");
TransformerChain<ClusteringPredictionTransformer<KMeansModelParameters>> model = pipeline.Fit(trainingData);
Debug.WriteLine("Model training complete!");
// Transform data
IDataView transformedData = model.Transform(trainingData);
VBuffer<float>[] centroids = null;
var last = model.LastTransformer;
KMeansModelParameters kparams = (KMeansModelParameters)last.GetType().GetProperty("Model").GetValue(last);
kparams.GetClusterCentroids(ref centroids, out int k);
float cluster1 = centroids[0].GetValues().ToArray().FirstOrDefault();
float cluster2 = centroids[1].GetValues().ToArray().FirstOrDefault();
float cluster3 = centroids[2].GetValues().ToArray().FirstOrDefault();
Debug.WriteLine(cluster1);
Debug.WriteLine(cluster2);
Debug.WriteLine(cluster3);
So we are able to get the centroids of each cluster. What we need is the number of samples in each cluster and the withinss value for each cluster but we just cannot work out how to do this.
Does anyone know how to access these values?
Regards
Ian Hannah
Continue reading...
We are using the Microsoft Machine Learning library (Microsoft.ML). We have the following script working:
var trainingData = mlContext.Data.LoadFromTextFile<CSOData>(
path: @"C:\Users\Administrator\source\repos\WindowsFormsApp1\WindowsFormsApp1\level-data - reduced.txt",
hasHeader: false,
separatorChar: ',');
// set up a learning pipeline
// step 1: concatenate input features into a single column
var pipeline = mlContext.Transforms.Concatenate(
"Features",
"Level")
// step 2: use the k-means clustering algorithm
// assume there are 3 clusters
.Append(mlContext.Clustering.Trainers.KMeans(
"Features",
numberOfClusters: 3));
// train the model on the data file
Debug.WriteLine("Start training model....");
TransformerChain<ClusteringPredictionTransformer<KMeansModelParameters>> model = pipeline.Fit(trainingData);
Debug.WriteLine("Model training complete!");
// Transform data
IDataView transformedData = model.Transform(trainingData);
VBuffer<float>[] centroids = null;
var last = model.LastTransformer;
KMeansModelParameters kparams = (KMeansModelParameters)last.GetType().GetProperty("Model").GetValue(last);
kparams.GetClusterCentroids(ref centroids, out int k);
float cluster1 = centroids[0].GetValues().ToArray().FirstOrDefault();
float cluster2 = centroids[1].GetValues().ToArray().FirstOrDefault();
float cluster3 = centroids[2].GetValues().ToArray().FirstOrDefault();
Debug.WriteLine(cluster1);
Debug.WriteLine(cluster2);
Debug.WriteLine(cluster3);
So we are able to get the centroids of each cluster. What we need is the number of samples in each cluster and the withinss value for each cluster but we just cannot work out how to do this.
Does anyone know how to access these values?
Regards
Ian Hannah
Continue reading...