Class ClusterFinder


  • public class ClusterFinder
    extends java.lang.Object
    ClusterFinder identifies HBase rows with clusters defined by previous clustering algorithm, read from JSON model files.
    Author:
    J.Hrivnac
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private org.apache.commons.math3.linear.RealMatrix _clusterCenters  
      private double[] _explainedVariance  
      private double[] _mean  
      private org.apache.commons.math3.linear.RealMatrix _pcaComponents  
      private static double _separation  
      private double[] _std  
      private static org.apache.logging.log4j.Logger log
      Logging .
    • Constructor Summary

      Constructors 
      Constructor Description
      ClusterFinder​(java.lang.String scalerFile, java.lang.String pcaFile, java.lang.String clustersFile)  
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      private double[] applyPCA​(double[] standardizedInput)  
      private int findClosestCluster​(double[] transformedData)
      Find the closest cluster from the transformed data.
      private void loadClusterCenters​(java.lang.String filePath)  
      private void loadPCAParams​(java.lang.String filePath)  
      private void loadScalerParams​(java.lang.String filePath)  
      static void main​(java.lang.String[] args)  
      private static void setSeparation​(double separation)
      Set the minimal separation quotient.
      private double[] standardize​(double[] input)  
      int transformAndPredict​(double[] inputData)
      Transform provided data array and find the closest cluster.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • _mean

        private double[] _mean
      • _std

        private double[] _std
      • _pcaComponents

        private org.apache.commons.math3.linear.RealMatrix _pcaComponents
      • _clusterCenters

        private org.apache.commons.math3.linear.RealMatrix _clusterCenters
      • log

        private static org.apache.logging.log4j.Logger log
        Logging .
    • Constructor Detail

      • ClusterFinder

        public ClusterFinder​(java.lang.String scalerFile,
                             java.lang.String pcaFile,
                             java.lang.String clustersFile)
                      throws java.io.IOException
        Throws:
        java.io.IOException
    • Method Detail

      • main

        public static void main​(java.lang.String[] args)
                         throws java.io.IOException
        Throws:
        java.io.IOException
      • loadScalerParams

        private void loadScalerParams​(java.lang.String filePath)
                               throws java.io.IOException
        Throws:
        java.io.IOException
      • loadPCAParams

        private void loadPCAParams​(java.lang.String filePath)
                            throws java.io.IOException
        Throws:
        java.io.IOException
      • loadClusterCenters

        private void loadClusterCenters​(java.lang.String filePath)
                                 throws java.io.IOException
        Throws:
        java.io.IOException
      • standardize

        private double[] standardize​(double[] input)
      • applyPCA

        private double[] applyPCA​(double[] standardizedInput)
      • findClosestCluster

        private int findClosestCluster​(double[] transformedData)
        Find the closest cluster from the transformed data.
        Parameters:
        transformedData - The transformed input data.
        Returns:
        The (number of) the closest cluster. -1 if it cannot be found with sufficient resolution.
      • transformAndPredict

        public int transformAndPredict​(double[] inputData)
        Transform provided data array and find the closest cluster.
        Parameters:
        inputData - The original input data.
        Returns:
        The (number of) the closest cluster. -1 if it cannot be found with sufficient resolution.
      • setSeparation

        private static void setSeparation​(double separation)
        Set the minimal separation quotient.
        Parameters:
        separation - The minimal separation quotient. The ration between distance to closest and second closest cluster should be smaller than separation, otherwise cluster is not considered reliable. 1 gives no restriction. The default is 0.5.