Where traditional Data Mining methods fail

Specifics of the innovative SRMD technology

Discovery of so-called “strong” (most complete with given accuracy) logical “if-then” rules in multidimensional data is a fundamental scientific and practical problem.

This problem is believed to be solvable only by applying exhaustive search methods on combinations of elementary logical events. This requires tremendous amount of computing power and hardly can be done even with modern supercomputers. To achieve the result, traditional methods either apply artificial restrictions on the search or use “greed” (locally optimal) algorithms like decision trees.

During the years of theoretical research and practical studies, we have managed to demonstrate much better results by seeking for an effect of structural resonance at some step of multidimensional data aggregation algorithm. Structural resonance (SR) effect is an abrupt change in characteristics of object groups homology, which can be found in different points of multidimensional objects description space.

This effect allows to discover strong “if-then” rules in multidimensional data reflecting complex systemic relations usually invisible for traditional methods.

Deep Data Diver characteristics applies the innovative SRMD technology to solve the multidimensional data classification problems

1. Accuracy
For data with no complex systemic relations inside (typical for some business domains) Deep Data Diver shows at least the same accuracy as traditional methods. “The hardest thing of all is to find a black cat in a dark room, especially if there is no cat.” — Confucius.
Applied in areas with complex systemic interrelations (bioinformatics, medicine, chemometrics) Deep Data Diver produces results with significantly better accuracy. This is reflected by noticeable higher value of AUC[CROC] — area under the ROC curve.

2. Computational complexity
Limited search algorythms — exponential complexity.
Decision trees — O(PNL)
Deep Data Diver — O(PN)
Where P — number of attributes, N — number of objects, L — tree depth.

3. Result visibility
Strong rules are much better perceived and accepted by analysts. That’s why the Deep Data Diver system has cogent advantages over competitors.

Big Data KDK

Interpretability and objectivity of clusters. Clusters, localized as groups in local spaces, have clear interpretation in terms of logical rules and reflect analysis context defined by target parameters.

Parallelism. Cluster analysis procedure is arranged as a set of similar algorithms for building local metrics that can be naturally run in parallel.

Solution for “incomplete object description” problem. The common for Big Data problem of missing attribute values gets solved by building an own local description space for each selected object.

• Dynamic training. Updates to the database are checked for allegiance to already determined clusters and, if necessary, new local spaces are built for this data and new clusters are found.

• Data validation. The procedure of building local context-sensitive metrics allows information value checks for separate objects and the data array in a whole.

Big Basket

Big Basket system aims at solving problems of market basket analysis. It utilizes the new technology of search for associative rules, based on notion of local geometry and effect of structural resonance in multidimensional data. Unique abilities of the system allow determining highly accurate associations between elements of base transaction set with given element. These sets forms a basket with high support and long itemsets.