Interpretability of machine learning models
Vyacheslav A. Duke, Scientific director of Deep Patterns made a presentation «Logical methods of machine learning» during «Transportation problem and transport facilities» session held in House of scientists on January 16, 2019.
Overall picture of modern machine learning methods which are normally taken as important part of artificial intelligence systems was given in the first part of the presentation. It was noted that there is no clear interpretation for significant amount of such methods. Among them are discriminant analysis (with heterogenous data sets), neural network approach, nearest neighbors method, support vector machines etc. Machine learning practice of the last decade shown that such a requirement as interpretability of the results became a secondary issue.
The accent moved to stability of obtained models. This trend brought up ensemble approaches where hundreds or even thousands of models built by different methods and algorithms are combined to form a solution. It appeared that such committees (often built with boosting, bagging, method of random subspaces and stacking) of even «weak» algorithms can show same or even better accuracy as isolated so-called «strong» algorithms aimed at finding deep consistent patterns in data arrays.
This tendency was, in particular, reflected by publication in New York Times, posted on November 13, 2017, where the list of modern technologies that will change the world is headed by artificial intelligence methods in healthcare context. At the same time the accent in this article is made to perspectives of deep learning neural networks, mainly basing on effectiveness of this approach in visual pattern recognition tasks. This view of artificial intelligence misses the fact that many of medical diagnostics problems require not only formal result like diagnostic model accuracy but also result explanation, result interpretability.
The neural network approach is not the only one that misses interpretability requirement. Many of other modern artificial intelligence methods are based on building big ensembles of «weak» algorithms implementing so-called «swarm intelligence» and producing «black box» models that don’t contribute much to understanding the essence of diagnostic and prognostic decisions.
Thus, significant changes are happened in machine learning world during the last decade. Discovering «strong» methods and algorithms is no more attractive for most of the machine learning professionals — their interest now moved to using big ensembles of «weak» methods and algorithms on big data. This trend in machine learning must be analyzed separately. Here we deal with obvious deviation from basic machine learning ideals associated with extracting knowledge from data but not building «black box» models.
At the same time the problem of building «strong» models is still actual. It’s obvious, that strong models are more suitable for interpretation and can be combined into less bulky collections to achieve effective results on classification and prognosis tasks. Special value in this field belongs to logical analysis methods that reflect interconnections in data in terms of «if-then» rules allowing extraction and understanding the essence of the built models.
Regular polls among data scientists conducted by KDnuggets portal (www.kdnuggets.com) show stable popularity of logical methods of data mining. Results of these polls show that modern data mining software tools use wide spectrum of different procedures and special instruments for building and tuning big ensembles of decision rules. But at the same time the logical data analysis methods are still in top 3 approaches which are far ahead of others in terms of usage frequency.
In the next part of the presentation prof. V. Duke described characteristics of most commonly used methods of finding logical rules in data — decision (classification) trees — and shown that these methods can give reliable solutions only for simple data structures. Another popular approach to building logical rule, reviewed in the presentation is selection of binary features (elementary predicates) which is implemented in practice as procedures of limited search in the set of these elementary predicates. Special attention was paid to evolutionary approach (genetic algorithms mainly).
The emphasis in the presentation was placed on achievements of Deep Patterns, LLC, promoting innovative SRMD technology (Structural Resonance in Multidimensional Data) having no analogs worldwide. The SRMD technology is based on author’s concept of context-dependent local metrics and effect of structural resonance in multidimensional data. SRMD brings special value to areas described by complexity of systematic relations where traditional machine learning methods fail to give reliable results.
The Deep Data Diver system, implementing SRMD technology for classification in multidimensional data can be described as follows:
1. Accuracy — Deep Data Diver shows much better accuracy in areas with highly complex interrelations (bioinformatics, healthcare, chemometrics). This is reflected by significant growth of AUC[CROC] metric (area under the curve).
2. Computational complexity for algorithms used in Deep Data Diver — O(PN), where P — number of objects properties, N — number of objects in data matrix.
3. Result interpretability — Deep Data Diver finds in multidimensional data so-called «strong» (most complete for given accuracy) logical rules which allow significantly better interpretation.
At the end of the presentation were given some examples of solutions for practical tasks from different areas including transportation problems, finding long associative chains in data, et cetera.