10 лучших алгоритмов машинного обучения (ML)

Хорошая статья в Data Science Central по результатам опроса специалистов по данным: «Top 10 Machine Learning Algorithms«. Особенностью статьи является ее практическая ценность — приведены наиболее популярные алгоритмы машинного обучения с 2006 по 2016 год.

This was the subject of a question asked on Quora: What are the top 10 data mining or machine learning algorithms?

Some modern algorithms such as collaborative filtering, recommendation engine, segmentation, or attribution modeling, are missing from the lists below. Algorithms from graph theory (to find the shortest path in a graph, or to detect connected components), from operations research (the simplex, to optimize the supply chain), or from time series, are not listed either. And I could not find MCM (Markov Chain Monte Carlo) and related algorithms used to process hierarchical, spatio-temporal and other Bayesian models. What else in missing?

10 лучших алгоритмов Машинного Обучения

10 лучших алгоритмов Машинного Обучения

In 2006, the IEEE Conference on Data Mining identified the top 10 ML algorithms as

  1. C4.5 (Decision Trees)
  2. k-Means (clustering)
  3. Support Vector Machines (SVM)
  4. Apriori
  5. Expectation Maximization (EM)
  6. PageRank
  7. AdaBoost
  8. k-Nearest Neighbors (kNN)
  9. Naive Bayes
  10. Classification and Regression Tree (CART)

An answer to the Quora question, in 2011, lists the following as potential candidates or additions:

  1. Kernel Density Estimation and Non-parametric Bayes Classifier
  2. K-Means
  3. Kernel Principal Components Analysis
  4. Linear Regression
  5. Neighbors (Nearest, Farthest, Range, k, Classification)
  6. Non-Negative Matrix Factorization
  7. Support Vector Machines
  8. Dimensionality Reduction
  9. Fast Singular Value Decomposition
  10. Decision Tree
  11. Bootstapped SVM
  12. Decision Tree
  13. Gaussian Processes
  14. Logistic Regression
  15. Logit Boost
  16. Model Tree
  17. Naïve Bayes
  18. Nearest Neighbors
  19. PLS
  20. Random Forest
  21. Ridge Regression
  22. Support Vector Machine
  23. Classification: logistic regression, naïve bayes, SVM, decision tree
  24. Regression: multiple regression, SVM
  25. Attribute importance: MDL
  26. Anomaly detection: one-class SVM
  27. Clustering: k-means, orthogonal partitioning
  28. Association: A Priori
  29. Feature extraction: NNMF

And a 2015 answer provides the following:

  1. Linear regression
  2. Logistic regression
  3. k-means
  4. SVMs
  5. Random Forests
  6. Matrix Factorization/SVD
  7. Gradient Boosted Decision Trees/Machines
  8. Naive Bayes
  9. Artificial Neural Networks
  10. For the last one I’d let you pick one of the following:
  11. Bayesian Networks
  12. Elastic Nets
  13. Any other clustering algo besides k-means
  14. LDA
  15. Conditional Random Fields
  16. HDPs or other Bayesian non-parametric model

My point of view is of course biased, but I would like to also add some algorithms developed or re-developed at the Data Science Central’s research lab:

  • Jackknife regression
  • Feature extraction / selection (mentioned above, but this version is very different)
  • Hidden decision trees
  • Indexation and tagging algorithms

These algorithms are described in the article What you wont learn in statistics classes.

Regarding the Indexation algorithms (see Part 2 after clicking on this link): This must be at least 20 years old. It is an incredibly fast clustering technique indeed: it does not require n x n memory storage, only n, where n is the number of observations. Also, it is easy to implement in distributed Map-Reduce or Hadoop environments. It is a fundamental algorithm: the core algorithm used to build taxonomies, catalogs (see this article about Amazon), search engines, and enterprise search solutions. DSC used it successfully in numerous contexts including for IoT automated growth hacking for digital publishing, to categorize articles and boost them depending (among other things) on category, for maximum efficiency. Here’s another illustration.

Добавить комментарий