未加星标

Basic Machine Learning in Python with Scikit-learn

字体大小 | |
[开发(python) 所属分类 开发(python) | 发布者 店小二03 | 时间 2016 | 作者 红领巾 ] 0人收藏点击收藏

Machine learning has become a hot topic in the last few years and it is for a reason. It provides data analysts with efficientways of extracting information from data, allowing it to be usedfor analysis and modeling purposes.

The Scikit-learn python library has implementations ofdozens of learning algorithms and is freelyavailable for academic and commercial use under the terms of the BSD licence. Some of these algorithms can be extremely useful for our job as water systems analysts, so given the overwelming amount of algorithmsimplemented in Scikit-learn, I though I would mention a few I find particularly useful for my research. For each method belowI included link swith an examples from the Scikit-learn’s website. Instalation and use instructions can be found in their website .

CART Trees

CARTtrees that can used forregression or classification. Any any tree,CART trees are considered poor (generally high variance) classifiers unless bootstrapped or boosted (see supervised learning), but the resulting rules are easily interpretable.

CART Trees: http://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart

Dimensionality reduction

Principal component Analysis (PCA) is perhaps the most widely useddimensionality reduction technique.It works byfinding the basis the maximizes the data’s variance, allowing for the elimination of axis that have low variances.Among its uses arenoise reduction, data visualization, as it preserves the distances between data points, and improvement ofcomputational efficiency of other algorithms bygetting rid of redundant information.PCA can me used in its pure form or it can be kernelized to handledata setswhose variance is maximumin a non-linear direction.Manifold learning is another way of performing dimensionality reduction by unwinding the lower dimensional manifold where the information lies.

PCA: http://scikit-learn.org/stable/auto_examples/decomposition/plot_pca_3d.html#sphx-glr-auto-examples-decomposition-plot-pca-3d-py

Kernel PCA: http://scikit-learn.org/stable/auto_examples/decomposition/plot_kernel_pca.html#sphx-glr-auto-examples-decomposition-plot-kernel-pca-py

Manifold learning: http://scikit-learn.org/stable/auto_examples/manifold/plot_compare_methods.html#sphx-glr-auto-examples-manifold-plot-compare-methods-py

Clustering

Clustering is used to group similar points in a data set. One example is the problem of find customer niches based on the products each customer buys. The most famous clustering algorithm is k-means, which,as any other machine learning algorithm, works well on some data sets but not in others.There are several alternative algorithms, all of which exemplified in the following two links:

Clustering algorithmscomparison: http://scikit-learn.org/stable/auto_examples/cluster/plot_cluster_comparison.html#sphx-glr-auto-examples-cluster-plot-cluster-comparison-py

Gaussian Mixture Models(finds the sameresultsas k-means but also provides variances): http://scikit-learn.org/stable/auto_examples/mixture/plot_gmm_covariances.html#sphx-glr-auto-examples-mixture-plot-gmm-covariances-py

Reducing the dimentionality of a dataset with PCA or kernel PCA may speed up clustering algorithms.

Supervised learning

Supervised learning algorithms can be used forregression or classification problems (e.g. classify a point as pass/fail) based on labeled data sets. The most “trendy” one nowadays is neural networks, but support vector machines, boosted andbagged trees, and others are also options that should be considered and tested on your data set. Bellow are links to some of the supervised learning algorithms implemented in Scikit-learn:

Comparison between supervised learning algorithms : http://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html#sphx-glr-auto-examples-classification-plot-classifier-comparison-py

Neural networks: http://scikit-learn.org/stable/modules/neural_networks_supervised.html

Gaussian Processes is also a supervised learning algorithm (regression) which isalso be used for Bayesian optimization:

Gaussian processes: http://scikit-learn.org/stable/auto_examples/gaussian_process/plot_gpr_noisy_targets.html#sphx-glr-auto-examples-gaussian-process-plot-gpr-noisy-targets-py

本文开发(python)相关术语:python基础教程 python多线程 web开发工程师 软件开发工程师 软件开发流程

主题: Python
分页:12
转载请注明
本文标题:Basic Machine Learning in Python with Scikit-learn
本站链接:http://www.codesec.net/view/480150.html
分享请点击:


1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
登录后可拥有收藏文章、关注作者等权限...
技术大类 技术大类 | 开发(python) | 评论(0) | 阅读(41)