x lines of Python: machine learning

| |
[ 所属分类 开发（python） | 发布者 店小二04 | 时间 2017 | 作者 红领巾 ] 0人收藏点击收藏

You might have noticed that our web address has changed to agilescientific.com , reflecting our continuing journey as a company. Links and emails to agilegeoscience.com will redirect for the foreseeable future, but if you have bookmarks or other links, you might want to change them. If you find anything that's broken, we'd love it if you couldlet us know.

Artificial intelligence in 10 lines of python? Is this really the world we live in? Yes. Yes it is.

Afterreminding you about the SEG machine learning contest just before Christmas, I thought I could show you how you train a model in a supervised learning problem, then use it to make predictions on unseen data. So we'll just break a simple contest entry down into ten easy steps (note that you could do this on anything, doesn't have to be this problem).

A machine learning primer

Before we start, let's review quickly what a machine learning problem looks like, and introduct a bit of jargon. To begin, we have a dataset (e.g. the 'Old'well in the diagram below). This consists of records, called instances . In this problem, each instance is a depth location. Each instance is a feature vector : a row vector comprising attributes or features , which in our case are wireline log values for GR, ILD, and so on. Each feature vector is a row in a matrix we conventionally call \(X\). Associated with each instance is some target label ― the thing we want to predict ― which is a continuous quantity in a regression problem, discrete in a classification problem. The vector of labels is usually called \(y\). In the problem below, the labels are integers representing 9 different facies.

A cartoon overview of a simple machine learning classification task, using a k-nearest neighbours model.

You can read much more about the dataset I'm using in Brendon Hall's tutorial (The Leading Edge, October 2016).

The ten steps to glory

Well, maybe not glory, but something. A prediction of facies at two wells, based on measurements made at 10 other wells. You can follow along in the notebook , but all the highlights are included here. We start by loading the data into a 'dataframe', which you can think of like a spreadsheet:

Now we specify the features we want to use, and make the matrix \(X\) and label vector \(y\):

features = ['GR', 'ILD_log10', 'DeltaPHI', 'PHIND', 'PE'] X = df[features].values y = df.Facies.values

Since this dataset is all we have, we'd like to set aside some data to test our model on. The library we're using, scikit-learn, has functions to do this sort of thing; by default, it'll split \(X\) and \(y\) into train and test datasets, with 25% of the data going into the test part:

X_train, X_test, y_train, y_test = train_test_split(X, y)

Now we're ready to choose a model, instantiate it (with some parameters if we want), and train the model (i.e. 'fit' the data). I am calling the trained model augur , because I like that word.

from sklearn.ensemble import ExtraTreesClassifier model = ExtraTreesClassifier() augur = model.fit(X_train, y_train)

Now we're ready to take the part of the dataset we reserved for validation, X_test , and predict its labels. Then we can compare those with the known labels, y_test , to see how well we did:

y_pred = augur.predict(X_test)

We can get a quick idea of the quality of prediction with sklearn.metrics.accuracy_score(y_test, y_pred) , but it's more interesting to look at the classification report, which shows us the precision and recall for each class, along with their harmonic mean, the F1 score:

from sklearn.metrics import classification_report print(classification_report(y_test, y_pred))

tags: train,model,problem,learning

1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责；
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性，不作出任何保证或承若；
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。