x lines of Python: machine learning
You might have noticed that our web address has changed to agilescientific.com , reflecting our continuing journey as a company. Links and emails to agilegeoscience.com will redirect for the foreseeable future, but if you have bookmarks or other links, you might want to change them. If you find anything that's broken, we'd love it if you couldlet us know.
Artificial intelligence in 10 lines of python? Is this really the world we live in? Yes. Yes it is.
Afterreminding you about the SEG machine learning contest just before Christmas, I thought I could show you how you train a model in a supervised learning problem, then use it to make predictions on unseen data. So we'll just break a simple contest entry down into ten easy steps (note that you could do this on anything, doesn't have to be this problem).A machine learning primer
Before we start, let's review quickly what a machine learning problem looks like, and introduct a bit of jargon. To begin, we have a dataset (e.g. the 'Old'well in the diagram below). This consists of records, called instances . In this problem, each instance is a depth location. Each instance is a feature vector : a row vector comprising attributes or features , which in our case are wireline log values for GR, ILD, and so on. Each feature vector is a row in a matrix we conventionally call \(X\). Associated with each instance is some target label ― the thing we want to predict ― which is a continuous quantity in a regression problem, discrete in a classification problem. The vector of labels is usually called \(y\). In the problem below, the labels are integers representing 9 different facies.
A cartoon overview of a simple machine learning classification task, using a k-nearest neighbours model.
You can read much more about the dataset I'm using in Brendon Hall's tutorial (The Leading Edge, October 2016).The ten steps to glory
Well, maybe not glory, but something. A prediction of facies at two wells, based on measurements made at 10 other wells. You can follow along in the notebook , but all the highlights are included here. We start by loading the data into a 'dataframe', which you can think of like a spreadsheet:
Now we specify the features we want to use, and make the matrix \(X\) and label vector \(y\):features = ['GR', 'ILD_log10', 'DeltaPHI', 'PHIND', 'PE'] X = df[features].values y = df.Facies.values
Since this dataset is all we have, we'd like to set aside some data to test our model on. The library we're using, scikit-learn, has functions to do this sort of thing; by default, it'll split \(X\) and \(y\) into train and test datasets, with 25% of the data going into the test part:X_train, X_test, y_train, y_test = train_test_split(X, y)
Now we're ready to choose a model, instantiate it (with some parameters if we want), and train the model (i.e. 'fit' the data). I am calling the trained model augur , because I like that word.from sklearn.ensemble import ExtraTreesClassifier model = ExtraTreesClassifier() augur = model.fit(X_train, y_train)
Now we're ready to take the part of the dataset we reserved for validation, X_test , and predict its labels. Then we can compare those with the known labels, y_test , to see how well we did:y_pred = augur.predict(X_test)
We can get a quick idea of the quality of prediction with sklearn.metrics.accuracy_score(y_test, y_pred) , but it's more interesting to look at the classification report, which shows us the precision and recall for each class, along with their harmonic mean, the F1 score:
from sklearn.metrics import classification_report print(classification_report(y_test, y_pred))
本文开发（python）相关术语:python基础教程 python多线程 web开发工程师 软件开发工程师 软件开发流程