Tips Implementing Classification Models with Scikit-Learn

hieuhoc96 · Sep 29, 2023

[TIẾNG VIỆT]:
** Thực hiện các mô hình phân loại với scikit-learn **

Phân loại là một nhiệm vụ học tập được giám sát trong đó mục tiêu là dự đoán nhãn lớp của một điểm dữ liệu đầu vào đã cho.Scikit-Learn là một thư viện Python cung cấp nhiều thuật toán học máy, bao gồm cả bộ phân loại.Trong bài viết này, chúng tôi sẽ chỉ cho bạn cách triển khai các mô hình phân loại với Scikit-Learn.

Chúng tôi sẽ sử dụng [Bộ dữ liệu Iris] (sklearn.datasets.load_iris) để đào tạo và đánh giá các mô hình của chúng tôi.Bộ dữ liệu IRIS chứa 150 điểm dữ liệu, mỗi điểm đại diện cho một bông hoa có bốn tính năng: chiều dài sepal, chiều rộng sepal, chiều dài cánh hoa và chiều rộng cánh hoa.Biến mục tiêu là các loài hoa, có thể là một trong ba lớp: iris-setosa, iris-vversolor hoặc iris-virginica.

## 1. Tải dữ liệu

Bước đầu tiên là tải dữ liệu vào đối tượng bộ dữ liệu Scikit-LEARN.Chúng ta có thể làm điều này bằng cách sử dụng hàm `load_iris ()`.

`` `Python
từ sklearn.datasets nhập load_iris

iris = load_iris ()
`` `

Đối tượng `iris` là một từ điển chứa các tính năng và giá trị đích cho dữ liệu.Các tính năng được lưu trữ trong thuộc tính `data` và các giá trị đích được lưu trữ trong thuộc tính 'Target'.

`` `Python
in (iris.data.shape)
in (iris.target.shape)
`` `

`` `
(150, 4)
(150,)
`` `

Dữ liệu là một mảng 150 x 4 và các giá trị đích là một mảng 150 phần tử.

## 2. Chia dữ liệu vào tập luyện và kiểm tra

Trước khi chúng tôi có thể đào tạo một mô hình, chúng tôi cần chia dữ liệu thành các bộ đào tạo và kiểm tra.Bộ đào tạo sẽ được sử dụng để đào tạo mô hình và bộ thử nghiệm sẽ được sử dụng để đánh giá hiệu suất của mô hình.

Chúng ta có thể phân chia dữ liệu bằng hàm `Train_test_split ()`.

`` `Python
từ sklearn.model_selection nhập khẩu troed_test_split

X_TRAIN, X_TEST, Y_TRAIN, Y_TEST = Train_Test_Split (iris.data, iris.target, test_size = 0.2)
`` `

Hàm `Train_test_split ()` có các đối số sau:

* `X`: Dữ liệu tính năng.
* `y`: các giá trị đích.
* `test_size`: Tỷ lệ dữ liệu được sử dụng cho tập kiểm tra.

Hàm trả về bốn mảng:

* `X_train`: Dữ liệu tính năng đào tạo.
* `X_test`: Dữ liệu tính năng kiểm tra.
* `y_train`: các giá trị mục tiêu đào tạo.
* `y_test`: các giá trị mục tiêu thử nghiệm.

## 3. Đào tạo mô hình

Bây giờ chúng tôi đã chia dữ liệu thành các bộ đào tạo và thử nghiệm, chúng tôi có thể đào tạo mô hình.Chúng tôi sẽ sử dụng [logisticRegression] (https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.logisticregression.html) để đào tạo mô hình hồi quy logistic.

`` `Python
từ sklearn.linear_model nhập khẩu logisticregression

model = logisticRegress ()
model.fit (x_train, y_train)
`` `

Hàm `fit ()` lấy các tính năng đào tạo có các giá trị dữ liệu và mục tiêu làm đối số.Nó đào tạo mô hình trên dữ liệu đào tạo.

## 4. Đánh giá mô hình

Bây giờ chúng tôi đã đào tạo mô hình, chúng tôi có thể đánh giá hiệu suất của nó trên bộ thử nghiệm.Chúng ta có thể làm điều này bằng cách sử dụng hàm `scoor ()`.

`` `Python
in (model.score (x_test, y_test)))
`` `

`` `
0,97
`` `

Hàm `scoor ()` trả về độ chính xác của mô hình trên tập kiểm tra.Trong trường hợp này, mô hình đạt được độ chính xác 0,97.

## 5. Dự đoán dữ liệu mới

Khi chúng tôi đã đào tạo và đánh giá mô hình, chúng tôi có thể sử dụng nó để dự đoán nhãn lớp của

[ENGLISH]:
**Implementing Classification Models with Scikit-Learn**

Classification is a supervised learning task where the goal is to predict the class label of a given input data point. Scikit-Learn is a Python library that provides a variety of machine learning algorithms, including classifiers. In this article, we will show you how to implement classification models with Scikit-Learn.

We will use the [iris dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html) to train and evaluate our models. The iris dataset contains 150 data points, each of which represents a flower with four features: sepal length, sepal width, petal length, and petal width. The target variable is the flower species, which can be one of three classes: Iris-setosa, Iris-versicolor, or Iris-virginica.

## 1. Loading the Data

The first step is to load the data into a Scikit-Learn dataset object. We can do this using the `load_iris()` function.

```python
from sklearn.datasets import load_iris

iris = load_iris()
```

The `iris` object is a dictionary that contains the features and target values for the data. The features are stored in the `data` attribute, and the target values are stored in the `target` attribute.

```python
print(iris.data.shape)
print(iris.target.shape)
```

```
(150, 4)
(150,)
```

The data is a 150 x 4 array, and the target values are a 150-element array.

## 2. Splitting the Data into Training and Test Sets

Before we can train a model, we need to split the data into training and test sets. The training set will be used to train the model, and the test set will be used to evaluate the model's performance.

We can split the data using the `train_test_split()` function.

```python
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2)
```

The `train_test_split()` function takes the following arguments:

* `X`: The features data.
* `y`: The target values.
* `test_size`: The proportion of the data to be used for the test set.

The function returns four arrays:

* `X_train`: The training features data.
* `X_test`: The test features data.
* `y_train`: The training target values.
* `y_test`: The test target values.

## 3. Training the Model

Now that we have split the data into training and test sets, we can train the model. We will use the [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) class to train a logistic regression model.

```python
from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
model.fit(X_train, y_train)
```

The `fit()` function takes the training features data and target values as arguments. It trains the model on the training data.

## 4. Evaluating the Model

Now that we have trained the model, we can evaluate its performance on the test set. We can do this using the `score()` function.

```python
print(model.score(X_test, y_test))
```

```
0.97
```

The `score()` function returns the model's accuracy on the test set. In this case, the model achieved an accuracy of 0.97.

## 5. Predicting New Data

Once we have trained and evaluated the model, we can use it to predict the class label of

Tips Implementing Classification Models with Scikit-Learn

hieuhoc96

New member

Latest posts