KNN近邻算法
<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>K-近邻算法(KNN)全面解析</title>
<script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
<style>
* {
box-sizing: border-box;
margin: 0;
padding: 0;
font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
}
body {
background-color: #f5f7fa;
color: #333;
line-height: 1.6;
padding: 20px;
max-width: 1200px;
margin: 0 auto;
}
header {
text-align: center;
margin-bottom: 40px;
padding: 20px;
background: linear-gradient(135deg, #6a11cb 0%, #2575fc 100%);
color: white;
border-radius: 10px;
box-shadow: 0 4px 15px rgba(0, 0, 0, 0.1);
}
h1 {
font-size: 2.5rem;
margin-bottom: 10px;
}
h2 {
font-size: 1.8rem;
margin: 25px 0 15px;
color: #2c3e50;
border-left: 5px solid #2575fc;
padding-left: 10px;
}
h3 {
font-size: 1.4rem;
margin: 20px 0 10px;
color: #3498db;
}
p {
margin-bottom: 15px;
font-size: 1.1rem;
}
.container {
display: flex;
flex-wrap: wrap;
gap: 20px;
margin-bottom: 30px;
}
.explanation {
flex: 1;
min-width: 300px;
background: white;
padding: 25px;
border-radius: 10px;
box-shadow: 0 4px 10px rgba(0, 0, 0, 0.1);
}
.visualization {
flex: 1;
min-width: 300px;
background: white;
padding: 25px;
border-radius: 10px;
box-shadow: 0 4px 10px rgba(0, 0, 0, 0.1);
}
.controls {
display: flex;
flex-wrap: wrap;
gap: 15px;
margin: 20px 0;
align-items: center;
}
.control-group {
display: flex;
flex-direction: column;
gap: 5px;
}
label {
font-weight: bold;
color: #2c3e50;
}
input[type="range"] {
width: 200px;
}
input[type="number"] {
width: 70px;
padding: 5px;
border: 1px solid #ddd;
border-radius: 4px;
}
button {
padding: 10px 20px;
background: #2575fc;
color: white;
border: none;
border-radius: 5px;
cursor: pointer;
font-size: 1rem;
transition: background 0.3s;
}
button:hover {
background: #1a60c0;
}
.chart-container {
position: relative;
height: 400px;
margin-top: 20px;
}
.point-info {
margin-top: 20px;
padding: 15px;
background: #f8f9fa;
border-radius: 5px;
border-left: 4px solid #3498db;
}
.example {
background: #e8f4fc;
padding: 15px;
border-radius: 5px;
margin: 15px 0;
}
.formula {
font-family: 'Courier New', Courier, monospace;
background: #2c3e50;
color: white;
padding: 10px;
border-radius: 5px;
margin: 10px 0;
overflow-x: auto;
}
.movie-table {
width: 100%;
border-collapse: collapse;
margin: 20px 0;
}
.movie-table th, .movie-table td {
border: 1px solid #ddd;
padding: 10px;
text-align: center;
}
.movie-table th {
background-color: #3498db;
color: white;
}
.movie-table tr:nth-child(even) {
background-color: #f2f2f2;
}
.movie-table tr:hover {
background-color: #e6f7ff;
}
footer {
text-align: center;
margin-top: 40px;
padding: 20px;
color: #7f8c8d;
}
@media (max-width: 768px) {
.container {
flex-direction: column;
}
}
</style>
</head>
<body>
<header>
<h1>K-近邻算法(KNN)全面解析</h1>
<p>一种简单而强大的分类与回归算法</p>
</header>
<div class="container">
<div class="explanation">
<h2>什么是K-近邻算法?</h2>
<p>K-近邻算法(K-Nearest Neighbors,简称KNN)是一种基本且常用的机器学习算法,可用于分类和回归任务。</p>
<div class="example">
<p><strong>核心思想:</strong>如果一个样本在特征空间中的k个最相似(即最近邻)的样本中的大多数属于某一个类别,则该样本也属于这个类别。</p>
</div>
<h3>工作原理</h3>
<ol>
<li>计算未知样本到训练集中每个样本的距离</li>
<li>按距离递增次序排序</li>
<li>选取距离最小的k个样本</li>
<li>确定前k个样本所在类别出现的频率</li>
<li>返回前k个样本中出现频率最高的类别作为预测结果</li>
</ol>
<h3>距离计算</h3>
<p>KNN算法通常使用欧氏距离来衡量样本之间的相似性:</p>
<div class="formula">
d(x, y) = √[(x₁ - y₁)² + (x₂ - y₂)² + ... + (xₙ - yₙ)²]
</div>
<p>其中x和y是两个样本点,x₁, x₂,...,xₙ和y₁, y₂,...,yₙ是它们的特征值。</p>
<h3>K值的选择</h3>
<p>K值的选择对算法结果有重大影响:</p>
<ul>
<li><strong>K值过小:</strong>模型复杂,容易收到异常点的影响,那么模型复杂,容易过拟合(对异常点敏感)</li>
<li><strong>K值过大:</strong>模型简单,模型没有什么东西学,那么模型简单,可能欠拟合(忽略局部特征)</li>
</ul>
<p>通常使用交叉验证来选择最优的K值。</p>
</div>
<div class="visualization">
<h2>KNN算法可视化</h2>
<p>调整参数,观察K值如何影响分类结果:</p>
<div class="controls">
<div class="control-group">
<label for="k-value">K值 (n_neighbors):</label>
<input type="range" id="k-value" min="1" max="15" value="3">
<input type="number" id="k-value-display" min="1" max="15" value="3">
</div>
<div class="control-group">
<label for="x-value">X坐标:</label>
<input type="number" id="x-value" value="5">
</div>
<div class="control-group">
<label for="y-value">Y坐标:</label>
<input type="number" id="y-value" value="5">
</div>
<button id="add-point">添加测试点</button>
<button id="reset">重置</button>
</div>
<div class="chart-container">
<canvas id="knn-chart"></canvas>
</div>
<div class="point-info">
<p id="prediction">预测结果:请添加测试点</p>
<p id="distances">距离信息:-</p>
</div>
</div>
</div>
<div class="explanation">
<h2>实验要求</h2>
<p>假设我们有以下电影数据,根据搞笑镜头、拥抱镜头和打斗镜头的数量分类:</p>
<p>实验数据<br>
x = [<br>
[5, 100, 5], # 1<br>
[39, 0, 31], # 0<br>
[3, 2, 65], # 2<br>
[2, 3, 55], # 2<br>
[9, 38, 2], # 1<br>
[8, 34, 17], # 1<br>
[5, 2, 57], # 2<br>
[21, 17, 5], # 0<br>
[45, 2, 9] # 0<br>
]<br>
y = [1, 0, 2, 2, 1, 1, 2, 0, 0]
</p>
<table class="movie-table">
<thead>
<tr>
<th>电影名称</th>
<th>搞笑镜头</th>
<th>拥抱镜头</th>
<th>打斗镜头</th>
<th>电影类型</th>
</tr>
</thead>
<tbody>
<tr>
<td>泰坦尼克号</td>
<td>5</td>
<td>100</td>
<td>5</td>
<td>爱情片</td>
</tr>
<tr>
<td>超能失控</td>
<td>39</td>
<td>0</td>
<td>31</td>
<td>喜剧片</td>
</tr>
<tr>
<td>光明纪元</td>
<td>3</td>
<td>2</td>
<td>65</td>
<td>动作片</td>
</tr>
<tr>
<td>月球之下</td>
<td>2</td>
<td>3</td>
<td>55</td>
<td>动作片</td>
</tr>
<tr>
<td>心灵传输</td>
<td>9</td>
<td>38</td>
<td>2</td>
<td>爱情片</td>
</tr>
<tr>
<td>无尽星海</td>
<td>8</td>
<td>34</td>
<td>17</td>
<td>爱情片</td>
</tr>
<tr>
<td>时空破裂</td>
<td>5</td>
<td>2</td>
<td>57</td>
<td>动作片</td>
</tr>
<tr>
<td>命运之刃</td>
<td>21</td>
<td>17</td>
<td>5</td>
<td>喜剧片</td>
</tr>
<tr>
<td>影武者</td>
<td>45</td>
<td>2</td>
<td>9</td>
<td>喜剧片</td>
</tr>
<tr style="background-color: #ffe6cc;">
<td>光影纪元(待分类)</td>
<td>23</td>
<td>3</td>
<td>17</td>
<td>?</td>
</tr>
</tbody>
</table>
<p>要预测《光影纪元》的类型,我们可以计算它与所有已知电影的距离,然后找出K个最近的邻居。</p>
<div class="formula">
// 计算与《月球之下》的距离:<br>
d = √[(23-2)² + (3-3)² + (17-55)²] <br>
= √[21² + 0² + (-38)²] <br>
= √[441 + 0 + 1444] <br>
= √1885 ≈ 43.42
</div>
<p>计算出所有距离后,选择K个最近的邻居,通过投票决定《光影纪元》的类型。</p>
</div>
<div class="container">
<div class="explanation">
<h2>KNN的优缺点</h2>
<h3>优点</h3>
<ul>
<li>简单易懂,易于实现</li>
<li>无需训练阶段,适合实时预测</li>
<li>对异常值不敏感(当K值较大时)</li>
<li>适用于多分类问题</li>
</ul>
<h3>缺点</h3>
<ul>
<li>计算复杂度高(需要计算所有样本的距离)</li>
<li>内存消耗大(需要存储所有训练数据)</li>
<li>对不平衡数据集敏感</li>
<li>需要选择合适的K值和距离度量</li>
</ul>
</div>
<div class="explanation">
<h2>KNN的应用场景</h2>
<h3>回归问题 => 回归的标签是连续的</h3>
<ul>
<li>计算未知样本到每一个训练样本的距离</li>
<li>将训练样本根据距离大小升序排列</li>
<li>取出距离最近的 K 个训练样本</li>
<li>进行多数表决,统计 K 个样本中哪个类别的样本个数最多</li>
</ul>
<h3>分类问题 => 分类的标签的离散的</h3>
<ul>
<li>计算未知样本到每一个训练样本的距离</li>
<li>将训练样本根据距离大小升序排列</li>
<li>取出距离最近的 K 个训练样本</li>
<li>把这个 K 个样本的目标值计算其平均值</li>
<li>作为将未知的样本预测的值</li>
</ul>
</div>
</div>
<footer>
<p>© 2023 K-近邻算法(KNN)可视化教程 | 设计用于教育目的</p>
</footer>
<script>
document.addEventListener('DOMContentLoaded', function() {
// 训练数据:特征和标签
const trainingData = {
features: [
[2, 3], [5, 4], [9, 6], [4, 7], [8, 1],
[7, 2], [5, 8], [3, 5], [2, 9], [1, 4],
[7, 7], [9, 9], [4, 3], [6, 5], [8, 8]
],
labels: [
'A', 'A', 'B', 'B', 'A',
'A', 'B', 'A', 'B', 'A',
'B', 'B', 'A', 'B', 'B'
]
};
// 颜色映射
const colors = {
A: 'rgba(255, 99, 132, 0.8)',
B: 'rgba(54, 162, 235, 0.8)',
unknown: 'rgba(200, 200, 200, 0.8)'
};
// 获取Canvas上下文
const ctx = document.getElementById('knn-chart').getContext('2d');
// 创建图表
const knnChart = new Chart(ctx, {
type: 'scatter',
data: {
datasets: [
{
label: '类别 A',
data: trainingData.features.map((point, i) =>
trainingData.labels[i] === 'A' ? {x: point[0], y: point[1]} : null
).filter(point => point !== null),
backgroundColor: colors.A,
pointRadius: 8,
pointHoverRadius: 10
},
{
label: '类别 B',
data: trainingData.features.map((point, i) =>
trainingData.labels[i] === 'B' ? {x: point[0], y: point[1]} : null
).filter(point => point !== null),
backgroundColor: colors.B,
pointRadius: 8,
pointHoverRadius: 10
},
{
label: '测试点',
data: [],
backgroundColor: colors.unknown,
pointRadius: 10,
pointHoverRadius: 12
}
]
},
options: {
responsive: true,
maintainAspectRatio: false,
scales: {
x: {
title: {
display: true,
text: '特征 1'
},
min: 0,
max: 10
},
y: {
title: {
display: true,
text: '特征 2'
},
min: 0,
max: 10
}
},
plugins: {
tooltip: {
callbacks: {
label: function(context) {
return `(${context.parsed.x}, ${context.parsed.y})`;
}
}
}
}
}
});
// 获取DOM元素
const kValueSlider = document.getElementById('k-value');
const kValueDisplay = document.getElementById('k-value-display');
const xValueInput = document.getElementById('x-value');
const yValueInput = document.getElementById('y-value');
const addPointButton = document.getElementById('add-point');
const resetButton = document.getElementById('reset');
const predictionElement = document.getElementById('prediction');
const distancesElement = document.getElementById('distances');
// 同步K值滑块和数字输入
kValueSlider.addEventListener('input', function() {
kValueDisplay.value = this.value;
});
kValueDisplay.addEventListener('input', function() {
let value = Math.min(Math.max(parseInt(this.value) || 1, 1), 15);
this.value = value;
kValueSlider.value = value;
});
// 添加测试点
addPointButton.addEventListener('click', function() {
const k = parseInt(kValueDisplay.value);
const x = parseFloat(xValueInput.value);
const y = parseFloat(yValueInput.value);
if (isNaN(x) || isNaN(y) || x < 0 || x > 10 || y < 0 || y > 10) {
alert('请输入有效的坐标值(0-10)');
return;
}
// 计算与所有训练点的距离
const distances = trainingData.features.map((point, i) => {
return {
index: i,
distance: Math.sqrt(Math.pow(point[0] - x, 2) + Math.pow(point[1] - y, 2)),
label: trainingData.labels[i]
};
});
// 按距离排序
distances.sort((a, b) => a.distance - b.distance);
// 获取前K个最近邻
const nearestNeighbors = distances.slice(0, k);
// 统计各类别数量
const count = {
A: 0,
B: 0
};
nearestNeighbors.forEach(neighbor => {
count[neighbor.label]++;
});
// 确定预测类别
let prediction = count.A >= count.B ? 'A' : 'B';
// 更新图表
knnChart.data.datasets[2].data = [{x, y}];
// 添加距离线
const annotation = {
type: 'line',
mode: 'vertical',
scaleID: 'x',
value: x,
borderColor: 'rgba(100, 100, 100, 0.5)',
borderWidth: 1,
label: {
display: true,
content: `预测: ${prediction}`,
position: 'start'
}
};
// 显示预测结果
predictionElement.textContent = `预测结果:${prediction} (A: ${count.A}, B: ${count.B})`;
// 显示距离信息
distancesElement.textContent = `最近${k}个邻居的距离:${nearestNeighbors.map(n => n.distance.toFixed(2)).join(', ')}`;
// 更新图表
knnChart.update();
});
// 重置图表
resetButton.addEventListener('click', function() {
knnChart.data.datasets[2].data = [];
predictionElement.textContent = '预测结果:请添加测试点';
distancesElement.textContent = '距离信息:-';
knnChart.update();
});
});
</script>
</body>
</html>