K-Means算法的实现

K-Means算法是分类数据中最简单有效的算法。我在前面的博客里曾经写过

K-means

但其还是有两个明显的缺陷。一是K-Means必须保存全部数据集,如果训练数据集很大必须使用大量的存储空间,此外必须对每个数据都计算一遍距离,这会很费时间。
另一个缺陷在于它无法给出任何数据的基础结构信息。但是我会在后面的博客中写出解决之道

本文使用python以及Matlab分别该算法,以及在文章末尾简单实现了基于该算法实现的手写识别

源码及测试文件

More

Machine-Learning-8

Clustering

Unsupervised Learning: Introduction

Unsupervised learning is contrasted from supervised learning because it uses an unlabeled training set rather than a labeled one.

In other words, we don’t have the vector y of expected results, we only have a dataset of features where we can find structure.

More

Machine-Learning-7

SVM and Kernels

Optimization Objective

Ps:
向量机(Vector Machine):简称VM。向量算机的简称。

The Support Vector Machine (SVM) is yet another type of supervised machine learning algorithm. It is sometimes cleaner and more powerful.

Recall that in logistic regression, we use the following rules:

More

Machine-Learning-6

Advice for Applying Machine Learning

Deciding What to Try Next

Errors in your predictions can be troubleshooted by:

  • Getting more training examples
  • Trying smaller sets of features
  • Trying additional features
  • Trying polynomial features
  • Increasing or decreasing λ

More

Machine-Learning-4

Neural Networks

Model Representation I

Let’s examine how we will represent a hypothesis function using neural networks. At a very simple level, neurons are basically computational units that take inputs (dendrites) as electrical inputs (called “spikes”) that are channeled to outputs (axons). In our model, our dendrites are like the input features x1…xn and the output is the result of our hypothesis function. In this model our x0 input node is sometimes called the “bias unit.” It is always equal to 1. In neural networks, we use the same logistic function as in classification, 1/(1+eΘTx),yet we sometimes call it a sigmoid (logistic) activation function. In this situation, our “theta” parameters are sometimes called “weights”.

More