# 1.9. 朴素贝叶斯(Naive Bayes)¶

$P(y \mid x_1, \dots, x_n) = \frac{P(y) P(x_1, \dots x_n \mid y)} {P(x_1, \dots, x_n)}$

$P(x_i | y, x_1, \dots, x_{i-1}, x_{i+1}, \dots, x_n) = P(x_i | y),$

$P(y \mid x_1, \dots, x_n) = \frac{P(y) \prod_{i=1}^{n} P(x_i \mid y)} {P(x_1, \dots, x_n)}$

\begin{align}\begin{aligned}P(y \mid x_1, \dots, x_n) \propto P(y) \prod_{i=1}^{n} P(x_i \mid y)\\\Downarrow\\\hat{y} = \arg\max_y P(y) \prod_{i=1}^{n} P(x_i \mid y),\end{aligned}\end{align}

## 1.9.1. 高斯朴素贝叶斯¶

GaussianNB 实现了运用于分类的高斯朴素贝叶斯算法。每个特征分量的似然函数，也就是类条件概率密度被假设为服从高斯分布:

$P(x_i \mid y) = \frac{1}{\sqrt{2\pi\sigma^2_y}} \exp\left(-\frac{(x_i - \mu_y)^2}{2\sigma^2_y}\right)$

>>> from sklearn import datasets
>>> iris = datasets.load_iris()
>>> from sklearn.naive_bayes import GaussianNB
>>> gnb = GaussianNB()
>>> y_pred = gnb.fit(iris.data, iris.target).predict(iris.data)
>>> print("Number of mislabeled points out of a total %d points : %d"
...       % (iris.data.shape[0],(iris.target != y_pred).sum()))
Number of mislabeled points out of a total 150 points : 6


## 1.9.2. 多项分布朴素贝叶斯¶

MultinomialNB 实现了服从多项分布数据的朴素贝叶斯算法，也是用于文本分类(这个领域中数据往往以词向量表示， 尽管在实践中 tf-idf 向量在预测时表现良好)的两大经典朴素贝叶斯算法之一。 每个类 $$y$$ 的分布由 $$\theta_y = (\theta_{y1},\ldots,\theta_{yn})$$ 向量进行参数化表示， 式中 $$n$$ 是特征的数量(对于文本分类，是词汇量的大小) 。 $$\theta_{yi}$$ 是特征 $$i$$ 出现在类 $$y$$ 的样本中的概率 $$P(x_i \mid y)$$ (译者注：其实就是类条件概率密度)。

$\hat{\theta}_{yi} = \frac{ N_{yi} + \alpha}{N_y + \alpha n}$

## 1.9.3. Complement Naive Bayes¶

ComplementNB implements the complement naive Bayes (CNB) algorithm. CNB is an adaptation of the standard multinomial naive Bayes (MNB) algorithm that is particularly suited for imbalanced data sets. Specifically, CNB uses statistics from the complement of each class to compute the model’s weights. The inventors of CNB show empirically that the parameter estimates for CNB are more stable than those for MNB. Further, CNB regularly outperforms MNB (often by a considerable margin) on text classification tasks. The procedure for calculating the weights is as follows:

\begin{align}\begin{aligned}\hat{\theta}_{ci} = \frac{\alpha_i + \sum_{j:y_j \neq c} d_{ij}} {\alpha + \sum_{j:y_j \neq c} \sum_{k} d_{kj}}\\w_{ci} = \log \hat{\theta}_{ci}\\w_{ci} = \frac{w_{ci}}{\sum_{j} |w_{cj}|}\end{aligned}\end{align}

where the summations are over all documents $$j$$ not in class $$c$$, $$d_{ij}$$ is either the count or tf-idf value of term $$i$$ in document $$j$$, $$\alpha_i$$ is a smoothing hyperparameter like that found in MNB, and $$\alpha = \sum_{i} \alpha_i$$. The second normalization addresses the tendency for longer documents to dominate parameter estimates in MNB. The classification rule is:

$\hat{c} = \arg\min_c \sum_{i} t_i w_{ci}$

i.e., a document is assigned to the class that is the poorest complement match.

References:

## 1.9.4. 伯努利朴素贝叶斯¶

BernoulliNB 实现了用于多变量伯努利分布(multivariate Bernoulli distributions)数据的朴素贝叶斯训练和分类算法， 即有多个特征，但每个特征都假设是一个二元 (Bernoulli, boolean) 变量。 因此，这类算法要求样本以二元化特征向量表示(binary-valued feature vectors)；如果样本含有其他类型的数据， 一个 BernoulliNB 类的实例会将其二值化(依赖于 binarize 参数)。

$P(x_i \mid y) = P(i \mid y) x_i + (1 - P(i \mid y)) (1 - x_i)$

## 1.9.5. 堆外朴素贝叶斯模型拟合¶

fit 方法不同，首次调用 partial_fit 方法需要传递一个所有期望的类标签的列表。

Note