Quote from The Hours

To look life in the face, always, to look life in the face and to know it for what it is.

At last to know it, to love it for what it is, and then, to put it away.

—-Virginia Woolf

Advertisements

For self-motivation

冲风之衰不能起毛羽

强弩之末,力不能入鲁缟

—《汉书·韩安国传》

 

zt:特征选择常用算法综述

http://www.cnblogs.com/heaad/archive/2011/01/02/1924088.html

 

 

1 综述


(1) 什么是特征选择

特征选择 ( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS ) ,或属性选择( Attribute Selection ) ,是指从全部特征中选取一个特征子集,使构造出来的模型更好。

 

(2) 为什么要做特征选择

在机器学习的实际应用中,特征数量往往较多,其中可能存在不相关的特征,特征之间也可能存在相互依赖,容易导致如下的后果:

  • 特征个数越多,分析特征、训练模型所需的时间就越长。
  • 特征个数越多,容易引起“维度灾难”,模型也会越复杂,其推广能力会下降。

 

特征选择能剔除不相关(irrelevant)或亢余(redundant )的特征,从而达到减少特征个数,提高模型精确度,减少运行时间的目的。另一方面,选取出真正相关的特征简化了模型,使研究人员易于理解数据产生的过程。

 

 2 特征选择过程

 

2.1 特征选择的一般过程

 

特征选择的一般过程可用图1表示。首先从特征全集中产生出一个特征子集,然后用评价函数对该特征子集进行评价,评价的结果与停止准则进行比较,若评价结果比停止准则好就停止,否则就继续产生下一组特征子集,继续进行特征选择。选出来的特征子集一般还要验证其有效性。

综上所述,特征选择过程一般包括产生过程,评价函数,停止准则,验证过程,这4个部分。

 

  (1) 产生过程( Generation Procedure )

产生过程是搜索特征子集的过程,负责为评价函数提供特征子集。搜索特征子集的过程有多种,将在2.2小节展开介绍。

 

  (2) 评价函数( Evaluation Function )     

评价函数是评价一个特征子集好坏程度的一个准则。评价函数将在2.3小节展开介绍。

 

  (3) 停止准则( Stopping Criterion )

停止准则是与评价函数相关的,一般是一个阈值,当评价函数值达到这个阈值后就可停止搜索。

 

  (4) 验证过程( Validation Procedure )

在验证数据集上验证选出来的特征子集的有效性。

 

图1. 特征选择的过程 ( M. Dash and H. Liu 1997 )

2.2 产生过程

产生过程是搜索特征子空间的过程。搜索的算法分为完全搜索(Complete),启发式搜索(Heuristic),随机搜索(Random) 3大类,如图2所示。

 

 

 

 

图2. 产生过程算法分类 ( M. Dash and H. Liu 1997 )

       下面对常见的搜索算法进行简单介绍。

 

2.2.1完全搜索

  完全搜索分为穷举搜索(Exhaustive)与非穷举搜索(Non-Exhaustive)两类。

 

  (1) 广度优先搜索( Breadth First Search )

 

         算法描述:广度优先遍历特征子空间。

  算法评价:枚举了所有的特征组合,属于穷举搜索,时间复杂度是O(2n),实用性不高。

 

  (2)分支限界搜索( Branch and Bound )

 

         算法描述:在穷举搜索的基础上加入分支限界。例如:若断定某些分支不可能搜索出比当前找到的最优解更优的解,则可以剪掉这些分支。

 

  (3) 定向搜索 (Beam Search )

 

         算法描述:首先选择N个得分最高的特征作为特征子集,将其加入一个限制最大长度的优先队列,每次从队列中取出得分最高的子集,然后穷举向该子集加入1个特征后产生的所有特征集,将这些特征集加入队列。

 

  (4) 最优优先搜索 ( Best First Search )

 

         算法描述:与定向搜索类似,唯一的不同点是不限制优先队列的长度。

 

 

2.2.2 启发式搜索


 

  (1)序列前向选择( SFS , Sequential Forward Selection )

 

  算法描述:特征子集X从空集开始,每次选择一个特征x加入特征子集X,使得特征函数J( X)最优。简单说就是,每次都选择一个使得评价函数的取值达到最优的特征加入,其实就是一种简单的贪心算法。

 

  算法评价:缺点是只能加入特征而不能去除特征。例如:特征A完全依赖于特征B与C,可以认为如果加入了特征B与C则A就是多余的。假设序列前向选择算法首先将A加入特征集,然后又将B与C加入,那么特征子集中就包含了多余的特征A。

 

  (2)序列后向选择( SBS , Sequential Backward Selection )

 

  算法描述:从特征全集O开始,每次从特征集O中剔除一个特征x,使得剔除特征x后评价函数值达到最优。

 

  算法评价:序列后向选择与序列前向选择正好相反,它的缺点是特征只能去除不能加入。

 

  另外,SFS与SBS都属于贪心算法,容易陷入局部最优值。

 

 

  (3) 双向搜索( BDS , Bidirectional Search )

 

  算法描述:使用序列前向选择(SFS)从空集开始,同时使用序列后向选择(SBS)从全集开始搜索,当两者搜索到一个相同的特征子集C时停止搜索。

 

  双向搜索的出发点是  。如下图所示,O点代表搜索起点,A点代表搜索目标。灰色的圆代表单向搜索可能的搜索范围,绿色的2个圆表示某次双向搜索的搜索范围,容易证明绿色的面积必定要比灰色的要小。

 

图2. 双向搜索

 

 

  (4) LR选择算法 ( LRS , Plus-L Minus-R Selection )

  该算法有两种形式:

      <1> 算法从空集开始,每轮先加入L个特征,然后从中去除R个特征,使得评价函数值最优。( L > R )

    <2> 算法从全集开始,每轮先去除R个特征,然后加入L个特征,使得评价函数值最优。( L < R )

  算法评价:增L去R选择算法结合了序列前向选择与序列后向选择思想, L与R的选择是算法的关键。

  (5) 序列浮动选择( Sequential Floating Selection )

  算法描述:序列浮动选择由增L去R选择算法发展而来,该算法与增L去R选择算法的不同之处在于:序列浮动选择的L与R不是固定的,而是“浮动”的,也就是会变化的。

    序列浮动选择根据搜索方向的不同,有以下两种变种。

    <1>序列浮动前向选择( SFFS , Sequential Floating Forward Selection )

      算法描述:从空集开始,每轮在未选择的特征中选择一个子集x,使加入子集x后评价函数达到最优,然后在已选择的特征中选择子集z,使剔除子集z后评价函数达到最优。

    <2>序列浮动后向选择( SFBS , Sequential Floating Backward Selection )

      算法描述:与SFFS类似,不同之处在于SFBS是从全集开始,每轮先剔除特征,然后加入特征。

           算法评价:序列浮动选择结合了序列前向选择、序列后向选择、增L去R选择的特点,并弥补了它们的缺点。

  (6) 决策树( Decision Tree Method , DTM)

         算法描述:在训练样本集上运行C4.5或其他决策树生成算法,待决策树充分生长后,再在树上运行剪枝算法。则最终决策树各分支处的特征就是选出来的特征子集了。决策树方法一般使用信息增益作为评价函数。

2.2.3 随机算法


  (1) 随机产生序列选择算法(RGSS, Random Generation plus Sequential Selection)

  算法描述:随机产生一个特征子集,然后在该子集上执行SFS与SBS算法。

  算法评价:可作为SFS与SBS的补充,用于跳出局部最优值。

  (2) 模拟退火算法( SA, Simulated Annealing )

    模拟退火算法可参考 大白话解析模拟退火算法 。

 

算法评价:模拟退火一定程度克服了序列搜索算法容易陷入局部最优值的缺点,但是若最优解的区域太小(如所谓的“高尔夫球洞”地形),则模拟退火难以求解。


  (3) 遗传算法( GA,  Genetic Algorithms )

    遗传算法可参考 遗传算法入门 。

算法描述:首先随机产生一批特征子集,并用评价函数给这些特征子集评分,然后通过交叉、突变等操作繁殖出下一代的特征子集,并且评分越高的特征子集被选中参加繁殖的概率越高。这样经过N代的繁殖和优胜劣汰后,种群中就可能产生了评价函数值最高的特征子集。

随机算法的共同缺点:依赖于随机因素,有实验结果难以重现。

 

2.3 评价函数

评价函数的作用是评价产生过程所提供的特征子集的好坏。

评价函数根据其工作原理,主要分为筛选器(Filter)、封装器( Wrapper )两大类。

 

筛选器通过分析特征子集内部的特点来衡量其好坏。筛选器一般用作预处理,与分类器的选择无关。筛选器的原理如下图3:

 

 

 

图3. Filter原理(Ricardo Gutierrez-Osuna 2008 )

 

 

封装器实质上是一个分类器,封装器用选取的特征子集对样本集进行分类,分类的精度作为衡量特征子集好坏的标准。封装器的原理如图4所示。

 

图4. Wrapper原理 (Ricardo Gutierrez-Osuna 2008 )

 

 

下面简单介绍常见的评价函数。

 

 

  (1) 相关性( Correlation)

运用相关性来度量特征子集的好坏是基于这样一个假设:好的特征子集所包含的特征应该是与分类的相关度较高(相关度高),而特征之间相关度较低的(亢余度低)。

可以使用线性相关系数(correlation coefficient) 来衡量向量之间线性相关度。

 

 

  ( 2) 距离 (Distance Metrics )

         运用距离度量进行特征选择是基于这样的假设:好的特征子集应该使得属于同一类的样本距离尽可能小,属于不同类的样本之间的距离尽可能远。

常用的距离度量(相似性度量)包括欧氏距离、标准化欧氏距离、马氏距离等。

 

 

  (3) 信息增益( Information Gain )

假设存在离散变量Y,Y中的取值包括{y1,y2,….,ym} ,yi出现的概率为Pi。则Y的信息熵定义为:

 

 

    信息熵有如下特性:若集合Y的元素分布越“纯”,则其信息熵越小;若Y分布越“紊乱”,则其信息熵越大。在极端的情况下:若Y只能取一个值,即P1=1,则H(Y)取最小值0;反之若各种取值出现的概率都相等,即都是1/m,则H(Y)取最大值log2m。

         在附加条件另一个变量X,而且知道X=xi后,Y的条件信息熵(Conditional Entropy)表示为:

 

在加入条件X前后的Y的信息增益定义为

 

类似的,分类标记C的信息熵H( C )可表示为:

 

将特征Fj用于分类后的分类C的条件信息熵H( C | Fj )表示为:

 

选用特征Fj前后的C的信息熵的变化成为C的信息增益(Information Gain),用表示,公式为:

 

 

 

假设存在特征子集A和特征子集B,分类变量为C,若IG( C|A ) > IG( C|B ) ,则认为选用特征子集A的分类结果比B好,因此倾向于选用特征子集A。

 

 

 

  (4)一致性( Consistency )

若样本1与样本2属于不同的分类,但在特征A、 B上的取值完全一样,那么特征子集{A,B}不应该选作最终的特征集。

 

 

  (5)分类器错误率 (Classifier error rate )

使用特定的分类器,用给定的特征子集对样本集进行分类,用分类的精度来衡量特征子集的好坏。

以上5种度量方法中,相关性、距离、信息增益、一致性属于筛选器,而分类器错误率属于封装器。

筛选器由于与具体的分类算法无关,因此其在不同的分类算法之间的推广能力较强,而且计算量也较小。而封装器由于在评价的过程中应用了具体的分类算法进行分类,因此其推广到其他分类算法的效果可能较差,而且计算量也较大。

 

参考资料

 

[1] M. Dash, H. Liu, Feature Selection for Classification. In:Intelligent Data Analysis 1 (1997) 131–156.

 

[2]Lei Yu,Huan Liu, Feature Selection for High-Dimensional Data:A Fast Correlation-Based Filter Solution

 

[3] Ricardo Gutierrez-Osuna, Introduction to Pattern Analysis ( LECTURE 11: Sequential Feature Selection )

http://courses.cs.tamu.edu/rgutier/cpsc689_f08/l11.pdf

 

zt:机器学习中的相似性度量

http://www.cnblogs.com/heaad/archive/2011/03/08/1977733.html

在做分类时常常需要估算不同样本之间的相似性度量(Similarity Measurement),这时通常采用的方法就是计算样本间的“距离”(Distance)。采用什么样的方法计算距离是很讲究,甚至关系到分类的正确与否。

本文的目的就是对常用的相似性度量作一个总结。

 

本文目录:

1. 欧氏距离

2. 曼哈顿距离

3. 切比雪夫距离

4. 闵可夫斯基距离

5. 标准化欧氏距离

6. 马氏距离

7. 夹角余弦

8. 汉明距离

9. 杰卡德距离 & 杰卡德相似系数

10. 相关系数 & 相关距离

11. 信息熵

 

1. 欧氏距离(Euclidean Distance)

欧氏距离是最易于理解的一种距离计算方法,源自欧氏空间中两点间的距离公式。

(1)二维平面上两点a(x1,y1)与b(x2,y2)间的欧氏距离:

(2)三维空间两点a(x1,y1,z1)与b(x2,y2,z2)间的欧氏距离:

(3)两个n维向量a(x11,x12,…,x1n)与 b(x21,x22,…,x2n)间的欧氏距离:

也可以用表示成向量运算的形式:

(4)Matlab计算欧氏距离

Matlab计算距离主要使用pdist函数。若X是一个M×N的矩阵,则pdist(X)将X矩阵M行的每一行作为一个N维向量,然后计算这M个向量两两间的距离。

例子:计算向量(0,0)、(1,0)、(0,2)两两间的欧式距离

X = [0 0 ; 1 0 ; 0 2]

D = pdist(X,’euclidean’)

结果:

D =

1.0000    2.0000    2.2361

 

 

2. 曼哈顿距离(Manhattan Distance)

从名字就可以猜出这种距离的计算方法了。想象你在曼哈顿要从一个十字路口开车到另外一个十字路口,驾驶距离是两点间的直线距离吗?显然不是,除非你能穿越大楼。实际驾驶距离就是这个“曼哈顿距离”。而这也是曼哈顿距离名称的来源, 曼哈顿距离也称为城市街区距离(City Block distance)

(1)二维平面两点a(x1,y1)与b(x2,y2)间的曼哈顿距离

(2)两个n维向量a(x11,x12,…,x1n)与 b(x21,x22,…,x2n)间的曼哈顿距离

(3) Matlab计算曼哈顿距离

例子:计算向量(0,0)、(1,0)、(0,2)两两间的曼哈顿距离

X = [0 0 ; 1 0 ; 0 2]

D = pdist(X, ‘cityblock’)

结果:

D =

1     2     3

 

3. 切比雪夫距离 ( Chebyshev Distance )

国际象棋玩过么?国王走一步能够移动到相邻的8个方格中的任意一个。那么国王从格子(x1,y1)走到格子(x2,y2)最少需要多少步?自己走走试试。你会发现最少步数总是max( | x2-x1 | , | y2-y1 | ) 步 。有一种类似的一种距离度量方法叫切比雪夫距离。

(1)二维平面两点a(x1,y1)与b(x2,y2)间的切比雪夫距离

(2)两个n维向量a(x11,x12,…,x1n)与 b(x21,x22,…,x2n)间的切比雪夫距离

这个公式的另一种等价形式是

看不出两个公式是等价的?提示一下:试试用放缩法和夹逼法则来证明。

(3)Matlab计算切比雪夫距离

例子:计算向量(0,0)、(1,0)、(0,2)两两间的切比雪夫距离

X = [0 0 ; 1 0 ; 0 2]

D = pdist(X, ‘chebychev’)

结果:

D =

1     2     2

 

 

4. 闵可夫斯基距离(Minkowski Distance)

闵氏距离不是一种距离,而是一组距离的定义。

(1) 闵氏距离的定义

两个n维变量a(x11,x12,…,x1n)与 b(x21,x22,…,x2n)间的闵可夫斯基距离定义为:

其中p是一个变参数。

当p=1时,就是曼哈顿距离

当p=2时,就是欧氏距离

当p→∞时,就是切比雪夫距离

根据变参数的不同,闵氏距离可以表示一类的距离。

(2)闵氏距离的缺点

闵氏距离,包括曼哈顿距离、欧氏距离和切比雪夫距离都存在明显的缺点。

举个例子:二维样本(身高,体重),其中身高范围是150~190,体重范围是50~60,有三个样本:a(180,50),b(190,50),c(180,60)。那么a与b之间的闵氏距离(无论是曼哈顿距离、欧氏距离或切比雪夫距离)等于a与c之间的闵氏距离,但是身高的10cm真的等价于体重的10kg么?因此用闵氏距离来衡量这些样本间的相似度很有问题。

简单说来,闵氏距离的缺点主要有两个:(1)将各个分量的量纲(scale),也就是“单位”当作相同的看待了。(2)没有考虑各个分量的分布(期望,方差等)可能是不同的。

(3)Matlab计算闵氏距离

例子:计算向量(0,0)、(1,0)、(0,2)两两间的闵氏距离(以变参数为2的欧氏距离为例)

X = [0 0 ; 1 0 ; 0 2]

D = pdist(X,’minkowski’,2)

结果:

D =

1.0000    2.0000    2.2361

 

 

5. 标准化欧氏距离 (Standardized Euclidean distance )

(1)标准欧氏距离的定义

标准化欧氏距离是针对简单欧氏距离的缺点而作的一种改进方案。标准欧氏距离的思路:既然数据各维分量的分布不一样,好吧!那我先将各个分量都“标准化”到均值、方差相等吧。均值和方差标准化到多少呢?这里先复习点统计学知识吧,假设样本集X的均值(mean)为m,标准差(standard deviation)为s,那么X的“标准化变量”表示为:

而且标准化变量的数学期望为0,方差为1。因此样本集的标准化过程(standardization)用公式描述就是:

标准化后的值 =  ( 标准化前的值  - 分量的均值 ) /分量的标准差

经过简单的推导就可以得到两个n维向量a(x11,x12,…,x1n)与 b(x21,x22,…,x2n)间的标准化欧氏距离的公式:

如果将方差的倒数看成是一个权重,这个公式可以看成是一种加权欧氏距离(Weighted Euclidean distance)

(2)Matlab计算标准化欧氏距离

例子:计算向量(0,0)、(1,0)、(0,2)两两间的标准化欧氏距离 (假设两个分量的标准差分别为0.5和1)

X = [0 0 ; 1 0 ; 0 2]

D = pdist(X, ‘seuclidean’,[0.5,1])

结果:

D =

2.0000    2.0000    2.8284

 


6. 马氏距离(Mahalanobis Distance)

(1)马氏距离定义

有M个样本向量X1~Xm,协方差矩阵记为S,均值记为向量μ,则其中样本向量X到u的马氏距离表示为:

 

而其中向量Xi与Xj之间的马氏距离定义为:

若协方差矩阵是单位矩阵(各个样本向量之间独立同分布),则公式就成了:

也就是欧氏距离了。

若协方差矩阵是对角矩阵,公式变成了标准化欧氏距离。

(2)马氏距离的优缺点:量纲无关,排除变量之间的相关性的干扰。

(3) Matlab计算(1 2),( 1 3),( 2 2),( 3 1)两两之间的马氏距离

X = [1 2; 1 3; 2 2; 3 1]

Y = pdist(X,’mahalanobis’)

结果:

Y =

2.3452    2.0000    2.3452    1.2247    2.4495    1.2247

 

 

7. 夹角余弦(Cosine)

有没有搞错,又不是学几何,怎么扯到夹角余弦了?各位看官稍安勿躁。几何中夹角余弦可用来衡量两个向量方向的差异,机器学习中借用这一概念来衡量样本向量之间的差异。

(1)在二维空间中向量A(x1,y1)与向量B(x2,y2)的夹角余弦公式:

(2) 两个n维样本点a(x11,x12,…,x1n)和b(x21,x22,…,x2n)的夹角余弦

类似的,对于两个n维样本点a(x11,x12,…,x1n)和b(x21,x22,…,x2n),可以使用类似于夹角余弦的概念来衡量它们间的相似程度。

即:

夹角余弦取值范围为[-1,1]。夹角余弦越大表示两个向量的夹角越小,夹角余弦越小表示两向量的夹角越大。当两个向量的方向重合时夹角余弦取最大值1,当两个向量的方向完全相反夹角余弦取最小值-1。

夹角余弦的具体应用可以参阅参考文献[1]。

(3)Matlab计算夹角余弦

例子:计算(1,0)、( 1,1.732)、( -1,0)两两间的夹角余弦

X = [1 0 ; 1 1.732 ; -1 0]

D = 1- pdist(X, ‘cosine’)  % Matlab中的pdist(X, ‘cosine’)得到的是1减夹角余弦的值

结果:

D =

0.5000   -1.0000   -0.5000

 

 

8. 汉明距离(Hamming distance)

(1)汉明距离的定义

两个等长字符串s1与s2之间的汉明距离定义为将其中一个变为另外一个所需要作的最小替换次数。例如字符串“1111”与“1001”之间的汉明距离为2。

应用:信息编码(为了增强容错性,应使得编码间的最小汉明距离尽可能大)。

(2)Matlab计算汉明距离

Matlab中2个向量之间的汉明距离的定义为2个向量不同的分量所占的百分比。

例子:计算向量(0,0)、(1,0)、(0,2)两两间的汉明距离

X = [0 0 ; 1 0 ; 0 2];

D = PDIST(X, ‘hamming’)

结果:

D =

0.5000    0.5000    1.0000

 

 

9. 杰卡德相似系数(Jaccard similarity coefficient)

(1) 杰卡德相似系数

两个集合A和B的交集元素在A,B的并集中所占的比例,称为两个集合的杰卡德相似系数,用符号J(A,B)表示。

杰卡德相似系数是衡量两个集合的相似度一种指标。

(2) 杰卡德距离

与杰卡德相似系数相反的概念是杰卡德距离(Jaccard distance)。杰卡德距离可用如下公式表示:

杰卡德距离用两个集合中不同元素占所有元素的比例来衡量两个集合的区分度。

(3) 杰卡德相似系数与杰卡德距离的应用

可将杰卡德相似系数用在衡量样本的相似度上。

样本A与样本B是两个n维向量,而且所有维度的取值都是0或1。例如:A(0111)和B(1011)。我们将样本看成是一个集合,1表示集合包含该元素,0表示集合不包含该元素。

p :样本A与B都是1的维度的个数

q :样本A是1,样本B是0的维度的个数

r :样本A是0,样本B是1的维度的个数

s :样本A与B都是0的维度的个数

 

那么样本A与B的杰卡德相似系数可以表示为:

这里p+q+r可理解为A与B的并集的元素个数,而p是A与B的交集的元素个数。

而样本A与B的杰卡德距离表示为:

(4)Matlab 计算杰卡德距离

Matlab的pdist函数定义的杰卡德距离跟我这里的定义有一些差别,Matlab中将其定义为不同的维度的个数占“非全零维度”的比例。

例子:计算(1,1,0)、(1,-1,0)、(-1,1,0)两两之间的杰卡德距离

X = [1 1 0; 1 -1 0; -1 1 0]

D = pdist( X , ‘jaccard’)

结果

D =

0.5000    0.5000    1.0000

 

 

10. 相关系数 ( Correlation coefficient )与相关距离(Correlation distance)

(1) 相关系数的定义

相关系数是衡量随机变量X与Y相关程度的一种方法,相关系数的取值范围是[-1,1]。相关系数的绝对值越大,则表明X与Y相关度越高。当X与Y线性相关时,相关系数取值为1(正线性相关)或-1(负线性相关)。

(2)相关距离的定义

(3)Matlab计算(1, 2 ,3 ,4 )与( 3 ,8 ,7 ,6 )之间的相关系数与相关距离

X = [1 2 3 4 ; 3 8 7 6]

C = corrcoef( X’ )   %将返回相关系数矩阵

D = pdist( X , ‘correlation’)

结果:

C =

1.0000    0.4781

0.4781    1.0000

D =

0.5219

其中0.4781就是相关系数,0.5219是相关距离。

 

11. 信息熵(Information Entropy)

信息熵并不属于一种相似性度量。那为什么放在这篇文章中啊?这个。。。我也不知道。 (╯▽╰)

信息熵是衡量分布的混乱程度或分散程度的一种度量。分布越分散(或者说分布越平均),信息熵就越大。分布越有序(或者说分布越集中),信息熵就越小。

计算给定的样本集X的信息熵的公式:

参数的含义:

n:样本集X的分类数

pi:X中第i类元素出现的概率

信息熵越大表明样本集S分类越分散,信息熵越小则表明样本集X分类越集中。。当S中n个分类出现的概率一样大时(都是1/n),信息熵取最大值log2(n)。当X只有一个分类时,信息熵取最小值0

 

参考资料: 

[1]吴军. 数学之美 系列 12 – 余弦定理和新闻的分类.

http://www.google.com.hk/ggblog/googlechinablog/2006/07/12_4010.html

[2] Wikipedia. Jaccard index.

http://en.wikipedia.org/wiki/Jaccard_index

[3] Wikipedia. Hamming distance

http://en.wikipedia.org/wiki/Hamming_distance

[4] 求马氏距离(Mahalanobis distance )matlab版

http://junjun0595.blog.163.com/blog/static/969561420100633351210/

[5] Pearson product-moment correlation coefficient

http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient

Excerpt from unfinished Reading List (The universe-unsolved!)

Excerpt — Chapter 7: Are we living in a programmed reality

Four major categories of evidence:

1. Quantization

2. The improbability of the world timeline

3. The tuning of our reality

4. Anomalous occurrences

IF…By Rudyard Kipling

IF you can keep your head when all about you
Are losing theirs and blaming it on you,
If you can trust yourself when all men doubt you,
But make allowance for their doubting too;
If you can wait and not be tired by waiting,
Or being lied about, don’t deal in lies,
Or being hated, don’t give way to hating,
And yet don’t look too good, nor talk too wise:

If you can dream – and not make dreams your master;
If you can think – and not make thoughts your aim;
If you can meet with Triumph and Disaster
And treat those two impostors just the same;
If you can bear to hear the truth you’ve spoken
Twisted by knaves to make a trap for fools,
Or watch the things you gave your life to, broken,
And stoop and build ’em up with worn-out tools:

If you can make one heap of all your winnings
And risk it on one turn of pitch-and-toss,
And lose, and start again at your beginnings
And never breathe a word about your loss;
If you can force your heart and nerve and sinew
To serve your turn long after they are gone,
And so hold on when there is nothing in you
Except the Will which says to them: ‘Hold on!’

If you can talk with crowds and keep your virtue,
‘ Or walk with Kings – nor lose the common touch,
if neither foes nor loving friends can hurt you,
If all men count with you, but none too much;
If you can fill the unforgiving minute
With sixty seconds’ worth of distance run,
Yours is the Earth and everything that’s in it,
And – which is more – you’ll be a Man, my son!

Paper Tigers

Paper Tigers.

Forward from nymag.com.

 

线程管理

http://www.cnblogs.com/huxi/archive/2010/06/26/1765808.html

1. 线程基础

1.1. 线程状态

线程有5种状态,状态转换的过程如下图所示:

thread_stat_simple

1.2. 线程同步(锁)

多线程的优势在于可以同时运行多个任务(至少感觉起来是这样)。但是当线程需要共享数据时,可能存在数据不同步的问题。考虑这样一种情况:一个列表里所有元素都是0,线程”set”从后向前把所有元素改成1,而线程”print”负责从前往后读取列表并打印。那么,可能线程”set”开始改的时候,线程”print”便来打印列表了,输出就成了一半0一半1,这就是数据的不同步。为了避免这种情况,引入了锁的概念。

锁有两种状态——锁定和未锁定。每当一个线程比如”set”要访问共享数据时,必须先获得锁定;如果已经有别的线程比如”print”获得锁定了,那么就让线程”set”暂停,也就是同步阻塞;等到线程”print”访问完毕,释放锁以后,再让线程”set”继续。经过这样的处理,打印列表时要么全部输出0,要么全部输出1,不会再出现一半0一半1的尴尬场面。

线程与锁的交互如下图所示:

thread_lock

1.3. 线程通信(条件变量)

然而还有另外一种尴尬的情况:列表并不是一开始就有的;而是通过线程”create”创建的。如果”set”或者”print” 在”create”还没有运行的时候就访问列表,将会出现一个异常。使用锁可以解决这个问题,但是”set”和”print”将需要一个无限循环——他们不知道”create”什么时候会运行,让”create”在运行后通知”set”和”print”显然是一个更好的解决方案。于是,引入了条件变量。

条件变量允许线程比如”set”和”print”在条件不满足的时候(列表为None时)等待,等到条件满足的时候(列表已经创建)发出一个通知,告诉”set” 和”print”条件已经有了,你们该起床干活了;然后”set”和”print”才继续运行。

线程与条件变量的交互如下图所示:

thread_condition_wait

thread_condition_notify

1.4. 线程运行和阻塞的状态转换

最后看看线程运行和阻塞状态的转换。

thread_stat

阻塞有三种情况:
同步阻塞是指处于竞争锁定的状态,线程请求锁定时将进入这个状态,一旦成功获得锁定又恢复到运行状态;
等待阻塞是指等待其他线程通知的状态,线程获得条件锁定后,调用“等待”将进入这个状态,一旦其他线程发出通知,线程将进入同步阻塞状态,再次竞争条件锁定;
而其他阻塞是指调用time.sleep()、anotherthread.join()或等待IO时的阻塞,这个状态下线程不会释放已获得的锁定。

tips: 如果能理解这些内容,接下来的主题将是非常轻松的;并且,这些内容在大部分流行的编程语言里都是一样的。(意思就是非看懂不可 >_< 嫌作者水平低找别人的教程也要看懂)

2. thread

Python通过两个标准库thread和threading提供对线程的支持。thread提供了低级别的、原始的线程以及一个简单的锁。

01 # encoding: UTF-8
02 import thread
03 import time
04
05 # 一个用于在线程中执行的函数
06 def func():
07     for in range(5):
08         print 'func'
09         time.sleep(1)
10    
11     # 结束当前线程
12     # 这个方法与thread.exit_thread()等价
13     thread.exit() # 当func返回时,线程同样会结束
14        
15 # 启动一个线程,线程立即开始运行
16 # 这个方法与thread.start_new_thread()等价
17 # 第一个参数是方法,第二个参数是方法的参数
18 thread.start_new(func, ()) # 方法没有参数时需要传入空tuple
19
20 # 创建一个锁(LockType,不能直接实例化)
21 # 这个方法与thread.allocate_lock()等价
22 lock = thread.allocate()
23
24 # 判断锁是锁定状态还是释放状态
25 print lock.locked()
26
27 # 锁通常用于控制对共享资源的访问
28 count = 0
29
30 # 获得锁,成功获得锁定后返回True
31 # 可选的timeout参数不填时将一直阻塞直到获得锁定
32 # 否则超时后将返回False
33 if lock.acquire():
34     count += 1
35    
36     # 释放锁
37     lock.release()
38
39 # thread模块提供的线程都将在主线程结束后同时结束
40 time.sleep(6)

thread 模块提供的其他方法:
thread.interrupt_main(): 在其他线程中终止主线程。
thread.get_ident(): 获得一个代表当前线程的魔法数字,常用于从一个字典中获得线程相关的数据。这个数字本身没有任何含义,并且当线程结束后会被新线程复用。

thread还提供了一个ThreadLocal类用于管理线程相关的数据,名为 thread._local,threading中引用了这个类。

由于thread提供的线程功能不多,无法在主线程结束后继续运行,不提供条件变量等等原因,一般不使用thread模块,这里就不多介绍了。

3. threading

threading基于Java的线程模型设计。锁(Lock)和条件变量(Condition)在Java中是对象的基本行为(每一个对象都自带了锁和条件变量),而在Python中则是独立的对象。Python Thread提供了Java Thread的行为的子集;没有优先级、线程组,线程也不能被停止、暂停、恢复、中断。Java Thread中的部分被Python实现了的静态方法在threading中以模块方法的形式提供。

threading 模块提供的常用方法:
threading.currentThread(): 返回当前的线程变量。
threading.enumerate(): 返回一个包含正在运行的线程的list。正在运行指线程启动后、结束前,不包括启动前和终止后的线程。
threading.activeCount(): 返回正在运行的线程数量,与len(threading.enumerate())有相同的结果。

threading模块提供的类:
Thread, Lock, Rlock, Condition, [Bounded]Semaphore, Event, Timer, local.

3.1. Thread

Thread是线程类,与Java类似,有两种使用方法,直接传入要运行的方法或从Thread继承并覆盖run():

01 # encoding: UTF-8
02 import threading
03
04 # 方法1:将要执行的方法作为参数传给Thread的构造方法
05 def func():
06     print 'func() passed to Thread'
07
08 = threading.Thread(target=func)
09 t.start()
10
11 # 方法2:从Thread继承,并重写run()
12 class MyThread(threading.Thread):
13     def run(self):
14         print 'MyThread extended from Thread'
15
16 = MyThread()
17 t.start()

构造方法:
Thread(group=None, target=None, name=None, args=(), kwargs={})
group: 线程组,目前还没有实现,库引用中提示必须是None;
target: 要执行的方法;
name: 线程名;
args/kwargs: 要传入方法的参数。

实例方法:
isAlive(): 返回线程是否在运行。正在运行指启动后、终止前。
get/setName(name): 获取/设置线程名。
is/setDaemon(bool): 获取/设置是否守护线程。初始值从创建该线程的线程继承。当没有非守护线程仍在运行时,程序将终止。
start(): 启动线程。
join([timeout]): 阻塞当前上下文环境的线程,直到调用此方法的线程终止或到达指定的timeout(可选参数)。

一个使用join()的例子:

01 # encoding: UTF-8
02 import threading
03 import time
04
05 def context(tJoin):
06     print 'in threadContext.'
07     tJoin.start()
08    
09     # 将阻塞tContext直到threadJoin终止。
10     tJoin.join()
11    
12     # tJoin终止后继续执行。
13     print 'out threadContext.'
14
15 def join():
16     print 'in threadJoin.'
17     time.sleep(1)
18     print 'out threadJoin.'
19
20 tJoin = threading.Thread(target=join)
21 tContext = threading.Thread(target=context, args=(tJoin,))
22
23 tContext.start()

运行结果:

in threadContext.
in threadJoin.
out threadJoin.
out threadContext.

3.2. Lock

Lock(指令锁)是可用的最低级的同步指令。Lock处于锁定状态时,不被特定的线程拥有。Lock包含两种状态——锁定和非锁定,以及两个基本的方法。

可以认为Lock有一个锁定池,当线程请求锁定时,将线程至于池中,直到获得锁定后出池。池中的线程处于状态图中的同步阻塞状态。

构造方法:
Lock()

实例方法:
acquire([timeout]): 使线程进入同步阻塞状态,尝试获得锁定。
release(): 释放锁。使用前线程必须已获得锁定,否则将抛出异常。

01 # encoding: UTF-8
02 import threading
03 import time
04
05 data = 0
06 lock = threading.Lock()
07
08 def func():
09     global data
10     print '%s acquire lock...' % threading.currentThread().getName()
11    
12     # 调用acquire([timeout])时,线程将一直阻塞,
13     # 直到获得锁定或者直到timeout秒后(timeout参数可选)。
14     # 返回是否获得锁。
15     if lock.acquire():
16         print '%s get the lock.' % threading.currentThread().getName()
17         data += 1
18         time.sleep(2)
19         print '%s release lock...' % threading.currentThread().getName()
20        
21         # 调用release()将释放锁。
22         lock.release()
23
24 t1 = threading.Thread(target=func)
25 t2 = threading.Thread(target=func)
26 t3 = threading.Thread(target=func)
27 t1.start()
28 t2.start()
29 t3.start()

3.3. RLock

RLock(可重入锁)是一个可以被同一个线程请求多次的同步指令。RLock使用了“拥有的线程”和“递归等级”的概念,处于锁定状态时,RLock被某个线程拥有。拥有RLock的线程可以再次调用acquire(),释放锁时需要调用release()相同次数。

可以认为RLock包含一个锁定池和一个初始值为0的计数器,每次成功调用 acquire()/release(),计数器将+1/-1,为0时锁处于未锁定状态。

构造方法:
RLock()

实例方法:
acquire([timeout])/release(): 跟Lock差不多。

01 # encoding: UTF-8
02 import threading
03 import time
04
05 rlock = threading.RLock()
06
07 def func():
08     # 第一次请求锁定
09     print '%s acquire lock...' % threading.currentThread().getName()
10     if rlock.acquire():
11         print '%s get the lock.' % threading.currentThread().getName()
12         time.sleep(2)
13        
14         # 第二次请求锁定
15         print '%s acquire lock again...' % threading.currentThread().getName()
16         if rlock.acquire():
17             print '%s get the lock.' % threading.currentThread().getName()
18             time.sleep(2)
19        
20         # 第一次释放锁
21         print '%s release lock...' % threading.currentThread().getName()
22         rlock.release()
23         time.sleep(2)
24        
25         # 第二次释放锁
26         print '%s release lock...' % threading.currentThread().getName()
27         rlock.release()
28
29 t1 = threading.Thread(target=func)
30 t2 = threading.Thread(target=func)
31 t3 = threading.Thread(target=func)
32 t1.start()
33 t2.start()
34 t3.start()

3.4. Condition

Condition(条件变量)通常与一个锁关联。需要在多个Contidion中共享一个锁时,可以传递一个Lock/RLock实例给构造方法,否则它将自己生成一个RLock实例。

可以认为,除了Lock带有的锁定池外,Condition还包含一个等待池,池中的线程处于状态图中的等待阻塞状态,直到另一个线程调用notify()/notifyAll()通知;得到通知后线程进入锁定池等待锁定。

构造方法:
Condition([lock/rlock])

实例方法:
acquire([timeout])/release(): 调用关联的锁的相应方法。
wait([timeout]): 调用这个方法将使线程进入Condition的等待池等待通知,并释放锁。使用前线程必须已获得锁定,否则将抛出异常。
notify(): 调用这个方法将从等待池挑选一个线程并通知,收到通知的线程将自动调用acquire()尝试获得锁定(进入锁定池);其他线程仍然在等待池中。调用这个方法不会释放锁定。使用前线程必须已获得锁定,否则将抛出异常。
notifyAll(): 调用这个方法将通知等待池中所有的线程,这些线程都将进入锁定池尝试获得锁定。调用这个方法不会释放锁定。使用前线程必须已获得锁定,否则将抛出异常。

例子是很常见的生产者/消费者模式:

01 # encoding: UTF-8
02 import threading
03 import time
04
05 # 商品
06 product = None
07 # 条件变量
08 con = threading.Condition()
09
10 # 生产者方法
11 def produce():
12     global product
13    
14     if con.acquire():
15         while True:
16             if product is None:
17                 print 'produce...'
18                 product = 'anything'
19                
20                 # 通知消费者,商品已经生产
21                 con.notify()
22            
23             # 等待通知
24             con.wait()
25             time.sleep(2)
26
27 # 消费者方法
28 def consume():
29     global product
30    
31     if con.acquire():
32         while True:
33             if product is not None:
34                 print 'consume...'
35                 product = None
36                
37                 # 通知生产者,商品已经没了
38                 con.notify()
39            
40             # 等待通知
41             con.wait()
42             time.sleep(2)
43
44 t1 = threading.Thread(target=produce)
45 t2 = threading.Thread(target=consume)
46 t2.start()
47 t1.start()

3.5. Semaphore/BoundedSemaphore

Semaphore(信号量)是计算机科学史上最古老的同步指令之一。Semaphore管理一个内置的计数器,每当调用acquire()时-1,调用release() 时+1。计数器不能小于0;当计数器为0时,acquire()将阻塞线程至同步锁定状态,直到其他线程调用release()。

基于这个特点,Semaphore经常用来同步一些有“访客上限”的对象,比如连接池。

BoundedSemaphore 与Semaphore的唯一区别在于前者将在调用release()时检查计数器的值是否超过了计数器的初始值,如果超过了将抛出一个异常。

构造方法:
Semaphore(value=1): value是计数器的初始值。

实例方法:
acquire([timeout]): 请求Semaphore。如果计数器为0,将阻塞线程至同步阻塞状态;否则将计数器-1并立即返回。
release(): 释放Semaphore,将计数器+1,如果使用BoundedSemaphore,还将进行释放次数检查。release()方法不检查线程是否已获得 Semaphore。

01 # encoding: UTF-8
02 import threading
03 import time
04
05 # 计数器初值为2
06 semaphore = threading.Semaphore(2)
07
08 def func():
09    
10     # 请求Semaphore,成功后计数器-1;计数器为0时阻塞
11     print '%s acquire semaphore...' % threading.currentThread().getName()
12     if semaphore.acquire():
13        
14         print '%s get semaphore' % threading.currentThread().getName()
15         time.sleep(4)
16        
17         # 释放Semaphore,计数器+1
18         print '%s release semaphore' % threading.currentThread().getName()
19         semaphore.release()
20
21 t1 = threading.Thread(target=func)
22 t2 = threading.Thread(target=func)
23 t3 = threading.Thread(target=func)
24 t4 = threading.Thread(target=func)
25 t1.start()
26 t2.start()
27 t3.start()
28 t4.start()
29
30 time.sleep(2)
31
32 # 没有获得semaphore的主线程也可以调用release
33 # 若使用BoundedSemaphore,t4释放semaphore时将抛出异常
34 print 'MainThread release semaphore without acquire'
35 semaphore.release()

3.6. Event

Event(事件)是最简单的线程通信机制之一:一个线程通知事件,其他线程等待事件。Event内置了一个初始为False的标志,当调用set()时设为True,调用clear()时重置为 False。wait()将阻塞线程至等待阻塞状态。

Event其实就是一个简化版的 Condition。Event没有锁,无法使线程进入同步阻塞状态。

构造方法:
Event()

实例方法:
isSet(): 当内置标志为True时返回True。
set(): 将标志设为True,并通知所有处于等待阻塞状态的线程恢复运行状态。
clear(): 将标志设为False。
wait([timeout]): 如果标志为True将立即返回,否则阻塞线程至等待阻塞状态,等待其他线程调用set()。

01 # encoding: UTF-8
02 import threading
03 import time
04
05 event = threading.Event()
06
07 def func():
08     # 等待事件,进入等待阻塞状态
09     print '%s wait for event...' % threading.currentThread().getName()
10     event.wait()
11    
12     # 收到事件后进入运行状态
13     print '%s recv event.' % threading.currentThread().getName()
14
15 t1 = threading.Thread(target=func)
16 t2 = threading.Thread(target=func)
17 t1.start()
18 t2.start()
19
20 time.sleep(2)
21
22 # 发送事件通知
23 print 'MainThread set event.'
24 event.set()

3.7. Timer

Timer(定时器)是Thread的派生类,用于在指定时间后调用一个方法。

构造方法:
Timer(interval, function, args=[], kwargs={})
interval: 指定的时间
function: 要执行的方法
args/kwargs: 方法的参数

实例方法:
Timer从Thread派生,没有增加实例方法。

1 # encoding: UTF-8
2 import threading
3
4 def func():
5     print 'hello timer!'
6
7 timer = threading.Timer(5, func)
8 timer.start()

3.8. local

local是一个小写字母开头的类,用于管理 thread-local(线程局部的)数据。对于同一个local,线程无法访问其他线程设置的属性;线程设置的属性不会被其他线程设置的同名属性替换。

可以把local看成是一个“线程-属性字典”的字典,local封装了从自身使用线程作为 key检索对应的属性字典、再使用属性名作为key检索属性值的细节。

01 # encoding: UTF-8
02 import threading
03
04 local = threading.local()
05 local.tname = 'main'
06
07 def func():
08     local.tname = 'notmain'
09     print local.tname
10
11 t1 = threading.Thread(target=func)
12 t1.start()
13 t1.join()
14
15 print local.tname

 

熟练掌握Thread、Lock、Condition就可以应对绝大多数需要使用线程的场合,某些情况下local也是非常有用的东西。本文的最后使用这几个类展示线程基础中提到的场景:

01 # encoding: UTF-8
02 import threading
03
04 alist = None
05 condition = threading.Condition()
06
07 def doSet():
08     if condition.acquire():
09         while alist is None:
10             condition.wait()
11         for in range(len(alist))[::-1]:
12             alist[i] = 1
13         condition.release()
14
15 def doPrint():
16     if condition.acquire():
17         while alist is None:
18             condition.wait()
19         for in alist:
20             print i,
21         print
22         condition.release()
23
24 def doCreate():
25     global alist
26     if condition.acquire():
27         if alist is None:
28             alist = [0 for in range(10)]
29             condition.notifyAll()
30         condition.release()
31
32 tset = threading.Thread(target=doSet,name='tset')
33 tprint = threading.Thread(target=doPrint,name='tprint')
34 tcreate = threading.Thread(target=doCreate,name='tcreate')
35 tset.start()
36 tprint.start()
37 tcreate.start()

Semi-annual Report

I couldn’t remember when is the last time I’ve linked to this blog. I miss space’s simple and clear style, yet you can still see the background when typing in, whereas WordPress is a pure editing tool. Needless to mention, I doubt I would just concentrate on the writing while distracted by the many links around my editing page. Even though the beginning of 2010 seems like yesterday to me, I still have to browse my blogs in the first half of the year to gather my memories. So I think writing blogs or journals is a necessity when one grows old.

I finally managed to come back home this summer. I wondered if my parents would still recognize me. Mom’s first few words after her inspection of me were:” you looked like you’ve been drinking a lot of coffee.” How on earth did she know! She indeed looked much younger. Dad looked aged more than her, yet still energetic, as he usually was. Not before long I was told grandpa was ill in hospital. I knew that my uncle’s family hadn’t been taking care of him, even if grandpa was such a proud person who wouldn’t want to be a burden to others. But leaving him in pain at home while busy with their own ‘business’, I don’t think I’ll ever forgive them for that– actually, mark this. Not only because they treated my grandpa 50 times worse as he treated him, but also because they’ve made such a strong man so sad.  Grandpa suddenly turned to such a pessimistic person that I barely knew him. He told me this is fate, everything is determined by one’s personality. I told him one could still change his living condition if he tries, change his personality if he wishes. But yeah, grandpa is too nice/stubborn not to care about others but himself. He still took care of my cousin when he got back from hospital. At their living room, I lost my temper and ordered my cousin to buy fruits for grandpa instead of fixing himself in front of his computer. Sigh, he was such a sweet little boy 12 years ago, what has changed him? Mom and Dad could only cooked some nutrient meal and brought to grandpa with me worrying if he would eat them all or just save it for my cousin. Every time I watched him eating, wishing he could turn back into that old man with fast and steady pace, loud optimistic laughters, and a careless impish smile like a child, who gardens in his square of whatever wild grass he collected from who knows where, who collects we call junks but door bells, components in the radio or his web-like electronic workstations. But my wishes hadn’t realized when I left home again, even that my visa to the USA was delayed which granted me more days at home.

In the first few days at home, I got very grumpy and judged everything around me. I felt Mom was actually picturing me as those bad-mannered vulgar people returned from oversea. I couldn’t defend myself.  Yeah, I hate everyone drives crazy on the road as if the lines are invisible, I hate the drivers seldom stop for pedestrians around the zebra-crossing lines that I learned to put myself in front of those cars bravely so I get to cross the road, I hate my parents regulate my coffee in the morning, I hate my elder relatives keep on asking me eating more and I will then really eat a lot to please them, I hate the most while a golden retriever with its master missing was hit on the bridge with a group of people gathered discussing the news while nobody moves. Of course, I also hate to see the rich gets richer while the poor gets poorer. But while I was in Chengdu fussing for my visa appointment, an kind woman gave me a good lesson. While I was in the train station registering my luggage pushed around the crowds, I got upset and decided to put my luggage on the top of the shelf.  The woman in charge of the registration said, ” Little sister (and Sichuan is the only province where I don’t feel offended being called sister), you can put it at the lower layer so it’s easy to get out later.” Being trained so skeptical about stranger’s nice motive, I blurted without thinking:” No, you just want me to put it there because you are worried the shelves will be occupied soon. I can put it wherever I want!” She didn’t get angry at the rude me, instead, she smiled and said:” Sister, you can put it anywhere, and it doesn’t  matter where you put it, as long as you feel right with a good mood when you do things.” I felt so ashamed. She reminded me most of my nice compatriots I’ve known before, no matter if they suddenly became so cheeky calling strangers ‘handsome’ or ‘beauty’ on the streets or shops in order to get a good deal, no matter if they are struggling for survival and house dreams while becoming a bit cynical and snobbish. Deep down they are still the same people who believe that kindness will be rewarded in the world. At least I hope they are.

At home, I found myself not guilty for taking time off just enjoying pure thinking or idling. In the morning, I read ‘useless’ philosophical books while sipping my limited amount of coffee. At lunch, I brought my grandpa meal hoping he would eat more under my order. In the afternoon, I jogged to the valley I used to hang around in summer when I was a little girl, it is a busy bridge now. I stayed on the bridge watching the trains coming from afar in the mountains one by one, just like the first time grandpa has showed me that long long huge black dragon. At night, the trains looks much more beautiful with lights brightly illuminated inside each carriage,where I know too well, with people talking and playing cards with each other, friends or strangers. Each time I walked with Mom on this bridge, I’d unconsciously lingered longer towards the trains direction. Mom understandingly asked me if i was watching the trains. Yes, she always knew me, I’m still the little girl of hers. In the evening, I went to learn badminton from my teacher and practice. I tried really hard for Dad’s sarcastic words for motivation. And at the end of the day, oops, I seemed to forgot my work again. Days like this didn’t last long, maybe it started too late as I settled down from visa appointments. Soon I got my visa and jumped on the airplane bringing me back to the USA.

I started to pick up my work as soon I came back — why is America such a place that makes one only want to work? Jet lag, the heat, sudden change to abysmal food, I became helplessly sick after a pointless business trip to VT. In my two days lying in bed contemplation, I decided to slow down (though I’m slow enough) in future before I kill myself. So I went on working on a project for pure interests when I recovered. And when my advisor asked me about the demo for the conference 2 weeks later, I was so relaxed and still thinking it a very far deadline. This attitude obviously pissed everyone off, though they don’t need to provide anything for the demo nor suggestion. But I wondered if I wasn’t so relaxed, could I come up with practical ideas to solve the practical issues in the demo, such as, a tennis grip and a elastic Velcro method to secure the mounting instead of whatever junk they were using or complicated algorithm they were trying to come up.

In San Diego,  I met with my old friends with their cute little girl. She’s growing more and more beautiful and understanding now. A perfect kid, sweet and smart, makes one wants to protect and love her naturally. I wish I could visit them this year again. Sister Yuanyan’s new home is located in San Diego now, yet I haven’t got any chance to see her little baby girl. I think that means another trip to San Diego in 2011 for sure.  The conference has a 8:00am – 8:00pm schedule with refreshments served time to time, and also, some big names’ big talks. The guy from FDA actually scares me, instead of talking about regulation rules that everyone cares, he spent one hour showing his too reduced slides–by which I mean, black background with one big white word in Helvetica talking about ‘future, innovation, thinking ahead…’, is that the department I should trust to examine food and pills I take?  In the demo time, I became an adept salesperson, running around introducing our demo. By the way, the demo wasn’t the best demo though it was definitely the coolest and most popular one. My dear lab-mates are great helpers too, except I don’t get why they can keep on drinking at the hotel bar after returning from outside bars. One night with them in a Mexican bar, I’ve had the cheesiest cheese fries in the world. It makes me doubt if San Diego would be a city I’d like to live in, though I’m sure there will be a lot of places I ought to see except the zoo and the sea.

Returning from the conference, I started to organize my thoughts — of course, after such an exciting conference. Remaining work such as electronic measurement system noise and error characterization (undone but should have been done by the guys who developed the board), gait event-classification, gait identification etc etc, all could lead to dissertation. I’m dreaming about my proposal then. Speaking of which, I realized I hadn’t passed my qualifying exam yet, so I settled a date with 3 of chosen professors, meeting them in December to get the PhD student title. In the days of preparing my qualifying exam, I avoided the noisy lab, sitting back and thinking about the fundamental concepts. And I passed it with a trip over one fundamental DSP question (shame on myself, but I feel there’s never a deep understanding of DSP, every time I review a concept, indeed, it’s different). Anyhow, I think the holiday desserts I prepared might have helped me to pass it, I’m still doubting my quality as a PhD student, I knew I could do better. But that’s for 2011, I have enjoyed a nice drinking night with my friends to call it an symbolic ending event for 2010.

Some writings

Though with a thousand of ‘No’s in my heart, I still went to this recommendatory English for the Second Language class imposed to me. Luckily I had a fun teacher and learned something.

A multicultural and multilingual society is a much healthier society than a one race one language society.”

At the first glance, this statement may seem true. In a modern society, it is a universal belief that a highly diversified society is healthier and provides better opportunity for human development. However, even with the general truth behind this almost self-evident statement, we should still revisit and re-evaluate it in a scientific way. It is difficult to provide a quantitative measure for identifying how healthy a society is. Generally speaking, a healthy society means a fair, free and humanistic society which provides its people equal rights and opportunity to develop and itself opportunity to become more prosperous.

History has shown diversity will improve and propel a society for better advancement. Take China for example, the paramount of its federalism dynasty, Tang, was the most open and diversified period in its history. During Tang period, the emperors encouraged and welcomed the immigrants from all over the world, to study, research and exchange ideologies, trade, reside and intermarry in China.  After Tang dynasty, the emperors of China took the more conservative routes and closed its door to other nations and stopped from learning from other nations. This policy had become to the extreme in Qing dynasty. The Qing Empire declined and fell hard inevitably, not long after living in the dream of being the top of the world. The USA is another good example for being a successful multicultural and multilingual society. Since its establishment, the US government has encouraged immigrations and received great results.  The diverse races, nationalities and cultures from all over the world has injected the motivation and vibe to propel its development, based on the “Everyone is born equal” creed they believe.

But the distinction between different races and cultures in one society has also created unpleasant stories. In New York City, where can be considered the most diversified place in the USA, the fights between different racial gangsters perform every day.  Less extremely but not negligible, the partial favor to one’s own race in competition is also a prevailing phenomenon. The miscommunications and misunderstandings between different cultures and language speakers may escalate to a situation that the law and policy enforcement to guarantee the peace could become very tedious to make the society very inefficient. In my opinion, there should be a balance of the diversity in a society to assure the healthiness. That being said, however, extreme advocates of single race single language, such as the inhumane purgatory of other races and nationalities inside one society, should never be tolerated by the world. Those extremes of single race conservative ideas had created too many miserable memories in the world history.

Additional instructor’s comments about your submission

Shanshan,

You set up your argument really well i the first paragraph. This was really good! As usual, you make a lot of really great points. I don’t have the crossout feature on the computer I am using, so I just highlighted words below if I deleted a word before them… So I hope my suggestions make sense! Anyway, nice job.

Alison

At first glance, this statement may seem true. In a modern society, it is a universal belief that a highly diversified society is healthier and provides better opportunity for human development. However, even with the general truth behind this almost self-evident statement, we should still revisit and re-evaluate it in a scientific way. It is difficult to provide a quantitative measure for identifying how healthy a society is. Generally speaking, a healthy society means a fair, free and humanistic society which provides its people equal rights and opportunity to develop and itself opportunity to become more prosperous.

History has shown diversity will improve and propel a society for better advancement. Take China for example, the paramount of its federalism dynasty, Tang, was the most open and diversified period in its history. During the Tang period, the emperors encouraged and welcomed the immigrants from all over the world, to study, research and exchange ideologies, trade, reside and intermarry in China.  After the Tang dynasty, the emperors of China took the more conservative routes and closed its doors to other nations and stopped learning from other nations. This policy had become extreme in the Qing dynasty. The Qing Empire declined and fell hard inevitably, not long after living in the dream of being the top of the world.

The USA is another good example for being a successful multicultural and multilingual society. Since its establishment, the US government has encouraged immigrations and received great results.  Diverse races, nationalities and cultures from all over the world have injected the motivation and vibe to propel its development, based on the “Everyone is born equal” creed they believe.

But the distinction between different races and cultures in one society has also created unpleasant stories. In New York City, which can be considered the most diversified place in the USA, fights between different racial gangsters happen every day.  Less extremely but not negligible, the partial favor to one’s own race in competition is also a prevailing phenomenon. The miscommunication and misunderstandings between different cultures and language speakers may escalate to a situation that requires the law and policy enforcement to guarantee the peace, whichcould become very tedious and make the society very inefficient. In my opinion, there should be a balance of diversity in a society to assure its healthiness. That being said, however, extreme advocates of single race single language, such as the inhumane purgatory of other races (*Do you mean ‘ethnic genocide’?) and nationalities inside one society, should never be tolerated by the world. Those extremes of single race conservative ideas have created too many miserable memories in world history.

Gait analysis, which means the study of animal locomotion, esp. of human beings, has been an established research area for various medical and healthcare diagnoses. In orthopedics and prosthetics, gait analysis is essential to identify the pathology and assess the efficacy of the orthopedic assistant or prosthetics prescribed. For example, the efficacy of Ankle-Foot Orthosis (AFO) usually prescribed to cerebral palsy (CP) patients, remains unclear. In study on osteoarthritis, knee surgery recovery and rehabilitation, gait analysis focusing on knee joints angle is the key to evaluate the treatment. Studies on recovery and rehabilitation from knee surgery have shown that gait analysis focusing on knee joint angles is the key to evaluating the efficacy of treatment. In elderly healthcare, gait analysis has also played an important role in studies of fall risks and fall preventions, in reason that falls can worsen the health situation of, or be detrimental to the aged population. Even in cognitive and neuropsychology studies, gait analysis becomes an important parameter because of the close relation between human cognitive skills and motor function. For example, some researchers have shown the research value of gait analysis in Parkinson disease and early childhood autism diagnosis, respectively.

The quality of gait research can highly depends on the tools used in the gait analysis. Initially, gait analysis was mainly relied on observation by eyes, later with augmented tools such as camera with capability of continuous picture capturing. Such gait analysis are only limited to the qualitative level from the non-quantitative tools, which fail to provide further information in discriminate the nuances in gaits that vary slightly different. Today, optical motion capturing system, which tracks the spatial position of body segments by infrared camera, provides highly accurate data for gait research. However, while many gait analysis requires natural environment, out-of-lab use, such optical motion capturing system can be expensive and limited to in-lab use. More recently, the development of on-body wearable inertial sensor enables gait analysis to be realized in real-world. With MEMS technology, these motion sensors can be made in a very small form factor and be wearable to human body with less invasiveness.

Additional instructor’s comments about your submission

Shanshan,

Nice clarification in the first sentence and good beginning. I think you may have repeated or forgotten to edit out a sentence in the middle of the first paragraph. There are just a few suggestions below. This is really interesting! I didn’t know you were interested in things like this.

Alison

Gait analysis, which means the study of animal locomotion, esp. of human beings, has been an established research area for various medical and healthcare diagnoses. In orthopedics and prosthetics, gait analysis is essential to identify the pathology and assess the efficacy of the orthopedic assistant or prosthetics prescribed. For example, the efficacy of Ankle-Foot Orthosis (AFO) usually prescribed to cerebral palsy (CP) patients, remains unclear. In studies on osteoarthritis, knee surgery recovery and rehabilitation, gait analysis focusing on knee joints angle is the key to evaluating the treatment. Studies on recovery and rehabilitation from knee surgery have shown that gait analysis focusing on knee joint angles is the key to evaluating the efficacy of treatment.(*Is this the same piece from the sentence before?) In elderly healthcare, gait analysis has also played an important role in studies of fall risks and fall preventions, in reason that falls can worsen the health situation of, or be detrimental to the aged population. Even in cognitive and neuropsychology studies, gait analysis becomes an important parameter because of the close relationship between human cognitive skills and motor functions. For example, some researchers have shown the research value of gait analysis in Parkinson’s Disease and early childhood autism diagnosis, respectively.

The quality of gait research can highly depends on the tools used in the gait analysis. Initially, gait analysis was mainly relied on visual observation by eyes, later with augmented tools such as a camera with capability of continuous picture capturing. Such gait analysis are only limited to the qualitative level from the non-quantitative tools, which fail to provide further information in discriminating between nuances in gaits that vary slightly different. Today, optical motion capturing systems, which tracks the spatial position of body segments by infrared camera, provideshighly accurate data for gait research. However, while much gait analysis requires a natural environment, out-of-lab use, such an optical motion capturing system can be expensive and limited to in-lab use. More recently, the development of an on-body, wearable, inertial sensor enables gait analysis to be realized in the real-world. With MEMS technology, these motion sensors can be made in a very small form factor and be worn on the human body with less invasiveness.

What is your heritage?

When heritage is brought up in a conversation, I always wonder what exactly it is to represent my country. For a Chinese, this word has infinite interpretations. However, the first thing comes to my mind is always Chinese food, which usually foster each Chinese’s appetite she/he can never get rid of in his/her whole life.  Generally speaking, Chinese base their staple food on rice and noodles, with countless kinds of dishes they’ve created and inherited since they mastered the usage of fire.  It is mission impossible to identify some certain types of food to represent Chinese food. However, there is a stereotype that people in the North tend to favor saltier taste food, Southeast sweet, Southwest spicy and West sour. One can only know what Chinese food is when she/he visits China.

Ideology also says a lot about a nation. For China, it is a mixture of the traditional philosophies, contemporary communism and influence from other countries. It takes a while to explain this complicated matter.  The traditional ideology was also a mixture of Confucianism, Taoism and Buddhism. Confucianism was the most dominating mainstream philosophy rooted in every Chinese’s heart, while Taoism can be considered as the primitive ancient Chinese metaphysics to explain the world. Both Confucianism and Taoism were born natively in China, whereas Buddhism was imported from India. Though the fact Buddhism was originally a foreign religion didn’t prevent Chinese people’s adopting nature to tailor to their own pleasure. The mixture of Taoism and Buddhism is usually referred as Zen. I don’t have the level to show the depth of Zen, I can just say every ancient Chinese intellect had studied Zen in their life. Since the establishment of the New China, communism has been adopted for the national ideology.  This ideology has been serving as the beautiful belief of this nation for more than half a century, however, it is under heavy influence of western culture and the foundation of this belief has begun shaken.  Some still hold communism belief and some don’t. Nevertheless, the desire of pursuing a happy life is the same to every Chinese with the idea of Confucianism they’ve never given up.

Cultural traditions such as festivals and rituals are also conversed a lot when talking about China. I’m fascinated about traditional festivals because of every festival has a enchanting mythology behind it. Of course, the special food associated with these festivals cannot be omitted for the charm of these festivals. Several major festivals include Spring Festival to celebrate the Chinese Lunar New Year, Dragon Boat Day in memory of a great poet Qu Yuan, Mid-autumn Day to celebrate the harvest and family get-together (kind of sounds like Halloween and Thanksgiving), with delicious food as dumpling, Zongzi and Mooncake associated with each festival respectively.

Additional instructor’s comments about your submission

This is great! I think it’s also interesting how many other countries try to recreate what they call “Chinese food” and never seem to get it quite right. I have only really had food from Taiwan, but I wish I was more familiar with the mainland food.

China’s history and heritage is also very fascinating to me. The country has held on to powerful beliefs such as Buddhism and Confuscian ideology and seems to have an interesting mix now. Thanks for sharing!! I really enjoyed it. I made some corrections in blue so you can compare with the original.

Alison

What is your heritage?

When heritage is brought up in a conversation, I always wonder what exactly it is to represent my country. For a Chinese, this word has infinite interpretations. However, the first thing comes to my mind is always Chinese food, which usually fosters each Chinese’s person’s appetite which she/he can never get rid of in his/her whole life.  Generally speaking, Chinese base their staple food on rice and noodles, with countless kinds of dishes they’ve created and inherited since they mastered the usage of fire.  It is mission impossible to identify some certain types of food to represent Chinese food. However, there is a stereotype that people in the North tend to favor saltier taste food, Southeast sweet, Southwest spicy and West sour. One can only know what Chinese food is when she/he visits China.

Ideology also says a lot about a nation. For China, it is a mixture of the traditional philosophies, contemporary communism and influence from other countries. It takes a while to explain this complicated matter.  The traditional ideology was also a mixture of Confucianism, Taoism and Buddhism. Confucianism was the most dominating mainstream philosophy rooted in every Chinese’s heart, while Taoism can be considered as the primitive ancient Chinese metaphysicsused to explain the world. Both Confucianism and Taoism were born natively in China, whereas Buddhism was imported from India. Though the fact that Buddhism was originally a foreign religion didn’t prevent Chinese people’s adopting nature to tailor it to their own pleasure. The mixture of Taoism and Buddhism is usually referred as Zen. I don’t have the level to show the depth of Zen, I can just say every ancient Chinese intellect had studied Zen in their life. Since the establishment of the New China, communism has been adopted for the national ideology.  This ideology has been serving as the beautiful belief of this nation for more than half a century, however, it is under heavy influence of western culture and the foundation of this belief has begun to shake. Some still hold communist beliefs and some don’t. Nevertheless, the desire of pursuing a happy life is the same to every Chinese with the idea of Confucianism they’ve never given up.

Cultural traditions such as festivals and rituals are also conversed a lot when talking about China. I’m fascinated about traditional festivals because of every festival has an enchanting mythology behind it. Of course, the special food associated with these festivals cannot be omitted for the charm of these festivals. Several major festivals include Spring Festival to celebrate the Chinese Lunar New Year, Dragon Boat Day in memory of a great poet Qu Yuan, Mid-

Great opening sentence! However, it isn’t clear if it has caught the attention of German society or the international society. Also, Neo-Nazi should be capitalized. J  “Neo-Nazi” is either used as an adjective or a single person, where “Neo-Nazism” is the movement.

I have highlighted mistakes and places where it sounds a little awkward. Although it may seem unnecessary, you also need to explain why Nazism was bad and what the implications are of a rise in a Neo-Nazi mentality or movement. In other words, make the problem clear from the beginning.

I really like the perspective you give. You seem to have good insight into this problem. You just have a few problems with some of the work forms.

Suggested Grade: A-

Though appearing to be a global phenomenon, the rise of neo-nazi in Germany has increasingly caught the attention of society. It is a political extreme attempting to justify the historical impact of Nazism and revive the Nazism. Inheriting the Nazism ideology of World War II, neo-nazists seek opportunities to convene in the form of public movements, causing stirs in German society or even globally.

To solve this social issue, further investigation on the reason behind this phenomenon is needed. It is interesting to notice that although with intense emphasis on “denazification” in the retrospect to their war crime in Germany after World War II, the neo-nazism becomes popular among the young German generations, gaining a considerable amount of followers in East Germany.  This is partially because the former Nazi remembers retained their ideology and beliefs and passed it to the new generation. To prevent the rise of ht e neo-nazi trend, a possible solution is to launch and enforce strict laws and policies. However, in fact, the German government does enact strict laws to prohibit the neo-nazi movement without desirable outcome. From my perspective, I highly doubt that stricter laws would be more effective. I think new education ideas in humanistic should be discussed and developed to educate the young generations. Meanwhile, since this is a worldwide issue residing in the wide divide between people with different religions, nationalities, and various cultural backgrounds, a global dialogue on such matter should be initialized as a worldwide movement, discussed  not only by academia and politic elites, but also by the general public.

« Older entries