Python Essential

2015-04-10 | 振导社会 | 程序设计 Python

Python，一切皆对象（对象）！ Python是一种解释型、面向对象、可扩展、可嵌入、小内核、动态类型、强类型的编程语言。

赋值不是拷贝数据，只是将名字绑定到对象，id()函数可以查看对象的唯一id。del x从名字空间中解除对名字x的绑定。

数据类型

基本类型

动态类型：不需要事先声明变量类型；
强类型：一旦变量有了值，这个值就有一个类型，不同类型变量间不能直接赋值。

Python有4种基本的数值类型：

整型（int）：通过C语言的long实现，提供至少32位（bit）的精度，sys.maxint表示当前系统整型的最大值；
长整型（long）：拥有无限的精度；
浮点型（float）：通过C语言的double实现，可通过sys.float_info查看当前系统浮点型信息；
复数型（complex）：实部和虚部都是浮点型。

整型加L或l后缀构成长整型，数值加j或J构成复数的虚数，实数与虚数的和构成复数。

布尔型是特殊的整型。标准库提供了fractions和decimal库，分别处理分数和自定义精度的浮点数。

`None`类型

None比任何数都小：

sorted([1, 2, None, 4.5, float('-inf')])

#### [None, -inf, 1, 2, 4.5]

迭代器类型（iterator type）

Python支持容器的迭代，用户可以通过定义类的__iter__()和next()两个方法实现迭代器。

生成器（generator）提供了实现迭代协议的简便方法。若容器的__iter__()方法用生成器实现，它自动返回的生成器对象支持__iter__()和next()方法。

序列类型（sequence type）

Python有7种序列类型：字符串（string）、unicode字符串、列表（list）、元组（tuple）、字节数组（bytearray）、缓存（buffer）、xrange对象。

字符串和元组是不可变序列类型（immutable sequence type），创建后不可更改。

一、字符串

字符串用单引号'或双引号"创建。

字符串和unicode字符串有专门的配套处理方法，部分方法也适用于字节数组。

格式化字符串的基本方法是s%<tuple>，格式由s确定，tuple为参数列表。

##### 基本格式化方法
print "%s’s height is %dcm" % ("Charles", 170)
##### 返回：Charles’s height is 180cm

##### 另一种格式化方法：
print "%(name)s’s height is %(height)dcm" \
    ", %(name)s’s weight is %(weight)dkg." % \
    {"name":"Charles", "weight":70, "height":170}
##### 返回：Charles’s height is 170cm, Charles’s weight is 70kg.

re模块是基于正则表达式（regular expression）的字符串处理方法。

二、unicode字符串

unicode字符串与字符串类似，在字符串加上了前缀u。

ustr = u'abc'

三、列表

列表为顺序存储结构，元素可为任意类型：

[ 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 ]
[ 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday' ]

四、元组

元组是常量列表，创建后不可更改。创建唯一元素的元组时，结尾必须包括逗号(d,)。

##### 创建元组
a = (0, 1, 2, 3, 4)
a = 0, 1, 2, 3, 4

##### 同时对多个变量赋值
x, y = (4, 44)
x, y = 444, 4444

五、字节数组

字节数组通过内建函数bytearray()创建。

六、缓存

缓存对象不直接通过Python语法支持，通过内建函数buffer()创建，不支持连结（concatenation）和重复（repetition）。

七、xrange对象

xrange对象与缓存类似，没有直接语法支持，通过内建函数xrange()创建。

xrange对象是不可变序列，通常用于循环，不管序列长度是多少，始终占用同样内存空间。There are no consistent performance advantages. xrange只具备索引、迭代和len()行为，对其使用in、not in、min()和max()效率低，不支持切片（slicing）、连结（concatenation）和重复（repetition）。

集合类型（set type）

集合是由不同的可哈希对象组成的无序集¹。Python有set和frozenset两种内建的集合类型。set是可变的，支持add()和remove()（Since it is mutable, it has no hash value and cannot be used as either a dictionary key or as an element of another set. ），frozenset是可变的和能哈希的（It can therefore be used as a dictionary key or as an element of another set.）。

Python2.7中，集合set可用{}创建。

{'jack', 'sjoerd'}

集合通常用于成员测试（membership testing）、移除序列的重复元素、数学中集合的交、并和补等操作。集合支持x in set、len(set)和for x in set，不支持索引、切片等序列相关的操作。集合支持比较操作，set和frozenset之间也可比较。set('abc') == frozenset('abc')和set('abc') in set([frozenset('abc')])都返回True，集合没有实现__cmp__()方法。

集合与字典一样，key必须是可哈希的。

映射类型（mapping type）

映射类型是将可哈希的值映射到任意对象，它是可变对象。目前只有字典（dictionary）一种映射类型。字典是无序存储结构，每个元素是一个pair。

包括key和value两部分，key的类型是integer或string（key必须是可哈希的），value的类型任意；
没有重复的key；
用D[key]的形式得到字典key对应的value。

##### 构造dictionary的方法
pricelist = {"clock":12, "table":100, "xiao":100}
pricelist = dict([("clock",12), ("table",100), ("xiao",100)])

##### 若对不存在的key赋值，则增加一个元素
pricelist["apple"] = 12
##### pricelist = {’table’: 100, ’apple’: 12, ’xiao’: 100, ’clock’: 12}

读取不存在的key会抛出异常。若用D.get(key)，当D中有key则返回对应的value，否则返回None。D.get(key)相当于D.get(key, None)。

a = {"four": 4, "three": 3, "two": 2, "one": 1}
a.items()
##### 返回：[(’four’, 4), (’three’, 3), (’two’, 2), (’one’, 1)]

dict.viewkeys()、dict.viewvalues()和dict.viewitems()返回字典的视图对象。视图可以动态查看字典元素，当字典改变时，视图随之改变。由于字典的key是可哈希的，key的视图与集合类似，同理item的视图也与集合类似（value的视图与集合不同），因此可使用集合操作。

>>> dishes = {'eggs': 2, 'sausage': 1, 'bacon': 1, 'spam': 500}
>>> keys = dishes.viewkeys()
>>> values = dishes.viewvalues()

>>> ##### iteration
>>> n = 0
>>> for val in values:
...     n += val
>>> print(n)
504

>>> ##### keys and values are iterated over in the same order
>>> list(keys)
['eggs', 'bacon', 'sausage', 'spam']
>>> list(values)
[2, 1, 1, 500]

>>> ##### view objects are dynamic and reflect dict changes
>>> del dishes['eggs']
>>> del dishes['sausage']
>>> list(keys)
['spam', 'bacon']

>>> ##### set operations
>>> keys & {'eggs', 'bacon', 'salad'}
{'bacon'}

其它内建类型

模块、类、函数、方法、代码对象、省略号（Ellipsis）对象、NotImplemented对象……

运算和优先级

基本运算

4种基本数值类型的运算：

运算符	运算结果	备注
`＋-*/`		整形之间的运算为整形，`1 / 2 = 0`。
`-x+x`
`x // y`	(floored) quotient of `x` and `y`
`x % y`	remainder of `x / y`
`abs(x)`	absolute value or magnitude of `x`
`int(x)`	`x` converted to integer
`long(x)`	`x` converted to long integer
`float(x)`	`x` converted to floating point
`complex(re,im)`	a complex number with real part `re`, imaginary part `im`. `im` defaults to zero.
`c.conjugate()`	conjugate of the complex number `c`. (Identity on real numbers)
`divmod(x, y)`	the pair `(x // y, x % y)`
`pow(x, y)`	`x` to the power `y`	`pow(0, 0) = 1`
`x ** y`	`x` to the power `y`	`0 ** 0 = 1`

浮点向整型转换用math.trunc()、math.floor()和math.ceil()，分别表示向0取整、下取整和上取整。
从2.3版本开始，不再为复数定义//、%和divmod()运算；
其它针对整型的运算有int.bit_length()、long.bit_length()；
其它针对浮点型的运算有float.as_integer_ratio()、float.is_integer()、
float.hex()、float.fromhex(s)。

布尔运算

Python的所有对象都支持真值测试，以下情况为False²：

None；
False；
数值类型值为0时，例如：0、0L、0.0、0j；
空的序列（sequence），例如：''、()、[]；
空的映射（mapping），例如：{}；
对用户定义类的实例，若类定义了__nonzero__()或__len__()方法，返回0或False时。

除此之外，其它情况为True。

运算符	运算法则	备注
`x or y`	若`x`为`False`取`y`的值，否则取`x`的值。	只有在`x`为`False`时才计算`y`的值。
`x and y`	若`x`为`False`取`x`的值，否则取`y`的值。	只有在`x`为`True`时才计算`y`的值。
`not x`	若`x`为`False`取`True`，否则取`False`。	`not`的优先级低于非布尔运算，因此`not a == b`等价于`not (a == b)`（`a == not b`是语法错误）。

比较运算

Python的所有对象都支持比较运算。

运算符	运算法则	备注
`<`	strictly less than
`<=`	less than or equal
`>`	strictly greater than
`>=`	greater than or equal
`==`	equal³
`!=`	not equal	为后向兼容，也可用`<>`。
`is`	object identity
`is not`	negated object identity

优先级：所有比较运算优先级一样，但都高于布尔运算；
持链式法则：x < y <= z等价于x < y and y <= z（但y的值只计算一次），当x < y为False时不计算z的值。

特殊比较法则⁴：

除了不同数值类型和不同字符串类型⁵，不同类型的对象不相等，这些对象的排序是consistently but arbitrarily；
所有文件对象都不相等，它们的排序是arbitrarily but consistently；
<、<=、>和>=运算符的任何一个操作数是复数时，抛出TypeError异常；
同一类的不同实例不相等，除非类定义了__eq__()或__cmp__()方法；
同一类的不同实例不能排序⁶，除非类定义了足够的比较方法__lt__()、__le__()、__gt__()、__ge__()或者__cmp__()方法。

位运算

位运算只针对整型，负整型按补码的方式存储。以下运算符优先级按从低到高排列：

运算符	结果	备注
`x \| y`	bitwise or of `x` and `y`
`x ^ y`	bitwise exclusive or of `x` and `y`
`x & y`	bitwise and of `x` and `y`
`x << n`	`x` shifted left by `n` bits	整型运算超出范围时结果自动转为长整型；`n`取负值时，抛出`ValueError`异常。
`x >> n`	`x` shifted right by `n` bits	`n`取负值时，抛出`ValueError`异常。
`~x`	the bits of `x` inverted

位运算优先级低于基本的数值运算，高于比较运算；
单目运算~与取正负的运算+和-优先级相同。

序列运算

序列支持比较。特别的，对于列表和元组，相应元素进行比较。如果两个序列相等，序列的类型必须相同，元素个数必须相等，并且每个对应元素也必须相等。

s和t是相同类型的序列，n、i和j是整数，以下运算按优先级从低到高排列：

运算符	运算结果
`x in s`	`True` if an item of `s` is equal to `x`, else `False`
`x not in s`	`False` if an item of `s` is equal to `x`, else `True`
`s + t`	the concatenation of `s` and `t`
`s * n, n * s`	`n` shallow copies of `s` concatenated
`s[i]`	`i`th item of `s`, origin `0`
`s[i:j]`	slice of `s` from `i` to `j`
`s[i:j:k]`	slice of `s` from `i` to `j` with step `k`
`len(s)`	length of `s`
`min(s)`	smallest item of `s`
`max(s)`	largest item of `s`
`s.index(x)`	index of the first occurrence of `x` in `s`
`s.count(x)`	total number of occurrences of `x` in `s`

in和not in和比较运算优先级相同，+和*和基本数值运算优先级相同；
n小于0时，视为0；
索引的取值范围是-len(s)到len(s)-1，i为负数时，索引为len(s) + i（但-0索引仍为0）；
第1个元素是s[0]或s[-len]；
对于s[i:j]：i到j切片包含的元素为i <= k < j，i和j大于len(s)时用len(s)代替，忽略i或i为None时用0代替，忽略j或j为None时用len(s)代替，i >= j时切片为空；
s[::－1]逆序成新的序列。

##### 浅拷贝
>>> lists = [[]] * 3
>>> lists
[[], [], []]
>>> lists[0].append(3)
>>> lists
[[3], [3], [3]]

>>> lists = [[] for i in range(3)]
>>> lists[0].append(3)
>>> lists[1].append(5)
>>> lists[2].append(7)
>>> lists
[[3], [5], [7]]

列表和字节数组是可变序列类型（mutable sequence type），支持如下修改序列内容的运算（x是任意对象）：

运算符	运算结果
`s[i] = x`	item `i` of `s` is replaced by `x`
`s[i:j] = t`	slice of `s` from `i` to `j` is replaced by the contents of the iterable `t`
`del s[i:j]`	same as `s[i:j] = []`
`s[i:j:k] = t`	the elements of `s[i:j:k]` are replaced by those of `t`
`del s[i:j:k]`	removes the elements of `s[i:j:k]` from the list
`s.append(x)`	same as `s[len(s):len(s)] = [x]`
`s.extend(x)`	same as `s[len(s):len(s)] = x`
`s.count(x)`	return number of `i`‘s for which `s[i] == x`
`s.index(x[, i[, j]])`	return smallest `k` such that `s[k] == x and i <= k < j`
`s.insert(i, x)`	same as `s[i:i] = [x]`
`s.pop([i])`	same as `x = s[i]; del s[i]; return x`
`s.remove(x)`	same as `del s[s.index(x)]`
`s.reverse()`	reverses the items of `s` in place
`s.sort([cmp[, key[, reverse]]])`	sort the items of `s` in place

t的长度必须和切片所得长度相同；
pop()默认参数是-1；
从2.3版开始，sort()排序是稳定的。

控制流

分支

if <expr1>:
    <statement-block>
elif <expr2>:
    <statement-block>
elif <expr3>:
    <statement-block>
...
else:
    <statement-block>

if后可跟任何表达式，除None、0、""（空字符串）、[]（空list)、()（空tuple）和{}（空dictionary）外，其它都为True；
switch结构通过elif实现。

循环

##### for
for x in <sequence>:
    <statement-block>
else:
    <else-block>
#####`else`部分可有可无。若有，则表示每个元素都循环到了（无`break`），那么就执行\<else-block\>。

##### while
while <expr1>:
    <block>
else:
    <else-block>
##### 在循环结束时若expr1为假，则执行\<else-block\>。

可用break、continue改变循环结构。

当需要确定序列的索引时，利用enumerate()函数。

>>> for i, v in enumerate(['tic', 'tac', 'toe']):
...     print i, v
...
0 tic
1 tac
2 toe

##从函数到包

函数

def <function_name> ( <parameters_list> ):
    <code block>

函数没有返回值类型，return可以返回任何类型；
函数名称只是一个变量，可把一个函数赋值给另一个变量。

函数支持默认参数。

def parrot(voltage, state='a stiff', action='voom', type='Norwegian Blue'):
    print "-- This parrot wouldn't", action,
    print "if you put", voltage, "volts through it."
    print "-- Lovely plumage, the", type
    print "-- It's", state, "!"

parrot(1000)                                          # 1 positional argument
parrot(voltage=1000)                                  # 1 keyword argument
parrot(voltage=1000000, action='VOOOOOM')             # 2 keyword arguments
parrot(action='VOOOOOM', voltage=1000000)             # 2 keyword arguments
parrot('a million', 'bereft of life', 'jump')         # 3 positional arguments
parrot('a thousand', state='pushing up the daisies')  # 1 positional, 1 keyword

函数参数的传递通过对象绑定的方式引用传递（不同于C++的引用）。

个数可变参数：`*arg` & `**keyword`

def printf(format, *arg):
    print format%arg
##### `*arg`必须为最后一个参数，`*`表示接受任意多个参数，多余的参数以tuple形式传递。

##### 另一种用dictionary实现可变个数参数的方法：
def printf(format, **keyword):
    for k in keyword.keys():
        print "keyword[%s] is %s"%(k,keyword[k])
##### `**keyword`表示接受任意个数有名字的参数。

*arg和**keyword同时存在时，*arg要在**keyword之前。

匿名的lambda函数

f = lambda a, b: a + b
f(1, 2)
##### 返回：3
#

>>> pairs = [(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')]
>>> pairs.sort(key=lambda pair: pair[1])
>>> pairs
[(4, 'four'), (1, 'one'), (3, 'three'), (2, 'two')]

嵌套函数

def outfun(a, b):
    def innerfun(x, y):
 	    return x + y
    return innerfun(a, b)

类

Python的类机制是C++和Modula-3的混合，支持标准面向对象编程的所有特性。与C++和Modula-3不同的是，基本的内建类型也可以作为基类。类也拥有Python的动态特性，在运行时（runtime）创建，可以在创建后修改。

类也是一种对象，支持属性引用（attribute reference）和实例化（instantiation）操作。

如果要让内部属性不被外部访问，可以把属性的名称前加上两个下划线__。

定义一个特殊的__slots__变量，来限制该类能添加的属性。__slots__定义的属性仅对当前类起作用，对继承的子类不起作用。

`new`与`init`

__new__创建实例，__init__初始化实例。

__new__(cls[, ...])用于创建类cls的实例，它是静态方法，第一个参数是实例化采用的类。通常通过调用超类（superclass）的__new__()实现，super(currentclass, cls).__new__(cls[, ...])，返回按需修改后创建的实例。

若__new__()返回cls的实例，新实例的方法__init__(self[, ...])将被调用（其它参数与传给__new__()的相同）；
若__new__()不返回cls的实例，不会调用__init__()。

__new__()通常用于定制不可变类型（例如整型、字符串和元组等）子类的实例创建，以及在元类（metaclass）中定制类的创建。

__init__(self[, ...])在实例创建后调用。

`getattribute`与`getattr`

`super`函数

Things to Know About Python Super（［1/3］、［2/3］）

`@staticmethod`与`@classmethod`

用@staticmethod修饰的静态方法对它所在类一无所知，没有隐式的第一参数，即可从类调用也可从实例调用。

用@classmethod修饰的类方法仅仅与类交互而不和实例交互，将类作为第一个隐式参数，即可从类调用也可从实例调用。

描述符（descriptor）

从根本上讲，描述符就是可以重复使用的属性。描述符可以实现属性、函数与方法、静态方法和动态方法等特性。

为了让描述符能够正常工作，它们必须定义在类的层次上。

class Broken(object):
    y = NonNegative(5)           # descriptor
    def __init__(self):
        self.x = NonNegative(0)  # NOT a good descriptor

b = Broken()
print "X is %s, Y is %s" % (b.x, b.y)

##### 输出：X is <__main__.NonNegative object at 0x10432c250>, Y is 5

`metaclass`元类

类是元类的实例。new-style类的默认用type()创建，当类的定义读入后，通过type(name, bases, dict)创建类。

A = type('A', (object,), {'a': 'I am a string.'})

如果读入了类的定义，并且__metaclass__已定义，将用__metaclass__定义的方法替代type()创建类。

类的__metaclass__可以是一个type类型的子类，也可以是一个函数，该函数接受3个参数：

name：字符串类型, 表示新生成类型对象的名称；
bases：元组类型, 新生成类型对象的父类列表；
properties: 字典类型, 新生成类型对象的属性。

def meta_func(name, bases, properties):
    # you can modify bases, properties here to overide class creation
    return type(name, bases, properties)

class A(object):
    __metaclass__ = meta_func

模块和包

模块（module）就是同一个文件中函数或类的集合。包（package）就是同一个文件夹中模块的集合，该文件夹须包含__init__.py文件。

模块的名字是通过全局变量__name__定义。每个模块都有自己的私有变量表，作为该模块中函数的全局变量。模块和包通过import调用。

包中__init__.py中的变量__all__限定了from package import *可以导入的模块。

程序调试

Python中常用的调试工具有pdb、pudb、ipdb等，pycharm也提供了方便的图形化调试功能。

pdb调试：

python -m pdb test.py

pudb调试：

pudb test.py # method 1
python -m pudb.run test.py # method 2

ipdb调试：

python -m ipdb test.py

异常处理

import sys

try:
    f = open('myfile.txt')
    s = f.readline()
    i = int(s.strip())
except IOError as e:
    print "I/O error({0}): {1}".format(e.errno, e.strerror)
except ValueError:
    print "Could not convert data to an integer."
except:
    print "Unexpected error:", sys.exc_info()[0]
    raise

####同时捕获多个异常：

... except (RuntimeError, TypeError, NameError):
...     pass

####try没有捕获到异常时，执行else部分：

for arg in sys.argv[1:]:
    try:
        f = open(arg, 'r')
    except IOError:
        print 'cannot open', arg
    else:
        print arg, 'has', len(f.readlines()), 'lines'
        f.close()

####抛出异常：

>>> try:
...    raise Exception('spam', 'eggs')
... except Exception as inst:
...    print type(inst)     # the exception instance
...    print inst.args      # arguments stored in .args
...    print inst           # __str__ allows args to be printed directly
...    x, y = inst.args
...    print 'x =', x
...    print 'y =', y
...
<type 'exceptions.Exception'>
('spam', 'eggs')
('spam', 'eggs')
x = spam
y = eggs

####检测是否遇到异常，可不处理，重新抛出异常：

>>> try:
...     raise NameError('HiThere')
... except NameError:
...     print 'An exception flew by!'
...     raise
...
An exception flew by!
Traceback (most recent call last):
  File "<stdin>", line 2, in ?
NameError: HiThere

####用户自定义异常：

>>> class MyError(Exception):
...     def __init__(self, value):
...         self.value = value
...     def __str__(self):
...         return repr(self.value)
...
>>> try:
...     raise MyError(2*2)
... except MyError as e:
...     print 'My exception occurred, value:', e.value
...
My exception occurred, value: 4
>>> raise MyError('oops!')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
__main__.MyError: 'oops!'

####清理（clean-up）行为

通过finally语句，定义清理行为：

在任何情况下，离开try之前都要执行finally语句；
没被except处理的异常，finally继续抛出；
with语句是try...except...finally结构的简便实现。

在实际应用中，finally语句通常用于释放外部资源。

>>> def divide(x, y):
...     try:
...         result = x / y
...     except ZeroDivisionError:
...         print "division by zero!"
...     else:
...         print "result is", result
...     finally:
...         print "executing finally clause"
...
>>> divide(2, 1)
result is 2
executing finally clause
>>> divide(2, 0)
division by zero!
executing finally clause
>>> divide("2", "1")
executing finally clause
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "<stdin>", line 3, in divide
TypeError: unsupported operand type(s) for /: 'str' and 'str'

Idiomatic Python

列表解析

##### 语法
[ <expr1> for k in L if <expr2> ]

##### if 条件 —— 要位于for之后
a = ["123", "456.7", "abc", "Abc", "AAA"]
[ k.upper() for k in a if k.isalpha() ] # 返回：['ABC', 'ABC', 'AAA']
[ int(k) for k in a if k.isdigit() ] # 返回：[123]

##### if ... else 条件 —— 要位于for之前
[(x, y) if x != y else (x * 100, y * 100) for x in [1,2,3] for y in [3,1,4]]
##### [(1, 3), (100, 100), (1, 4), (2, 3), (2, 1), (2, 4), (300, 300), (3, 1), (3, 4)]

##### 嵌套
[(x, y) for x in [1,2,3] for y in [3,1,4] if x != y] 
#####[(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)]

##### 生成元组
[(x, x**2) for x in range(6)] 
#####[(0, 0), (1, 1), (2, 4), (3, 9), (4, 16), (5, 25)]

##### 矩阵展开
>>> vec = [[1,2,3], [4,5,6], [7,8,9]]
>>> [num for elem in vec for num in elem]
[1, 2, 3, 4, 5, 6, 7, 8, 9]

##### 矩阵转置
>>> matrix = [
...     [1, 2, 3, 4],
...     [5, 6, 7, 8],
...     [9, 10, 11, 12],
... ]
>>> [[row[i] for row in matrix] for i in range(4)]
[[1, 5, 9], [2, 6, 10], [3, 7, 11], [4, 8, 12]]

除此之外，还有集合解析、字典解析。没有元组解析，那是构造生成器的方法。

###with与上下文管理器（context manager）

with语句确保对象在任何情况下都被正确清理。

with open("myfile.txt") as f:
    for line in f:
        print line,

`yield`与生成器（generator）

###修饰器（decorator）⁷

一个函数返回另一个函数，通常利用@wrapper语法实现函数变换，classmethod()和 staticmethod()是典型的修饰器。

函数式编程

与序列配套使用的三个有用的函数：filter()、map()和reduce()。

filter(function, sequence)：返回sequence的元素组成的序列，其中的每个元素item满足function(item)为True。当sequence为字符串或元组时，返回相同类型，其它情况总返回列表类型。

>>> def f(x): return x % 3 == 0 or x % 5 == 0
...
>>> filter(f, range(2, 25))
[3, 5, 6, 9, 10, 12, 15, 18, 20, 21, 24]

map(function, sequence)：返回以function(item)返回值构成的序列。当参数为多个sequence时，每个对应function的一个参数，短的序列以None补齐。

>>> def cube(x): return x*x*x
...
>>> map(cube, range(1, 11))
[1, 8, 27, 64, 125, 216, 343, 512, 729, 1000]

>>> seq = range(8)
>>> def add(x, y): return x+y
...
>>> map(add, seq, seq)
[0, 2, 4, 6, 8, 10, 12, 14]

reduce(function, sequence)：将sequence的前两个元素作为二元参数函数的参数，再将结果和下一个元素作为参数……返回最终结果。如果sequence只有一个元素，返回元素值，如果sequence为空，抛出异常。

reduce函数的第3个参数表示function函数参数的起始值，相当于它和sequence连结成新的序列作为reduce的参数，用它可以处理sequence为空时抛出异常的问题。

>>> def add(x,y): return x+y
...
>>> reduce(add, range(1, 11))
55

>>> def sum(seq):
...     def add(x,y): return x+y
...     return reduce(add, seq, 0)
...
>>> sum(range(1, 11))
55
>>> sum([])
0

Docstring

每个函数对象都有一个__doc__属性，如第一表达式是string，函数的__doc__就是这个string，否则__doc__为None。

def testfun():
    """
    this function do nothing, just demonstrate the use of the doc string.
    """
    pass

print testfun.__doc__
#####返回：this function do nothing, just demonstrate the use of the doc string.

用这种方法写帮助文档，有利于程序和文档的一致性。

常用函数

`enumerate()`：确定序列索引

`zip()`：序列合并与反合并

等价于参数参数为None的map()函数。

>>> x = [1, 2, 3]
>>> y = [4, 5, 6]
>>> zipped = zip(x, y)
>>> zipped
[(1, 4), (2, 5), (3, 6)]
>>> x2, y2 = zip(*zipped)
>>> x == list(x2) and y == list(y2)
True

`dir()`：列出模块定义了的名字

>>> import __builtin__
>>> dir(__builtin__)

实用技术[1]

避免直接比较`True`、`False`、`None`

避免直接和False、None以及空序列（[]、{}、{}）比较，如果序列my_list为空，if my_list:返回False。有时推荐直接与None比较，比如检验函数参数的默认值是否为None：

def insert_value(value, position=None):
"""Inserts a value into my container, optionally at the specified position"""
    if position is not None:
        #...

上例中不能使用if position，因为position的取值可能是0。

避免使用`''`、`[]`、`{}`作为函数的默认参数

def f(a, L=[]):
    L.append(a)
    return L 

print(f(1)) 
print(f(2)) 
print(f(3))

##### This will print
#
###### [1]
###### [1, 2]
###### [1, 2, 3]

用`*`表示列表剩下的元素［Python 3］

some_list = ['a', 'b', 'c', 'd', 'e'] 
(first, second, *rest) = some_list 
print(rest)
(first, *middle, last) = some_list 
print(middle)
(*head, penultimate, last) = some_list 
print(head)

利用`default`参数，字典`dict.get`返回默认值

log_severity = None
if 'severity' in configuration:
    log_severity = configuration['severity'] 
else:
    log_severity = 'Info'

##### 等价于：
# 
log_severity = configuration.get('severity', 'Info')

利用字典解析（dict comprehension）构造字典

user_email = {user.name: user.email for user in users_list if user.email}

利用`format`函数格式化字符串

除此之外，+和%格式化字符串的方法不推荐。

def get_formatted_user_info(user):
    output = 'Name: {user.name}, Age: {user.age}, Sex: {user.sex}'.format(user=user)
    return output

用`''.join`将列表元素合并为字符串

result_list = ['True', 'False', 'File not found']
result_string = ''.join(result_list)

多个字符串操作链式组合

book_info = ' The Three Musketeers: Alexandre Dumas'
formatted_book_info = book_info.strip().upper().replace(':', ' by')

表达式求值

import ast
expr = "[1, 2, 3]" 
my_list = ast.literal_eval(expr)

为类定义`str`使其便于理解

class Point():
    def __init__(self, x, y):
        self.x = x
        self.y = y
    def __str__(self):
        return '{0}, {1}'.format(self.x, self.y) 

p = Point(1, 2)
print (p)

##### Prints '1, 2'

利用集合去除迭代容器中的重复元素

unique_surnames = set(employee_surnames) 
def display(elements, output_format='html'):
    if output_format == 'std_out': 
        for element in elements:
            print(element)
    elif output_format == 'html':
        as_html = '<ul>'
        for element in elements:
            as_html += '<li>{}</li>'.format(element) 
        return as_html + '</ul>'
    else:
        raise RuntimeError('Unknown format {}'.format(output_format))

利用集合解析（set comprehension）

users_first_names = {user.first_name for user in users}

利用生成器（generator）惰性加载无限序列

def get_twitter_stream_for_keyword(keyword):
    imaginary_twitter_api = ImaginaryTwitterAPI()
    while imaginary_twitter_api.can_get_stream_data(keyword):
        yield imaginary_twitter_api.get_stream(keyword)

for tweet in get_twitter_stream_for_keyword('#jeffknupp'):
    if got_stop_signal: 
        break
    process_tweet(tweet)

def get_list_of_incredibly_complex_calculation_results(data):
    yield first_incredibly_long_calculation(data) 
    yield second_incredibly_long_calculation(data) 
    yield third_incredibly_long_calculation(data)

利用生成器解析（generator expression）替代列表解析

for uppercase_name in (name.upper() for name in get_all_usernames()): 
    process_normalized_username(uppercase_name)

利用上下文管理器（context manager），`with`语句，正确管理资源

用户定义的类可以通过定义__enter__和__exit__方法实现上下文管理器，函数可以借助contextlib包实现。

with open(path_to_file, 'r') as file_handle: 
    for line in file_handle:
        if raise_exception(line): 
            print('No! An Exception!')

利用元组解包数据

list_from_comma_separated_value_file = ['dog', 'Fido', 10]
(animal, name, age) = list_from_comma_separated_value_file
output = ('{name} the {animal} is {age} years old'.format(animal=animal, name=name, age=age))

利用`_`占位元组不需要的元素

(name, age, _, _) = get_user_info(user) if age > 21:
    output = '{name} can drink!'.format(name=name)

交换元素不必使用临时变量

foo = 'Foo'
bar = 'Bar'
(foo, bar) = (bar, foo)

利用`sys.exit`让脚本返回错误码

这样可以与其他程序兼容，用于Unix管道。

def main(): 
    import sys
    if len(sys.argv) < 2:
        sys.exit('You forgot to pass an argument')
    argument = sys.argv[1] 
    result = do_stuff(argument) 
    if not result:
        sys.exit(1)
    do_stuff_with_result(result)
    # Optional, since the return value without this return 
    # statment would default to None, which sys.exit treats 
    # as 'exit with 0'
    return 0

if __name__ == '__main__':
    sys.exit(main())

不要使用`from foo import *`

from foo import (bar, baz, qux, quux, quuux)
##### or even better...
import foo

`import`的顺序：

标准库；
site-packages中的第三方库；
当前工程中的库。

字符集编码

不同编码之间通过中间桥梁Unicode，利用decode和encode进行转换；
- decode将其它编码的字符串转换成Unicode编码，name.decode("GB2312")表示将GB2312编码的字符串name转换成Unicode编码；
- encode将Unicode编码转换成其它编码的字符串，name.encode("GB2312")表示将GB2312编码的字符串name转换成GB2312编码；
- 在进行编码转换的时候必须先知道name是那种编码。
判断是s字符串否为Unicode：isinstance(s, unicode)；
获取系统默认编码：sys.getdefaultencoding()；
str(s)是s.encode('ascii')的简写。

三元运算：模拟`?:`运算

max_value = A if A > B else B

优美地打印

可以通过以下方式对字典和数列进行优美地打印：

from pprint import pprint 
pprint(my_dict)

这种方式对于字典打印更加高效。此外，如果你想要漂亮的将文件中的json文档打印出来，可以用以下这种方式：

cat file.json | python -m json.tools

直接利用 set 去重复元素

如果 set 满足后续使用，复杂度为 O(n)：

seq = ['a', 'a', 'b']
res = set(seq)

switch case 的实现方式

因为 switch case 语法完全可用 if else 代替，所以 Python 就没有 switch case 语法，但可用 dictionary 或 lamda 实现。

switch case 结构：

switch (var)
{
    case v1: func1();
    case v2: func2();
    ...
    case vN: funcN();
    default: default_func();
}

dictionary 实现：

values = {
           v1: func1,
           v2: func2,
           ...
           vN: funcN,
         }
values.get(var, default_func)()

lambda 实现：

{
  '1': lambda: func1,
  '2': lambda: func2,
  '3': lambda: func3
}[value]()

用 try…catch 来实现带 Default 的情况，个人推荐使用 dict 的实现方法。

尽量使用 build-in 的函数

add(a, b) 要优于 a+b；
字符串链接时，用 join 代替 + 操作符，后者有 copy 开销；
对字符串既可使用正则表达式也可使用内置函数时，选择内置函数，如：str.isalpha()，str.isdigit()，str.startswith((‘x’, ‘yz’))，str.endswith((‘x’, ‘yz’))。

编程风格：按PEP8格式化代码

命名法则：

类型	格式	例子
类	驼峰命名法	`class StringManipulator():`
变量	`_`连接单词	`joined_by_underscore = True`
函数	`_`连接单词	`def multi_word_name(words):`
常量	全部大写	`SECRET_KEY = 42`

尽量避免使用缩写。

运用范例

利用`with`的上下文管理器测试执行时间

import time


class Timer(object):
    def __enter__(self):
        self.start = time.clock()
        return self

    def __exit__(self, *args):
        self.end = time.clock()
        self.interval = self.end - self.start


with Timer() as t:
    dosomesuch()
print t.interval

利用递归和`yield`打印嵌套结构的字典

def superPrint(inidic={},indent=chr(32)):
    length=len(inidic)
    for i,d in enumerate(inidic.items()):
        #if the k or v is string object,add ' to both sides
        k,v=["'%s'"%x if isinstance(x,(str,unicode)) else x for x in d]
        #if the v is dict object,recurse across v and return a string
        if isinstance(v,dict):
            v=''.join(superPrint(v,indent+chr(32)*(len(str(k))+3)))
        if length==1:
            yield "{ %s: %s}"%(k,v)
        elif i==0:
            yield "{ %s: %s,\n"%(k,v)
        elif i==length-1:
            yield "%s%s: %s}"%(indent,k,v)
        else:
            yield "%s%s: %s,\n"%(indent,k,v)

x={
'y': {'integer': 6,'decimal': {9: 7, 'tryHarder': {'wow': 8, 'seemsGood': {'wooow': 9}}}},
'x': {'integer': {23: {0: 0, 244: 2}},'decimal': 5},
 2: 'absolute', 
 'zeros': 'leading',
 'gerber-command': 'FS'
 }

print ''.join(superPrint(x))

利用`defaultdict`一行实现树

from collections import defaultdict

def tree(): 
    return defaultdict(tree)

模拟枚举常量

def enum(**enums):
    return type('Enum', (), enums)

Color = enum(RED=0, GREEN=1, BLUE=2)

##### >>> Color.RED
##### 0
##### >>> Color.GREEN
##### 1
##### >>> Color.BLUE
##### 2

并发与并行的新手指南

常用库

itertools

collections

使用 Python 的 collections 模块替代内建容器类型，collections 有三种类型：

deque：增强功能的类似 list 类型；
defaultdict：类似 dict 类型；
namedtuple：类似 tuple 类型。

列表是基于数组实现的，而 deque 是基于双链表的，所以后者在中间 or 前面插入元素，或者删除元素都会快很多。defaultdict 为新的键值添加了一个默认的工厂，可以避免编写一个额外的测试来初始化映射条目，比 dict.setdefault 更高效。

functools

contextlib

weakref

行业应用

计算机视觉

计算机视觉使用的主要Python包有scikit-image、OpenCV、Mahotas、SimpleCV、ilastik、Menpo。

机器学习

机器学习使用的主要Python包有scikit-learn、Milk。

Milk：Milk的重点是提供监督分类法与几种有效的分类分析：SVMs(基于libsvm)，K-NN，随机森林经济和决策树。它还可以进行特征选择。这些分类可以在许多方面相结合，形成不同的分类系统。对于无监督学习，它提供K-means和affinity propagation聚类算法。

Python 3

与 Python 2 的兼容性

利用future包……

参考资料

[1]J. Knupp, “Writing Idiomatic Python.” 2013. [Online]

Values that are not hashable, that is, values containing lists, dictionaries or other mutable types (that are compared by value rather than by object identity) may not be used as keys. ↩
[] == False、[] == True、[] is False和[] is True都返回False，但是用作if等控制的条件时为False，因此可以这样用if foo:，而不能这样用~~if foo == True:~~。 ↩
==判断两个对象的值是否相同，is判断是否为同一对象。 ↩
CPython实现方法：不同类型按类型名排序，不支持比较的相同类型按地址排序。 ↩
不同类型字符串是什么意思？ ↩
Instances of a class cannot be ordered with respect to other instances of the same class, or other types of object, unless … ↩
Python装饰器学习（九步入门）。 ↩

打赏作者

2016-10-24 » NNML（03）：BP 学习
2016-10-16 » NNML（02）：感知器学习
2016-10-09 » 鲁棒及自适应控制（2）：模型
2016-10-09 » NNML（01）：引言
2016-09-24 » 家用监控设备用于电网的可行性分析
2016-09-19 » 无人机电缆隧道巡检可行性调研报告
2016-09-18 » 鲁棒及自适应控制（1）：概论
2016-09-13 » Gradient-Based Learning Applied to Document Recognition

上一篇：向量空间模型下一篇：OpenCV（1）：基于Python的简介

目录