Step deep in python

Old style class

class A:
    def __init__(self):
        self.data = {}
 
    def fun():
        return 0
 
    def __private_method():
        return "only used inside class"

上面是老式的class定义方法(python2.2以前),因为违背了对象完备性(万物皆继承自object类),在后续版本的python中可能会引起不必要的问题,所以不推荐使用。请使用新式类定义:

Note: __private_method实际上可从外部访问,但是这么写从协议上来说我是不想让它从外面直接调用的。参见:https://www.liaoxuefeng.com/wiki/1016959663602400/1017496679217440

class B(object):
    def __init__(self):
        self.data = {}
 
    def fun():
        pass

笼统的异常处理

def some_really_complicated_method():
    try:
        foo = bar...
        # other long codes...
    except Exception as e:
        print(e)

try...except将整个函数包起来,那么里面的代码片抛出的异常全部被外层的笼统异常捕获,这对于定位异常是很麻烦的,因为它隐去了抛出异常的点及其细节。

Everything is object (except keywords)

  • print在python2.x属于关键字,不是对象
  • print在python3中属于函数,是对象

Iterators

for i in xrange(n): pass
 
# 3 ways to iterate a dict
for k in dict_obj.iterkeys(): pass
for v in dict_obj.itervalues(): pass
for k, v in dict_obj.iteritems(): pass
 
for index, value in enumerate(seq_obj): pass
for v1, v2 in zip(seq1, seq2): pass
for line in open("bigfile.txt", "rt"): pass

Implement your iterator

Meets iteration protocol:

  • iter()
  • next()
  • Exception of StopIteration

Semantic of for

for v in lst:
    print v

is equivalent to

it = iter(lst)
try:
    while True:
        v = it.next()
        print v
except StopIteration:
    pass

For example,

class Foo(object):
    def __init__(self):
        self._list = [1,2,3,4,5]
        self._curr = 0
 
    def __iter__(self):
        self._curr = 0
        return self
 
    def __next__(self):
        if self._curr < len(self._list):
            res = self._list[self._curr]
            self._curr += 1
            return res
        else:
            raise StopIteration

Foo is iterable, let’s test

a = Foo()
for e in a:
    print e
 
# 1
# 2
# 3
# 4
# 5

one more

for e1 in a:
    for e2 in a:
        print (e1, e2)
 
# (1,1)
# (1,2)
# (1,3)
# (1,4)
# (1,5)

What’s the problem?

内层循环已经耗尽了元素,抛出停止迭代的异常,外层循环已经没有元素可迭代了。

What’s the solution?

class FooIterator(object):
    def __init__(self, foo):
        self._foo = foo
        self._curr = 0
 
    def next(self):
        if self._curr < len(self._foo):
            res = self._foo[self._curr]
            self._curr += 1
            return res
        else:
            raise StopIteration
 
class Foo(objcet):
    def __init__(self):
        self._list = [1,2,3,4,5]
 
    def __iter__(self):
        return FooIterator(self)
 
    def __len__(self):
        return len(self._list)
 
    def __getitem__(self, index):
        return self._list[index]

Now let’s test

a = Foo()
for e1 in a:
    for e2 in a:
        print (e1, e2)

Semantic of yield

def foo():
    lst = [1,2,3]
    for v in lst:
        yield v
 
g = foo()
print g.next()
print g.next()
print g.next()
print g.next()
  • Function contains a yield will be considered as generator
  • Generator object will be created upon call of the function
  • Generator object supports iteration protocol

See the next:

print foo()
print foo()
print foo()
print foo()
 
# <generator object foo at 0x0000012DF0C1CDC8>
# <generator object foo at 0x0000012DF0C1CDC8>
# <generator object foo at 0x0000012DF0C1CDC8>
# <generator object foo at 0x0000012DF0C1CDC8>

实际上每次生成的确实不是同一个对象,但因为内存的局部性。python垃圾回收使用引用计数,计数为0则销毁对象。foo()创建完没被引用,因此立即被销毁。然后下一个foo()被创建,由于内存的局部性,计算机很大可能会在刚刚那块内存上开辟空间,创建对象,并且两次创建的对象大小一样,内存对齐,所以更加有概率会在同一个地址上。因此,每次输出的地址一样,但其实是不同对象。

Iterator with closure

  • ctor
  • copy ctor
  • copy assignment
  • move ctor
  • move assignment

Python basic type

  • numeric: immutable
  • string: immutable
  • list: mutable
  • tuple: immutable
  • dict: mutable

参数传递:对于immutable的类型,传值;对于mutable的类型,传引用。

def foo(a, items=[], added=True):
    if added:
        items.append(a)
    print items
 
foo(1)
foo(2)
foo(3, added=False
    
# [1]
# [1, 2]
# [1, 2]

为何?Python参数默认值保存在函数对象上(一切皆对象),当需要读默认参数时,都从foo.param_list里面取,而list又是可变对象,所以一直存在函数对象上,不会在第二次调用时重新初始化,每次取默认参数都拿到同一个对象。

Namesapce and object space

A namespace is a mapping from names to objects. Most namespaces are currently implemented as python dictionaries.

  • each function has its own local namespace
  • each module has its own global namespace
  • python built-in namespace (built-in is actually a module (__builtins__), and has its own global namespace, which is called built-in namespace)

Namespace locating

LGB rules: local parent local global built-in

The global keyword just tell the interpreter the name is locating in global namespace, not the local.

Note: write operation will sheild the namelooking process, see below

class Foo(object):
    def __init__(self):
        self._list = [1, 2, 3, 4, 5]
 
    def __iter__(self):
        counter = 0
 
        def get():
            if counter >= len(self._list):
                return None
            res = self._list[counter]
            counter += 1
            return res
 
        return iter(get, None)
 
a = Foo()
for e1 in a:
    for e2 in a:
        print (e1, e2)

will outputs

Traceback (most recent call last):
  File "e:/Documents/learn2live/notes/test.py", line 38, in <module>
    for e1 in a:
  File "e:/Documents/learn2live/notes/test.py", line 29, in get
    if counter >= len(self._list):
UnboundLocalError: local variable 'counter' referenced before assignment

What’t the problem?

解释器读到get函数时,为其创建local namespace, 因为其中有counter += 1这个写操作,所以local namespace为创建一个条目counter:

__localnamespace__ = {
    'counter': ,   # empty map
    'others': ...,
    ...
}

此时解释器认为在get函数内部找到了counter这个名字,因此不会再往上寻找。

Reflection

  • dir(obj): return a list of names, which can be accessed by the mode obj.xxx
  • obj.__dict__: a map of (name, object) pairs, which can be accessed in the object’s namespace (local)

== v.s. is

a = 'abcdefg'
b = ''.join([chr(ord('a') + i) for i in xrange(7)])
 
print a == b
print a is b
 
# True
# False

Hold on…

a = 'a'
b = 'abcdefg'[0]
print a == b	# True
print a is b	# True
 
a = 10**9
b = 10**9
print a == b	# True
print a is b	# False
 
a = 10
b = 10
print a == b	# True
print a is b	# True
  • semantics of ==: the values of two object are the same
  • semantics of is: the objects referenced by two names are the same one
    • a is b is equivalent to id(a) == id(b)
    • some types (str or int), use an object pool to manage some special objects for optimization purpose
    • the type NoneType has only one instance - None
    • the type bool has only two instance - True/False

小整数及单字符优化:经验发现,大家在程序中使用的大部分都是较小的整数,或者是单个的字符,因此对于这种类型的对象,python做了缓存池。取的时候直接从池子里拿,所以两个id一样。但超出范围的话,就会重新构造对象,因此id不一样。

Semantics of import

import m

  1. Check if it is in sys.modules, (sys.modules.has_key('m'))
  2. Goto 4, when it is true
  3. Otherwise, load the modulem” to create the module object
  4. Place a name “m” in current namespace, and let the name refer to the object sys.modules['m']

Load the module

  1. Open the carrier (i.e., xxx.py, xxx.pyc, xxx.pyd, xxx.o, xxx.dll, etc.) corresponding to “m” - the carrier differs in cases
  2. Create an empty module object named “m” and place it into sys.modules
  3. Execute the statemtns in the module sequentially within the namespace of the module object, i.e., code-scan
# file: t014.py
print 't014'
import t015
 
# file: t015.py
print 't015'
import t016
a_in_t015 = 10
 
# file t016.py
print 't016'
import t015
print t015.a_in_t015
 
# now execute t014.py
t014
t015
t016
Traceback (most recent call last):
  File "e:/Documents/learn2live/notes/t014.py", line 2, in <module>
    import t015
  File "e:\Documents\learn2live\notes\t015.py", line 2, in <module>
    import t016
  File "e:\Documents\learn2live\notes\t016.py", line 3, in <module>
    print t015.a_in_t015
AttributeError: 'module' object has no attribute 'a_in_t015'

为何?仔细看import的语义:

  1. 执行t014.py,首先打印t014,然后import t015
  2. sys.modules里面找,没有,创建之,执行t015.py的代码,打印t015,然后import t016
  3. sys.modules里面找,没有,创建之,执行t016.py的代码,打印t016,然后import t015
  4. sys.modules里面找,有,导入结束,继续执行t016.py的代码,打印t015.a_in_t015,但此时t015这个模块还没有这个成员,因为其代码还没有执行完,因此报错

NOTE

  • Do NOT do much consuming work at module level
  • Do NOT form reference cycle
  • Do NOT place all the import statement at the very begining of the module

from m import sth

  1. Execute the import semantics above, WITHOUT put the name “m” into current namespace
  2. Search the name “sth” in the namespace of the module “m”
  3. If it exists, put the name “sth” into current namespace, and refer the name “sth” to the object that “m.sth” refers to
  4. If it doesn’t exist, raise an exception
# t017.py
MaxHp = 100
a = [1, 2]
 
# t018.py
from t017 import MaxHp, a
import t017
 
print MaxHp		# 100
print t017.MaxHp # 100
print a			# [1, 2]
print t017.a	# [1, 2]
 
t017.MaxHp = 200
a = [1, 2, 3]
 
print MaxHp			# 100
print t017.MaxHp	# 200
print a			# [1, 2, 3]
print t017.a	# [1, 2]

为何?仔细看from/import语义,第一串print很好理解。来看第二串,from/import语义将MaxHp放到t018的名称空间,并将其指向t017.MaxHp所指的对象,也就是100这个int对象。因此当执行t017.MaxHp = 200后,仅仅是将200这个对象赋给t017.MaxHp这个变量。但t018名称空间下的MaxHp仍然指向100这个对象。而a = [1,2,3][1,2,3]这个list对象赋给a,那么变量a就不再指向t017.a所指向的那个对象了。

Python中的赋值仅仅是改变变量所指向的对象,不存在变量的引用,只有对象的引用。

a = 1
b = a	# b指向a所指的那个对象
a = 2	# a现在换了一个对象指了,但b没换!
 
print a	# 2
print b	# 1

NOTE

  • from m import * would pollute the current namespace!

Semantics of del

See: https://docs.python.org/3/reference/datamodel.html

Note

del x doesn’t directly call x.__del__() — the former decrements the reference count for x by one, and the latter is only called when x’s reference count reaches zero.

Decorator

Ref

Generator

see: https://stackoverflow.com/a/231855

Enum

Ref: https://www.liaoxuefeng.com/wiki/1016959663602400/1017595944503424

To use enum in python, see

from enum import Enum, unique
 
@unique
class Month(Enum):
    Jan = 0
    Feb = 1
    Mar = 2
    # ...
    Dec = 11
    
for name, member in Month.__members__.items():
    print name, "=>", member, '=>', member.value

Python运行时函数创建

class A(object):
    def func(self):
        pass
    
>>> a = A()
>>> id(a.func)
44690448
>>> id(a.func)
44690448
>>> b = a.func
>>> id(b)
44690448
>>> id(b)
44690448
>>> id(a.func)
44610264
>>> id(A.func)
44690528
>>> id(A.func)
44610264
>>> id(A.func)
44610264
>>> c = A.func
>>> id(c)
44610264
>>> id(c)
44610264
>>> id(A.func)
44690528
>>> id(A.func)
44690528
>>>

可以看到,python调用类方法都是“即调即创”,即调用的时候才创建一个函数对象(object),调用完即销毁。上例中,虽然ba.func逻辑一模一样,但却是两个不同的对象。

Meta class

see: https://medium.com/@dboyliao/%E6%B7%BA%E8%AB%87-python-metaclass-dfacf24d6dd5

Keep in mind two things:

  1. Everything is an object in python.
  2. Every object has a type.

對我來說,關於這個問題我喜歡引用在 C 語言規格書像是 c99 裡面對於 object 的定義:

region of data storage in the execution environment, the contents of which can represent values

class A(object):
    def func(self):
        print 'A.func'
 
>>> a = A()
>>> type(a)
<class '__main__.A'>
>>> type(A)
<type 'type'>
>>> isinstance(a, object)
True
>>> isinstance(A, object)
True  # A is a type, everything (type) is an object
>>> isinstance(type, object)
True  # everything (type) is an object
 
# here comes weird things
>>> isinstance(object, type)
True  # object is a type
>>> isinstance(object, object)
True  # everything (type) is an object
 
>>> type(type)
<type 'type'>  # type is not a function! It is a class, then an object, then a type!
>>> type(object)
<type 'type'>
 
>>> issubclass(type, object)
True
>>> isinstance(type, object)
True
>>> isinstance(object, type)
True

Oh, what the fuck!

根據 python 的 data modeltype.__new__ 的 signature 是長這樣的:

type.new(mcls, name, base, attribs)

  • mcls: metaclass 物件本身
  • name: 要被創建的 class 的名字
  • base: 要被創建的 class 所繼承的其他 class
  • attribs: 要被創建的 class 本身的各項 attribute

舉個例子來說,在 python 裡以下兩種寫法是等價的:

class MyClass(object):
    ANSWER = 42
    
    def speak(self):
        print('the answer to life is {}'.format(self.ANSWER))
# or
MyClass = type('MyClass',
              (object,),
              {
                  'ANSWER': 42,
                  'speak': lambda self: print('the answer to life is {}'.format(self.ANSWER))
              }
)

眼尖的讀者或許會問: 那 mcls 跑哪兒去了?

其實當你寫下 type(...) 時,其實 mcls 會是 type 自己。也就是說等同於使用 type.__new__(type, 'MyClass', (object,), ...)