Step deep in python
Old style class
class A:
def __init__(self):
self.data = {}
def fun():
return 0
def __private_method():
return "only used inside class"
上面是老式的class定义方法(python2.2以前),因为违背了对象完备性(万物皆继承自object类),在后续版本的python中可能会引起不必要的问题,所以不推荐使用。请使用新式类定义:
Note:
__private_method
实际上可从外部访问,但是这么写从协议上来说我是不想让它从外面直接调用的。参见:https://www.liaoxuefeng.com/wiki/1016959663602400/1017496679217440
class B(object):
def __init__(self):
self.data = {}
def fun():
pass
笼统的异常处理
def some_really_complicated_method():
try:
foo = bar...
# other long codes...
except Exception as e:
print(e)
用try...except
将整个函数包起来,那么里面的代码片抛出的异常全部被外层的笼统异常捕获,这对于定位异常是很麻烦的,因为它隐去了抛出异常的点及其细节。
Everything is object (except keywords)
print
在python2.x属于关键字,不是对象print
在python3中属于函数,是对象
Iterators
for i in xrange(n): pass
# 3 ways to iterate a dict
for k in dict_obj.iterkeys(): pass
for v in dict_obj.itervalues(): pass
for k, v in dict_obj.iteritems(): pass
for index, value in enumerate(seq_obj): pass
for v1, v2 in zip(seq1, seq2): pass
for line in open("bigfile.txt", "rt"): pass
Implement your iterator
Meets iteration protocol:
- iter()
- next()
- Exception of StopIteration
Semantic of for
for v in lst:
print v
is equivalent to
it = iter(lst)
try:
while True:
v = it.next()
print v
except StopIteration:
pass
For example,
class Foo(object):
def __init__(self):
self._list = [1,2,3,4,5]
self._curr = 0
def __iter__(self):
self._curr = 0
return self
def __next__(self):
if self._curr < len(self._list):
res = self._list[self._curr]
self._curr += 1
return res
else:
raise StopIteration
Foo is iterable, let’s test
a = Foo()
for e in a:
print e
# 1
# 2
# 3
# 4
# 5
one more
for e1 in a:
for e2 in a:
print (e1, e2)
# (1,1)
# (1,2)
# (1,3)
# (1,4)
# (1,5)
What’s the problem?
内层循环已经耗尽了元素,抛出停止迭代的异常,外层循环已经没有元素可迭代了。
What’s the solution?
class FooIterator(object):
def __init__(self, foo):
self._foo = foo
self._curr = 0
def next(self):
if self._curr < len(self._foo):
res = self._foo[self._curr]
self._curr += 1
return res
else:
raise StopIteration
class Foo(objcet):
def __init__(self):
self._list = [1,2,3,4,5]
def __iter__(self):
return FooIterator(self)
def __len__(self):
return len(self._list)
def __getitem__(self, index):
return self._list[index]
Now let’s test
a = Foo()
for e1 in a:
for e2 in a:
print (e1, e2)
Semantic of yield
def foo():
lst = [1,2,3]
for v in lst:
yield v
g = foo()
print g.next()
print g.next()
print g.next()
print g.next()
- Function contains a yield will be considered as generator
- Generator object will be created upon call of the function
- Generator object supports iteration protocol
See the next:
print foo()
print foo()
print foo()
print foo()
# <generator object foo at 0x0000012DF0C1CDC8>
# <generator object foo at 0x0000012DF0C1CDC8>
# <generator object foo at 0x0000012DF0C1CDC8>
# <generator object foo at 0x0000012DF0C1CDC8>
实际上每次生成的确实不是同一个对象,但因为内存的局部性。python垃圾回收使用引用计数,计数为0则销毁对象。foo()
创建完没被引用,因此立即被销毁。然后下一个foo()
被创建,由于内存的局部性,计算机很大可能会在刚刚那块内存上开辟空间,创建对象,并且两次创建的对象大小一样,内存对齐,所以更加有概率会在同一个地址上。因此,每次输出的地址一样,但其实是不同对象。
Iterator with closure
- ctor
- copy ctor
- copy assignment
- move ctor
- move assignment
Python basic type
- numeric: immutable
- string: immutable
- list: mutable
- tuple: immutable
- dict: mutable
参数传递:对于immutable的类型,传值;对于mutable的类型,传引用。
def foo(a, items=[], added=True):
if added:
items.append(a)
print items
foo(1)
foo(2)
foo(3, added=False
# [1]
# [1, 2]
# [1, 2]
为何?Python参数默认值保存在函数对象上(一切皆对象),当需要读默认参数时,都从foo.param_list
里面取,而list
又是可变对象,所以一直存在函数对象上,不会在第二次调用时重新初始化,每次取默认参数都拿到同一个对象。
Namesapce and object space
A namespace is a mapping from names to objects. Most namespaces are currently implemented as python dictionaries.
- each function has its own local namespace
- each module has its own global namespace
- python built-in namespace (built-in is actually a module (
__builtins__
), and has its own global namespace, which is called built-in namespace)
Namespace locating
LGB rules: local → parent local → … → global → built-in
The global
keyword just tell the interpreter the name is locating in global namespace, not the local.
Note: write operation will sheild the namelooking process, see below
class Foo(object):
def __init__(self):
self._list = [1, 2, 3, 4, 5]
def __iter__(self):
counter = 0
def get():
if counter >= len(self._list):
return None
res = self._list[counter]
counter += 1
return res
return iter(get, None)
a = Foo()
for e1 in a:
for e2 in a:
print (e1, e2)
will outputs
Traceback (most recent call last):
File "e:/Documents/learn2live/notes/test.py", line 38, in <module>
for e1 in a:
File "e:/Documents/learn2live/notes/test.py", line 29, in get
if counter >= len(self._list):
UnboundLocalError: local variable 'counter' referenced before assignment
What’t the problem?
解释器读到get
函数时,为其创建local namespace, 因为其中有counter += 1
这个写操作,所以local namespace为创建一个条目counter
:
__localnamespace__ = {
'counter': , # empty map
'others': ...,
...
}
此时解释器认为在get
函数内部找到了counter
这个名字,因此不会再往上寻找。
Reflection
dir(obj)
: return a list of names, which can be accessed by the modeobj.xxx
obj.__dict__
: a map of (name, object) pairs, which can be accessed in the object’s namespace (local)
==
v.s. is
a = 'abcdefg'
b = ''.join([chr(ord('a') + i) for i in xrange(7)])
print a == b
print a is b
# True
# False
Hold on…
a = 'a'
b = 'abcdefg'[0]
print a == b # True
print a is b # True
a = 10**9
b = 10**9
print a == b # True
print a is b # False
a = 10
b = 10
print a == b # True
print a is b # True
- semantics of
==
: the values of two object are the same - semantics of
is
: the objects referenced by two names are the same onea is b
is equivalent toid(a) == id(b)
- some types (
str
orint
), use an object pool to manage some special objects for optimization purpose - the type
NoneType
has only one instance -None
- the type
bool
has only two instance -True/False
小整数及单字符优化:经验发现,大家在程序中使用的大部分都是较小的整数,或者是单个的字符,因此对于这种类型的对象,python做了缓存池。取的时候直接从池子里拿,所以两个id一样。但超出范围的话,就会重新构造对象,因此id不一样。
Semantics of import
import m
- Check if it is in
sys.modules
, (sys.modules.has_key('m')
) - Goto 4, when it is true
- Otherwise, load the module “
m
” to create the module object - Place a name “m” in current namespace, and let the name refer to the object
sys.modules['m']
Load the module
- Open the carrier (i.e., xxx.py, xxx.pyc, xxx.pyd, xxx.o, xxx.dll, etc.) corresponding to “m” - the carrier differs in cases
- Create an empty module object named “m” and place it into
sys.modules
- Execute the statemtns in the module sequentially within the namespace of the module object, i.e., code-scan
# file: t014.py
print 't014'
import t015
# file: t015.py
print 't015'
import t016
a_in_t015 = 10
# file t016.py
print 't016'
import t015
print t015.a_in_t015
# now execute t014.py
t014
t015
t016
Traceback (most recent call last):
File "e:/Documents/learn2live/notes/t014.py", line 2, in <module>
import t015
File "e:\Documents\learn2live\notes\t015.py", line 2, in <module>
import t016
File "e:\Documents\learn2live\notes\t016.py", line 3, in <module>
print t015.a_in_t015
AttributeError: 'module' object has no attribute 'a_in_t015'
为何?仔细看import的语义:
- 执行t014.py,首先打印t014,然后
import t015
- 去
sys.modules
里面找,没有,创建之,执行t015.py的代码,打印t015,然后import t016
- 去
sys.modules
里面找,没有,创建之,执行t016.py的代码,打印t016,然后import t015
- 去
sys.modules
里面找,有,导入结束,继续执行t016.py的代码,打印t015.a_in_t015,但此时t015这个模块还没有这个成员,因为其代码还没有执行完,因此报错
NOTE
- Do NOT do much consuming work at module level
- Do NOT form reference cycle
- Do NOT place all the import statement at the very begining of the module
from m import sth
- Execute the import semantics above, WITHOUT put the name “m” into current namespace
- Search the name “sth” in the namespace of the module “m”
- If it exists, put the name “sth” into current namespace, and refer the name “sth” to the object that “m.sth” refers to
- If it doesn’t exist, raise an exception
# t017.py
MaxHp = 100
a = [1, 2]
# t018.py
from t017 import MaxHp, a
import t017
print MaxHp # 100
print t017.MaxHp # 100
print a # [1, 2]
print t017.a # [1, 2]
t017.MaxHp = 200
a = [1, 2, 3]
print MaxHp # 100
print t017.MaxHp # 200
print a # [1, 2, 3]
print t017.a # [1, 2]
为何?仔细看from/import
语义,第一串print很好理解。来看第二串,from/import语义将MaxHp放到t018的名称空间,并将其指向t017.MaxHp所指的对象,也就是100这个int对象。因此当执行t017.MaxHp = 200
后,仅仅是将200这个对象赋给t017.MaxHp
这个变量。但t018名称空间下的MaxHp仍然指向100这个对象。而a = [1,2,3]
将[1,2,3]
这个list对象赋给a,那么变量a就不再指向t017.a
所指向的那个对象了。
Python中的赋值仅仅是改变变量所指向的对象,不存在变量的引用,只有对象的引用。
a = 1
b = a # b指向a所指的那个对象
a = 2 # a现在换了一个对象指了,但b没换!
print a # 2
print b # 1
NOTE
from m import *
would pollute the current namespace!
Semantics of del
See: https://docs.python.org/3/reference/datamodel.html
Note
del x
doesn’t directly call x.__del__()
— the former decrements the reference count for x
by one, and the latter is only called when x
’s reference count reaches zero.
Decorator
Ref
Generator
see: https://stackoverflow.com/a/231855
Enum
Ref: https://www.liaoxuefeng.com/wiki/1016959663602400/1017595944503424
To use enum in python, see
from enum import Enum, unique
@unique
class Month(Enum):
Jan = 0
Feb = 1
Mar = 2
# ...
Dec = 11
for name, member in Month.__members__.items():
print name, "=>", member, '=>', member.value
Python运行时函数创建
class A(object):
def func(self):
pass
>>> a = A()
>>> id(a.func)
44690448
>>> id(a.func)
44690448
>>> b = a.func
>>> id(b)
44690448
>>> id(b)
44690448
>>> id(a.func)
44610264
>>> id(A.func)
44690528
>>> id(A.func)
44610264
>>> id(A.func)
44610264
>>> c = A.func
>>> id(c)
44610264
>>> id(c)
44610264
>>> id(A.func)
44690528
>>> id(A.func)
44690528
>>>
可以看到,python调用类方法都是“即调即创”,即调用的时候才创建一个函数对象(object),调用完即销毁。上例中,虽然b
和a.func
逻辑一模一样,但却是两个不同的对象。
Meta class
see: https://medium.com/@dboyliao/%E6%B7%BA%E8%AB%87-python-metaclass-dfacf24d6dd5
Keep in mind two things:
- Everything is an object in python.
- Every object has a type.
對我來說,關於這個問題我喜歡引用在 C 語言規格書像是 c99 裡面對於 object 的定義:
region of data storage in the execution environment, the contents of which can represent values
class A(object):
def func(self):
print 'A.func'
>>> a = A()
>>> type(a)
<class '__main__.A'>
>>> type(A)
<type 'type'>
>>> isinstance(a, object)
True
>>> isinstance(A, object)
True # A is a type, everything (type) is an object
>>> isinstance(type, object)
True # everything (type) is an object
# here comes weird things
>>> isinstance(object, type)
True # object is a type
>>> isinstance(object, object)
True # everything (type) is an object
>>> type(type)
<type 'type'> # type is not a function! It is a class, then an object, then a type!
>>> type(object)
<type 'type'>
>>> issubclass(type, object)
True
>>> isinstance(type, object)
True
>>> isinstance(object, type)
True
Oh, what the fuck!
根據 python 的 data model , type.__new__
的 signature 是長這樣的:
type.new(mcls, name, base, attribs)
- mcls: metaclass 物件本身
- name: 要被創建的 class 的名字
- base: 要被創建的 class 所繼承的其他 class
- attribs: 要被創建的 class 本身的各項 attribute
舉個例子來說,在 python 裡以下兩種寫法是等價的:
class MyClass(object):
ANSWER = 42
def speak(self):
print('the answer to life is {}'.format(self.ANSWER))
# or
MyClass = type('MyClass',
(object,),
{
'ANSWER': 42,
'speak': lambda self: print('the answer to life is {}'.format(self.ANSWER))
}
)
眼尖的讀者或許會問: 那 mcls 跑哪兒去了?
其實當你寫下 type(...)
時,其實 mcls 會是 type 自己。也就是說等同於使用 type.__new__(type, 'MyClass', (object,), ...)
。