详解Python字典查找性能

详解Python字典查找性能

timeit.repeat

timeit.repeat默认会执行3轮,每轮执行1000000次。返回每轮的总执行时间列表

字典获取性能

大家都知道字典获取分为

中括号获取,获取不到会抛出KeyError

get获取,获取不到会返回默认值

下面比较两种获取方式的性能

数据准备

一条简单一条复杂

# logging标准库的level字典
level_mapping = {"CRITICAL": 50, "FATAL": 50, "ERROR": 40, "WARN": 30, "WARNING": 30, "INFO": 20, "DEBUG": 10, "NOTSET": 0}
# elasticsearch日志
record = {"_index": "logstash-project.test-env.release-user.root-2021", "_type": "doc", "_id": "2f60jn0BaH-cdSPUSkiF", "_version": 1, "_score": None, "_source": {"method": "GET", "index_name": "project.test-env.release-user.root", "@version": "flask", "path": "D:alphaflasklogstashcoreflask.py", "logger_name": "flask.exception", "stack_info": None, "user": "root", "@timestamp": "2021-12-06T07:45:20.056Z", "level": "ERROR", "thread_name": "Thread-5", "type": "exception", "env": "release", "process": 8716, "funcName": "exceptions", "port": 55792, "project": "test", "tags": [], "lineno": 89, "request": {"headers": {"Accept-Encoding": "gzip, deflate, br", "Connection": "keep-alive", "Postman-Token": "359faa6e-9527-4de7-82ff-eecb92656875", "User-Agent": "PostmanRuntime/7.28.4", "Cookie": "csrftoken=bf58fmaG5wBVabJwBeD8srVsfw7EjKe0VN7xD8mu817UzVm", "Accept": "*/*", "Host": "127.0.0.1:5000"}, "args": {"a": "11", "b": "22"}}, "message": "division by zero", "host": "DESKTOP-JCQ9527", "status_code": 500, "stack_trace": "Traceback (most recent call last):
  File "D:Envslogstashlibsite-packagesflaskapp.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "D:Envslogstashlibsite-packagesflaskapp.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "D:flasklogstashapp.py", line 112, in get_raise
    a/0
ZeroDivisionError: division by zero
", "remote_addr": "127.0.0.1", "url": "http://127.0.0.1:5000/raise?a=11&b=22"}, "fields": {"@timestamp": ["2021-12-06T07:45:20.056Z"]}, "sort": [1638776720056]}
def test():
    level_mapping["CRITICAL"]
timeit.repeat(lambda: test())
[0.08700739999994767, 0.0864886999997907, 0.08675769999990735]
# 中括号加try except
def test1():
    try:
        level_mapping["CRITICA"]
    except:
        pass
timeit.repeat(lambda: test1())
[0.09164779999991879, 0.0921809999999823, 0.09076550000099814]  # key存在
[0.17694680000022345, 0.1759290999998484, 0.17659119999916584]  # key不存在
def test2():
    level_mapping.get("CRITICAL") 
timeit.repeat(lambda: test2())
[0.131671400000414, 0.12985489999982747, 0.13035420000005615]

中括号获取比get方式快了50%左右;

但当使用中括号加try except,key不存在时要慢近一倍;

使用get时key存不存在设不设默认值都一样。

复杂获取

def test3():
    level_mapping[record["_source"]["level"]]
timeit.repeat(lambda: test3())
[0.1141027999999551, 0.11351319999994303, 0.11431539999989582]
def test4():
    level_mapping.get(record.get("_source").get("level"))
timeit.repeat(lambda: test4())
[0.22142400000007, 0.21937850000017534, 0.21913369999992938]

随着数据的复杂嵌套和链式操作,这次快了整整一倍。

总结

在能非常确定key存在且频繁获取数据的情况下,应该尽量使用中括号取值。

另外在Python3.6后,重写了字典的底层数据结构,从而使字典变得有序。

本篇文章就到这里了,希望能够给你带来帮助,也希望您能够多多关注云海天教程的更多内容!