1.12 序列中出现次数最多的元素

1.12 序列中出现次数最多的元素

问题描述

怎样找出一个序列中出现次数最多的元素呢?

解决方案

collections.Counter类就是专门为这类问题而设计的,它甚至有一个有用的most_common()方法直接给了你答案。

from collections import Counter
words = ["look", "into", "my", "eyes", "look", "into", "my", "eyes", "the", "eyes", "the", "eyes", "the", "eyes", "not", "around", "the", "eyes", "don"t", "look", "around", "the", "eyes", "look", "into", "my", "eyes", "you"re", "under"]
word_counts = Counter(words)
top_three = word_counts.most_common(3)  # [("eyes", 8), ("the", 5), ("look", 4)]

讨论

Counter对象在几乎所有需要制表或者计数数据的场合是非常有用的工具。在解决这类问题时应该有限选择它,而不是手动的利用字典去实现。

Counter对象可接受任意的由可哈希(hashable)元素构成的序列对象。在底层实现上,一个Counter就是一个字典,将元素映射到它出现的次数上。

word_counts["not"]  # 1
word_counts["eyes"]  # 8

如果想手动增加计数,可以简单的使用加法:

morewords = ["why", "are", "you", "not", "looking", "in", "my", "eyes"]
for word in morewords:
    word_counts[word] += 1
"""
word_counts["eyes"] = 9
"""

或者用update()方法:

word_counts.update(morewords)

Counter实例还可以跟数学运算操作相结合,比如:

from collections import Counter
words = ["look", "into", "my", "eyes", "look", "into", "my", "eyes", "the", "eyes", "the", "eyes", "the", "eyes", "not", "around", "the", "eyes", "don"t", "look", "around", "the", "eyes", "look", "into", "my", "eyes", "you"re", "under"]
morewords = ["why", "are", "you", "not", "looking", "in", "my", "eyes"]

a = Counter(words)
b = Counter(morewords)

c = a + b  # c = Counter({"eyes": 9, "the": 5, "look": 4, "my": 4, "into": 3, "not": 2, "around": 2, "don"t": 1, "you"re": 1, "under": 1, "why": 1, "are": 1, "you": 1, "looking": 1, "in": 1})

d = a - b  # d = Counter({"eyes": 7, "the": 5, "look": 4, "into": 3, "my": 2, "around": 2, "don"t": 1, "you"re": 1, "under": 1})