02-迭代器与工具函数

难度：⭐⭐⭐ 专家 预计时间：45分钟 前置知识：迭代器基础、生成器概念、for 循环、列表推导式 引入版本：Python 2.3+

itertools 是 Python 标准库中最强大的模块之一，提供高效迭代器工具，底层由 C 实现，支持惰性求值。

为什么需要 itertools？

问题场景：大数据处理

python

# ❌ 方案一：列表推导式，立即生成全部数据
nums = [x * 2 for x in range(10_000_000)]  # 内存爆炸

# ❌ 方案二：多层嵌套循环，代码冗长
result = []
for c in colors:
    for s in sizes:
        for m in materials:
            result.append((c, s, m))

# ✅ 方案三：itertools 惰性迭代，零内存占用
import itertools
nums_iter = map(lambda x: x * 2, itertools.count(0))  # 惰性
combos = itertools.product(colors, sizes, materials)   # 惰性

问题：如何高效处理大数据、复杂迭代，而不占用内存？

概念铺垫

┌─────────────────────────────────────────────────────────────┐
│          itertools 模块核心组件                              │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  1. 无限迭代器                                              │
│  ─────────────────────────────────────────────              │
│  • count(start, step)  → 无限递增序列                       │
│  • cycle(iterable)     → 无限循环遍历                       │
│  • repeat(obj, times)  → 无限重复对象                       │
│                                                             │
│  2. 序列处理                                                │
│  ─────────────────────────────────────────────              │
│  • chain               → 拼接多个迭代器                     │
│  • islice              → 惰性切片                           │
│  • groupby             → 分组（需先排序）                   │
│  • zip_longest         → 不等长配对                         │
│                                                             │
│  3. 排列组合                                                │
│  ─────────────────────────────────────────────              │
│  • product             → 笛卡尔积（嵌套循环替身）           │
│  • permutations        → 排列（有序）                       │
│  • combinations        → 组合（无序）                       │
│                                                             │
│  4. 过滤分组                                                │
│  ─────────────────────────────────────────────              │
│  • filterfalse         → 反向过滤                           │
│  • takewhile           → 满足条件时取值                     │
│  • dropwhile           → 满足条件时跳过                     │
│  • accumulate          → 累积计算                           │
│                                                             │
│  生活类比：                                                   │
│  count     = 数字流水线（源源不断）                         │
│  cycle     = 旋转餐厅（轮流服务）                           │
│  chain     = 传送带拼接（无缝衔接）                         │
│  product   = 餐厅菜单组合（所有搭配）                       │
│                                                             │
│  核心特性：                                                   │
│  ⚡ 惰性求值：不立即生成，按需计算                           │
│  ⚡ C 语言速度：比纯 Python 循环快 10 倍+                    │
│  ⚡ 零内存占用：迭代器只存储状态                             │
│                                                             │
└─────────────────────────────────────────────────────────────┘

L1 理解层：会用

无限迭代器

语法结构

┌─────────────────────────────────────────────────────────────┐
│  无限迭代器语法                                              │
│                                                             │
│  count(start=0, step=1)                                     │
│  → 从 start 开始，每次增加 step，无限递增                   │
│                                                             │
│  cycle(iterable)                                            │
│  → 无限循环遍历 iterable                                    │
│                                                             │
│  repeat(object, times=None)                                 │
│  → 无限重复 object（times 指定次数则有限）                  │
│                                                             │
│  ⚠️ 警告：无限迭代器永不停止，需配合 break 或 islice         │
│                                                             │
└─────────────────────────────────────────────────────────────┘

最简示例

python

import itertools

# count: 无限计数
counter = itertools.count(start=0, step=2)
print(next(counter))  # 0
print(next(counter))  # 2
print(next(counter))  # 4

# cycle: 无限循环
colors = itertools.cycle(['R', 'G', 'B'])
print(next(colors))  # 'R'
print(next(colors))  # 'G'
print(next(colors))  # 'B'
print(next(colors))  # 'R' (循环)

# repeat: 无限重复
repeater = itertools.repeat('Hello')
print(next(repeater))  # 'Hello'
print(next(repeater))  # 'Hello'

详细示例

python

import itertools

# count 支持浮点数
counter = itertools.count(start=1.5, step=0.5)
for i, val in enumerate(counter):
    if i >= 5:
        break
    print(val)  # 1.5, 2.0, 2.5, 3.0, 3.5

# cycle 实现轮询
servers = ['Server-A', 'Server-B', 'Server-C']
lb = itertools.cycle(servers)
for i in range(6):
    print(f"请求 {i+1} -> {next(lb)}")
# 请求1->A, 请求2->B, 请求3->C, 请求4->A, ...

# repeat 配合 map
squares = map(pow, range(5), itertools.repeat(2))
print(list(squares))  # [0, 1, 4, 9, 16]

# repeat 指定次数
limited = itertools.repeat('X', 3)
print(list(limited))  # ['X', 'X', 'X']

关键代码说明

代码	含义	为什么这样写
`count(0, 2)`	从0开始，每次+2	比range更灵活，支持浮点、无限延续
`cycle(servers)`	轮询服务器列表	实现负载均衡的简洁方式
`repeat(2)`	无限提供参数2	配合map实现批量平方，避免lambda

序列处理

语法结构

┌─────────────────────────────────────────────────────────────┐
│  序列处理函数语法                                            │
│                                                             │
│  chain(*iterables)                                          │
│  → 将多个迭代器首尾相连                                     │
│                                                             │
│  chain.from_iterable(iterable_of_iterables)                 │
│  → 扁平化嵌套迭代器                                         │
│                                                             │
│  islice(iterable, start, stop[, step])                      │
│  → 对迭代器进行切片（惰性）                                 │
│                                                             │
│  groupby(iterable, key=None)                                │
│  → 按key分组（需先排序）                                    │
│                                                             │
│  zip_longest(*iterables, fillvalue=None)                    │
│  → 不等长配对，用fillvalue填充                              │
│                                                             │
└─────────────────────────────────────────────────────────────┘

最简示例

python

import itertools

# chain: 拼接序列
result = list(itertools.chain([1, 2], [3, 4], [5]))
print(result)  # [1, 2, 3, 4, 5]

# chain.from_iterable: 展平
matrix = [[1, 2], [3, 4], [5]]
flat = list(itertools.chain.from_iterable(matrix))
print(flat)  # [1, 2, 3, 4, 5]

# islice: 惰性切片
result = list(itertools.islice(range(100), 5))
print(result)  # [0, 1, 2, 3, 4]

详细示例

python

import itertools

# chain 处理多个文件
file1 = ['line1', 'line2']
file2 = ['line3', 'line4']
file3 = ['line5']
all_lines = itertools.chain(file1, file2, file3)
print(list(all_lines))  # ['line1', 'line2', 'line3', 'line4', 'line5']

# chain.from_iterable 展平多层
nested = [[1, 2], [3, [4, 5]], [6]]
# 注意：只展平一层，不深度展平
flat = list(itertools.chain.from_iterable(nested))
print(flat)  # [1, 2, 3, [4, 5], 6]

# islice 带步长
result = list(itertools.islice(range(20), 2, 10, 2))
print(result)  # [2, 4, 6, 8]

# groupby 分组（必须先排序）
data = [
    {'name': 'A', 'type': 'X'},
    {'name': 'B', 'type': 'X'},
    {'name': 'C', 'type': 'Y'},
]
data.sort(key=lambda x: x['type'])
for key, group in itertools.groupby(data, key=lambda x: x['type']):
    print(f'{key}: {list(group)}')
# X: [{'name': 'A', 'type': 'X'}, {'name': 'B', 'type': 'X'}]
# Y: [{'name': 'C', 'type': 'Y'}]

# zip_longest 不等长配对
a = [1, 2, 3]
b = ['x', 'y']
result = list(itertools.zip_longest(a, b, fillvalue='-'))
print(result)  # [(1, 'x'), (2, 'y'), (3, '-')]

关键代码说明

代码	含义	为什么这样写
`chain(a, b, c)`	拼接多个序列	避免a+b+c创建临时列表
`chain.from_iterable(matrix)`	展平二维列表	内存O(1)，不创建新列表
`islice(iter, 5)`	取前5个	对无限迭代器也有效
`groupby(data, key)`	分组	必须先排序，否则错误

排列组合

语法结构

┌─────────────────────────────────────────────────────────────┐
│  排列组合函数语法                                            │
│                                                             │
│  product(*iterables, repeat=1)                              │
│  → 笛卡尔积（嵌套循环替身）                                 │
│                                                             │
│  permutations(iterable, r=None)                             │
│  → 排列（有序，AB ≠ BA）                                    │
│                                                             │
│  combinations(iterable, r)                                  │
│  → 组合（无序，AB = BA）                                    │
│                                                             │
│  combinations_with_replacement(iterable, r)                 │
│  → 组合（允许重复）                                         │
│                                                             │
└─────────────────────────────────────────────────────────────┘

最简示例

python

import itertools

# product: 笛卡尔积
colors = ['R', 'G']
sizes = ['S', 'M']
result = list(itertools.product(colors, sizes))
print(result)  # [('R', 'S'), ('R', 'M'), ('G', 'S'), ('G', 'M')]

# permutations: 排列（有序）
result = list(itertools.permutations(['A', 'B', 'C'], 2))
print(result)  # [('A','B'), ('A','C'), ('B','A'), ('B','C'), ('C','A'), ('C','B')]

# combinations: 组合（无序）
result = list(itertools.combinations(['A', 'B', 'C'], 2))
print(result)  # [('A','B'), ('A','C'), ('B','C')]

# combinations_with_replacement: 允许重复
result = list(itertools.combinations_with_replacement(['A', 'B'], 2))
print(result)  # [('A','A'), ('A','B'), ('B','B')]

详细示例

python

import itertools

# product 嵌套循环替身
colors = ['Red', 'Blue']
sizes = ['S', 'M', 'L']
materials = ['Cotton', 'Polyester']

# 原生写法：三层嵌套
# for c in colors:
#     for s in sizes:
#         for m in materials:
#             ...

# itertools 写法
combos = itertools.product(colors, sizes, materials)
print(f"总组合数: {len(list(combos))}")  # 2 * 3 * 2 = 12

# product repeat 参数（自身笛卡尔积）
dice = itertools.product(range(1, 7), repeat=2)  # 两颗骰子
print(f"骰子组合数: {len(list(dice))}")  # 36

# permutations 全排列
all_perms = list(itertools.permutations([1, 2, 3]))
print(f"全排列数: {len(all_perms)}")  # 3! = 6

# combinations 团队组建
members = ['Alice', 'Bob', 'Charlie', 'David']
teams_2 = list(itertools.combinations(members, 2))
print(f"2人团队数: {len(teams_2)}")  # C(4,2) = 6

关键代码说明

代码	含义	为什么这样写
`product(A, B)`	A×B 笛卡尔积	替代嵌套for循环
`permutations(A, r)`	A选r排列	有序，AB≠BA
`combinations(A, r)`	A选r组合	无序，AB=BA
`repeat=2`	自身笛卡尔积	product(A, A)简写

过滤分组

语法结构

┌─────────────────────────────────────────────────────────────┐
│  过滤分组函数语法                                            │
│                                                             │
│  filterfalse(predicate, iterable)                           │
│  → 过滤掉满足条件的元素（反向filter）                       │
│                                                             │
│  takewhile(predicate, iterable)                             │
│  → 满足条件时取值，遇到第一个不满足就停止                   │
│                                                             │
│  dropwhile(predicate, iterable)                             │
│  → 满足条件时跳过，遇到第一个不满足就开始取                 │
│                                                             │
│  accumulate(iterable, func=operator.add)                    │
│  → 累积计算（前缀和）                                       │
│                                                             │
└─────────────────────────────────────────────────────────────┘

最简示例

python

import itertools

# filterfalse: 反向过滤
nums = [1, 2, 3, 4, 5, 6]
odd = list(itertools.filterfalse(lambda x: x % 2 == 0, nums))
print(odd)  # [1, 3, 5]（保留奇数）

# takewhile: 满足条件时取值
nums = [1, 2, 3, 10, 4, 5]
result = list(itertools.takewhile(lambda x: x < 5, nums))
print(result)  # [1, 2, 3]（遇到10停止）

# dropwhile: 满足条件时跳过
nums = [1, 2, 3, 10, 4, 5]
result = list(itertools.dropwhile(lambda x: x < 5, nums))
print(result)  # [10, 4, 5]（跳过1,2,3）

# accumulate: 累积求和
nums = [1, 2, 3, 4, 5]
result = list(itertools.accumulate(nums))
print(result)  # [1, 3, 6, 10, 15]

详细示例

python

import itertools
import operator

# filterfalse 过滤空值
data = [1, None, 2, None, 3]
cleaned = list(itertools.filterfalse(lambda x: x is None, data))
print(cleaned)  # [1, 2, 3]

# takewhile 解析日志（遇到空行停止）
lines = ['ERROR: file1', 'ERROR: file2', '', 'WARNING: file3']
errors = list(itertools.takewhile(lambda x: x.startswith('ERROR'), lines))
print(errors)  # ['ERROR: file1', 'ERROR: file2']

# dropwhile 跳过前导空行
lines = ['', '', 'START', 'DATA1', 'DATA2']
content = list(itertools.dropwhile(lambda x: x == '', lines))
print(content)  # ['START', 'DATA1', 'DATA2']

# accumulate 不同函数
nums = [1, 2, 3, 4, 5]

# 累积乘积
products = list(itertools.accumulate(nums, operator.mul))
print(products)  # [1, 2, 6, 24, 120]

# 累积最大值
maxs = list(itertools.accumulate(nums, max))
print(maxs)  # [1, 2, 3, 4, 5]

# 自定义累积函数
def concat(a, b):
    return f"{a}-{b}"

strings = ['a', 'b', 'c']
result = list(itertools.accumulate(strings, concat))
print(result)  # ['a', 'a-b', 'a-b-c']

关键代码说明

代码	含义	为什么这样写
`filterfalse(pred, iter)`	过滤掉满足pred的	filter的反向，语义更清晰
`takewhile(pred, iter)`	满足条件时取值	遇到第一个False停止
`dropwhile(pred, iter)`	满足条件时跳过	遇到第一个False开始
`accumulate(nums)`	前缀和	Python 3.2+ 内置

L2 实践层：用好

无限迭代器推荐做法

做法	原因	示例
配合 islice 限制	防止无限循环	`islice(count(0), 10)`
用 repeat 替代 lambda	更高效	`map(pow, nums, repeat(2))`
cycle 实现轮询	简洁明了	`cycle(['A','B','C'])`

无限迭代器反模式

python

# ❌ 无限迭代器不限制，程序卡死
for x in itertools.count():
    print(x)  # 永不停止！

# ✅ 配合 islice 或 break
for x in itertools.islice(itertools.count(), 10):
    print(x)

python

# ❌ 用 lambda 重复参数
result = map(lambda x: x ** 2, range(10))

# ✅ 用 repeat 更高效
result = map(pow, range(10), itertools.repeat(2))

无限迭代器适用场景

场景	是否推荐	工具	原因
浮点数递增	✅	count	range不支持浮点
负载均衡轮询	✅	cycle	简洁实现
批量参数传递	✅	repeat	比lambda高效
有限序列	❌	range	用内置range更直接

序列处理推荐做法

做法	原因	示例
用 chain 代替 +	不创建临时列表	`chain(a, b)` vs `a + b`
用 islice 切片迭代器	列表切片对迭代器无效	`islice(iter, 10)`
groupby 前先排序	否则分组错误	`sort(); groupby()`

序列处理反模式

python

# ❌ 用 + 拼接列表（创建临时列表）
result = list1 + list2 + list3  # 浪费内存

# ✅ 用 chain 惰性拼接
result = list(itertools.chain(list1, list2, list3))

python

# ❌ groupby 不排序
data = [{'t': 'X'}, {'t': 'Y'}, {'t': 'X'}]
for k, g in itertools.groupby(data, key=lambda x: x['t']):
    print(k, list(g))  # X, Y, X (分组错误！)

# ✅ 先排序再分组
data.sort(key=lambda x: x['t'])
for k, g in itertools.groupby(data, key=lambda x: x['t']):
    print(k, list(g))  # X, Y (正确)

序列处理适用场景

场景	工具	原因
展平二维列表	chain.from_iterable	惰性，省内存
处理无限迭代器	islice	列表切片无效
分类汇总	groupby	需先排序
不等长配对	zip_longest	zip会截断

排列组合推荐做法

做法	原因	示例
product 替代嵌套循环	简洁高效	`product(A, B, C)`
组合数计算用公式	避免生成全部	`n!/(r!(n-r)!)`
排列用于密码破解	暴力尝试所有顺序	`permutations(chars, 4)`

排列组合反模式

python

# ❌ 嵌套循环生成笛卡尔积
result = []
for c in colors:
    for s in sizes:
        for m in materials:
            result.append((c, s, m))

# ✅ 用 product
result = list(itertools.product(colors, sizes, materials))

python

# ❌ 手动实现组合
def manual_combinations(items, r):
    result = []
    for i in range(len(items)):
        for j in range(i+1, len(items)):
            result.append((items[i], items[j]))
    return result

# ✅ 用 combinations
result = list(itertools.combinations(items, r))

排列组合适用场景

场景	工具	原因
菜单选项组合	product	所有搭配
密码暴力破解	permutations	尝试所有顺序
团队组建	combinations	无序组合
允许重复选择	combinations_with_replacement	抽奖场景

过滤分组推荐做法

做法	原因	示例
filterfalse 过滤None	语义清晰	`filterfalse(lambda x: x is None, data)`
takewhile 解析头部	遇到分隔符停止	`takewhile(lambda x: x != '', lines)`
accumulate 前缀计算	高效	`accumulate(nums, operator.mul)`

过滤分组反模式

python

# ❌ 用 filter 过滤满足条件的（语义混淆）
filtered = filter(lambda x: x is None, data)  # 保留None！

# ✅ 用 filterfalse 语义清晰
filtered = itertools.filterfalse(lambda x: x is None, data)  # 去除None

python

# ❌ 手动实现前缀和
nums = [1, 2, 3, 4, 5]
prefix_sum = []
total = 0
for n in nums:
    total += n
    prefix_sum.append(total)

# ✅ 用 accumulate
prefix_sum = list(itertools.accumulate(nums))

过滤分组适用场景

场景	工具	原因
去除空值	filterfalse	语义清晰
解析文件头部	takewhile	遇分隔符停止
跳过前导空白	dropwhile	跳过空白后开始
前缀计算	accumulate	高效内置

L3 专家层：深入

无限迭代器底层原理

Python 如何实现

┌─────────────────────────────────────────────────────────────┐
│  无限迭代器内部实现                                          │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  count 内部结构：                                           │
│  ─────────────────────────────────────────────              │
│  class count:                                               │
│      __iter__ = self                                        │
│      def __next__(self):                                    │
│          result = self.current                              │
│          self.current += self.step                          │
│          return result                                      │
│                                                             │
│  内存占用：                                                  │
│  • 只存储 current, step 两个数值                            │
│  • sys.getsizeof(count()) ≈ 48 bytes                       │
│  • vs 列表 [0,1,2,...,10**8] = 数百MB                       │
│                                                             │
│  cycle 内部结构：                                           │
│  ─────────────────────────────────────────────              │
│  • 保存原始 iterable 的副本                                 │
│  • 内部指针循环遍历                                         │
│  • 每轮结束后重置指针                                       │
│                                                             │
│  repeat 内部结构：                                          │
│  ─────────────────────────────────────────────              │
│  • 只存储 object 和 times                                   │
│  • times=None 时永不停止                                    │
│  • times=N 时计数递减至0停止                                │
│                                                             │
└─────────────────────────────────────────────────────────────┘

python

import itertools
import sys

# 验证内存占用
counter = itertools.count()
print(f"count 内存: {sys.getsizeof(counter)} bytes")  # ~48

big_list = list(range(1000))
print(f"列表内存: {sys.getsizeof(big_list)} bytes")  # ~9000+

无限迭代器性能考量

操作	时间复杂度	空间复杂度	说明
`next(count)`	O(1)	O(1)	单次计算
`next(cycle)`	O(1)	O(n)	需存储副本
`next(repeat)`	O(1)	O(1)	直接返回
创建迭代器	O(1)	O(1)	只初始化状态

无限迭代器设计动机

设计选择	原因
惰性求值	大数据处理不占内存
C 语言实现	比 Python 循环快10倍+
无限设计	数学抽象，配合islice限制

无限迭代器知识关联

无限迭代器知识关联：
                    ┌─────────────┐
                    │   iterator  │
                    │   迭代器    │
                    └─────────────┘
                          │
          ┌───────────────┼───────────────┐
          ↓               ↓               ↓
    ┌───────────┐   ┌───────────┐   ┌───────────┐
    │   count   │   │   cycle   │   │  repeat   │
    │   计数    │   │   循环    │   │   重复    │
    └───────────┘   └───────────┘   └───────────┘
          │               │               │
          └───────────────┴───────────────┘
                          ↓
                    ┌─────────────┐
                    │   islice    │
                    │   切片限制  │
                    └─────────────┘

序列处理底层原理

Python 如何实现

┌─────────────────────────────────────────────────────────────┐
│  chain 内部实现                                              │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  class chain:                                               │
│      def __iter__(self):                                    │
│          for it in self.iterables:                          │
│              for element in it:                             │
│                  yield element                              │
│                                                             │
│  内存占用：                                                  │
│  • 只存储 iterables 引用                                    │
│  • 不创建合并后的新列表                                     │
│  • 逐个迭代，惰性生成                                       │
│                                                             │
│  groupby 内部实现：                                         │
│  ─────────────────────────────────────────────              │
│  • 维护当前 key 和 group                                    │
│  • key 变化时 yield 新 group                                │
│  • 相邻相同 key 才归为一组                                  │
│                                                             │
│  ⚠️ 关键：groupby 只对相邻元素分组                          │
│                                                             │
└─────────────────────────────────────────────────────────────┘

序列处理性能考量

操作	时间复杂度	空间复杂度	说明
`chain(a, b)` 创建	O(1)	O(1)	只存引用
`chain` 遍历	O(n)	O(1)	逐个yield
`a + b` 列表拼接	O(n)	O(n)	创建新列表
`groupby` 遍历	O(n)	O(1)	惰性分组

序列处理设计动机

设计选择	原因
chain 惰性拼接	避免大列表拼接内存爆炸
groupby 相邻分组	高效实现，但需排序
islice 惰性切片	列表切片不适用于迭代器

序列处理知识关联

序列处理知识关联：
                    ┌─────────────┐
                    │   迭代器    │
                    └─────────────┘
                          │
          ┌───────────────┼───────────────┐
          ↓               ↓               ↓
    ┌───────────┐   ┌───────────┐   ┌───────────┐
    │   chain   │   │  islice   │   │  groupby  │
    │   拼接    │   │   切片    │   │   分组    │
    └───────────┘   └───────────┘   └───────────┘
          │               │               │
          ↓               ↓               ↓
    ┌───────────┐   ┌───────────┐   ┌───────────┐
    │  惰性拼接 │   │  惰性切片 │   │   排序    │
    │ 省内存    │   │ 限制无限  │   │   先排序  │
    └───────────┘   └───────────┘   └───────────┘

排列组合底层原理

数学原理

┌─────────────────────────────────────────────────────────────┐
│  排列组合数学公式                                            │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  product(A, B):                                             │
│  ─────────────────────────────────────────────              │
│  |A| × |B|  (笛卡尔积)                                      │
│                                                             │
│  permutations(A, r):                                        │
│  ─────────────────────────────────────────────              │
│  P(n, r) = n! / (n-r)!                                      │
│                                                             │
│  combinations(A, r):                                        │
│  ─────────────────────────────────────────────              │
│  C(n, r) = n! / (r! × (n-r)!)                               │
│                                                             │
│  combinations_with_replacement(A, r):                       │
│  ─────────────────────────────────────────────              │
│  C(n+r-1, r) = (n+r-1)! / (r! × (n-1)!)                     │
│                                                             │
│  示例：n=4, r=2                                             │
│  • permutations: 4!/2! = 12                                 │
│  • combinations: 4!/(2!×2!) = 6                             │
│  • with_replacement: C(5,2) = 10                            │
│                                                             │
└─────────────────────────────────────────────────────────────┘

排列组合性能考量

操作	时间复杂度	空间复杂度	说明
`product(A, B)`	O(	A	×
`permutations(n, r)`	O(n!/(n-r)!)	O(1)	惰性生成
`combinations(n, r)`	O(n!/(r!(n-r)!))	O(1)	惰性生成

注意：数量级增长快，r大时避免list()全部生成。

排列组合设计动机

设计选择	原因
惰性生成	大组合数不占内存
C语言实现	比Python循环快100倍+
统一接口	product/permutations/combinations语义清晰

排列组合知识关联

排列组合知识关联：
                    ┌─────────────┐
                    │   数学      │
                    │   集合论    │
                    └─────────────┘
                          │
          ┌───────────────┼───────────────┐
          ↓               ↓               ↓
    ┌───────────┐   ┌───────────┐   ┌───────────┐
    │  product  │   │permutations│   │combinations│
    │ 笛卡尔积  │   │   排列     │   │   组合    │
    │ 有序有序  │   │ 有序无重复 │   │ 无序无重复│
    └───────────┘   └───────────┘   └───────────┘
          │               │               │
          ↓               ↓               ↓
    ┌───────────┐   ┌───────────┐   ┌───────────┐
    │ A×B×C     │   │  AB≠BA    │   │  AB=BA    │
    │ 嵌套循环  │   │ 密码破解  │   │ 团队组建  │
    └───────────┘   └───────────┘   └───────────┘

过滤分组底层原理

Python 如何实现

┌─────────────────────────────────────────────────────────────┐
│  accumulate 内部实现                                         │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  def accumulate(iterable, func=operator.add):               │
│      it = iter(iterable)                                    │
│      try:                                                   │
│          total = next(it)                                   │
│      except StopIteration:                                  │
│          return                                             │
│      yield total                                            │
│      for element in it:                                     │
│          total = func(total, element)                       │
│          yield total                                        │
│                                                             │
│  内存占用：                                                  │
│  • 只存储 total 和当前 element                              │
│  • 惰性生成，不创建中间列表                                 │
│                                                             │
└─────────────────────────────────────────────────────────────┘

过滤分组性能考量

操作	时间复杂度	空间复杂度	说明
`filterfalse`	O(n)	O(1)	惰性过滤
`takewhile`	O(k)	O(1)	k为停止前元素数
`dropwhile`	O(k)	O(1)	k为跳过元素数
`accumulate`	O(n)	O(1)	惰性累积

过滤分组设计动机

设计选择	原因
filterfalse命名	filter的语义补充
takewhile/dropwhile	处理边界数据
accumulate函数参数	支持多种累积操作

过滤分组知识关联

过滤分组知识关联：
                    ┌─────────────┐
                    │   filter    │
                    │   过滤      │
                    └─────────────┘
                          │
                          ↓
                    ┌─────────────┐
                    │ filterfalse │
                    │ 反向过滤    │
                    └─────────────┘
                          │
          ┌───────────────┼───────────────┐
          ↓               ↓               ↓
    ┌───────────┐   ┌───────────┐   ┌───────────┐
    │ takewhile │   │ dropwhile │   │ accumulate│
    │  取到False│   │ 跳到False │   │   累积    │
    └───────────┘   └───────────┘   └───────────┘

实战案例：数据清洗管道

python

import itertools

# 脏数据：嵌套列表，包含None和空列表
raw_data = [
    [1, 2, None, 3],
    [None, None],
    [],
    [4, 5, None],
    [6]
]

# 目标：展平 → 去None → 过滤奇数 → 前5个

# 惰性管道（零内存）
pipeline = (
    itertools.chain.from_iterable(raw_data)  # 展平
    | itertools.filterfalse(lambda x: x is None)  # 去None
)

# 添加更多过滤
odd_numbers = itertools.filterfalse(lambda x: x % 2 == 0, pipeline)
first_5 = itertools.islice(odd_numbers, 5)

result = list(first_5)
print(result)  # [1, 3, 5]

关键点：整条管道直到 list() 才执行，内存始终 O(1)。

本章小结

┌─────────────────────────────────────────────────────────────┐
│                 itertools 常用函数速查                        │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  无限迭代：                                                  │
│  count(start, step)    - 计数器（支持浮点）                  │
│  cycle(iterable)       - 循环器（轮询）                      │
│  repeat(obj, times)    - 重复器（配合map）                   │
│                                                             │
│  序列操作：                                                  │
│  chain(a, b)           - 拼接（省内存）                      │
│  chain.from_iterable   - 展平二维                            │
│  islice(iter, n)       - 惰性切片                            │
│  groupby(iter, key)    - 分组（需先排序）                    │
│                                                             │
│  排列组合：                                                  │
│  product               - 笛卡尔积（嵌套循环替身）            │
│  permutations          - 排列（有序）                        │
│  combinations          - 组合（无序）                        │
│                                                             │
│  过滤分组：                                                  │
│  filterfalse           - 反向过滤                            │
│  takewhile             - 满足时取值                          │
│  dropwhile             - 满足时跳过                          │
│  accumulate            - 累积计算                            │
│                                                             │
│  选择指南：                                                  │
│  • 大数据处理 → chain + islice（惰性）                      │
│  • 嵌套循环 → product（简洁）                                │
│  • 分类汇总 → groupby + sort                                │
│  • 前缀计算 → accumulate                                     │
│                                                             │
│  记忆口诀：                                                   │
│  count数，cycle转，repeat传                                │
│  chain接，islice切，groupby分                               │
│  product积，perm排，comb组                                  │
│                                                             │
└─────────────────────────────────────────────────────────────┘

自检清单

回答以下问题，检查你是否掌握了核心概念：

itertools 的惰性求值有什么好处？
count 和 range 有什么区别？
groupby 为什么需要先排序？
permutations 和 combinations 有什么区别？
accumulate 默认做什么计算？

答案

惰性求值好处：不立即生成数据，按需计算，内存占用O(1)，适合大数据处理。
count vs range：count无限延续，支持浮点数；range有限，只支持整数。
groupby 需排序：groupby只对相邻元素分组，未排序时相同元素被分开。
permutations vs combinations：permutations有序（AB≠BA），combinations无序（AB=BA）。
accumulate 默认：operator.add，即前缀求和。可传入operator.mul等其他函数。

本章术语表

术语	定义	本章位置
惰性求值	不立即生成结果，按需计算	概念铺垫
无限迭代器	永不停止的迭代器，需限制	L1理解层
笛卡尔积	所有元素的组合配对	L1理解层
排列	有序选择，AB≠BA	L1理解层
组合	无序选择，AB=BA	L1理解层
前缀和	累积求和，[1,3,6,10,...]	L1理解层
islice	惰性切片，对迭代器有效	L1理解层

扩展阅读

Python 官方文档：itertools 模块
《流畅的Python》第14章：可迭代的对象、迭代器和生成器
Python Cookbook：迭代器与生成器章节
more-itertools 第三方库：更多高级迭代器工具

02-迭代器与工具函数 ​

为什么需要 itertools？ ​

问题场景：大数据处理 ​

概念铺垫 ​

L1 理解层：会用 ​

无限迭代器 ​

语法结构 ​

最简示例 ​

详细示例 ​

关键代码说明 ​

序列处理 ​

语法结构 ​

最简示例 ​

详细示例 ​

关键代码说明 ​

排列组合 ​

语法结构 ​

最简示例 ​

详细示例 ​

关键代码说明 ​

过滤分组 ​

语法结构 ​

最简示例 ​

详细示例 ​

关键代码说明 ​

L2 实践层：用好 ​

无限迭代器 推荐做法 ​

无限迭代器 反模式 ​

无限迭代器 适用场景 ​

序列处理 推荐做法 ​

序列处理 反模式 ​

序列处理 适用场景 ​

排列组合 推荐做法 ​

排列组合 反模式 ​

排列组合 适用场景 ​

过滤分组 推荐做法 ​

过滤分组 反模式 ​

过滤分组 适用场景 ​

L3 专家层：深入 ​

无限迭代器 底层原理 ​

Python 如何实现 ​

无限迭代器 性能考量 ​

无限迭代器 设计动机 ​

无限迭代器 知识关联 ​

序列处理 底层原理 ​

Python 如何实现 ​

序列处理 性能考量 ​

序列处理 设计动机 ​

序列处理 知识关联 ​

排列组合 底层原理 ​

数学原理 ​

排列组合 性能考量 ​

排列组合 设计动机 ​

排列组合 知识关联 ​

过滤分组 底层原理 ​

Python 如何实现 ​

过滤分组 性能考量 ​

过滤分组 设计动机 ​

过滤分组 知识关联 ​

实战案例：数据清洗管道 ​

本章小结 ​

自检清单 ​

本章术语表 ​

扩展阅读 ​

02-迭代器与工具函数

为什么需要 itertools？

问题场景：大数据处理

概念铺垫

L1 理解层：会用

无限迭代器

语法结构

最简示例

详细示例

关键代码说明

序列处理

语法结构

最简示例

详细示例

关键代码说明

排列组合

语法结构

最简示例

详细示例

关键代码说明

过滤分组

语法结构

最简示例

详细示例

关键代码说明

L2 实践层：用好

无限迭代器推荐做法

无限迭代器反模式

无限迭代器适用场景

序列处理推荐做法

序列处理反模式

序列处理适用场景

排列组合推荐做法

排列组合反模式

排列组合适用场景

过滤分组推荐做法

过滤分组反模式

过滤分组适用场景

L3 专家层：深入

无限迭代器底层原理

Python 如何实现

无限迭代器性能考量

无限迭代器设计动机

无限迭代器知识关联

序列处理底层原理

Python 如何实现

序列处理性能考量

序列处理设计动机

序列处理知识关联

排列组合底层原理

数学原理

排列组合性能考量

排列组合设计动机

排列组合知识关联

过滤分组底层原理

Python 如何实现

过滤分组性能考量

过滤分组设计动机

过滤分组知识关联

实战案例：数据清洗管道

本章小结

自检清单

本章术语表

扩展阅读