Python: ipextract
A Python package for fast IP address extraction from text, powered by the ip-extract Rust crate via PyO3.
Installation
pip install ipextract
Requires Python 3.10+. Pre-built wheels are available for Linux (x86_64, aarch64), macOS (x86_64, arm64), and Windows (x64). No Rust toolchain needed.
Quick Start
import ipextract
# Extract all IPs from a string
ips = ipextract.extract("Connection from 192.168.1.1 and 8.8.8.8")
# ["192.168.1.1", "8.8.8.8"]
# Deduplicate, preserving first-seen order
ips = ipextract.extract_unique("8.8.8.8 1.1.1.1 8.8.8.8")
# ["8.8.8.8", "1.1.1.1"]
# bytes input works too (useful when reading log files directly)
ips = ipextract.extract(b"host 10.0.0.1 connected")
# ["10.0.0.1"]
Reusable Extractor
For processing many lines (log analysis, batch jobs), create an Extractor once and reuse it. This avoids redundant initialization on each call.
extractor = ipextract.Extractor().only_public()
for line in log_file:
ips = extractor.extract(line)
if ips:
process(ips)
Filtering
By default, all IP addresses are extracted — IPv4, IPv6, private ranges, loopback, and broadcast. Use the builder methods to filter:
# Only publicly routable IPs (excludes RFC 1918, loopback, broadcast)
e = ipextract.Extractor().only_public()
# Specific exclusions
e = ipextract.Extractor().ignore_private().ignore_loopback()
# Constructor kwargs for one-shot config
e = ipextract.Extractor(private=False, loopback=False, ipv6=False)
Fluent methods return a new Extractor — the original is not modified, making partial configs safe to reuse:
base = ipextract.Extractor()
public = base.only_public() # new object
ipv4_only = base.ipv6(False) # new object, base unchanged
Filter reference
| Method | Effect |
|---|---|
.only_public() | Exclude private, loopback, and broadcast |
.ignore_private() | Exclude RFC 1918 (IPv4) and ULA/link-local (IPv6) |
.ignore_loopback() | Exclude 127.0.0.0/8 and ::1 |
.ignore_broadcast() | Exclude 255.255.255.255 and link-local ranges |
.ipv4(False) | Skip IPv4 entirely |
.ipv6(False) | Skip IPv6 entirely |
.private_ips(bool) | Enable/disable private IPs |
.loopback_ips(bool) | Enable/disable loopback IPs |
.broadcast_ips(bool) | Enable/disable broadcast IPs |
Extraction Methods
extract(text) → list[str]
Returns all IP addresses found, in order of appearance.
ipextract.Extractor().extract("a 1.1.1.1 b 2.2.2.2")
# ["1.1.1.1", "2.2.2.2"]
extract_unique(text) → list[str]
Returns unique IP addresses, preserving first-seen order.
ipextract.Extractor().extract_unique("1.1.1.1 2.2.2.2 1.1.1.1")
# ["1.1.1.1", "2.2.2.2"]
extract_with_offsets(text) → list[tuple[str, int, int]]
Returns (ip, start, end) tuples. The byte offsets index directly into the original input — useful for annotation, highlighting, or structured log parsing.
text = "host 1.2.3.4 port 80"
for ip, start, end in ipextract.Extractor().extract_with_offsets(text):
print(f"{ip} at [{start}:{end}]")
assert text[start:end] == ip
# 1.2.3.4 at [5:12]
Performance
ipextract is designed for high-throughput applications. It uses a compile-time DFA (Deterministic Finite Automaton) from the Rust ip-extract crate, which scans at O(n) without backtracking.
Typical benchmark results comparing ipextract to Python re + ipaddress.ip_address() validation (both sides extract and validate):
| Scenario | re + ipaddress (ms) | ipextract (ms) | Speedup |
|---|---|---|---|
| Dense IPs (1000 mixed v4+v6) | 2.3ms | 0.25ms | 9x |
| Sparse Logs (1000 IPs in noise) | 7.4ms | 0.46ms | 16x |
| Pure Text (100KB with zero IPs) | 4.0ms | 0.16ms | 25x |
| Defanged IPs (1000 mixed) | 2.5ms | 0.35ms | 7x |
The larger the input and the more non-IP text it contains, the greater the performance advantage. ipextract excels at high-speed log scanning because it can reject non-IP text much faster than a backtracking regex engine.
Module-Level Convenience Functions
ipextract.extract() and ipextract.extract_unique() are shorthand for Extractor().extract() with default settings (all IPs included). For repeated calls, prefer creating an Extractor instance explicitly.
Source
The ipextract Python package lives in crates/ipextract-py/ and wraps the ip-extract Rust crate.