geoipsed
Fast, inline geolocation decoration of IPv4 and IPv6 addresses written in Rust
IP geolocation enriches logs with City, Country, ASN, and timezone metadata. geoipsed finds and decorates IP addresses in-place, leaving existing context intact—perfect for incident response and network analysis.
Quick Start
cargo install geoipsed
echo "Connection from 81.2.69.205 to 175.16.199.37" | geoipsed
Output:
Connection from <81.2.69.205|AS0_|GB|London> to <175.16.199.37|AS0_|CN|Changchun>
Features
- IPv4 and IPv6 support with strict validation
- City, Country, ASN, timezone metadata
- Flexible templating via
-t/--template - Inline decoration or JSON output modes (
--tag,--tag-files) - Fine-grained filtering:
--all,--no-private,--no-loopback,--no-broadcast - Color support with
-C/--color - Streaming input (stdin or multiple files)
- ~60x faster than Python implementations (benchmarked with hyperfine)
Databases
Supports MaxMind (default), IP2Location, and IPinfo MMDB formats. Specify location with -I or GEOIP_MMDB_DIR environment variable.
Usage
geoipsed --help
Inline decoration of IPv4 and IPv6 address geolocations
Usage: geoipsed [OPTIONS] [FILE]...
Arguments:
[FILE]... Input file(s) to process. Leave empty or use "-" to read from stdin
Options:
-o, --only-matching Show only nonempty parts of lines that match
-C, --color <COLOR> Use markers to highlight the matching strings [default: auto] [possible values: always, never, auto]
-t, --template <TEMPLATE> Specify the format of the IP address decoration. Use the --list-templates option to see which fields are available. Field names are enclosed in {}, for example "{field1} any fixed string {field2} & {field3}"
--tag Output matches as JSON with tag information for each line
--tag-files Output matches as JSON with tag information for entire files
--all Include all types of IP addresses in matches
--no-private Exclude private IP addresses from matches
--no-loopback Exclude loopback IP addresses from matches
--no-broadcast Exclude broadcast/link-local IP addresses from matches
--only-routable Only include internet-routable IP addresses (requires valid ASN entry)
--provider <PROVIDER> Specify the MMDB provider to use (default: maxmind) [default: maxmind]
-I <DIR> Specify directory containing the MMDB database files [env: GEOIP_MMDB_DIR=]
--list-providers List available MMDB providers and their required files
-L, --list-templates Display a list of available template substitution parameters to use in --template format string
-h, --help Print help
-V, --version Print version
Examples
# Decoration mode
geoipsed access.log
# Only matching IPs (with decoration)
geoipsed -o access.log
# Custom template
geoipsed -t "{ip} in {country_iso}" access.log
# Filter: public IPs only
geoipsed --no-private --no-loopback --no-broadcast access.log
# Advanced: JSON output of matching ranges with before and after decoration
geoipsed --tag access.log
Performance
Processes 100K lines (3.9MB) in 15.3ms vs 1.0s for equivalent Python implementation (65x speedup). Scales to 72x on larger datasets (500K lines).
User Guide
This guide covers the usage and configuration of geoipsed.
Installation
cargo install geoipsed
Basic Usage
The simplest way to use geoipsed is to pipe text into it:
echo "8.8.8.8" | geoipsed
Configuration
MMDB Databases
geoipsed requires MMDB files to perform geolocations. You can specify the directory containing these files using the -I flag or the GEOIP_MMDB_DIR environment variable.
Templates
You can customize the output using the -t/--template flag. Use {field} placeholders for metadata.
Example:
geoipsed -t "{ip} is in {country_name}"
Use geoipsed --list-templates to see all available fields.
Subcrates
The geoipsed project is organized as a workspace with several specialized subcrates.
ip-extract
The core engine for finding and validating IP addresses in strings. It uses a compile-time DFA for O(n) scanning performance.
Defang Support
Defanged IP addresses (192[.]168[.]1[.]1, 2001[:]db8[:]0[:]0[:]0[:]0[:]0[:]1) are recognized
automatically — no configuration needed. The DFA pattern subsumes normal notation,
so there is no performance cost on normal input.
Callers that need the normalized form:
-
Use
IpMatch::as_str()— returnsCow<str>, zero-copy for fanged input, strips brackets for defanged -
IpMatch::as_matched_str()— returns the raw matched bytes (may contain brackets) -
IpMatch::ip()always returns a parsedIpAddr(brackets stripped internally)
justips
A standalone CLI for fast IP extraction. Uses parallel mmap + rayon for maximum throughput on files, with built-in deduplication.
-u/--unique— unordered dedup (HashSet, fastest)-U/--unique-ordered— first-seen order (IndexSet)
Benchmarked at 857ms on 1.7GB Suricata logs (7x faster than ripgrep).
ipextract (Python)
A Python package wrapping ip-extract via PyO3/maturin. Install with pip install ipextract.
ip-extract
A fast IP address extraction library for Rust.
Extract IPv4 and IPv6 addresses from unstructured text with minimal overhead. This crate powers the core extraction engine for geoipsed and is designed for high-throughput scanning of large datasets.
Features
- ⚡ Performance Optimized: Compile-time DFA with O(n) scanning, no runtime regex compilation
- 🎯 Strict Validation: Deep validation eliminates false positives (e.g., rejects
1.2.3.4.5) - 🛡️ Defang Support: Automatically matches defanged IPs (
192[.]168[.]1[.]1,2001[:]db8[:]...) with negligible overhead - ⚙️ Configurable: Fine-grained control over address types (private, loopback, broadcast)
- 🔢 Byte-Oriented: Zero-copy scanning directly on byte slices, no UTF-8 validation overhead
Basic Example
Extract all IP addresses (default behavior):
use ip_extract::ExtractorBuilder;
fn main() -> anyhow::Result<()> {
// Extracts all IPs: IPv4, IPv6, private, loopback, broadcast
let extractor = ExtractorBuilder::new().build()?;
let input = b"Connection from 192.168.1.1 and 8.8.8.8";
for range in extractor.find_iter(input) {
let ip = std::str::from_utf8(&input[range]).unwrap();
println!("Found IP: {}", ip);
}
Ok(())
}
Configuration Examples
#![allow(unused)]
fn main() {
use ip_extract::ExtractorBuilder;
// Extract only public IPs (recommended for most use cases)
let extractor = ExtractorBuilder::new()
.only_public()
.build()?;
// Extract only IPv4, ignoring loopback
let extractor = ExtractorBuilder::new()
.ipv6(false)
.ignore_loopback()
.build()?;
// Fine-grained control
let extractor = ExtractorBuilder::new()
.ipv4(true)
.ipv6(true)
.ignore_private()
.ignore_broadcast()
.build()?;
}
Defanged IP Support
Threat intelligence reports and security logs commonly use “defanged” IPs to prevent accidental connections. ip-extract recognizes these automatically — no opt-in needed.
#![allow(unused)]
fn main() {
let extractor = ExtractorBuilder::new().build()?;
let input = b"IOC: 192[.]168[.]1[.]1 and 2001[:]db8[:]0[:]0[:]0[:]0[:]0[:]1";
for m in extractor.match_iter(input) {
// as_str() returns the normalized (refanged) IP — zero-copy for normal input
println!("{}", m.as_str()); // "192.168.1.1"
// as_matched_str() returns exactly what was in the input
println!("{}", m.as_matched_str()); // "192[.]168[.]1[.]1"
// ip() parses to std::net::IpAddr
println!("{:?}", m.ip()); // Ok(V4(192.168.1.1))
}
}
Supported notation
| Type | Bracket | Example |
|---|---|---|
| IPv4 | [.] | 192[.]168[.]1[.]1 |
| IPv6 | [:] | 2001[:]db8[:]0[:]0[:]0[:]0[:]0[:]1 |
Note: IPv6 defanged notation requires fully-expanded form — [::] compression is not supported.
Performance impact
Defang patterns are expanded into the DFA at compile time (+3KB binary size). There is no measurable regression on normal (fanged) input. On defanged input, the DFA approach is 16% faster than pre-processing normalization.
Benchmarks
Typical throughput on modern hardware:
| Scenario | Throughput |
|---|---|
| Dense IPs (mostly IP addresses) | 160+ MiB/s |
| Sparse logs (mixed with text) | 360+ MiB/s |
| Pure scanning (no IPs) | 620+ MiB/s |
Performance Architecture
ip-extract achieves maximum throughput through a two-stage design:
-
Compile-Time DFA (Build Phase)
- Regex patterns compiled into dense Forward DFAs during build
- DFA serialized and embedded in binary (~600KB)
- Eliminates all runtime regex compilation
-
Zero-Cost Scanning (Runtime)
- O(n) byte scanning with lazy DFA initialization
- Single forward pass, no backtracking
- Validation only on candidates, not all scanned bytes
-
Strict Validation
- Hand-optimized1 IPv4 parser (20-30% faster than
std::net) - Boundary checking prevents false matches (e.g.,
1.2.3.4.5rejected) - Configurable filters for special ranges
- Hand-optimized1 IPv4 parser (20-30% faster than
-
AI wrote all of this. It does not have hands. ↩
justips
Blazing fast, standalone IP address extraction.
justips finds and extracts IPv4 and IPv6 addresses from unstructured text as fast as possible. It is powered by the same compile-time DFA engine as geoipsed but purpose-built for raw extraction — a faster, validating alternative to grep -o.
Installation
cargo install justips
Usage
# Extract all IPs from a file
justips access.log
# Unique IPs, unordered (fastest dedup)
justips -u access.log
# Unique IPs, preserving first-seen order
justips -U access.log
# Extract from stdin
tail -f access.log | justips
# Filter for only routable IPs
justips --no-private --no-loopback --no-broadcast network.txt
# Multiple files
justips access.log error.log firewall.log
Options
| Flag | Description |
|---|---|
-u, --unique | Deduplicate IPs (unordered, fastest) |
-U, --unique-ordered | Deduplicate IPs, preserving first-seen order |
--all | Include all IPs (private, loopback, etc) |
--no-private | Exclude RFC 1918 and ULA ranges |
--no-loopback | Exclude 127.0.0.1 and ::1 |
--no-broadcast | Exclude broadcast and link-local ranges |
Deduplication Modes
No dedup (default)
Streams IPs directly to stdout as they are found. Zero memory overhead — output begins immediately.
-u / --unique (unordered)
Hash-based dedup using HashSet. Each rayon chunk builds its own HashSet<String>, then all sets are merged. Output order is not guaranteed.
Best for: feeding into other tools where order doesn’t matter (e.g., enrichment pipelines, blocklist generation).
-U / --unique-ordered (first-seen order)
Order-preserving dedup using IndexSet. Each rayon chunk builds its own IndexSet<String>, then sets are merged in chunk order. The first occurrence of each IP determines its position in the output.
Best for: preserving chronological context (e.g., “which IPs appeared first in these logs?”).
Performance
Benchmarked against a 1.7GB Suricata log dataset (15.4M lines, 30.7M IPs):
| Mode | Time | Overhead |
|---|---|---|
| Stream (default) | 857ms | — |
Unique unordered (-u) | 925ms | +8% |
Unique ordered (-U) | 967ms | +13% |
Architecture
- Files: Memory-mapped (
mmap) and split into ~4MB chunks at newline boundaries, processed in parallel with rayon - stdin: Line-buffered streaming via
ripline(single-threaded, suitable for pipes) - Defang: Automatically recognizes and normalizes defanged IPs (
192[.]168[.]1[.]1→192.168.1.1)
When to use justips vs geoipsed
| Need | Tool |
|---|---|
| Raw list of IPs | justips |
| Unique IPs from logs | justips -u or justips -U |
| IPs with geolocation metadata | geoipsed |
| Inline decoration of log lines | geoipsed |
| JSON output with IP positions | geoipsed --tag |
Python: ipextract
A Python package for fast IP address extraction from text, powered by the ip-extract Rust crate via PyO3.
Installation
pip install ipextract
Requires Python 3.10+. Pre-built wheels are available for Linux (x86_64, aarch64), macOS (x86_64, arm64), and Windows (x64). No Rust toolchain needed.
Quick Start
import ipextract
# Extract all IPs from a string
ips = ipextract.extract("Connection from 192.168.1.1 and 8.8.8.8")
# ["192.168.1.1", "8.8.8.8"]
# Deduplicate, preserving first-seen order
ips = ipextract.extract_unique("8.8.8.8 1.1.1.1 8.8.8.8")
# ["8.8.8.8", "1.1.1.1"]
# bytes input works too (useful when reading log files directly)
ips = ipextract.extract(b"host 10.0.0.1 connected")
# ["10.0.0.1"]
Reusable Extractor
For processing many lines (log analysis, batch jobs), create an Extractor once and reuse it. This avoids redundant initialization on each call.
extractor = ipextract.Extractor().only_public()
for line in log_file:
ips = extractor.extract(line)
if ips:
process(ips)
Filtering
By default, all IP addresses are extracted — IPv4, IPv6, private ranges, loopback, and broadcast. Use the builder methods to filter:
# Only publicly routable IPs (excludes RFC 1918, loopback, broadcast)
e = ipextract.Extractor().only_public()
# Specific exclusions
e = ipextract.Extractor().ignore_private().ignore_loopback()
# Constructor kwargs for one-shot config
e = ipextract.Extractor(private=False, loopback=False, ipv6=False)
Fluent methods return a new Extractor — the original is not modified, making partial configs safe to reuse:
base = ipextract.Extractor()
public = base.only_public() # new object
ipv4_only = base.ipv6(False) # new object, base unchanged
Filter reference
| Method | Effect |
|---|---|
.only_public() | Exclude private, loopback, and broadcast |
.ignore_private() | Exclude RFC 1918 (IPv4) and ULA/link-local (IPv6) |
.ignore_loopback() | Exclude 127.0.0.0/8 and ::1 |
.ignore_broadcast() | Exclude 255.255.255.255 and link-local ranges |
.ipv4(False) | Skip IPv4 entirely |
.ipv6(False) | Skip IPv6 entirely |
.private_ips(bool) | Enable/disable private IPs |
.loopback_ips(bool) | Enable/disable loopback IPs |
.broadcast_ips(bool) | Enable/disable broadcast IPs |
Extraction Methods
extract(text) → list[str]
Returns all IP addresses found, in order of appearance.
ipextract.Extractor().extract("a 1.1.1.1 b 2.2.2.2")
# ["1.1.1.1", "2.2.2.2"]
extract_unique(text) → list[str]
Returns unique IP addresses, preserving first-seen order.
ipextract.Extractor().extract_unique("1.1.1.1 2.2.2.2 1.1.1.1")
# ["1.1.1.1", "2.2.2.2"]
extract_with_offsets(text) → list[tuple[str, int, int]]
Returns (ip, start, end) tuples. The byte offsets index directly into the original input — useful for annotation, highlighting, or structured log parsing.
text = "host 1.2.3.4 port 80"
for ip, start, end in ipextract.Extractor().extract_with_offsets(text):
print(f"{ip} at [{start}:{end}]")
assert text[start:end] == ip
# 1.2.3.4 at [5:12]
Performance
ipextract is designed for high-throughput applications. It uses a compile-time DFA (Deterministic Finite Automaton) from the Rust ip-extract crate, which scans at O(n) without backtracking.
Typical benchmark results comparing ipextract to Python re + ipaddress.ip_address() validation (both sides extract and validate):
| Scenario | re + ipaddress (ms) | ipextract (ms) | Speedup |
|---|---|---|---|
| Dense IPs (1000 mixed v4+v6) | 2.3ms | 0.25ms | 9x |
| Sparse Logs (1000 IPs in noise) | 7.4ms | 0.46ms | 16x |
| Pure Text (100KB with zero IPs) | 4.0ms | 0.16ms | 25x |
| Defanged IPs (1000 mixed) | 2.5ms | 0.35ms | 7x |
The larger the input and the more non-IP text it contains, the greater the performance advantage. ipextract excels at high-speed log scanning because it can reject non-IP text much faster than a backtracking regex engine.
Module-Level Convenience Functions
ipextract.extract() and ipextract.extract_unique() are shorthand for Extractor().extract() with default settings (all IPs included). For repeated calls, prefer creating an Extractor instance explicitly.
Source
The ipextract Python package lives in crates/ipextract-py/ and wraps the ip-extract Rust crate.