Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

geoipsed

Fast, inline geolocation decoration of IPv4 and IPv6 addresses written in Rust

IP geolocation enriches logs with City, Country, ASN, and timezone metadata. geoipsed finds and decorates IP addresses in-place, leaving existing context intact—perfect for incident response and network analysis.

Quick Start

cargo install geoipsed
echo "Connection from 81.2.69.205 to 175.16.199.37" | geoipsed

Output:

Connection from <81.2.69.205|AS0_|GB|London> to <175.16.199.37|AS0_|CN|Changchun>

Features

  • IPv4 and IPv6 support with strict validation
  • City, Country, ASN, timezone metadata
  • Flexible templating via -t/--template
  • Inline decoration or JSON output modes (--tag, --tag-files)
  • Fine-grained filtering: --all, --no-private, --no-loopback, --no-broadcast
  • Color support with -C/--color
  • Streaming input (stdin or multiple files)
  • ~60x faster than Python implementations (benchmarked with hyperfine)

Databases

Supports MaxMind (default), IP2Location, and IPinfo MMDB formats. Specify location with -I or GEOIP_MMDB_DIR environment variable.

Usage

geoipsed --help
Inline decoration of IPv4 and IPv6 address geolocations

Usage: geoipsed [OPTIONS] [FILE]...

Arguments:
  [FILE]...  Input file(s) to process. Leave empty or use "-" to read from stdin

Options:
  -o, --only-matching        Show only nonempty parts of lines that match
  -C, --color <COLOR>        Use markers to highlight the matching strings [default: auto] [possible values: always, never, auto]
  -t, --template <TEMPLATE>  Specify the format of the IP address decoration. Use the --list-templates option to see which fields are available. Field names are enclosed in {}, for example "{field1} any fixed string {field2} & {field3}"
      --tag                  Output matches as JSON with tag information for each line
      --tag-files            Output matches as JSON with tag information for entire files
      --all                  Include all types of IP addresses in matches
      --no-private           Exclude private IP addresses from matches
      --no-loopback          Exclude loopback IP addresses from matches
      --no-broadcast         Exclude broadcast/link-local IP addresses from matches
      --only-routable        Only include internet-routable IP addresses (requires valid ASN entry)
      --provider <PROVIDER>  Specify the MMDB provider to use (default: maxmind) [default: maxmind]
  -I <DIR>                   Specify directory containing the MMDB database files [env: GEOIP_MMDB_DIR=]
      --list-providers       List available MMDB providers and their required files
  -L, --list-templates       Display a list of available template substitution parameters to use in --template format string
  -h, --help                 Print help
  -V, --version              Print version

Examples

# Decoration mode
geoipsed access.log

# Only matching IPs (with decoration)
geoipsed -o access.log

# Custom template
geoipsed -t "{ip} in {country_iso}" access.log

# Filter: public IPs only
geoipsed --no-private --no-loopback --no-broadcast access.log

# Advanced: JSON output of matching ranges with before and after decoration
geoipsed --tag access.log

Performance

Processes 100K lines (3.9MB) in 15.3ms vs 1.0s for equivalent Python implementation (65x speedup). Scales to 72x on larger datasets (500K lines).

User Guide

This guide covers the usage and configuration of geoipsed.

Installation

cargo install geoipsed

Basic Usage

The simplest way to use geoipsed is to pipe text into it:

echo "8.8.8.8" | geoipsed

Configuration

MMDB Databases

geoipsed requires MMDB files to perform geolocations. You can specify the directory containing these files using the -I flag or the GEOIP_MMDB_DIR environment variable.

Templates

You can customize the output using the -t/--template flag. Use {field} placeholders for metadata.

Example:

geoipsed -t "{ip} is in {country_name}"

Use geoipsed --list-templates to see all available fields.

Subcrates

The geoipsed project is organized as a workspace with several specialized subcrates.

ip-extract

The core engine for finding and validating IP addresses in strings. It uses a compile-time DFA for O(n) scanning performance.

Defang Support

Defanged IP addresses (192[.]168[.]1[.]1, 2001[:]db8[:]0[:]0[:]0[:]0[:]0[:]1) are recognized automatically — no configuration needed. The DFA pattern subsumes normal notation, so there is no performance cost on normal input.

Callers that need the normalized form:

  • Use IpMatch::as_str() — returns Cow<str>, zero-copy for fanged input, strips brackets for defanged

  • IpMatch::as_matched_str() — returns the raw matched bytes (may contain brackets)

  • IpMatch::ip() always returns a parsed IpAddr (brackets stripped internally)

  • User Guide

  • API Reference

justips

A standalone CLI for fast IP extraction. Uses parallel mmap + rayon for maximum throughput on files, with built-in deduplication.

  • -u / --unique — unordered dedup (HashSet, fastest)
  • -U / --unique-ordered — first-seen order (IndexSet)

Benchmarked at 857ms on 1.7GB Suricata logs (7x faster than ripgrep).

ipextract (Python)

A Python package wrapping ip-extract via PyO3/maturin. Install with pip install ipextract.

ip-extract

A fast IP address extraction library for Rust.

Extract IPv4 and IPv6 addresses from unstructured text with minimal overhead. This crate powers the core extraction engine for geoipsed and is designed for high-throughput scanning of large datasets.

Features

  • Performance Optimized: Compile-time DFA with O(n) scanning, no runtime regex compilation
  • 🎯 Strict Validation: Deep validation eliminates false positives (e.g., rejects 1.2.3.4.5)
  • 🛡️ Defang Support: Automatically matches defanged IPs (192[.]168[.]1[.]1, 2001[:]db8[:]...) with negligible overhead
  • ⚙️ Configurable: Fine-grained control over address types (private, loopback, broadcast)
  • 🔢 Byte-Oriented: Zero-copy scanning directly on byte slices, no UTF-8 validation overhead

Basic Example

Extract all IP addresses (default behavior):

use ip_extract::ExtractorBuilder;

fn main() -> anyhow::Result<()> {
    // Extracts all IPs: IPv4, IPv6, private, loopback, broadcast
    let extractor = ExtractorBuilder::new().build()?;

    let input = b"Connection from 192.168.1.1 and 8.8.8.8";

    for range in extractor.find_iter(input) {
        let ip = std::str::from_utf8(&input[range]).unwrap();
        println!("Found IP: {}", ip);
    }

    Ok(())
}

Configuration Examples

#![allow(unused)]
fn main() {
use ip_extract::ExtractorBuilder;

// Extract only public IPs (recommended for most use cases)
let extractor = ExtractorBuilder::new()
    .only_public()
    .build()?;

// Extract only IPv4, ignoring loopback
let extractor = ExtractorBuilder::new()
    .ipv6(false)
    .ignore_loopback()
    .build()?;

// Fine-grained control
let extractor = ExtractorBuilder::new()
    .ipv4(true)
    .ipv6(true)
    .ignore_private()
    .ignore_broadcast()
    .build()?;
}

Defanged IP Support

Threat intelligence reports and security logs commonly use “defanged” IPs to prevent accidental connections. ip-extract recognizes these automatically — no opt-in needed.

#![allow(unused)]
fn main() {
let extractor = ExtractorBuilder::new().build()?;

let input = b"IOC: 192[.]168[.]1[.]1 and 2001[:]db8[:]0[:]0[:]0[:]0[:]0[:]1";
for m in extractor.match_iter(input) {
    // as_str() returns the normalized (refanged) IP — zero-copy for normal input
    println!("{}", m.as_str());         // "192.168.1.1"

    // as_matched_str() returns exactly what was in the input
    println!("{}", m.as_matched_str()); // "192[.]168[.]1[.]1"

    // ip() parses to std::net::IpAddr
    println!("{:?}", m.ip());           // Ok(V4(192.168.1.1))
}
}

Supported notation

TypeBracketExample
IPv4[.]192[.]168[.]1[.]1
IPv6[:]2001[:]db8[:]0[:]0[:]0[:]0[:]0[:]1

Note: IPv6 defanged notation requires fully-expanded form — [::] compression is not supported.

Performance impact

Defang patterns are expanded into the DFA at compile time (+3KB binary size). There is no measurable regression on normal (fanged) input. On defanged input, the DFA approach is 16% faster than pre-processing normalization.

Benchmarks

Typical throughput on modern hardware:

ScenarioThroughput
Dense IPs (mostly IP addresses)160+ MiB/s
Sparse logs (mixed with text)360+ MiB/s
Pure scanning (no IPs)620+ MiB/s

Performance Architecture

ip-extract achieves maximum throughput through a two-stage design:

  1. Compile-Time DFA (Build Phase)

    • Regex patterns compiled into dense Forward DFAs during build
    • DFA serialized and embedded in binary (~600KB)
    • Eliminates all runtime regex compilation
  2. Zero-Cost Scanning (Runtime)

    • O(n) byte scanning with lazy DFA initialization
    • Single forward pass, no backtracking
    • Validation only on candidates, not all scanned bytes
  3. Strict Validation

    • Hand-optimized1 IPv4 parser (20-30% faster than std::net)
    • Boundary checking prevents false matches (e.g., 1.2.3.4.5 rejected)
    • Configurable filters for special ranges

  1. AI wrote all of this. It does not have hands.

justips

Blazing fast, standalone IP address extraction.

justips finds and extracts IPv4 and IPv6 addresses from unstructured text as fast as possible. It is powered by the same compile-time DFA engine as geoipsed but purpose-built for raw extraction — a faster, validating alternative to grep -o.

Installation

cargo install justips

Usage

# Extract all IPs from a file
justips access.log

# Unique IPs, unordered (fastest dedup)
justips -u access.log

# Unique IPs, preserving first-seen order
justips -U access.log

# Extract from stdin
tail -f access.log | justips

# Filter for only routable IPs
justips --no-private --no-loopback --no-broadcast network.txt

# Multiple files
justips access.log error.log firewall.log

Options

FlagDescription
-u, --uniqueDeduplicate IPs (unordered, fastest)
-U, --unique-orderedDeduplicate IPs, preserving first-seen order
--allInclude all IPs (private, loopback, etc)
--no-privateExclude RFC 1918 and ULA ranges
--no-loopbackExclude 127.0.0.1 and ::1
--no-broadcastExclude broadcast and link-local ranges

Deduplication Modes

No dedup (default)

Streams IPs directly to stdout as they are found. Zero memory overhead — output begins immediately.

-u / --unique (unordered)

Hash-based dedup using HashSet. Each rayon chunk builds its own HashSet<String>, then all sets are merged. Output order is not guaranteed.

Best for: feeding into other tools where order doesn’t matter (e.g., enrichment pipelines, blocklist generation).

-U / --unique-ordered (first-seen order)

Order-preserving dedup using IndexSet. Each rayon chunk builds its own IndexSet<String>, then sets are merged in chunk order. The first occurrence of each IP determines its position in the output.

Best for: preserving chronological context (e.g., “which IPs appeared first in these logs?”).

Performance

Benchmarked against a 1.7GB Suricata log dataset (15.4M lines, 30.7M IPs):

ModeTimeOverhead
Stream (default)857ms
Unique unordered (-u)925ms+8%
Unique ordered (-U)967ms+13%

Architecture

  • Files: Memory-mapped (mmap) and split into ~4MB chunks at newline boundaries, processed in parallel with rayon
  • stdin: Line-buffered streaming via ripline (single-threaded, suitable for pipes)
  • Defang: Automatically recognizes and normalizes defanged IPs (192[.]168[.]1[.]1192.168.1.1)

When to use justips vs geoipsed

NeedTool
Raw list of IPsjustips
Unique IPs from logsjustips -u or justips -U
IPs with geolocation metadatageoipsed
Inline decoration of log linesgeoipsed
JSON output with IP positionsgeoipsed --tag

Python: ipextract

A Python package for fast IP address extraction from text, powered by the ip-extract Rust crate via PyO3.

Installation

pip install ipextract

Requires Python 3.10+. Pre-built wheels are available for Linux (x86_64, aarch64), macOS (x86_64, arm64), and Windows (x64). No Rust toolchain needed.

Quick Start

import ipextract

# Extract all IPs from a string
ips = ipextract.extract("Connection from 192.168.1.1 and 8.8.8.8")
# ["192.168.1.1", "8.8.8.8"]

# Deduplicate, preserving first-seen order
ips = ipextract.extract_unique("8.8.8.8 1.1.1.1 8.8.8.8")
# ["8.8.8.8", "1.1.1.1"]

# bytes input works too (useful when reading log files directly)
ips = ipextract.extract(b"host 10.0.0.1 connected")
# ["10.0.0.1"]

Reusable Extractor

For processing many lines (log analysis, batch jobs), create an Extractor once and reuse it. This avoids redundant initialization on each call.

extractor = ipextract.Extractor().only_public()

for line in log_file:
    ips = extractor.extract(line)
    if ips:
        process(ips)

Filtering

By default, all IP addresses are extracted — IPv4, IPv6, private ranges, loopback, and broadcast. Use the builder methods to filter:

# Only publicly routable IPs (excludes RFC 1918, loopback, broadcast)
e = ipextract.Extractor().only_public()

# Specific exclusions
e = ipextract.Extractor().ignore_private().ignore_loopback()

# Constructor kwargs for one-shot config
e = ipextract.Extractor(private=False, loopback=False, ipv6=False)

Fluent methods return a new Extractor — the original is not modified, making partial configs safe to reuse:

base = ipextract.Extractor()
public = base.only_public()   # new object
ipv4_only = base.ipv6(False)  # new object, base unchanged

Filter reference

MethodEffect
.only_public()Exclude private, loopback, and broadcast
.ignore_private()Exclude RFC 1918 (IPv4) and ULA/link-local (IPv6)
.ignore_loopback()Exclude 127.0.0.0/8 and ::1
.ignore_broadcast()Exclude 255.255.255.255 and link-local ranges
.ipv4(False)Skip IPv4 entirely
.ipv6(False)Skip IPv6 entirely
.private_ips(bool)Enable/disable private IPs
.loopback_ips(bool)Enable/disable loopback IPs
.broadcast_ips(bool)Enable/disable broadcast IPs

Extraction Methods

extract(text)list[str]

Returns all IP addresses found, in order of appearance.

ipextract.Extractor().extract("a 1.1.1.1 b 2.2.2.2")
# ["1.1.1.1", "2.2.2.2"]

extract_unique(text)list[str]

Returns unique IP addresses, preserving first-seen order.

ipextract.Extractor().extract_unique("1.1.1.1 2.2.2.2 1.1.1.1")
# ["1.1.1.1", "2.2.2.2"]

extract_with_offsets(text)list[tuple[str, int, int]]

Returns (ip, start, end) tuples. The byte offsets index directly into the original input — useful for annotation, highlighting, or structured log parsing.

text = "host 1.2.3.4 port 80"
for ip, start, end in ipextract.Extractor().extract_with_offsets(text):
    print(f"{ip} at [{start}:{end}]")
    assert text[start:end] == ip
# 1.2.3.4 at [5:12]

Performance

ipextract is designed for high-throughput applications. It uses a compile-time DFA (Deterministic Finite Automaton) from the Rust ip-extract crate, which scans at O(n) without backtracking.

Typical benchmark results comparing ipextract to Python re + ipaddress.ip_address() validation (both sides extract and validate):

Scenariore + ipaddress (ms)ipextract (ms)Speedup
Dense IPs (1000 mixed v4+v6)2.3ms0.25ms9x
Sparse Logs (1000 IPs in noise)7.4ms0.46ms16x
Pure Text (100KB with zero IPs)4.0ms0.16ms25x
Defanged IPs (1000 mixed)2.5ms0.35ms7x

The larger the input and the more non-IP text it contains, the greater the performance advantage. ipextract excels at high-speed log scanning because it can reject non-IP text much faster than a backtracking regex engine.

Module-Level Convenience Functions

ipextract.extract() and ipextract.extract_unique() are shorthand for Extractor().extract() with default settings (all IPs included). For repeated calls, prefer creating an Extractor instance explicitly.

Source

The ipextract Python package lives in crates/ipextract-py/ and wraps the ip-extract Rust crate.

API Documentation