Keyboard shortcuts

Press ← or β†’ to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

ip-extract

A fast IP address extraction library for Rust.

Extract IPv4 and IPv6 addresses from unstructured text with minimal overhead. This crate powers the core extraction engine for geoipsed and is designed for high-throughput scanning of large datasets.

Features

  • ⚑ Performance Optimized: Compile-time DFA with O(n) scanning, no runtime regex compilation
  • 🎯 Strict Validation: Deep validation eliminates false positives (e.g., rejects 1.2.3.4.5)
  • πŸ›‘οΈ Defang Support: Automatically matches defanged IPs (192[.]168[.]1[.]1, 2001[:]db8[:]...) with negligible overhead
  • βš™οΈ Configurable: Fine-grained control over address types (private, loopback, broadcast)
  • πŸ”’ Byte-Oriented: Zero-copy scanning directly on byte slices, no UTF-8 validation overhead

Basic Example

Extract all IP addresses (default behavior):

use ip_extract::ExtractorBuilder;

fn main() -> anyhow::Result<()> {
    // Extracts all IPs: IPv4, IPv6, private, loopback, broadcast
    let extractor = ExtractorBuilder::new().build()?;

    let input = b"Connection from 192.168.1.1 and 8.8.8.8";

    for range in extractor.find_iter(input) {
        let ip = std::str::from_utf8(&input[range]).unwrap();
        println!("Found IP: {}", ip);
    }

    Ok(())
}

Configuration Examples

#![allow(unused)]
fn main() {
use ip_extract::ExtractorBuilder;

// Extract only public IPs (recommended for most use cases)
let extractor = ExtractorBuilder::new()
    .only_public()
    .build()?;

// Extract only IPv4, ignoring loopback
let extractor = ExtractorBuilder::new()
    .ipv6(false)
    .ignore_loopback()
    .build()?;

// Fine-grained control
let extractor = ExtractorBuilder::new()
    .ipv4(true)
    .ipv6(true)
    .ignore_private()
    .ignore_broadcast()
    .build()?;
}

Defanged IP Support

Threat intelligence reports and security logs commonly use β€œdefanged” IPs to prevent accidental connections. ip-extract recognizes these automatically β€” no opt-in needed.

#![allow(unused)]
fn main() {
let extractor = ExtractorBuilder::new().build()?;

let input = b"IOC: 192[.]168[.]1[.]1 and 2001[:]db8[:]0[:]0[:]0[:]0[:]0[:]1";
for m in extractor.match_iter(input) {
    // as_str() returns the normalized (refanged) IP β€” zero-copy for normal input
    println!("{}", m.as_str());         // "192.168.1.1"

    // as_matched_str() returns exactly what was in the input
    println!("{}", m.as_matched_str()); // "192[.]168[.]1[.]1"

    // ip() parses to std::net::IpAddr
    println!("{:?}", m.ip());           // Ok(V4(192.168.1.1))
}
}

Supported notation

TypeBracketExample
IPv4[.]192[.]168[.]1[.]1
IPv6[:]2001[:]db8[:]0[:]0[:]0[:]0[:]0[:]1

Note: IPv6 defanged notation requires fully-expanded form β€” [::] compression is not supported.

Performance impact

Defang patterns are expanded into the DFA at compile time (+3KB binary size). There is no measurable regression on normal (fanged) input. On defanged input, the DFA approach is 16% faster than pre-processing normalization.

Benchmarks

Typical throughput on modern hardware:

ScenarioThroughput
Dense IPs (mostly IP addresses)160+ MiB/s
Sparse logs (mixed with text)360+ MiB/s
Pure scanning (no IPs)620+ MiB/s

Performance Architecture

ip-extract achieves maximum throughput through a two-stage design:

  1. Compile-Time DFA (Build Phase)

    • Regex patterns compiled into dense Forward DFAs during build
    • DFA serialized and embedded in binary (~600KB)
    • Eliminates all runtime regex compilation
  2. Zero-Cost Scanning (Runtime)

    • O(n) byte scanning with lazy DFA initialization
    • Single forward pass, no backtracking
    • Validation only on candidates, not all scanned bytes
  3. Strict Validation

    • Hand-optimized1 IPv4 parser (20-30% faster than std::net)
    • Boundary checking prevents false matches (e.g., 1.2.3.4.5 rejected)
    • Configurable filters for special ranges

  1. AI wrote all of this. It does not have hands. ↩