Skip to content

nwgrep

Grep your dataframes

Search and filter dataframes with grep-like patterns. Works with pandas, polars, and any backend supported by Narwhals.

uv ruff ty License: MIT Claude Gemini

# Find what you're looking for
df.grep("active")              # Simple search
df.grep("@gmail.com")          # Find patterns
df.grep(r"^\d{3}-\d{4}$")      # Regex support

What is nwgrep?

nwgrep brings the familiar power of grep to dataframes. Search across columns, filter by patterns, use regex - all with a simple, intuitive interface that works seamlessly with any dataframe library thanks to Narwhals.

Why nwgrep?

Familiar Interface
If you know grep, you know nwgrep. Same flags (-i, -v, -E), same intuition.
Backend Agnostic
Write once, run anywhere. Switch from pandas to polars without changing your code.
Simple to Use
Three ways to use: function call, pipe method, or accessor. Choose what feels natural.
Lightning Fast
Lazy evaluation with polars/daft. Process multi-GB files efficiently.
Type Safe
Full type hints. Catch errors before runtime with ty.

Quick Start

Install with your preferred backend:

uv add nwgrep[polars]  # or pandas, dask, pyarrow, cudf

Search your data:

import pandas as pd
from nwgrep import nwgrep

df = pd.DataFrame({
    "name": ["Alice", "Bob", "Eve"],
    "status": ["active", "locked", "active"],
})

# Find all rows containing "active"
result = nwgrep(df, "active")

That's it. No complex queries, no backend-specific syntax.

Three Ways to Use

Choose the style that fits your workflow:

Simple and explicit.

from nwgrep import nwgrep
result = nwgrep(df, "active")

Best for: Simple scripts, one-off searches, maximum clarity.

Functional style for data pipelines.

result = (
    df
    .pipe(nwgrep, "active")
    .pipe(nwgrep, "@example.com", columns=["email"])
    .pipe(lambda x: x.sort_values('name'))
)

Best for: Data pipelines, method chaining, functional programming.

Cleanest syntax for interactive use.

from nwgrep import register_grep_accessor
register_grep_accessor()  # Once at startup

df.grep("active")
df.grep("ALICE", case_sensitive=False)
df.grep("example.com", columns=["email"])

Best for: Notebooks, interactive analysis, frequent searching.

Search binary formats directly.

nwgrep "error" logfile.parquet
nwgrep -i "warning" data.feather
nwgrep --format ndjson "pattern" data.parquet | jq .

Best for: Shell scripts, one-liners, exploring data files.

Powerful Search Options

All the grep features you know and love:

# Case-insensitive search
df.grep("ACTIVE", case_sensitive=False)

# Invert match (like grep -v)
df.grep("test", invert=True)

# Regex patterns
df.grep(r".*@example\.com", regex=True)

# Multiple patterns (OR logic)
df.grep(["Alice", "Bob"])

# Whole word matching
df.grep("active", whole_word=True)

# Column-specific search
df.grep("pattern", columns=["name", "email"])

Backend Support

Works seamlessly with any dataframe library:

Backend Status Notes
pandas Full support
polars DataFrame and LazyFrame
pyarrow Table support
dask Distributed dataframes
daft Lazy evaluation
cuDF GPU acceleration
modin Parallel pandas

Same code, any backend. Switch freely without rewriting your filters.

Real-World Examples

Find Active Users

users = df.grep("active", columns=["status"])
gmail_users = df.grep("@gmail.com", columns=["email"])

Log Analysis

errors = df.grep(["ERROR", "CRITICAL"], columns=["level"])

Data Quality Checks

# Find rows without email addresses
missing_email = df.grep(r"\w+@\w+\.\w+", regex=True, invert=True)

Pipeline Filtering

result = (
    df
    .grep("active", columns=["status"])      # Active users
    .grep("@company.com", columns=["email"]) # Company emails
    .grep("admin", invert=True)              # Exclude admins
)

Why Narwhals?

Narwhals provides a unified API across dataframe libraries. This means:

  • Write once, run anywhere - Same code for pandas, polars, or any backend
  • No vendor lock-in - Switch backends without rewriting code
  • Automatic optimization - Each backend uses its strengths
  • Future-proof - Support for new backends as they emerge

nwgrep is a certified Narwhals plugin, enabling truly backend-agnostic filtering.

Next Steps

  • Installation

Get nwgrep installed with your preferred backends

Installation Guide

  • Usage

Learn all the ways to search and filter your data

Usage Examples

  • API Reference

Complete function and parameter documentation

API Reference

  • CLI Reference

Command-line interface for binary formats

CLI Reference

Credit

Built with using Narwhals for dataframe abstraction.

Special thanks to Claude and Gemini for their assistance in developing this project.