Usage Guide
This guide covers all the ways to use nwgrep for searching and filtering dataframes.
Basic Search
Search for a pattern across all columns:
import pandas as pd
from nwgrep import nwgrep
df = pd.DataFrame({
"name": ["Alice", "Bob", "Eve"],
"email": ["alice@example.com", "bob@test.org", "eve@example.com"],
"status": ["active", "locked", "active"],
})
# Find all rows containing "active"
result = nwgrep(df, "active")
print(result)
Output:
Search Options
Case-Insensitive Search
Invert Match
Return rows that don't match the pattern (like grep -v):
Output:
Column-Specific Search
Search only in specific columns:
# Only search in the email column
result = nwgrep(df, "example.com", columns=["email"])
print(result)
Output:
Regex Search
Use regular expressions for complex patterns:
# Find emails with .com domain
result = nwgrep(df, r"\.com$", regex=True, columns=["email"])
# Find names starting with A or E
result = nwgrep(df, r"^(A|E)", regex=True, columns=["name"])
Multiple Patterns
Search for any of multiple patterns (OR logic):
Whole Word Matching
Match complete words only:
df = pd.DataFrame({
"text": ["activate", "active", "actor"]
})
# Only matches "active", not "activate" or "actor"
result = nwgrep(df, "active", whole_word=True)
Usage Patterns
Method 1: Direct Function Call
The simplest approach - call nwgrep() directly:
Best for: Simple scripts, one-off searches, maximum compatibility.
Method 2: Pipe Method
Use with pandas/polars .pipe() for functional-style data pipelines:
result = df.pipe(nwgrep, "pattern")
# Beautiful chaining
result = (
df
.pipe(nwgrep, "active")
.pipe(nwgrep, "example.com", columns=["email"])
.pipe(lambda x: x.sort_values('name'))
)
Best for: Data pipelines, functional programming style, method chaining.
Method 3: Accessor Method
Register .grep() as a dataframe method for the cleanest syntax:
from nwgrep import register_grep_accessor
# Register once at the start of your script/notebook
register_grep_accessor()
# Now use .grep() directly
result = df.grep("active")
result = df.grep("ACTIVE", case_sensitive=False)
result = df.grep("pattern", columns=["email"])
Best for: Interactive use (notebooks), frequent searching, cleanest syntax.
Note
The accessor method works with both pandas and polars DataFrames.
Method 4: Narwhals Native
Work directly with Narwhals objects in backend-agnostic code:
import narwhals as nw
from nwgrep import nwgrep
# Accept any backend
def process_data(df_native):
df = nw.from_native(df_native)
result = nwgrep(df, "pattern")
return nw.to_native(result)
# Works with pandas
import pandas as pd
df_pandas = pd.DataFrame({"col": ["a", "b"]})
result = process_data(df_pandas)
# Works with polars
import polars as pl
df_polars = pl.DataFrame({"col": ["a", "b"]})
result = process_data(df_polars)
Best for: Library code, backend-agnostic functions, writing reusable components.
Working with Different Backends
nwgrep works seamlessly across all backends:
Highlighting Results
For pandas and polars backends, you can highlight the specific cells containing matches in notebooks (Jupyter, Marimo):
from nwgrep import nwgrep
import polars as pl
df = pl.DataFrame({
"timestamp": ["2024-01-01", "2024-01-02", "2024-01-03"],
"level": ["INFO", "ERROR", "WARN"],
"message": ["Started", "Connection failed", "Slow query"]
})
# Highlight cells containing "ERROR" with yellow background
result = nwgrep(df, "ERROR", highlight=True)
# In Jupyter/Marimo, only cells containing the match are highlighted
Styling Requirements:
- Pandas: Uses built-in
pandas.Styler(no additional dependencies) - Polars: Requires
great-tableslibrary (install withuv add 'nwgrep[notebook]')
Highlighting Features:
- Only cells containing the matched text are highlighted with yellow background
- Returns a styled object that displays beautifully in notebooks
- Works across all columns, highlighting only the specific cells with matches
- Incompatible with
count=True(raisesValueError) - Works with all search options (regex, case-insensitive, column filtering, etc.)
Advanced Examples
Email Domain Search
df = pd.DataFrame({
"user": ["alice", "bob", "charlie"],
"email": ["alice@gmail.com", "bob@company.com", "charlie@gmail.com"]
})
# Find all Gmail users
gmail_users = nwgrep(df, "gmail.com", columns=["email"])
Log File Analysis
df = pd.DataFrame({
"timestamp": ["2024-01-01", "2024-01-01", "2024-01-02"],
"level": ["INFO", "ERROR", "WARN"],
"message": ["Started", "Connection failed", "Slow query"]
})
# Find all errors and warnings
issues = nwgrep(df, ["ERROR", "WARN"], columns=["level"])
# Find connection-related messages
conn_logs = nwgrep(df, "connection", case_sensitive=False)
Data Quality Checks
# Find rows with email addresses (regex)
has_email = nwgrep(df, r"\w+@\w+\.\w+", regex=True)
# Find rows without phone numbers (invert)
no_phone = nwgrep(df, r"\d{3}-\d{3}-\d{4}", regex=True, invert=True)
Complex Pipeline
from nwgrep import register_grep_accessor
register_grep_accessor()
# Chain multiple operations
result = (
df
.grep("active", columns=["status"]) # Only active users
.grep("@company.com", columns=["email"]) # Company emails
.grep("admin", invert=True) # Exclude admins
.sort_values("name") # Sort by name
.reset_index(drop=True)
)
Performance Tips
Use Column Filtering
When you know which columns contain your data, specify them:
# Faster - only searches email column
result = nwgrep(df, "example.com", columns=["email"])
# Slower - searches all columns
result = nwgrep(df, "example.com")
Leverage Lazy Evaluation
With polars or daft, use lazy frames for better performance:
import polars as pl
# Lazy - builds query plan, executes once
df = (
pl.scan_parquet("huge_file.parquet")
.pipe(nwgrep, "pattern")
.collect()
)
Choose the Right Backend
Different backends excel at different tasks:
- pandas: Best for small-medium data, interactive work
- polars: Best for large data, complex transformations
- dask: Best for data larger than memory, distributed computing
- cuDF: Best when you have GPU acceleration available
Common Patterns
Quick Data Exploration
from nwgrep import register_grep_accessor
register_grep_accessor()
# Quick searches in notebooks
df.grep("TODO") # Find TODO items
df.grep("@") # Find rows with email addresses
df.grep("error", case_sensitive=False) # Find errors
Data Cleaning
# Remove test/dummy data
clean_df = df.pipe(nwgrep, "test", invert=True)
# Keep only valid email addresses
valid_emails = df.pipe(
nwgrep,
r"^[\w\.-]+@[\w\.-]+\.\w+$",
regex=True,
columns=["email"]
)
Filtering Pipelines
def filter_active_users(df):
return df.pipe(nwgrep, "active", columns=["status"])
def filter_premium(df):
return df.pipe(nwgrep, "premium", columns=["tier"])
# Compose filters
result = (
raw_data
.pipe(filter_active_users)
.pipe(filter_premium)
)
Next Steps
- API Reference - Complete parameter documentation
- CLI Reference - Command-line usage
- Contributing - Help improve nwgrep