Skip to content

LavenderTown

A Streamlit-first Python package for detecting and visualizing "data ghosts": type inconsistencies, nulls, invalid values, schema drift, and anomalies in tabular datasets.

LavenderTown helps you quickly identify data quality issues in your datasets through an intuitive, interactive Streamlit interface. Perfect for data scientists, analysts, and engineers who need to understand their data quality before diving into analysis.

Features

Version Information

See Version Mapping for details on when features were introduced.

Quick Start

import streamlit as st
from lavendertown import Inspector
import pandas as pd

# Load your data
df = pd.read_csv("your_data.csv")

# Create inspector and render
inspector = Inspector(df)
inspector.render()  # This must be called within a Streamlit app context

That's it! Save this code in a file (e.g., app.py) and run streamlit run app.py to see the interactive data quality dashboard.

Installation

pip install lavendertown

For optional features:

# Polars support
pip install lavendertown[polars]

# Ecosystem integrations
pip install lavendertown[pandera]
pip install lavendertown[great_expectations]

# Enhanced CLI with Rich formatting
pip install lavendertown[cli]

# ML and time-series features
pip install lavendertown[ml]          # PyOD + scikit-learn for 40+ ML algorithms
pip install lavendertown[timeseries]  # Ruptures + statsmodels + tsfresh for time-series analysis
pip install lavendertown[profiling]   # ydata-profiling for comprehensive reports
pip install lavendertown[parquet]     # PyArrow for Parquet export
pip install lavendertown[stats]       # scipy.stats for statistical tests

# Phase 7 features (v0.7.0)
pip install lavendertown[plotly]      # Plotly for interactive visualizations
pip install lavendertown[ui]          # Streamlit Extras for enhanced UI components
pip install lavendertown[database]    # SQLAlchemy for database backend

# All optional dependencies
pip install lavendertown[all]

Documentation

Ghost Categories

LavenderTown detects four main categories of data quality issues:

  1. Structural Ghosts - Mixed dtypes, schema drift, unexpected nullability
  2. Value Ghosts - Out-of-range values, regex violations, enum violations
  3. Completeness Ghosts - Null density thresholds, conditional nulls
  4. Statistical Ghosts - Outliers (IQR method), distribution shifts

Each finding includes: - Ghost type: Category of the issue - Column: Affected column name - Severity: info, warning, or error - Description: Human-readable explanation - Row indices: Specific rows affected (when applicable) - Metadata: Additional diagnostic information

  • GitHub Repository: https://github.com/eddiethedean/lavendertown
  • PyPI Package: https://pypi.org/project/lavendertown/
  • Issues: https://github.com/eddiethedean/lavendertown/issues

Made with ❤️ for the data quality community