Roadmap¶
Phase 0 — Foundations ✅ COMPLETE¶
- Repo setup & packaging
- Core Inspector class
- Pandas support
- Streamlit rendering shell
Phase 1 — MVP ✅ COMPLETE¶
- Null detection & visualization
- Type inconsistency detection
- Basic outlier detection
- CSV upload UI
- Findings export (JSON/CSV)
Phase 2 — Power Features ✅ COMPLETE¶
- Polars support
- Rule authoring UI
- Dataset comparison (drift)
- Performance optimizations
- Theme support
Phase 3 — Ecosystem Integration ✅ COMPLETE¶
- Pandera rule export
- Great Expectations export
- CLI wrapper
- Streamlit Cloud demo app
Phase 4 — Advanced Ghosts ✅ COMPLETE¶
- Time-series anomalies
- Cross-column logic
- ML-assisted anomaly detection
- Collaboration features
Phase 5 — Quick Wins ✅ COMPLETE¶
Based on research recommendations (see docs/RESEARCH_PYTHON_PACKAGES.md):
- Rich CLI output enhancement
- orjson JSON serialization
- Hypothesis property-based testing
- python-dotenv configuration
Phase 6 — Feature Enhancements ✅ COMPLETE¶
- Typer CLI framework migration
- PyOD anomaly detection (40+ ML algorithms)
- Ruptures change point detection
- scipy.stats statistical tests for drift detection
- PyArrow Parquet export
- Faker test data generation
- ydata-profiling integration
Phase 7 — Advanced Integrations ✅ COMPLETE¶
- Plotly interactive visualizations
- tsfresh time-series features
- Streamlit Extras UI components
- SQLAlchemy database backend
- Joblib parallel detector execution (deferred to future release)
Success Metrics¶
- ✅ <5 lines to usable dashboard
- ⚠️ Sub-2s load for 100k rows (varies by system)
- ✅ Clear ghost explanations
Future Enhancements (Post-Phase 7)¶
- Cloud-based collaboration storage integration
- Real-time collaboration features (WebSocket-based)
- Advanced ML models (deep learning, autoencoders)
- Time-series forecasting integration (Prophet)
- Automated rule suggestion based on ML findings
- Performance benchmarking and optimization for large-scale datasets
- Dask integration for cluster-scale deployments
Stretch Ideas¶
- VS Code extension
- Jupyter widget
- dbt integration