Base Detector¶
Base class for all ghost detectors.
GhostDetector ¶
Bases: ABC
Abstract base class for all ghost detectors.
Detectors are stateless and UI-agnostic modules that analyze DataFrames
and return normalized GhostFinding objects. They implement a single
detect() method that takes a DataFrame (Pandas or Polars) and
returns a list of findings.
Detectors should be designed to work with both Pandas and Polars DataFrames by detecting the backend and using the appropriate API.
Subclasses must implement the detect() method. The get_name()
method provides a default implementation that returns the class name,
but can be overridden for custom naming.
Example
Implement a custom detector::
from lavendertown.detectors.base import GhostDetector
from lavendertown.models import GhostFinding
from lavendertown.detectors.base import detect_dataframe_backend
class CustomDetector(GhostDetector):
def detect(self, df):
backend = detect_dataframe_backend(df)
findings = []
# Custom detection logic here
return findings
Source code in lavendertown/detectors/base.py
Functions¶
detect
abstractmethod
¶
Detect ghosts in the given DataFrame.
This is the main method that subclasses must implement. It should analyze the DataFrame for specific types of data quality issues and return a list of findings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
object
|
DataFrame to analyze. Can be a pandas.DataFrame or
polars.DataFrame. The detector should use
|
required |
Returns:
| Type | Description |
|---|---|
list[GhostFinding]
|
List of GhostFinding objects representing all detected issues |
list[GhostFinding]
|
of this detector's type. Can be an empty list if no issues |
list[GhostFinding]
|
are found. |
Note
Detectors should handle both Pandas and Polars DataFrames.
Use detect_dataframe_backend() to determine which API to use.
Source code in lavendertown/detectors/base.py
get_name ¶
Get the name of this detector.
Returns the human-readable name of the detector. By default, this returns the class name, but subclasses can override this method to provide a more descriptive name.
Returns:
| Type | Description |
|---|---|
str
|
String name of the detector. Used in UI displays and progress |
str
|
indicators. Defaults to the class name (e.g., "NullGhostDetector"). |
Source code in lavendertown/detectors/base.py
detect_dataframe_backend ¶
Detect whether a DataFrame is Pandas or Polars.
Examines the DataFrame object's attributes to determine which backend library it belongs to. This allows LavenderTown to use the appropriate API for each backend.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
object
|
DataFrame object to inspect. Should be a pandas.DataFrame or polars.DataFrame instance. |
required |
Returns:
| Type | Description |
|---|---|
str
|
String indicating the backend: "pandas" or "polars". |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the DataFrame type cannot be determined (not Pandas or Polars). This typically means an unsupported DataFrame type was passed. |
Example
Detect backend before performing operations::
backend = detect_dataframe_backend(df)
if backend == "pandas":
# Use pandas API
result = df[column].isna().sum()
else:
# Use polars API
result = df[column].null_count()