Models¶

Core data models for ghost findings.

GhostFinding¶

Represents a single data quality issue detected in a dataset.

GhostFinding `dataclass` ¶

Represents a single data quality issue (ghost) detected in a dataset.

GhostFindings are the primary output of data quality detectors. They encapsulate all information about a detected issue in a structured format that can be displayed in the UI, exported to various formats, or processed programmatically.

Attributes:

Name	Type	Description
`ghost_type`	`str`	Category of the ghost/issue. Common types include: - "null": Excessive null values - "type": Type inconsistencies - "outlier": Statistical outliers - "drift": Schema or distribution drift - "rule": Custom rule violations
`column`	`str`	Name of the affected column. Empty string for issues that don't relate to a specific column.
`severity`	`str`	Severity level of the issue. Valid values: - "info": Informational, minor issue - "warning": Warning-level issue that may need attention - "error": Error-level issue requiring immediate attention
`description`	`str`	Human-readable description of the issue, suitable for display in UI or reports.
`row_indices`	`list[int] \| None`	Optional list of row indices (0-based) affected by the issue. None if specific row indices are not available or not applicable. For Polars DataFrames, this may often be None as Polars doesn't maintain index concepts.
`metadata`	`dict[str, Any]`	Additional context information as key-value pairs. Contains detector-specific metadata such as statistics, thresholds, or diagnostic information. Empty dict by default.

Example

Create a finding manually::

finding = GhostFinding(
    ghost_type="null",
    column="email",
    severity="warning",
    description="Column 'email' has 15 null values (25% of 60 rows)",
    row_indices=[2, 5, 8, 12, 15],
    metadata={"null_count": 15, "null_percentage": 0.25}
)

Convert to/from dict for serialization::

# Serialize
data = finding.to_dict()

# Deserialize
finding = GhostFinding.from_dict(data)

Source code in lavendertown/models.py

@dataclass
class GhostFinding:
    """Represents a single data quality issue (ghost) detected in a dataset.

    GhostFindings are the primary output of data quality detectors. They
    encapsulate all information about a detected issue in a structured format
    that can be displayed in the UI, exported to various formats, or processed
    programmatically.

    Attributes:
        ghost_type: Category of the ghost/issue. Common types include:
            - "null": Excessive null values
            - "type": Type inconsistencies
            - "outlier": Statistical outliers
            - "drift": Schema or distribution drift
            - "rule": Custom rule violations
        column: Name of the affected column. Empty string for issues that
            don't relate to a specific column.
        severity: Severity level of the issue. Valid values:
            - "info": Informational, minor issue
            - "warning": Warning-level issue that may need attention
            - "error": Error-level issue requiring immediate attention
        description: Human-readable description of the issue, suitable for
            display in UI or reports.
        row_indices: Optional list of row indices (0-based) affected by the
            issue. None if specific row indices are not available or not
            applicable. For Polars DataFrames, this may often be None as
            Polars doesn't maintain index concepts.
        metadata: Additional context information as key-value pairs.
            Contains detector-specific metadata such as statistics, thresholds,
            or diagnostic information. Empty dict by default.

    Example:
        Create a finding manually::

            finding = GhostFinding(
                ghost_type="null",
                column="email",
                severity="warning",
                description="Column 'email' has 15 null values (25% of 60 rows)",
                row_indices=[2, 5, 8, 12, 15],
                metadata={"null_count": 15, "null_percentage": 0.25}
            )

        Convert to/from dict for serialization::

            # Serialize
            data = finding.to_dict()

            # Deserialize
            finding = GhostFinding.from_dict(data)
    """

    ghost_type: str
    column: str
    severity: str
    description: str
    row_indices: list[int] | None = None
    metadata: dict[str, Any] = field(default_factory=dict)

    def __post_init__(self) -> None:
        """Validate severity values after initialization.

        Ensures that the severity attribute is one of the valid values.
        Called automatically by the dataclass decorator after initialization.

        Raises:
            ValueError: If severity is not one of "info", "warning", or "error".
        """
        valid_severities = {"info", "warning", "error"}
        if self.severity not in valid_severities:
            raise ValueError(
                f"Severity must be one of {valid_severities}, got {self.severity}"
            )

    def to_dict(self) -> dict[str, Any]:
        """Convert finding to dictionary for serialization.

        Converts the GhostFinding instance to a dictionary representation
        suitable for JSON serialization or caching. All fields are included
        in the dictionary.

        Returns:
            Dictionary containing all finding attributes. Keys match the
            dataclass field names: "ghost_type", "column", "severity",
            "description", "row_indices", and "metadata".

        Example:
            Serialize for JSON export::

                import json
                finding = GhostFinding(...)
                data = finding.to_dict()
                json_str = json.dumps(data)
        """
        return {
            "ghost_type": self.ghost_type,
            "column": self.column,
            "severity": self.severity,
            "description": self.description,
            "row_indices": self.row_indices,
            "metadata": self.metadata,
        }

    @classmethod
    def from_dict(cls, data: dict[str, Any]) -> "GhostFinding":
        """Create finding from dictionary.

        Deserializes a dictionary back into a GhostFinding instance.
        Used for loading findings from JSON files or retrieving cached
        findings from Streamlit's cache.

        Args:
            data: Dictionary containing finding data. Must include the
                required fields: "ghost_type", "column", "severity", and
                "description". Optional fields "row_indices" and "metadata"
                will use defaults if not present.

        Returns:
            GhostFinding instance initialized with data from the dictionary.

        Raises:
            ValueError: If required fields ("ghost_type", "column", "severity",
                "description") are missing from the dictionary.

        Example:
            Deserialize from JSON::

                import json
                json_str = '{"ghost_type": "null", "column": "email", ...}'
                data = json.loads(json_str)
                finding = GhostFinding.from_dict(data)
        """
        required_fields = ["ghost_type", "column", "severity", "description"]
        missing_fields = [field for field in required_fields if field not in data]
        if missing_fields:
            raise ValueError(
                f"Missing required fields in GhostFinding data: {', '.join(missing_fields)}"
            )
        return cls(
            ghost_type=data["ghost_type"],
            column=data["column"],
            severity=data["severity"],
            description=data["description"],
            row_indices=data.get("row_indices"),
            metadata=data.get("metadata", {}),
        )

Functions¶

__post_init__ ¶

__post_init__()

Validate severity values after initialization.

Ensures that the severity attribute is one of the valid values. Called automatically by the dataclass decorator after initialization.

Raises:

Type	Description
`ValueError`	If severity is not one of "info", "warning", or "error".

Source code in lavendertown/models.py

def __post_init__(self) -> None:
    """Validate severity values after initialization.

    Ensures that the severity attribute is one of the valid values.
    Called automatically by the dataclass decorator after initialization.

    Raises:
        ValueError: If severity is not one of "info", "warning", or "error".
    """
    valid_severities = {"info", "warning", "error"}
    if self.severity not in valid_severities:
        raise ValueError(
            f"Severity must be one of {valid_severities}, got {self.severity}"
        )

from_dict `classmethod` ¶

from_dict(data)

Create finding from dictionary.

Deserializes a dictionary back into a GhostFinding instance. Used for loading findings from JSON files or retrieving cached findings from Streamlit's cache.

Parameters:

Name	Type	Description	Default
`data`	`dict[str, Any]`	Dictionary containing finding data. Must include the required fields: "ghost_type", "column", "severity", and "description". Optional fields "row_indices" and "metadata" will use defaults if not present.	required

Returns:

Type	Description
`'GhostFinding'`	GhostFinding instance initialized with data from the dictionary.

Raises:

Type	Description
`ValueError`	If required fields ("ghost_type", "column", "severity", "description") are missing from the dictionary.

Example

Deserialize from JSON::

import json
json_str = '{"ghost_type": "null", "column": "email", ...}'
data = json.loads(json_str)
finding = GhostFinding.from_dict(data)

Source code in lavendertown/models.py

@classmethod
def from_dict(cls, data: dict[str, Any]) -> "GhostFinding":
    """Create finding from dictionary.

    Deserializes a dictionary back into a GhostFinding instance.
    Used for loading findings from JSON files or retrieving cached
    findings from Streamlit's cache.

    Args:
        data: Dictionary containing finding data. Must include the
            required fields: "ghost_type", "column", "severity", and
            "description". Optional fields "row_indices" and "metadata"
            will use defaults if not present.

    Returns:
        GhostFinding instance initialized with data from the dictionary.

    Raises:
        ValueError: If required fields ("ghost_type", "column", "severity",
            "description") are missing from the dictionary.

    Example:
        Deserialize from JSON::

            import json
            json_str = '{"ghost_type": "null", "column": "email", ...}'
            data = json.loads(json_str)
            finding = GhostFinding.from_dict(data)
    """
    required_fields = ["ghost_type", "column", "severity", "description"]
    missing_fields = [field for field in required_fields if field not in data]
    if missing_fields:
        raise ValueError(
            f"Missing required fields in GhostFinding data: {', '.join(missing_fields)}"
        )
    return cls(
        ghost_type=data["ghost_type"],
        column=data["column"],
        severity=data["severity"],
        description=data["description"],
        row_indices=data.get("row_indices"),
        metadata=data.get("metadata", {}),
    )

to_dict ¶

to_dict()

Convert finding to dictionary for serialization.

Converts the GhostFinding instance to a dictionary representation suitable for JSON serialization or caching. All fields are included in the dictionary.

Returns:

Type	Description
`dict[str, Any]`	Dictionary containing all finding attributes. Keys match the
`dict[str, Any]`	dataclass field names: "ghost_type", "column", "severity",
`dict[str, Any]`	"description", "row_indices", and "metadata".

Example

Serialize for JSON export::

import json
finding = GhostFinding(...)
data = finding.to_dict()
json_str = json.dumps(data)

Source code in lavendertown/models.py

def to_dict(self) -> dict[str, Any]:
    """Convert finding to dictionary for serialization.

    Converts the GhostFinding instance to a dictionary representation
    suitable for JSON serialization or caching. All fields are included
    in the dictionary.

    Returns:
        Dictionary containing all finding attributes. Keys match the
        dataclass field names: "ghost_type", "column", "severity",
        "description", "row_indices", and "metadata".

    Example:
        Serialize for JSON export::

            import json
            finding = GhostFinding(...)
            data = finding.to_dict()
            json_str = json.dumps(data)
    """
    return {
        "ghost_type": self.ghost_type,
        "column": self.column,
        "severity": self.severity,
        "description": self.description,
        "row_indices": self.row_indices,
        "metadata": self.metadata,
    }

Models¶

GhostFinding¶

GhostFinding dataclass ¶

Functions¶

__post_init__ ¶

from_dict classmethod ¶

to_dict ¶

GhostFinding `dataclass` ¶

from_dict `classmethod` ¶