File Upload Component¶

Version

This feature was introduced in v0.6.0.

Enhanced file upload UI component with drag-and-drop support, animated progress indicators, and automatic encoding detection.

render_file_upload ¶

render_file_upload(
    st, accepted_types=None, help_text=None, show_file_info=True
)

Render file upload UI with dropzone and animations.

Provides a polished file upload experience with: - Drag-and-drop dropzone interface - Animated progress indicators (minimum ~750ms) - Automatic encoding detection - File validation and error handling

Parameters:

Name	Type	Description	Default
`st`	`object`	Streamlit module object	required
`accepted_types`	`list[str] \| None`	List of accepted file extensions (e.g., [".csv"]) Defaults to [".csv"]	`None`
`help_text`	`str \| None`	Optional help text to display	`None`
`show_file_info`	`bool`	Whether to display file info after upload (default: True)	`True`

Returns:

Type	Description
`object \| None`	Tuple of (uploaded_file, dataframe, encoding_used)
`object \| None`	uploaded_file: The uploaded file object or None
`str \| None`	dataframe: Pandas DataFrame if file was successfully read, None otherwise
`tuple[object \| None, object \| None, str \| None]`	encoding_used: The encoding that worked, or None if file read failed

Source code in lavendertown/ui/upload.py

def render_file_upload(
    st: object,
    accepted_types: list[str] | None = None,
    help_text: str | None = None,
    show_file_info: bool = True,
) -> tuple[object | None, object | None, str | None]:
    """Render file upload UI with dropzone and animations.

    Provides a polished file upload experience with:
    - Drag-and-drop dropzone interface
    - Animated progress indicators (minimum ~750ms)
    - Automatic encoding detection
    - File validation and error handling

    Args:
        st: Streamlit module object
        accepted_types: List of accepted file extensions (e.g., [".csv"])
            Defaults to [".csv"]
        help_text: Optional help text to display
        show_file_info: Whether to display file info after upload (default: True)

    Returns:
        Tuple of (uploaded_file, dataframe, encoding_used)
        - uploaded_file: The uploaded file object or None
        - dataframe: Pandas DataFrame if file was successfully read, None otherwise
        - encoding_used: The encoding that worked, or None if file read failed
    """
    if accepted_types is None:
        accepted_types = [".csv"]

    if help_text is None:
        help_text = "Upload a CSV file to analyze for data quality issues"

    # Convert accepted_types to format expected by file_uploader (e.g., ".csv" -> "csv")
    accept_list = [ext.lstrip(".") for ext in accepted_types]

    # Add custom CSS for enhanced dropzone appearance
    # Streamlit's native file_uploader supports drag-and-drop in recent versions
    st.markdown(
        """
    <style>
    /* Enhanced styling for file uploader to look like a dropzone */
    .uploadedFile {
        border: 2px dashed #9e9e9e !important;
        border-radius: 10px !important;
        padding: 20px !important;
        text-align: center !important;
        background: #f5f5f5 !important;
        transition: all 0.3s ease !important;
    }
    .uploadedFile:hover {
        border-color: #6366f1 !important;
        background: #f0f0ff !important;
    }
    /* Style the file uploader container */
    div[data-testid="stFileUploader"] {
        border: 2px dashed #9e9e9e;
        border-radius: 10px;
        padding: 20px;
        background: #fafafa;
        transition: all 0.3s ease;
    }
    div[data-testid="stFileUploader"]:hover {
        border-color: #6366f1;
        background: #f0f0ff;
    }
    </style>
    """,
        unsafe_allow_html=True,
    )

    # Use Streamlit's native file_uploader with enhanced styling
    # It supports drag-and-drop in recent versions
    uploaded_file = st.file_uploader(
        "📤 Drag and drop a CSV file here, or click to browse",
        type=accept_list,
        help=help_text,
    )

    if uploaded_file is None:
        return None, None, None

    # Create upload animation container
    upload_container = st.container()

    with upload_container:
        # Show upload progress animation
        upload_status = st.status("📤 Uploading file...", expanded=True)

        with upload_status:
            # Simulate upload progress (even if instant)
            progress_bar = st.progress(0)
            status_text = st.empty()

            # Calculate file size
            file_size = len(uploaded_file.getvalue())
            file_size_mb = file_size / (1024 * 1024)

            # Multi-stage progress animation with minimum duration
            stages = [
                ("Reading file...", 0.2),
                ("Validating format...", 0.4),
                ("Processing data...", 0.6),
                ("Preparing analysis...", 0.8),
                ("Ready!", 1.0),
            ]

            for step_text, progress in stages:
                status_text.text(step_text)
                progress_bar.progress(progress)
                # Minimum delay to show animation (150ms per step = ~750ms total)
                time.sleep(0.15)

            status_text.text(
                f"✅ File uploaded: {uploaded_file.name} ({file_size_mb:.2f} MB)"
            )

        # Close status after animation
        upload_status.update(state="complete", expanded=False)

    # Display file info if requested
    if show_file_info:
        col1, col2 = st.columns(2)
        with col1:
            st.info(f"📄 **File:** `{uploaded_file.name}`")
        with col2:
            st.info(f"📊 **Size:** {file_size:,} bytes ({file_size_mb:.2f} MB)")

        # File size warning for very large files
        if file_size > 10_000_000:  # 10 MB
            st.warning(
                "⚠️ Large file detected. Processing may take longer. "
                "Consider sampling for faster analysis."
            )

    # Read CSV with encoding detection and animation
    with st.status("📖 Reading CSV file...", expanded=True) as read_status:
        with read_status:
            read_progress = st.progress(0)
            read_status_text = st.empty()

            read_status_text.text("Detecting encoding...")
            read_progress.progress(0.3)
            time.sleep(0.1)  # Small delay for visual feedback

            # Attempt multiple encodings
            encodings = ["utf-8", "latin-1", "iso-8859-1", "cp1252"]
            df = None
            encoding_used = None

            read_status_text.text("Trying encodings...")
            read_progress.progress(0.5)

            for i, encoding in enumerate(encodings):
                try:
                    uploaded_file.seek(0)  # Reset file pointer
                    df = pd.read_csv(uploaded_file, encoding=encoding)
                    encoding_used = encoding
                    read_progress.progress(0.7 + (i * 0.1))
                    break
                except (UnicodeDecodeError, pd.errors.ParserError):  # type: ignore[union-attr]
                    continue

            if df is None:
                read_status_text.text("❌ Could not read file")
                read_progress.progress(1.0)
                st.error(
                    "❌ Could not read CSV file. Please check the file format and encoding."
                )
                # Return the uploaded file but no dataframe
                return uploaded_file, None, None

            read_status_text.text(f"✅ Successfully read with {encoding_used} encoding")
            read_progress.progress(1.0)
            time.sleep(0.2)  # Brief pause to show success

    return uploaded_file, df, encoding_used

Overview¶

The render_file_upload() function provides a polished file upload experience for Streamlit applications. It includes:

Drag-and-drop interface: Enhanced styling for intuitive file uploads
Animated progress indicators: Multi-stage progress animations (minimum ~750ms) for visual feedback
Automatic encoding detection: Tries multiple encodings (UTF-8, Latin-1, ISO-8859-1, CP1252) automatically
File validation: Validates file format and provides clear error messages
File size display: Shows file information and warnings for large files

Basic Usage¶

import streamlit as st
from lavendertown.ui.upload import render_file_upload

# Basic usage with defaults
uploaded_file, df, encoding_used = render_file_upload(st)

if uploaded_file is not None:
    if df is not None:
        st.write(f"File loaded successfully with {encoding_used} encoding")
        st.dataframe(df)
    else:
        st.error("Could not read the uploaded file")

Customization¶

Custom Accepted File Types¶

uploaded_file, df, encoding_used = render_file_upload(
    st,
    accepted_types=[".csv", ".txt", ".tsv"]
)

Custom Help Text¶

uploaded_file, df, encoding_used = render_file_upload(
    st,
    help_text="Upload your data file here. Supported formats: CSV, TSV"
)

Hide File Info Display¶

uploaded_file, df, encoding_used = render_file_upload(
    st,
    show_file_info=False
)

Return Values¶

The function returns a tuple of three values:

uploaded_file: The uploaded file object (or None if no file uploaded)
dataframe: Pandas DataFrame if file was successfully read (or None if read failed)
encoding_used: The encoding that successfully decoded the file (or None if read failed)

Encoding Detection¶

The component automatically tries multiple encodings in order:

UTF-8 (most common)
Latin-1 (ISO-8859-1)
ISO-8859-1
CP1252 (Windows-1252)

The first encoding that successfully decodes the file is used. If all encodings fail, the function returns None for the dataframe and shows an error message.

Progress Animation¶

The component includes a multi-stage progress animation:

Upload Stage (0-20%): "Reading file..."
Validation Stage (20-40%): "Validating format..."
Processing Stage (40-60%): "Processing data..."
Encoding Detection (60-80%): "Preparing analysis..."
Ready Stage (80-100%): "Ready!"

Each stage has a minimum delay to ensure users see feedback even for very fast operations.

File Size Warnings¶

For files larger than 10MB, the component automatically displays a warning suggesting data sampling for faster analysis.

Error Handling¶

The component handles various error cases gracefully:

Empty files: Shows clear error message
Invalid CSV format: Displays parsing error
Encoding failures: Tries multiple encodings before failing
File read errors: Returns file object but None for dataframe

Example: Complete Upload Workflow¶

import streamlit as st
from lavendertown import Inspector
from lavendertown.ui.upload import render_file_upload

st.title("Data Quality Inspector")

# Upload file with enhanced UI
uploaded_file, df, encoding_used = render_file_upload(
    st,
    accepted_types=[".csv"],
    help_text="Upload a CSV file to analyze for data quality issues",
    show_file_info=True
)

if uploaded_file is not None:
    if df is None:
        st.error("Could not read the CSV file. Please check the file format.")
        st.stop()

    # Show success message
    st.success(f"✅ File loaded successfully (encoding: {encoding_used})")

    # Display dataset preview
    st.header("Dataset Preview")
    st.dataframe(df.head(10))

    # Run inspection
    inspector = Inspector(df)
    inspector.render()
else:
    st.info("👆 Please upload a CSV file to get started.")

Integration with Inspector¶

The upload component is designed to work seamlessly with LavenderTown's Inspector:

from lavendertown import Inspector
from lavendertown.ui.upload import render_file_upload

uploaded_file, df, encoding_used = render_file_upload(st)

if df is not None:
    inspector = Inspector(df)
    inspector.render()  # Full data quality analysis

Styling¶

The component includes enhanced CSS styling for a modern dropzone appearance:

Dashed border with hover effects
Smooth transitions
Visual feedback on file selection
Consistent with Streamlit's design system

Performance Considerations¶

Small files (<1MB): Near-instantaneous processing
Medium files (1-10MB): Fast processing with progress feedback
Large files (>10MB): Shows warning and suggests sampling

For very large files, consider implementing data sampling before analysis.