Deployment Guide¶

This guide covers deploying LavenderTown applications to various platforms, including Streamlit Cloud, Docker, and traditional hosting environments.

Streamlit Cloud Deployment¶

Streamlit Cloud is the easiest way to deploy LavenderTown applications with minimal configuration.

Prerequisites¶

A GitHub account
Your LavenderTown app code pushed to a GitHub repository
A Streamlit Cloud account (free tier available at streamlit.io/cloud)

Quick Deployment¶

Go to share.streamlit.io
Click "New app"
Connect your GitHub account if not already connected
Select your repository and branch
Set the main file path (e.g., app.py or examples/app.py)
Click "Deploy"

Minimal App Example¶

Create a deployment-ready app:

# app.py
import pandas as pd
import streamlit as st
from lavendertown import Inspector

st.set_page_config(
    page_title="LavenderTown - Data Quality Inspector",
    page_icon="👻",
    layout="wide"
)

st.title("👻 Data Quality Inspector")

uploaded_file = st.file_uploader("Upload CSV file", type=["csv"])

if uploaded_file is not None:
    df = pd.read_csv(uploaded_file)
    inspector = Inspector(df)
    inspector.render()

Dependencies¶

Streamlit Cloud can automatically detect dependencies from pyproject.toml. Alternatively, create requirements.txt:

lavendertown>=0.7.0
pandas>=1.5.0
streamlit>=1.28.0
altair>=4.2.1

For optional features:

# Polars support
polars>=0.19.0

# ML features
scikit-learn>=1.0.0

# Time-series features
statsmodels>=0.14.0

# Ecosystem integrations
pandera>=0.18.0
great-expectations>=0.18.0

Configuration¶

Theme Customization¶

Create .streamlit/config.toml:

[theme]
primaryColor = "#9D4EDD"
backgroundColor = "#FFFFFF"
secondaryBackgroundColor = "#F0F0F0"
textColor = "#262730"
font = "sans serif"

Environment Variables and Secrets¶

For sensitive data, use Streamlit Cloud's secrets management (Settings → Secrets):

[database]
url = "postgresql://user:pass@host/db"

[api]
key = "your-api-key"

Access in your app:

import streamlit as st

db_url = st.secrets["database"]["url"]
api_key = st.secrets["api"]["key"]

Performance Optimization for Cloud¶

For large datasets on Streamlit Cloud:

import streamlit as st
from lavendertown import Inspector
import pandas as pd

@st.cache_data
def load_and_analyze(file):
    """Cache analysis results."""
    df = pd.read_csv(file)
    inspector = Inspector(df)
    findings = inspector.detect()
    return df, findings

uploaded_file = st.file_uploader("Upload CSV file", type=["csv"])

if uploaded_file is not None:
    df, findings = load_and_analyze(uploaded_file)

    # Display results
    st.write(f"Found {len(findings)} data quality issues")
    inspector = Inspector(df)
    inspector.render()

Docker Deployment¶

Deploy LavenderTown applications using Docker for more control and portability.

Dockerfile¶

Create a Dockerfile:

FROM python:3.11-slim

WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y \
    build-essential \
    && rm -rf /var/lib/apt/lists/*

# Copy requirements
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application
COPY app.py .
COPY .streamlit .streamlit

# Expose port
EXPOSE 8501

# Health check
HEALTHCHECK CMD curl --fail http://localhost:8501/_stcore/health

# Run Streamlit
ENTRYPOINT ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]

Docker Compose¶

For easier deployment with docker-compose.yml:

version: '3.8'

services:
  lavendertown:
    build: .
    ports:
      - "8501:8501"
    volumes:
      - ./data:/app/data
    environment:
      - STREAMLIT_SERVER_PORT=8501
      - STREAMLIT_SERVER_ADDRESS=0.0.0.0
    restart: unless-stopped

Build and run:

docker-compose up -d

Multi-stage Dockerfile (Production)¶

For optimized production builds:

# Build stage
FROM python:3.11-slim as builder

WORKDIR /app

COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt

# Runtime stage
FROM python:3.11-slim

WORKDIR /app

# Copy installed packages
COPY --from=builder /root/.local /root/.local

# Copy application
COPY app.py .
COPY .streamlit .streamlit

# Make sure scripts in .local are usable
ENV PATH=/root/.local/bin:$PATH

EXPOSE 8501

ENTRYPOINT ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]

Traditional Server Deployment¶

Deploy LavenderTown on your own server or VPS.

Using systemd¶

Create a systemd service /etc/systemd/system/lavendertown.service:

[Unit]
Description=LavenderTown Data Quality Inspector
After=network.target

[Service]
Type=simple
User=lavendertown
WorkingDirectory=/opt/lavendertown
Environment="PATH=/opt/lavendertown/venv/bin"
ExecStart=/opt/lavendertown/venv/bin/streamlit run app.py --server.port=8501 --server.address=0.0.0.0
Restart=always

[Install]
WantedBy=multi-user.target

Enable and start:

sudo systemctl enable lavendertown
sudo systemctl start lavendertown

Using Nginx Reverse Proxy¶

Configure Nginx as a reverse proxy:

server {
    listen 80;
    server_name your-domain.com;

    location / {
        proxy_pass http://localhost:8501;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_read_timeout 86400;
    }
}

Using Gunicorn (Advanced)¶

For production deployments, use Gunicorn with multiple workers:

# wsgi.py
from streamlit.web.server import Server
from streamlit.runtime.scriptrunner.magic_funcs import draw_all
import sys

def application(environ, start_response):
    # Set up Streamlit
    sys.argv = ["streamlit", "run", "app.py"]

    # Create server
    server = Server("app.py", False)

    # Handle request
    # (Implementation details depend on your setup)
    pass

Cloud Platform Deployments¶

AWS (Elastic Beanstalk or EC2)¶

Create requirements.txt with dependencies
Deploy using Elastic Beanstalk or configure EC2 instance
Use Application Load Balancer for HTTPS
Configure security groups to allow port 8501

Google Cloud Platform (App Engine or Cloud Run)¶

For Cloud Run, create app.yaml:

runtime: python311

entrypoint: streamlit run app.py --server.port=$PORT --server.address=0.0.0.0

env_variables:
  STREAMLIT_SERVER_PORT: 8080

Deploy:

gcloud run deploy lavendertown --source .

Azure (App Service)¶

Create requirements.txt
Deploy via Azure CLI or portal
Configure startup command: streamlit run app.py --server.port=8000 --server.address=0.0.0.0

Environment-Specific Configuration¶

Development¶

# config.py
import os

ENV = os.getenv("ENV", "development")

if ENV == "development":
    DEBUG = True
    LOG_LEVEL = "DEBUG"
elif ENV == "production":
    DEBUG = False
    LOG_LEVEL = "WARNING"

Production Considerations¶

Security:
Use HTTPS (configure reverse proxy)
Set STREAMLIT_SERVER_HEADLESS=true
Configure CORS if needed
Implement authentication (Streamlit Authenticator or similar)
Performance:
Enable caching with @st.cache_data
Use Polars for large datasets
Implement data sampling for very large files
Configure appropriate timeout values
Monitoring:
Set up logging
Monitor resource usage
Track error rates
Set up alerts for failures

Example Production App¶

import streamlit as st
import pandas as pd
from lavendertown import Inspector
import os

# Configuration
MAX_FILE_SIZE = 100 * 1024 * 1024  # 100MB
ENABLE_CACHING = os.getenv("ENABLE_CACHING", "true").lower() == "true"

st.set_page_config(
    page_title="Data Quality Inspector",
    layout="wide",
    initial_sidebar_state="expanded"
)

@st.cache_data(show_spinner="Analyzing data quality...")
def analyze_data(df: pd.DataFrame):
    """Cache analysis results."""
    inspector = Inspector(df)
    findings = inspector.detect()
    return findings

st.title("Data Quality Inspector")

uploaded_file = st.file_uploader(
    "Upload CSV file",
    type=["csv"],
    help=f"Maximum file size: {MAX_FILE_SIZE / 1024 / 1024}MB"
)

if uploaded_file is not None:
    # Check file size
    file_size = len(uploaded_file.getvalue())
    if file_size > MAX_FILE_SIZE:
        st.error(f"File too large. Maximum size: {MAX_FILE_SIZE / 1024 / 1024}MB")
    else:
        try:
            df = pd.read_csv(uploaded_file)

            if ENABLE_CACHING:
                findings = analyze_data(df)
            else:
                inspector = Inspector(df)
                findings = inspector.detect()

            inspector = Inspector(df)
            inspector.render()

        except Exception as e:
            st.error(f"Error processing file: {str(e)}")
            st.exception(e)

Troubleshooting¶

Common Issues¶

Import Errors: Ensure all dependencies are in requirements.txt
Port Conflicts: Change port with --server.port=8502
Memory Issues: Use Polars backend, implement data sampling
Timeout Errors: Increase timeout in reverse proxy configuration

Logs and Debugging¶

View logs:

# Streamlit Cloud
# Check dashboard logs

# Docker
docker logs <container_id>

# systemd
journalctl -u lavendertown -f

Health Checks¶

Implement health check endpoint:

import streamlit as st

if st.query_params.get("health") == "check":
    st.write("OK")
    st.stop()

Security Best Practices¶

Never commit secrets: Use environment variables or secrets management
Validate inputs: Sanitize file uploads
Rate limiting: Implement rate limiting for public apps
Authentication: Add authentication for sensitive data
HTTPS: Always use HTTPS in production
CORS: Configure CORS appropriately
File size limits: Enforce reasonable file size limits

Scaling Considerations¶

Horizontal scaling: Use load balancer with multiple instances
Caching: Implement Redis or similar for shared cache
Database: Store results in database for persistence
Queue system: Use message queue for batch processing
CDN: Use CDN for static assets