Automated Business Document Classification System with DiT Model

Description:

This code implements a business document classification system using a Vision Transformer-based model (DiT) fine-tuned for document types. The pipeline leverages Hugging Face’s AutoFeatureExtractor for image preprocessing and a pre-trained AutoModelForImageClassification to identify document categories (e.g., emails, forms). Input images are transformed into tensor representations, processed through the transformer model to generate logits, and mapped to human-readable labels via class indices. GPU acceleration optimizes inference speed. A Gradio interface provides real-time interaction, allowing users to upload document images and receive JSON-formatted predictions, with built-in examples demonstrating classification across common business document types.

Author: Renee Vera

Input

Upload a document image (JPG/PNG) via the Gradio interface or use provided examples.

Processing

Image preprocessing with DiT-specific feature extraction. GPU-accelerated inference using the transformer model. Class probability mapping to predefined document categories.

Output

JSON result showing the predicted document type (e.g., “email”, “form”).

Automated Business Document Classification System with DiT Model

This code classifies business documents (e.g., emails, forms) using a Vision Transformer (DiT) model from Hugging Face, preprocessing images into tensors for GPU-accelerated inference. A Gradio interface enables real-time predictions with JSON outputs and example-based validation.

Description:

Author: Renee Vera

Input

Processing

Output

Code:

Demo

Contact