Featured image of post Automated Business Document Classification System with DiT Model

Automated Business Document Classification System with DiT Model

This code classifies business documents (e.g., emails, forms) using a Vision Transformer (DiT) model from Hugging Face, preprocessing images into tensors for GPU-accelerated inference. A Gradio interface enables real-time predictions with JSON outputs and example-based validation.

Description:

This code implements a business document classification system using a Vision Transformer-based model (DiT) fine-tuned for document types. The pipeline leverages Hugging Face’s AutoFeatureExtractor for image preprocessing and a pre-trained AutoModelForImageClassification to identify document categories (e.g., emails, forms). Input images are transformed into tensor representations, processed through the transformer model to generate logits, and mapped to human-readable labels via class indices. GPU acceleration optimizes inference speed. A Gradio interface provides real-time interaction, allowing users to upload document images and receive JSON-formatted predictions, with built-in examples demonstrating classification across common business document types.

Author: Renee Vera

Demo

Input

Upload a document image (JPG/PNG) via the Gradio interface or use provided examples.

Processing

Image preprocessing with DiT-specific feature extraction. GPU-accelerated inference using the transformer model. Class probability mapping to predefined document categories.

Output

JSON result showing the predicted document type (e.g., “email”, “form”).

Code:

Demo

Demo

Contact

LinkedIn Email

Licensed under CC BY-NC-SA 4.0
Last updated on Aug 25, 2023 00:00 UTC
comments powered by Disqus
Built with Hugo
Theme Stack designed by Jimmy