Automating product documentation processing using AWS AI solutions

This case study describes the implementation of a serverless AWS solution for automated processing of technical documents (data sheets, catalogs, certifications) for a leading Slovak distributor of electrical installation materials and industrial automation. The solution enabled a dramatic acceleration in the processing of products into the B2B catalog, improved data quality, and reduced operating costs.

07. jul 2025 ┃ 6 minút čítania

Challenge

In the electrical installation and industrial components distribution segment, tens of thousands of product sheets and technical documents from various manufacturers are processed every year. Most companies still rely on manual data transcription, which is slow, inefficient, and prone to errors. Market dynamics, frequent changes in product parameters, and pressure to quickly list items on B2B portals create pressure for automation and data accuracy.

Volume and heterogeneity of documents: hundreds of suppliers and multiple formats (PDF, DOCX, XLS) in different languages.
Manual data extraction: 15–20 minutes per document, high risk of errors when manually transcribing technical parameters.
Delayed product launches: longer time-to-market for new items on the B2B portal.

Project objectives

Speed up document processing by at least 80%.
Reduce manual work for the data team by 50%.
Achieve an accuracy of extracted data ≥ 98%.
Implement a scalable, pay-per-use architecture integrated with ERP and B2B systems.

Solution

The proposed solution was built on a serverless AWS architecture with a focus on AI/ML and NLP for processing unstructured documents. This approach was chosen as the most optimal in terms of combining flexibility, reliability, and operational efficiency. The serverless model allows for rapid scaling based on document volume without the need for infrastructure management, while AWS's native AI services provide proven accuracy and the ability to quickly integrate with existing ERP and B2B systems.

Key components:

Amazon S3 – input storage for uploaded documents.
Amazon Textract – automatic extraction of text, tables, and key-value pairs.
Amazon Comprehend + custom NLP (SageMaker) – identification and classification of technical parameters (voltage, power, dimensions, standards, IP protection).
AWS Lambda – workflow orchestration and transformation into structured JSON.
Amazon DynamoDB – storage of extracted data + metadata.
API Gateway – integration with ERP systems and B2B portal.
Amazon CloudWatch – monitoring KPIs, latency, errors, and model quality.

Process:

Supplier file uploaded to S3 (batch or event-driven).
Textract extracts text and tables; the result is passed to a Lambda function.
The SageMaker NLP model and Comprehend identify parameters and map them to internal fields.
Lambda completes validation, creates JSON, and saves the record to DynamoDB.
API Gateway enables export to ERP and publication in the B2B catalog; manual validations are supported by audit logs.

Implementation

The project phase (3-6 months):

Analysis and PoC (4 weeks): audit of input documents, definition of parameter taxonomy, PoC on a sample of 500 documents.
Development and training (6–10 weeks): deployment of Textract, training of a custom NLP model in SageMaker, development of Lambdas and integrations.
Integration and testing (4–6 weeks): connection to ERP, end-to-end testing, security reviews.
Deployment and tuning (2-4 weeks): monitoring the feedback of the users, the calibration models.

Key decisions:

Choosing a serverless AWS solution for scalability and pay-as-you-go pricing.
Hybrid approach: automatic processing + human validation in exceptional cases.

Results and benefits

The initial evaluation confirmed that the deployed solution had a measurable impact not only on operational efficiency, but also on data quality and speed to market. The combination of AI and a serverless approach proved to be key in handling high volumes of documents, with the system adapting to real-world needs and delivering reliable, scalable results even under increasing load.

KPI 1 – Processing speed

Baseline: 15-20 minutes/document.
Result: average 1:45 min/document.
Impact: ~90% acceleration; significant acceleration of product uploads to the catalog.

KPI 2 – Data accuracy

Baseline: 93-95 % (manual).
The result: 98,6 % after validation.
Impact: fewer complaints; higher quality full-text search.

KPI 3 – Cost efficiency

Result: ~40% savings in data team costs.
Impact: lower OPEX, possibility to allocate resources to catalog development and UX.

Operational benefits:

Scalable processing of thousands of documents in parallel.
Better versioning and data auditing (DynamoDB + CloudWatch logs).
Support for multiple languages (SK/EN/DE) thanks to Comprehend + custom NLP.

"Automated document processing freed us from routine tasks and allowed the team to focus on catalog quality. Faster product uploads directly supported our business goals."

Conclusion

The project confirmed that automation based on AI and serverless architecture is an effective way to speed up and refine the processing of product documentation in technically demanding segments. The solution enabled the client to transform a time-consuming manual process into an agile and precisely measurable system that can be scaled as needed. The proven combination of AWS services has created a solid foundation for further expansion of automation and intelligent data management across the organization.

Subscribe for newsletter