Extract Order Details From Messy Logistic Documents With Waveline

Waveline
5 min readAug 1, 2023

--

Logistics and Supply chain documents like Packing Slips, Supplier Quotes, and Data Sheets can be very ugly and make it difficult to process automatically. We show how Waveline AI can help.

In this blog, we showcase some real-world examples of challenging documents to extract relevant data automatically. While traditional methods often fail in such cases, Waveline AI can now handle these documents thanks to state-of-the-art AI like Vision Transformers and Large Language Models.

Here we want to showcase some example use cases:

Example 1

Super messy document. While conventional Intelligent Document Processing (IDP) often requires training a new AI model to handle this specific document. With Waveline, there is no need for that; we can start immediately. As described within the documentation, we can describe what we want to extract with a shape. So let’s create this shape to extract most details of the different parts we ordered, shipping date, and order date:

[
{
"name": "parts",
"type": "object",
"description": "List of the different parts we ordered",
"isArray": true,
"elements": [
{
"name": "Manufacturer Part Number",
"type": "string",
},
{
"name": "Mouser Part Number",
"type": "string",
},
{
"name": "Order Quantity",
"type": "number",
},
{
"name": "Date Code",
"type": "string",
},
{
"name": "Lot Code",
"type": "string",
}
]
},
{
"name": "Ship Date",
"type": "string",
"description": "When we this order was shipped",
"isArray": false
},
{
"name": "Order Date",
"type": "string",
}
]

If a property is hard to get and needs an explanation, we can write this into the description. Otherwise, our system will infer what you meant only by the name you provided. For example, we provide a description of what we mean by ship date.

We now call the API and get the following result:

"Ship Date": "May 19 2023"
"Order Date": "May 19 2023"
"parts": [
{
"Lot Code": "388003",
"Date Code": "2206",
"Order Quantity": 12,
"Mouser Part Number": "654-AT60-202-16141",
"Manufacturer Part Number": "AT60-202-16141"
},
{
"Lot Code": null,
"Date Code": "22336",
"Order Quantity": 10,
"Mouser Part Number": "571-0413-217-1605",
"Manufacturer Part Number": "0413-217-1605"
},
{
"Lot Code": "MIXED",
"Date Code": "2245",
"Order Quantity": 1,
"Mouser Part Number": "571-HDP26-24-23SE",
"Manufacturer Part Number": "HDP26-24-23SE"
}
]

Even the Lot Code property, which is once 2204, once not provided, and once mixed, got correctly recognized.

Example 2

For this example, let's extract all the information of each item of the table. We again specify a similar shape as in example 1 with additional properties.

"parts": [
{
"COO": "TAIWAN",
"HTS": "8504.40.9580",
"MFG": "XP Power / JCK2024S05",
"Cust": null,
"DESC": "DC DC CONVERTER 5V 20W",
"ECCN": "EAR99",
"Part": "1470-2001-5-ND",
"Quantity": 285,
"Pack Type": "TUBE",
"Unit Price": 36.0363,
"Total Price": 6666.72,
"Minimum Release": 100,
"Package Quantity": 10,
"Current Stock Status": "AVAIL 0/4 AVAIL 49 WKS, BAL 78 WKS ARO",
"Pricing Valid Through": "10-Aug-2023",
"Standart MFG Lead Time": "EST WKS 78"
},
{
"COO": "UNITED KINGDOM",
"HTS": "8536.69.4040",
"MFG": "Harwin Inc. / M80-5111042",
"Cust": null,
"DESC": "CONN HEADER VERT 10POS 2MM",
"ECCN": "EAR99",
"Part": "952-2426-5-ND",
"Quantity": 573,
"Pack Type": "TUBE",
"Unit Price": 10.13726,
"Total Price": 5808.65,
"Minimum Release": 500,
"Package Quantity": 13,
"Current Stock Status": "423 AVAIL/BAL EST WKS 19 ARO",
"Pricing Valid Through": "10-Aug-2023",
"Standart MFG Lead Time": "EST WKS 19"
},
{
"COO": "UNITED KINGDOM",
"HTS": "8536.69.4040",
"MFG": "Harwin Inc. / M80-5100405",
"Cust": null,
"DESC": "CONN HEADER VERT 4P0S 2MM",
"ECCN": "EAR99",
"Part": "M80-5100405-ND",
"Quantity": 573,
"Pack Type": "TUBE",
"Unit Price": 9.1092,
"Total Price": 5219.57,
"Minimum Release": 500,
"Package Quantity": 18,
"Current Stock Status": "AVAIL 0/BAL 2 WKS ARO",
"Pricing Valid Through": "10-Aug-2023",
"Standart MFG Lead Time": "EST WKS 17"
},
{
"COO": "CHINA",
"HTS": "8504.31.2000",
"MFG": "Pulse Electronics / HX1294NLT",
"Cust": null,
"DESC": "MODULE XERMR LAN 10/100 SMD",
"ECCN": "EAR99",
"Part": "553-3792-2-ND",
"Quantity": 1050,
"Pack Type": "TR",
"Unit Price": 4.60681,
"Total Price": 4837.15,
"Minimum Release": 1050,
"Package Quantity": 350,
"Current Stock Status": "IN STOCK",
"Pricing Valid Through": "10-Aug-2023",
"Standart MFG Lead Time": "EST WKS 10"
}
]

Note that we don’t need to specify the exact name of each property. For example, if one manufacturer writes “Total Price” for the total price to pay, while this manufacturer writes “Extended Price”, Waveline still manages to extract this property. We can do this by explaining this within our description of the shape:

{
"name": "Total Price",
"type": "number",
"description": "Total price to pay. Sometimes written as \
Extended Price, Combined Price or Total",
"isArray": false
}

Example 3

For this packaging slip, we want to extract the following:

  • Part Number
  • Date
  • Lot Code
  • Quantity

Again it’s a really hard document where we, even as humans, need to think twice about what and where the part number is. Also, the Waveline AI model needs to understand the spatial information of this document, for example, that the lot code is below the date within the third column. But as we can see, even such documents can be handled:

[
{
"Date": "05/25/23",
"Lot Code": "234330901",
"Quantity": 1,
"Part Number": "SOU D38999/26FJ37PA",
},
{
"Date": "05/25/23",
"Lot Code": "234331001",
"Quantity": 2,
"Part Number": "APH D38999/24FJ19PN",
},
{
"Date": "05/25/23",
"Lot Code": "234331101",
"Quantity": 2,
"Part Number": "SOU D38999/26FJ37PN",
},
{
"Date": "05/25/23",
"Lot Code": "234331201",
"Quantity": 1,
"Part Number": "SOU D38999/24FJ37SA",
},
{
"Date": "05/25/23",
"Lot Code": "234425701",
"Quantity": 5,
"Part Number": "APH D38999/26FD35PN",
}
]

Conclusion

Logistic and supply chain documents can be super messy. Traditional methods need specific training and sometimes still fail. This often comes from a lack of reasoning capabilities and spatial awareness. However, thanks to vision transformers and LLMs, Waveline can now solve these tasks for any kind of document from the get-go.

--

--