TrueExtract

Extract structured data from receipts quickly and accurately.


Overview

To have a receipt image transcripted, just POST to the receipt transcription endpoint.

Send the image of the receipt as a base64-encoded data URL.

POST https://trueextract.pages.dev/api/beta/transcriptions { "receiptImage": "" }

There are 3 types of responses.

1. Successful transcription response. These always contain a transcription object with the transcribed receipt. This is guaranteed to have a structure that contains date, time, vendorName, paymentMethod, totalInclusive, lineItems, and rawText. It may also contain contain vendorLocation, vendorLogo, receiptId as optional fields, which are extracted if they exist.

{ "transcription": { "vendorName": "Desa's Cash and Carry", "date": "04/04/2024", "time": "14:30", "paymentMethod": "CASH", "totalInclusive": 50, "lineItems": [ { "unitPrice": 49.95, "description": "UTD POTATOES", "quantity": 1, "totalPrice": 49.95 } ], "rawText": "TAX INVOICE\nVAT.REG.NO 4220134953\n\nCASH SALE\n04/04/2024 14:30\nRef:48542\nACC: 10\n\n239864 UTD POTATOES\nN 1 49.95 **\nQty : 1\nTOTAL R50.00\n\nCASH R100.00\nCHANGE R50.00\n\nCashier: TILL07\nTrollies : 1\n\nWE ARE ONLINE-VISIT US ON:\nwww.desaiscnc.co.za" }, "isRejected": false }

2. Rejected response. All receipts are screened for validity before being transcribed. Images that are not fully readable, are handwritten, are not receipts, or appear fraudulent are rejected.

{ "transcription": null, "isRejected": true, "rejectionReason": "NOT_RECEIPT" }

If there are specific fields related to the rejection, they will be listed in issues:

{ "transcription": null, "isRejected": true, "rejectionReason": "UNREADABLE_FIELDS", "issues": [ "time", "vendorName" ] }

3. Error response. If there was an error processing your request, you will get an error response.

{ "error": true, "message": "400 {\"type\":\"error\",\"error\":{\"type\":\"invalid_request_error\",\"message\":\"messages.0.content.1.image.source.base64: invalid base64 data\"}}", "seeMore": "https://trueextract.pages.dev/docs/e/generic-server-error" }

Base64 encoding

To get a base64 encoded image on macOS or Linux, use the following command:

$
cat myimage.jpeg | base64 > myimage-base64.txt

You can then send this like so:

$
curl -X POST https://trueextract.pages.dev/api/beta/transcriptions \ -H "Content-Type: application/json" \ -d '{"receiptImage": "data:image/jpeg;base64,'$(cat myimage-base64.txt)'"}'

Combine these like so:

$
curl -X POST https://trueextract.pages.dev/api/beta/transcriptions \ -H "Content-Type: application/json" \ -d '{"receiptImage": "data:image/jpeg;base64,'$(cat myimage.jpeg | base64)'"}'

Limitations

The allowed image types are: image/jpeg, image/png, image/gif, image/webp.

Image sizes of around 1080x1920 are recommended. The maximum allowed image file size is 1.5MB.

Authentication

To send requests at production scale, you must authenticate your requests.

This is done with HTTP Basic Authentication.

Speak to your TrueExtract representative to obtain your HTTP Basic Authentication credentials.

You can use this handy online Basic Auth generator to generate the required header from your credentials.

Here is an example of a cURL command. Replace xxxxbasicauthxxxxx with your actual basic auth and xxxxxxbase64ofimagexxxx with your actual base64 encoded image.

curl -X POST -H 'Authorization: Basic xxxxbasicauthxxxxx' \ -H "Content-type: application/json" -d \ '{ "receiptImage": "" }' \ 'https://trueextract.pages.dev/api/beta/transcriptions'