Extracting data from bills that are scans

Extracting Data from Bills that are Scans

Firstly, we should note that bills and receipts are different things, and Lightyear treats them differently. For details on how to process receipts (and the data that Lightyear extracts from receipts), please refer to this article.

In regard to bills, Lightyear employs a number of methods of data extraction, depending on the file-type/format that has been emailed/uploaded to Lightyear. 

The vast majority of bills being emailed/uploaded to Lightyear are system-generated pdfs, but a very small percentage are image files, and generally have suffix's such as jpg, jpeg, tiff and png. In addition, a very small number of pdf files are actually image files, wrapped up as a pdf. And furthermore, a very, very small percentage of system-generated pdfs have scrambled metadata.

For these jpegs, tiff, png, image-based pdfs and pdfs with scrambled meta-data, Lightyear uses Optical Character Recognition (OCR) to extract the data.

The OCR capabilities in Lightyear allow us to create maps for you to use, which will extract data from these bills, much in the same way as we do from system-generated pdfs.

Uploading a Scanned Document or Image File

You can upload image files into Lightyear via the Upload icon/function which you can find on the top ribbon of the Lightyear screen. Or you can email these in to your unique Lightyear email address as per the usual process. 

When the document arrives into Lightyear, Lightyear will automatically detect whether the file is a system-generated pdf, or whether Lightyear needs to use OCR to extract the data. If the file requires OCR data-extraction, Lightyear will display the following image in Panel 2 whilst the data is being extracted via OCR. OCR can take up to 2 minutes to extract data, but typically takes between 15 and 60 seconds. You will need to manually refresh the image panel, but we will be introducing automatic page refresh in the near future which will automatically update Panel 2 with the bill data once it is available.

The following file types are accepted for OCR: PNG, TIF, TIFF, PDF (Scans) & JPEG. You cannot upload (or email in) any other file type.

Once OCR has completed you will see a Scanner Icon in panel 1 (like the image below) which indicates that the document has been OCR'd.

Once you see the green scanner icon, you can now treat the bill/credit-note/statement much as you would any system-generated pdf.

If the document has been Smart Extracted, instead you'll see the following icon:

For full details on how Smart Extract differs from OCR, click here.

Mapping a bill using OCR

From your point of view, the mapping process will be the same. You'll locate the bill in your Processing tab, and either search for and apply an existing map, or request a new map be created. However, there are a couple of important points to consider:
  • OCR will automatically run on any bill/file in which we detect no metadata... basically all image files. A very, very small percentage of image files actually do contain some metadata. For these bills, you may find that they remain in your Processing tab. But, please do ask for a map to be created, and we will try some magic behind the scenes to see if we can create rules to handle them now (and in future) for you.
  • Please note - Although we can do magic, we're not wizards. Or magicians. For OCR to work, the quality of the image must be good. Scans that have caught in the document feeder (and are skewed/squished) are impossible to extract data from. If your staff has ticked off stock arriving with markers, and have hidden text, we'll obviously struggle. Creases and folds (when scanned) create vertical and horizontal lines, which may obstruct the data we are looking to extract. Similarly staples. Or water-marks. 
  • OCR will work best with at least 150 DPI. 300 DPI is good too. Anything above that will create an image file of a size that might be too large to upload to Lightyear. Anything less than 150 DPI might not be of good enough quality for us to extract data, from.
  • The maximum file size for an OCR bill is 5mb, as opposed to the standard 10mb, so please consider that when setting your DPI level.
  • When an OCR bill is received in the Processing tab, it can take some time for the image to load. This is because the bill needs to be processed by our OCR engine before returning to the Processing tab.
  • When sending OCR bills into Lightyear, please make sure that each bill is its own separate attachment. If you have a 2-page bill, please scan the file as 2 pages, but 2 pages within 1 file. The rule is, if you have 5 bills, you need to send in 5 separate files. You can, of course, send those 5 bills into Lightyear on the one email, but they must be 1 file, for 1 bill.


    Check out our
    to stay up to date

      • Related Articles

      • Automatic Mapping

        Once you send a document into Lightyear, either an invoice or credit note, we will remember some bits of information for the future, so the next time you send a document from the same supplier, we will automatically apply the same map for you. There ...
      • Utility Bills

        At Lightyear, bills passing through our system come in all shapes and sizes. This requires an adaptable approach to ensure the satisfaction of our customers. Our data mapping technology is becoming more versatile and capable as we grow, allowing us ...
      • Bills with "scrambled" metadata

        Some bills are being generated with metadata that doesn't represent the data being displayed. On the PDF the information looks readable, however no useful data can actually be extracted.                                              Original PDF View  ...
      • What type of map should I ask for? And what data is extracted for each map type?

        Sometimes suppliers will send in multiple invoices within the one PDF. Make sure you split these before sending in a map request. Check out this article for more information Our Free 'Map for Me' Service Finding a map, trying a map and using a map is ...
      • My invoice was mapped to the wrong supplier. What do I do?

        Don't panic, it's super easy to fix. First we'll add the correct supplier to the invoice then go over how to prevent it from happening. Fixing the Invoice This can be done while the invoice is in the Approval Tab. Select the invoice in question, and ...