Firstly, we should note that bills and receipts are different things, and Lightyear treats them differently. For details on how to process receipts (and the data that Lightyear extracts from receipts), please refer to
this article.
In regard to bills, Lightyear employs a number of methods of data extraction, depending on the file-type/format that has been emailed/uploaded to Lightyear.
The vast majority of bills being emailed/uploaded to Lightyear are system-generated pdfs, but a very small percentage are image files, and generally have suffix's such as jpg, jpeg, tiff and png. In addition, a very small number of pdf files are actually image files, wrapped up as a pdf. And furthermore, a very, very small percentage of system-generated pdfs have scrambled metadata.
For these jpegs, tiff, png, image-based pdfs and pdfs with scrambled meta-data, Lightyear uses Optical Character Recognition (OCR) to extract the data.
The OCR capabilities in Lightyear allow us to create maps for you to use, which will extract data from these bills, much in the same way as we do from system-generated pdfs.
Uploading a Scanned Document or Image File
You can upload image files into Lightyear via the Upload icon/function which you can find on the top ribbon of the Lightyear screen. Or you can email these in to your unique Lightyear email address as per the usual process.
When the document arrives into Lightyear, Lightyear will automatically detect whether the file is a system-generated pdf, or whether Lightyear needs to use OCR to extract the data. If the file requires OCR data-extraction, Lightyear will display the following image in Panel 2 whilst the data is being extracted via OCR. OCR can take up to 2 minutes to extract data, but typically takes between 15 and 60 seconds. You will need to manually refresh the image panel, but we will be introducing automatic page refresh in the near future which will automatically update Panel 2 with the bill data once it is available.
The following file types are accepted for OCR: PNG, TIF, TIFF, PDF (Scans) & JPEG. You cannot upload (or email in) any other file type.
Once OCR has completed you will see a Scanner Icon in panel 1 (like the image below) which indicates that the document has been OCR'd.
Once you see the green scanner icon, you can now treat the bill/credit-note/statement much as you would any system-generated pdf.
If the document has been Smart Extracted, instead you'll see the following icon:
For full details on how Smart Extract differs from OCR, click here.
Mapping a bill using OCR
From your point of view, the mapping process will be the same. You'll locate the bill in your Processing tab, and either search for and apply an existing map, or request a new map be created. However, there are a couple of important points to consider:
- OCR will automatically run on any bill/file in which we detect no metadata... basically all image files. A very, very small percentage of image files actually do contain some metadata. For these bills, you may find that they remain in your Processing tab. But, please do ask for a map to be created, and we will try some magic behind the scenes to see if we can create rules to handle them now (and in future) for you.
- Please note - Although we can do magic, we're not wizards. Or magicians. For OCR to work, the quality of the image must be good. Scans that have caught in the document feeder (and are skewed/squished) are impossible to extract data from. If your staff has ticked off stock arriving with markers, and have hidden text, we'll obviously struggle. Creases and folds (when scanned) create vertical and horizontal lines, which may obstruct the data we are looking to extract. Similarly staples. Or water-marks.
- OCR will work best with at least 150 DPI. 300 DPI is good too. Anything above that will create an image file of a size that might be too large to upload to Lightyear. Anything less than 150 DPI might not be of good enough quality for us to extract data, from.
- The maximum file size for an OCR bill is 5mb, as opposed to the standard 10mb, so please consider that when setting your DPI level.
- When an OCR bill is received in the Processing tab, it can take some time for the image to load. This is because the bill needs to be processed by our OCR engine before returning to the Processing tab.
- When sending OCR bills into Lightyear, please make sure that each bill is its own separate attachment. If you have a 2-page bill, please scan the file as 2 pages, but 2 pages within 1 file. The rule is, if you have 5 bills, you need to send in 5 separate files. You can, of course, send those 5 bills into Lightyear on the one email, but they must be 1 file, for 1 bill.