This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Last revision Both sides next revision | ||
extraction_wizard [2022/05/26 09:31] autobook |
extraction_wizard [2022/08/25 12:40] autobook |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== Extraction Wizard ====== | ====== Extraction Wizard ====== | ||
- | The Extraction Wizard allows you to create Extraction Schemes for capturing data from a source text into the columns of a database. You can capture data from emails, cloud-systems, | + | The Extraction Wizard allows you to create Extraction Schemes for capturing data from a source text into the columns of a Database. You can capture data from emails, cloud-systems, |
- | You are able to define conditions on line, sub-line, word, and/or symbol level for each database | + | You are able to define conditions on line, sub-line, word, and/or symbol level for each Database |
- | The available options cover vastly more than you will need in typical usage. In most cases, you will need only 2 or 3 controls for each database | + | The available options cover vastly more than you will need in typical usage. In most cases, you will need only 2 or 3 controls for each Database |
- | To begin, paste a test string – e.g. a purchase order from which you want to extraction information – into the large **Test String** field on the right side and toy around a bit with the options, always first picking the [[#Data Source]] (except you only want to write a [[#Fixed Text]] into that database | + | To begin, paste a test string – e.g. a purchase order from which you want to extraction information – into the large **Test String** field on the right side and toy around a bit with the options, always first picking the [[#Data Source]] (except you only want to write a [[#Fixed Text]] into that Database |
You'll probably get how it works without reading the manual, although you might miss out on some less obvious functionalities and tricks. A detailed description of each control follows below. Alternatively, | You'll probably get how it works without reading the manual, although you might miss out on some less obvious functionalities and tricks. A detailed description of each control follows below. Alternatively, | ||
Line 15: | Line 15: | ||
⯈ Right-click into the **Schemes** listbox on the **Home** tab and select **Create new** from the context menu. | ⯈ Right-click into the **Schemes** listbox on the **Home** tab and select **Create new** from the context menu. | ||
- | The Auto Book Dataviewer will open and show the available | + | The Auto Book Dataviewer will open and show the available |
If no row contains the headers you want, add a new row and type your own headers or modify the existing ones. Don't forget to click **Save** if you want to keep your changes for later use. | If no row contains the headers you want, add a new row and type your own headers or modify the existing ones. Don't forget to click **Save** if you want to keep your changes for later use. | ||
Line 138: | Line 138: | ||
[{{ : | [{{ : | ||
- | The choice of the data source is important only if you are are going to transmit data to Auto Book via [[start#Direct transmission of Email Data to Auto Book|Direct Transmission]] or [[start#Processing the Email Source|Email Source Extraction]]. For [[start#Selecting | + | The choice of the data source is important only if you are are going to use [[start#Parameter Extraction Mode]] or [[start# |
- | Otherwise, pick the source from which you want to extract the data for this database | + | Otherwise, pick the source from which you want to extract the data for this Database |
(1) Date: The email date as indicated by your email client.\\ | (1) Date: The email date as indicated by your email client.\\ | ||
Line 147: | Line 147: | ||
(4) Address: The email address of the sender of the email as indicated by your email client.\\ | (4) Address: The email address of the sender of the email as indicated by your email client.\\ | ||
(5) Text: The text body of the email as indicated by your email client.\\ | (5) Text: The text body of the email as indicated by your email client.\\ | ||
- | (6) Clipboard/ | + | (6) Clipboard/ |
If you're going to process the email source, these sources – Date, Sender, Subject, Address and Text (body of the email) are automatically extracted from the email source, and your Extraction Scheme settings will be applied onto these resulting sources. Thus, if you are going to extract a part of an email' | If you're going to process the email source, these sources – Date, Sender, Subject, Address and Text (body of the email) are automatically extracted from the email source, and your Extraction Scheme settings will be applied onto these resulting sources. Thus, if you are going to extract a part of an email' | ||
Line 167: | Line 167: | ||
This is done by entering the line number or range of numbers into the **[N]** field. The initial line – the beginning of the source text or the line where the specific text occurs – is considered line 1, the next line is line 2, and so on. To define a range of line numbers, enter, for example, < | This is done by entering the line number or range of numbers into the **[N]** field. The initial line – the beginning of the source text or the line where the specific text occurs – is considered line 1, the next line is line 2, and so on. To define a range of line numbers, enter, for example, < | ||
- | When indicating a range of lines, note that at most one line will be inserted into your database. If you don't make any other limiting settings (e.g. via [[#Word Limitations]]), | + | When indicating a range of lines, note that at most one line will be inserted into your Database. If you don't make any other limiting settings (e.g. via [[#Word Limitations]]), |
To count only lines that include some content (anything else than whitespace), | To count only lines that include some content (anything else than whitespace), | ||
Line 287: | Line 287: | ||
The two fields in the **Fixed Text** group allow you to add fixed text, ie. text independent of the source text, in front of or behind the extracted text. | The two fields in the **Fixed Text** group allow you to add fixed text, ie. text independent of the source text, in front of or behind the extracted text. | ||
- | If you want to store ONLY fixed text for this database | + | If you want to store ONLY fixed text for this Database |
- | The **Fixed Text** fields also accept a few commands that make it somewhat dynamic - it's called " | + | The **Fixed Text** fields also accept a few commands that make it somewhat dynamic - it's called " |
- | |< | + | |< |
|< | |< | ||
|< | |< | ||
Line 347: | Line 347: | ||
==== - Order Email ==== | ==== - Order Email ==== | ||
+ | |||
+ | <WRAP box right prewrap 270px> | ||
+ | **See this example as a Video** | ||
+ | |||
+ | {{ Website-Sample-Complete.mp4|Example Video}} | ||
+ | </ | ||
We will show step-by-step instructions for two examples for Auto Book's primary use case, that is, extracting purchase order information from emails. In the first example that follows immediately below, we will extract these information from the email text body as it is displayed in an email client. In the second example, we will use the email source instead to get all data we want (see [[# | We will show step-by-step instructions for two examples for Auto Book's primary use case, that is, extracting purchase order information from emails. In the first example that follows immediately below, we will extract these information from the email text body as it is displayed in an email client. In the second example, we will use the email source instead to get all data we want (see [[# | ||
Line 394: | Line 400: | ||
To confirm that you settings for each column are correct, paste the above sample email into the large **Test String** field on the right side. Whenever you complete the instructions below for one of the columns, the extraction result will be automatically displayed in the **Test Result** field below. | To confirm that you settings for each column are correct, paste the above sample email into the large **Test String** field on the right side. Whenever you complete the instructions below for one of the columns, the extraction result will be automatically displayed in the **Test Result** field below. | ||
- | **Date**: In this sample email, the client did not indicate an order date. We could use either [[Start#Direct transmission of Email Data to Auto Book|direct data transmission]] or [[start#Processing the Email Source|email source extraction]] to capture the email' | + | **Date**: In this sample email, the client did not indicate an order date. We could use either [[Start#Parameter Extraction Mode]] or [[start# |
⯈ Type < | ⯈ Type < | ||
Line 408: | Line 414: | ||
In the PM column, we want to capture the project manager' | In the PM column, we want to capture the project manager' | ||
- | ⯈ Click on **Clipboard/ | + | ⯈ Click on **(6) Clipboard/ |
⯈ Click on **N-th non-empty line from** and **where a specific text occurs** in the **Line Limitations** group.\\ | ⯈ Click on **N-th non-empty line from** and **where a specific text occurs** in the **Line Limitations** group.\\ | ||
⯈ Enter < | ⯈ Enter < | ||
Line 422: | Line 428: | ||
Assuming all PO numbers start with < | Assuming all PO numbers start with < | ||
- | ⯈ Click on **Clipboard/ | + | ⯈ Click on **(6) Clipboard/ |
⯈ Enter < | ⯈ Enter < | ||
⯈ Click on **only letters + digits** in the [[#Symbol Limitations]] group to get rid of the colon or any other punctuation marks the client might use. | ⯈ Click on **only letters + digits** in the [[#Symbol Limitations]] group to get rid of the colon or any other punctuation marks the client might use. | ||
Line 429: | Line 435: | ||
Alternatively, | Alternatively, | ||
⯈ Activate the [[#Use Regex for text fields checkbox]]. \\ | ⯈ Activate the [[#Use Regex for text fields checkbox]]. \\ | ||
- | ⯈ Click on **Clipboard/ | + | ⯈ Click on **(6) Clipboard/ |
⯈ Enter < | ⯈ Enter < | ||
⯈ Click on **only letters + digits** in the [[#Symbol Limitations]] group to get rid of the colon or any other punctuation marks the client might use. (Alternatively, | ⯈ Click on **only letters + digits** in the [[#Symbol Limitations]] group to get rid of the colon or any other punctuation marks the client might use. (Alternatively, | ||
Line 435: | Line 441: | ||
If the client always uses the same introductory sentence, <q>We would like to request your services for the following job PO123456:</ | If the client always uses the same introductory sentence, <q>We would like to request your services for the following job PO123456:</ | ||
- | ⯈ Click on **Clipboard/ | + | ⯈ Click on **(6) Clipboard/ |
⯈ Enter <q> for the following job</ | ⯈ Enter <q> for the following job</ | ||
Line 444: | Line 450: | ||
We'll use the same method for all of these, namely: | We'll use the same method for all of these, namely: | ||
- | ⯈ Click on **Clipboard/ | + | ⯈ Click on **(6) Clipboard/ |
⯈ Click on **N-th non-empty line from** and **where a specific text occurs** in the **Line Limitations** group.\\ | ⯈ Click on **N-th non-empty line from** and **where a specific text occurs** in the **Line Limitations** group.\\ | ||
⯈ Enter < | ⯈ Enter < | ||
Line 454: | Line 460: | ||
Again, there are many ways to extract the amount and currency. Let's stick with the simplest: | Again, there are many ways to extract the amount and currency. Let's stick with the simplest: | ||
- | ⯈ Click on ** (6) Clipboard/ | + | ⯈ Click on **(6) Clipboard/ |
⯈ Activate the [[#Use Standard Format checkbox]] in the [[#Sub-Line Limitations]] group. | ⯈ Activate the [[#Use Standard Format checkbox]] in the [[#Sub-Line Limitations]] group. | ||
Line 463: | Line 469: | ||
Only the 2 **Comments** columns are left. We won't extract anything into these and keep them as a reserve in order to add notes to our records. We can use them, for example, to track whether each record has already been invoiced, etc. | Only the 2 **Comments** columns are left. We won't extract anything into these and keep them as a reserve in order to add notes to our records. We can use them, for example, to track whether each record has already been invoiced, etc. | ||
- | So there' | + | So there' |
=== - Extracting from email source === | === - Extracting from email source === | ||
Line 469: | Line 475: | ||
In this example, we will use an email only slightly modified from the example above ([[# | In this example, we will use an email only slightly modified from the example above ([[# | ||
- | As these information are not available within the text body, we have to use either [[Start#Direct transmission of Email Data to Auto Book|direct data transmission]] or [[start#Processing the Email Source|email source extraction]], and in this example, we are going to use the latter method. (The Extraction Wizard settings for [[Start#Direct transmission of Email Data to Auto Book|direct data transmission]] would actually be identical – the only difference is that we wouldn' | + | As these information are not available within the text body, we have to use either [[Start#Parameter Extraction Mode]] or [[start# |
Below is our sample email source. The header is slightly shortened for space reasons. If the header of your emails contains lengthy incomprehensible data salad, don't worry about it – Auto Book will simply ignore it. | Below is our sample email source. The header is slightly shortened for space reasons. If the header of your emails contains lengthy incomprehensible data salad, don't worry about it – Auto Book will simply ignore it. | ||
Line 607: | Line 613: | ||
Again, there are many ways to extract the amount and currency. Let's stick with the simplest: | Again, there are many ways to extract the amount and currency. Let's stick with the simplest: | ||
- | ⯈ Click on **(1) Text body** in the Data Source listbox.\\ | + | ⯈ Click on **(1) Text body** in the [[#Data Source]] listbox.\\ |
⯈ Activate the [[#Use Standard Format checkbox]] in the [[#Sub-Line Limitations]] group. | ⯈ Activate the [[#Use Standard Format checkbox]] in the [[#Sub-Line Limitations]] group. | ||
Line 616: | Line 622: | ||
Only the 2 **Comments** columns are left. We won't extract anything into these and keep them as a reserve in order to add notes to our records. We can use them, for example, to track whether each record has already been invoiced, etc. | Only the 2 **Comments** columns are left. We won't extract anything into these and keep them as a reserve in order to add notes to our records. We can use them, for example, to track whether each record has already been invoiced, etc. | ||
- | So there' | + | So there' |