User Tools

Site Tools


extraction_wizard

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
extraction_wizard [2022/06/27 17:40]
autobook [1.2.7 Fixed Text]
extraction_wizard [2022/08/25 12:40]
autobook
Line 1: Line 1:
 ====== Extraction Wizard ====== ====== Extraction Wizard ======
  
-The Extraction Wizard allows you to create Extraction Schemes for capturing data from a source text into the columns of a database. You can capture data from emails, cloud-systems, PDF files or any other kind of structured data.+The Extraction Wizard allows you to create Extraction Schemes for capturing data from a source text into the columns of a Database. You can capture data from emails, cloud-systems, PDF files or any other kind of structured data.
  
-You are able to define conditions on line, sub-line, word, and/or symbol level for each database column while the extraction results are shown on-the-fly in the **Test Result** field.+You are able to define conditions on line, sub-line, word, and/or symbol level for each Database column while the extraction results are shown on-the-fly in the **Test Result** field.
  
-The available options cover vastly more than you will need in typical usage. In most cases, you will need only 2 or 3 controls for each database column. In special, complex situations, you can activate the [[#Use Regex for text fields]] checkbox to inject regular expressions into your settings which open up literally unlimited possibilities.+The available options cover vastly more than you will need in typical usage. In most cases, you will need only 2 or 3 controls for each Database column. In special, complex situations, you can activate the [[#Use Regex for text fields]] checkbox to inject regular expressions into your settings which open up literally unlimited possibilities.
  
-To begin, paste a test string – e.g. a purchase order from which you want to extraction information – into the large **Test String** field on the right side and toy around a bit with the options, always first picking the [[#Data Source]] (except you only want to write a [[#Fixed Text]] into that database column), and then trimming down the text to be extracted by applying various limitations on line, sub-line, word, and/or character level until the result matches the content you want to store in a Database. To undo a selection in any of the listboxes, simply double-click it.+To begin, paste a test string – e.g. a purchase order from which you want to extraction information – into the large **Test String** field on the right side and toy around a bit with the options, always first picking the [[#Data Source]] (except you only want to write a [[#Fixed Text]] into that Database column), and then trimming down the text to be extracted by applying various limitations on line, sub-line, word, and/or character level until the result matches the content you want to store in a Database. To undo a selection in any of the listboxes, simply double-click it.
  
 You'll probably get how it works without reading the manual, although you might miss out on some less obvious functionalities and tricks. A detailed description of each control follows below. Alternatively, skip directly to the [[#Examples]] section to gain intuitive understanding. You'll probably get how it works without reading the manual, although you might miss out on some less obvious functionalities and tricks. A detailed description of each control follows below. Alternatively, skip directly to the [[#Examples]] section to gain intuitive understanding.
Line 15: Line 15:
 ⯈ Right-click into the **Schemes** listbox on the **Home** tab and select **Create new** from the context menu. ⯈ Right-click into the **Schemes** listbox on the **Home** tab and select **Create new** from the context menu.
  
-The Auto Book Dataviewer will open and show the available database column headers. Each row represents one set of column headers.+The Auto Book Dataviewer will open and show the available Database column headers. Each row represents one set of column headers.
  
 If no row contains the headers you want, add a new row and type your own headers or modify the existing ones. Don't forget to click **Save** if you want to keep your changes for later use. If no row contains the headers you want, add a new row and type your own headers or modify the existing ones. Don't forget to click **Save** if you want to keep your changes for later use.
Line 138: Line 138:
 [{{ :autobook-v1.1-extractionwizard-singleview-datasource.png?nolink|Data Source selection}}] [{{ :autobook-v1.1-extractionwizard-singleview-datasource.png?nolink|Data Source selection}}]
  
-The choice of the data source is important only if you are are going to use [[start#Parameter Extraction Mode]] or [[start#Email Source Extraction Mode]]. For [[start#Normal Text Extraction]], your selection won't make a difference, because you are using only one piece of text for data extraction (the text within the clipboard); in this case, click either **(6) Clipboard/General** or any of the other options in case you want to make your scheme compatible with other methods of data transmission.+The choice of the data source is important only if you are are going to use [[start#Parameter Extraction Mode]] or [[start#Email Source Extraction Mode]]. For [[start#Normal Text Extraction]], your selection won't make a difference, because you are using only one piece of text for data extraction (the text within the clipboard); in this case, click either **(6) Clipboard/General** or any of the other options in case you want to make your Extraction Scheme compatible with other methods of data transmission.
  
-Otherwise, pick the source from which you want to extract the data for this database column:+Otherwise, pick the source from which you want to extract the data for this Database column:
  
 (1) Date: The email date as indicated by your email client.\\ (1) Date: The email date as indicated by your email client.\\
Line 167: Line 167:
 This is done by entering the line number or range of numbers into the **[N]** field. The initial line – the beginning of the source text or the line where the specific text occurs – is considered line 1, the next line is line 2, and so on. To define a range of line numbers, enter, for example, <q>2-5</q> or <q>1<N<6</q> for lines from the 2nd to the 5th, <q><5</q> for lines up to the 4th, <q>>1</q> for all following lines, and so on. Negative line numbers are currently not supported, but might be implemented in a later Auto Book version. This is done by entering the line number or range of numbers into the **[N]** field. The initial line – the beginning of the source text or the line where the specific text occurs – is considered line 1, the next line is line 2, and so on. To define a range of line numbers, enter, for example, <q>2-5</q> or <q>1<N<6</q> for lines from the 2nd to the 5th, <q><5</q> for lines up to the 4th, <q>>1</q> for all following lines, and so on. Negative line numbers are currently not supported, but might be implemented in a later Auto Book version.
  
-When indicating a range of lines, note that at most one line will be inserted into your database. If you don't make any other limiting settings (e.g. via [[#Word Limitations]]), the first line of your range will be captured. If you do make other limiting settings, the first line with content that fulfills all other conditions as well will be captured. For example, if you stipulate that captured words must include the letters <q>EUR</q>, the first line of your range where such a word appears will be captured.+When indicating a range of lines, note that at most one line will be inserted into your Database. If you don't make any other limiting settings (e.g. via [[#Word Limitations]]), the first line of your range will be captured. If you do make other limiting settings, the first line with content that fulfills all other conditions as well will be captured. For example, if you stipulate that captured words must include the letters <q>EUR</q>, the first line of your range where such a word appears will be captured.
  
 To count only lines that include some content (anything else than whitespace), click **N-th non-empty line from** in the top field. Otherwise, to count all lines, click **N-th line from**. To count only lines that include some content (anything else than whitespace), click **N-th non-empty line from** in the top field. Otherwise, to count all lines, click **N-th line from**.
Line 347: Line 347:
  
 ==== - Order Email ==== ==== - Order Email ====
 +
 +<WRAP box right prewrap 270px>
 +**See this example as a Video**
 +
 +{{ Website-Sample-Complete.mp4|Example Video}}
 +</WRAP>
  
 We will show step-by-step instructions for two examples for Auto Book's primary use case, that is, extracting purchase order information from emails. In the first example that follows immediately below, we will extract these information from the email text body as it is displayed in an email client. In the second example, we will use the email source instead to get all data we want (see [[#Extracting from email source]]). We will show step-by-step instructions for two examples for Auto Book's primary use case, that is, extracting purchase order information from emails. In the first example that follows immediately below, we will extract these information from the email text body as it is displayed in an email client. In the second example, we will use the email source instead to get all data we want (see [[#Extracting from email source]]).
extraction_wizard.txt · Last modified: 2022/09/14 11:47 by autobook