User Tools

Site Tools


extraction_wizard

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
extraction_wizard [2022/08/19 06:34]
autobook [3.1.1 Extracting from email text body]
extraction_wizard [2022/09/14 11:47] (current)
autobook
Line 76: Line 76:
 This functionality is intended for troubleshooting, and you probably will never need it. If you're just reading the manual to learn general usage, skip ahead to the next section for now. This functionality is intended for troubleshooting, and you probably will never need it. If you're just reading the manual to learn general usage, skip ahead to the next section for now.
  
-Auto Book can decode all common email encoding schemes such as Quoted-Printable, BASE64, etc., and also parses HTML. When you press Auto Book's [[start#Hotkeys Tab|Email Source Extraction hotkey]], the selected text or clipboard's content is assumed to be an email source and decoded internally. Your Extraction Schemes are then applied to this decoded content instead of the raw text you are seeing on your screen.+Auto Book can decode all common email encoding schemes such as Quoted-Printable, BASE64, etc., and also parses HTML. When you press Auto Book's [[manual#Hotkeys Tab|Email Source Extraction hotkey]], the selected text or clipboard's content is assumed to be an email source and decoded internally. Your Extraction Schemes are then applied to this decoded content instead of the raw text you are seeing on your screen.
  
 The decoding results can also be viewed by pasting the email source into the **Test String** field and clicking the **Process Email Source** checkbox in [[#the right-side panel]], and then selecting a [[#data source]] – the only difference is that the Email Interpreter will show the decoding result of the whole email, including both the header and the text body. The decoding results can also be viewed by pasting the email source into the **Test String** field and clicking the **Process Email Source** checkbox in [[#the right-side panel]], and then selecting a [[#data source]] – the only difference is that the Email Interpreter will show the decoding result of the whole email, including both the header and the text body.
Line 138: Line 138:
 [{{ :autobook-v1.1-extractionwizard-singleview-datasource.png?nolink|Data Source selection}}] [{{ :autobook-v1.1-extractionwizard-singleview-datasource.png?nolink|Data Source selection}}]
  
-The choice of the data source is important only if you are are going to use [[start#Parameter Extraction Mode]] or [[start#Email Source Extraction Mode]]. For [[start#Normal Text Extraction]], your selection won't make a difference, because you are using only one piece of text for data extraction (the text within the clipboard); in this case, click either **(6) Clipboard/General** or any of the other options in case you want to make your Extraction Scheme compatible with other methods of data transmission.+The choice of the data source is important only if you are are going to use [[manual#Parameter Extraction Mode]] or [[manual#Email Source Extraction Mode]]. For [[manual#Normal Text Extraction]], your selection won't make a difference, because you are using only one piece of text for data extraction (the text within the clipboard); in this case, click either **(6) Clipboard/General** or any of the other options in case you want to make your Extraction Scheme compatible with other methods of data transmission.
  
 Otherwise, pick the source from which you want to extract the data for this Database column: Otherwise, pick the source from which you want to extract the data for this Database column:
Line 147: Line 147:
 (4) Address: The email address of the sender of the email as indicated by your email client.\\ (4) Address: The email address of the sender of the email as indicated by your email client.\\
 (5) Text: The text body of the email as indicated by your email client.\\ (5) Text: The text body of the email as indicated by your email client.\\
-(6) Clipboard/General: Select this option only if //not// using [[start#Parameter Extraction Mode]] or [[start#Email Source Extraction Mode]].\\+(6) Clipboard/General: Select this option only if //not// using [[manual#Parameter Extraction Mode]] or [[manual#Email Source Extraction Mode]].\\
  
 If you're going to process the email source, these sources – Date, Sender, Subject, Address and Text (body of the email) are automatically extracted from the email source, and your Extraction Scheme settings will be applied onto these resulting sources. Thus, if you are going to extract a part of an email's subject line, for example, you don't need to worry about capturing the subject line from the email source, but only need to make the settings to define which part of this single line you need. As another example, if you select Date as your source for a certain column, you won't need to make any other settings if you're happy to capture the whole date as per email source into your Database – selecting a source without any other limiting settings means you are going to keep the whole source. If you're going to process the email source, these sources – Date, Sender, Subject, Address and Text (body of the email) are automatically extracted from the email source, and your Extraction Scheme settings will be applied onto these resulting sources. Thus, if you are going to extract a part of an email's subject line, for example, you don't need to worry about capturing the subject line from the email source, but only need to make the settings to define which part of this single line you need. As another example, if you select Date as your source for a certain column, you won't need to make any other settings if you're happy to capture the whole date as per email source into your Database – selecting a source without any other limiting settings means you are going to keep the whole source.
Line 210: Line 210:
 ⯈ To capture <q>XY123456</q>, enter <q>:</q> or <q>PO:</q> (a trailing space is optional). By entering <q>PO</q>, you make sure that only lines including <q>PO:</q> will be captured, if you haven't defined the line via [[#Line Limitations]]. Otherwise, the first line including <q>:</q> will be used. ⯈ To capture <q>XY123456</q>, enter <q>:</q> or <q>PO:</q> (a trailing space is optional). By entering <q>PO</q>, you make sure that only lines including <q>PO:</q> will be captured, if you haven't defined the line via [[#Line Limitations]]. Otherwise, the first line including <q>:</q> will be used.
  
-(This example is identical to using the [[start#standard_schemes|Standard Format]] in case <q>PO</q> is also the column title.)+(This example is identical to using the [[manual#standard_schemes|Standard Format]] in case <q>PO</q> is also the column title.)
  
 == - Use Standard Format checkbox == == - Use Standard Format checkbox ==
  
-To extract text based on the [[start#standard_schemes|Standard Format]], activate the **Use Standard Format** checkbox. It's in this group of controls because it is, in effect, a kind of pre-defined sub-line limitation.+To extract text based on the [[manual#standard_schemes|Standard Format]], activate the **Use Standard Format** checkbox. It's in this group of controls because it is, in effect, a kind of pre-defined sub-line limitation.
  
 This means that you are going to extract the line part following the column header and a colon, such as <q>2022-12-31</q> from the line <q>Date: 2022-12-31</q> if <q>Date</q> is the column header. As this is a pre-defined complete column configuration, all other controls will be disabled, except [[#Fixed Text]], which you still can add before or after the extracted text. This means that you are going to extract the line part following the column header and a colon, such as <q>2022-12-31</q> from the line <q>Date: 2022-12-31</q> if <q>Date</q> is the column header. As this is a pre-defined complete column configuration, all other controls will be disabled, except [[#Fixed Text]], which you still can add before or after the extracted text.
Line 291: Line 291:
 The **Fixed Text** fields also accept a few commands that make it somewhat dynamic - it's called "fixed" because it doesn't depend on the source text. Simply enter each command including the tags <> into either one of the **Fixed Text** fields. When saving data to a Database, these commands will be automatically replaced as detailed below: The **Fixed Text** fields also accept a few commands that make it somewhat dynamic - it's called "fixed" because it doesn't depend on the source text. Simply enter each command including the tags <> into either one of the **Fixed Text** fields. When saving data to a Database, these commands will be automatically replaced as detailed below:
  
-|<AutoFolder>|Will be replaced with the folder path generated from the [[start#Auto Folder]] pattern saved with this Extraction Scheme.|+|<AutoFolder>|Will be replaced with the folder path generated from the [[manual#Auto Folder]] pattern saved with this Extraction Scheme.|
 |<Time>|Will be replaced with the current time and date in the system locale format (click the **Test** button if you're unsure what this format looks like on your computer).| |<Time>|Will be replaced with the current time and date in the system locale format (click the **Test** button if you're unsure what this format looks like on your computer).|
 |<Time.Format>|Will be replaced with the current date and/or time in a user-defined format.| |<Time.Format>|Will be replaced with the current date and/or time in a user-defined format.|
  
-The Time commands will also be replaced in the **Test Result** field, but <AutoFolder> is not because the [[start#Auto Folder]] pattern has not yet been set at this point.+The Time commands will also be replaced in the **Test Result** field, but <AutoFolder> is not because the [[manual#Auto Folder]] pattern has not yet been set at this point.
  
 == - Date/Time Formats == == - Date/Time Formats ==
Line 347: Line 347:
  
 ==== - Order Email ==== ==== - Order Email ====
 +
 +<WRAP box right prewrap 270px>
 +**See this example as a Video**
 +
 +{{ Website-Sample-Complete.mp4|Example Video}}
 +</WRAP>
  
 We will show step-by-step instructions for two examples for Auto Book's primary use case, that is, extracting purchase order information from emails. In the first example that follows immediately below, we will extract these information from the email text body as it is displayed in an email client. In the second example, we will use the email source instead to get all data we want (see [[#Extracting from email source]]). We will show step-by-step instructions for two examples for Auto Book's primary use case, that is, extracting purchase order information from emails. In the first example that follows immediately below, we will extract these information from the email text body as it is displayed in an email client. In the second example, we will use the email source instead to get all data we want (see [[#Extracting from email source]]).
Line 353: Line 359:
  
 Assume you are repeatedly receiving emails more or less in the following format: Assume you are repeatedly receiving emails more or less in the following format:
- 
-{{video.mp4|2022-08-19_11-28-21.mp4 |}} 
  
 >>Dear Mr. XXX, >>Dear Mr. XXX,
Line 396: Line 400:
 To confirm that you settings for each column are correct, paste the above sample email into the large **Test String** field on the right side. Whenever you complete the instructions below for one of the columns, the extraction result will be automatically displayed in the **Test Result** field below. To confirm that you settings for each column are correct, paste the above sample email into the large **Test String** field on the right side. Whenever you complete the instructions below for one of the columns, the extraction result will be automatically displayed in the **Test Result** field below.
  
-**Date**: In this sample email, the client did not indicate an order date. We could use either [[Start#Parameter Extraction Mode]] or [[start#Email Source Extraction Mode]] to capture the email's sent date. However, let's assume we want to stick with simplest case of using [[start#Normal Extraction Mode]], whereby you will select the email text with your mouse in your email client and then press <q>CTRL+SHIFT+E</q>. In this case:+**Date**: In this sample email, the client did not indicate an order date. We could use either [[manual#Parameter Extraction Mode]] or [[manual#Email Source Extraction Mode]] to capture the email's sent date. However, let's assume we want to stick with simplest case of using [[manual#Normal Extraction Mode]], whereby you will select the email text with your mouse in your email client and then press <q>CTRL+SHIFT+E</q>. In this case:
  
 ⯈ Type <q><Time.yyyy-MM-dd></q> into either one of the **Fixed Text** fields (it doesn't matter which one). ⯈ Type <q><Time.yyyy-MM-dd></q> into either one of the **Fixed Text** fields (it doesn't matter which one).
Line 415: Line 419:
 ⯈ Enter <q>Regards OR Wishes OR Greetings OR Sincerely</q> for **Specific Text**.\\ ⯈ Enter <q>Regards OR Wishes OR Greetings OR Sincerely</q> for **Specific Text**.\\
  
-This approach is somewhat fuzzy in so far as the 4 words used as alternatives for **Specific Text** also could appear in a different context somewhere else within an email. However, for typical order emails, this shouldn't happen often and if it does, the name can be edited in the [[start#Data Preview]].+This approach is somewhat fuzzy in so far as the 4 words used as alternatives for **Specific Text** also could appear in a different context somewhere else within an email. However, for typical order emails, this shouldn't happen often and if it does, the name can be edited in the [[manual#Data Preview]].
  
 Click **Next**. Click **Next**.
Line 459: Line 463:
 ⯈ Activate the [[#Use Standard Format checkbox]] in the [[#Sub-Line Limitations]] group. ⯈ Activate the [[#Use Standard Format checkbox]] in the [[#Sub-Line Limitations]] group.
  
-Nothing else needs to be done, because this piece of data conforms to the [[start#Standard Schemes|Standard Format]], since the column is called <q>Remuneration</q> and the text we want to extract follows <q>Remuneration</q> and a colon (<q>:</q>), as prescribed by the [[start#Standard Schemes|Standard Format]]. (If the column were named differently, we would have to enter <q>Remuneration:</q> into the **Pick text part from** field.)+Nothing else needs to be done, because this piece of data conforms to the [[manual#Standard Schemes|Standard Format]], since the column is called <q>Remuneration</q> and the text we want to extract follows <q>Remuneration</q> and a colon (<q>:</q>), as prescribed by the [[manual#Standard Schemes|Standard Format]]. (If the column were named differently, we would have to enter <q>Remuneration:</q> into the **Pick text part from** field.)
  
 **Saving the Extraction Scheme** **Saving the Extraction Scheme**
Line 465: Line 469:
 Only the 2 **Comments** columns are left. We won't extract anything into these and keep them as a reserve in order to add notes to our records. We can use them, for example, to track whether each record has already been invoiced, etc. Only the 2 **Comments** columns are left. We won't extract anything into these and keep them as a reserve in order to add notes to our records. We can use them, for example, to track whether each record has already been invoiced, etc.
  
-So there's nothing left to do. Click **Save** and enter a name for the newly created Extraction Scheme. After the Extraction Scheme has been saved, click **Exit** to close the Extraction Wizard. Then, select the sample email text above and press the Extraction Hotkey (<q>CTRL+SHIFT+E</q> by default). Select the name of the just created Extraction Scheme in the **Schemes** panel of the [[start#Data Preview]] that appears. All desired data should pop up, ready for being added to a Database via the **Add to Database** button. If you wish, also enter an [[start#Auto Folder]] pattern based on these data and create and open a folder by activating the corresponding checkboxes.+So there's nothing left to do. Click **Save** and enter a name for the newly created Extraction Scheme. After the Extraction Scheme has been saved, click **Exit** to close the Extraction Wizard. Then, select the sample email text above and press the Extraction Hotkey (<q>CTRL+SHIFT+E</q> by default). Select the name of the just created Extraction Scheme in the **Schemes** panel of the [[manual#Data Preview]] that appears. All desired data should pop up, ready for being added to a Database via the **Add to Database** button. If you wish, also enter an [[manual#Auto Folder]] pattern based on these data and create and open a folder by activating the corresponding checkboxes.
  
 === - Extracting from email source === === - Extracting from email source ===
Line 471: Line 475:
 In this example, we will use an email only slightly modified from the example above ([[#Extracting from email text body]]). The PO number, this time, is found only in the email's subject, but not in the text body. Furthermore, instead of using the current date, this time we want to extract the email's sent date, and we also want to extract the email's sender name instead of manually entering the client name. In this example, we will use an email only slightly modified from the example above ([[#Extracting from email text body]]). The PO number, this time, is found only in the email's subject, but not in the text body. Furthermore, instead of using the current date, this time we want to extract the email's sent date, and we also want to extract the email's sender name instead of manually entering the client name.
  
-As these information are not available within the text body, we have to use either [[Start#Parameter Extraction Mode]] or [[start#Email Source Extraction Mode]], and in this example, we are going to use the latter method. (The Extraction Wizard settings for [[Start#Parameter Extraction Mode]] would actually be identical – the only difference is that we wouldn't be using the email source.)+As these information are not available within the text body, we have to use either [[manual#Parameter Extraction Mode]] or [[manual#Email Source Extraction Mode]], and in this example, we are going to use the latter method. (The Extraction Wizard settings for [[manual#Parameter Extraction Mode]] would actually be identical – the only difference is that we wouldn't be using the email source.)
  
 Below is our sample email source. The header is slightly shortened for space reasons. If the header of your emails contains lengthy incomprehensible data salad, don't worry about it – Auto Book will simply ignore it. Below is our sample email source. The header is slightly shortened for space reasons. If the header of your emails contains lengthy incomprehensible data salad, don't worry about it – Auto Book will simply ignore it.
Line 577: Line 581:
 ⯈ Enter <q>Regards OR Wishes OR Greetings OR Sincerely</q> for **Specific Text**.\\ ⯈ Enter <q>Regards OR Wishes OR Greetings OR Sincerely</q> for **Specific Text**.\\
  
-This approach is somewhat fuzzy in so far as the 4 words used as alternatives for **Specific Text** also could appear in a different context somewhere else within an email. However, for typical order emails, this shouldn't happen often and if it does, the name can be edited in the [[start#Data Preview]].+This approach is somewhat fuzzy in so far as the 4 words used as alternatives for **Specific Text** also could appear in a different context somewhere else within an email. However, for typical order emails, this shouldn't happen often and if it does, the name can be edited in the [[manual#Data Preview]].
  
 Click **Next**. Click **Next**.
Line 612: Line 616:
 ⯈ Activate the [[#Use Standard Format checkbox]] in the [[#Sub-Line Limitations]] group. ⯈ Activate the [[#Use Standard Format checkbox]] in the [[#Sub-Line Limitations]] group.
  
-Nothing else needs to be done, because this piece of data conforms to the [[start#Standard Schemes|Standard Format]], since the column is called <q>Remuneration</q> and the text we want to extract follows <q>Remuneration</q> and a colon (<q>:</q>), as prescribed by the [[start#Standard Schemes|Standard Format]]. (If the column were named differently, we would have to enter <q>Remuneration:</q> into the **Pick text part from** field.)+Nothing else needs to be done, because this piece of data conforms to the [[manual#Standard Schemes|Standard Format]], since the column is called <q>Remuneration</q> and the text we want to extract follows <q>Remuneration</q> and a colon (<q>:</q>), as prescribed by the [[manual#Standard Schemes|Standard Format]]. (If the column were named differently, we would have to enter <q>Remuneration:</q> into the **Pick text part from** field.)
  
 **Saving the Extraction Scheme** **Saving the Extraction Scheme**
Line 618: Line 622:
 Only the 2 **Comments** columns are left. We won't extract anything into these and keep them as a reserve in order to add notes to our records. We can use them, for example, to track whether each record has already been invoiced, etc. Only the 2 **Comments** columns are left. We won't extract anything into these and keep them as a reserve in order to add notes to our records. We can use them, for example, to track whether each record has already been invoiced, etc.
  
-So there's nothing left to do. Click **Save** and enter a name for the newly created Extraction Scheme. After the Extraction Scheme has been saved, click **Exit** to close the Extraction Wizard. Then, select the sample email text above and press the Email Source Extraction Hotkey (<q>CTRL+WIN+E</q> by default). Select the name of the just created Extraction Scheme in the **Schemes** panel of the [[start#Data Preview]] that appears. All desired data should pop up, ready for being added to a Database via the **Add to Database** button. If you wish, also enter an [[start#Auto Folder]] pattern based on these data and create and open a folder by activating the corresponding checkboxes.+So there's nothing left to do. Click **Save** and enter a name for the newly created Extraction Scheme. After the Extraction Scheme has been saved, click **Exit** to close the Extraction Wizard. Then, select the sample email text above and press the Email Source Extraction Hotkey (<q>CTRL+WIN+E</q> by default). Select the name of the just created Extraction Scheme in the **Schemes** panel of the [[manual#Data Preview]] that appears. All desired data should pop up, ready for being added to a Database via the **Add to Database** button. If you wish, also enter an [[manual#Auto Folder]] pattern based on these data and create and open a folder by activating the corresponding checkboxes.
extraction_wizard.1660883649.txt.gz · Last modified: 2022/08/19 06:34 by autobook