|
|
|
|
|
Selecting Tables by Search String and Pattern
Another great feature in AWS HTML Producer 4.0 is the ability to select HTML tables to extract values from on the string that this HTML code of the table contains. For instance, in order to extract a table with weather from the CNN web site, you do not need to figure out its number or find unique attributes of the table as it was with AWS HTML Producer 3.5. You can just specify a text string that the table contains, or specify a RegEx pattern that would match this string. Because the matching is carried out with HTML representation of the table, not the text visible in browser, you can match both displayed text and HTML. For example, if the word "weather" appears not only in that table, but in the navigation menu as well, you can match a unique HTML comment that the weather table contains, or even a part of HTML code that comprise the table.
In order to select a table by the text it contains you can take advantage of one of two new selection modes (criteria) - SelectBySearchString and SelectByPattern.
- SelectBySearchString - you can specify a text string (the SearchString property) that would match HTML text within a table you want to process. Use this to find a table for processing by a text string it contains. Optionally you can specify match index to react on in the MatchIndex property, if several matches can occur in the text.
- SelectByPattern - the same as the previous one, but lets take advantage of the power of the regular expressions while matching a text inside the table to be selected. In this mode you need provide a valid regular expression pattern in the Pattern property. Optionally you can specify match index to react on in the MatchIndex property, if several matches can occur in the text.
Regular expressions are processed by intrinsic regular expression engine contained in Microsoft.Net Framework class library (System.Text.RegularExpressions.RegEx), so they must be composed in comply to the .NET Framework Regular Expressions reference.
See a real example of using the SelectBySearchString criterion, in the TableProcessor tutorial.
|
|
|
|
|