Home | News | Products | Services | Support | Contacts | About us

Featured Product

AWS HTML Producer is a powerful HTML processing engine. You can not only parse HTML templates but also extract the latest stock indices, currency rates, weather, news from various websites! See on-line demos.
COM and Net editions available.

Testimonials

HTML Producer allows us to easily build dynamic content extraction routines that can fit both types of storage. Perhaps the greatest strength of HTML Producer is that it makes it very easy to create custom HTML tags, with attributes, that can triger virtually any type of data access or process routines that return dynamic content to a web page. In my opinion its worth checking out if are into template driven content solutions.

Jim Santarius
NCC Technology Group

Our Products

AWS HTML Producer

AWS HTML Producer 4.0
Enterprise-level HTML processing solution.

AWS HTML Producer .Net

AWS HTML Producer 4.0 .Net
Take advantage of the cutting-edge HTML processing in your .Net apps!

AWS HTML Wizard

AWS HTML Wizard 1.0
Design multi-page web forms and on-line wizards quickly and easily!

AWS FilePicker
AWS FilePicker

AWS FilePicker 2.1
"Open file" dialog for your web apps. Includes full-featured file manager!

The TableProcessor Object Tutorial

Download source code tableprocessordemo.zip (0,8 kb).

This tutorial shows how to create an application that uses information from a web site gathered with the help of the TableProcessor object included into AWS HTML Producer 4.0.

Let’s extract the stock indices from the CNN web site and show them on your web site. The example will be written in Visual Basic Script for ASP environment (you need working Internet Information Services or MS Personal Web Server). Actually it does not matter what language the code will be written in and where it will work. You can easily adapt it for VB6.0 or client-side VBScript.

Generally, the first thing you need to do is to determine what page you want to process. This gives you ability to find out what cells provide information you need, and find their position (column index and row index).

So, you need to go to http://edition.cnn.com and find the table that contains a list of indices. The able is located at the right bottom corner of the page and you can see that its layout is very suitable for the task.

The first issue you encounter is how to make TableProcessor identify this table among other table inside the page. The object supports 5 ways to do this: using number of that table, its name, a value of any attribute of the <table> tag of the target table, by a search string that matches a substring in source HTML text and with advanced string matching - by regular expressions pattern. The search string option will be most suitable for our task, as it gives you opportunity to select desired table by just a sample string that resides within it.

So, let's find a text string that would exactly identify our table. As it's written in Selecting tables by search string chapter, there are 2 major requirements to that string: it must be static and unique as possible within the page. The more these requirements are met, the more quality and reliability of data extracting you get. In this example, the word MARKETS: (in upper case and with colon) come as the most suitable test to compare for our task. On the one hand, it is not expected to change frequently. On the other there is rather high probability that it is unique -- we include the colon in the search string and will pay attention to character case, so only word Markets in upper case and with colon in the end will match our criterion. It is improbable, that another such word can appear in other context on the page. So the criteria for selecting the table with indices are the following:

SearchString = "MARKETS:"
CaseSensitive = True

Now we need evaluate the query and figure out the table structure in order to format output in our sample application properly. So, open TableProcessor Console. With its help you can find out the layout of the table and discover location of the cells that contain needed information for you. To get a page directly from the Internet, check the “From File or Web” and fill the web site address in: http://edition.cnn.com. Note that you must enter URL in the full form (with the http:// prefix). Now select the By Search String select criterion and type in the search string: MARKETS: and check the Case Sensitive checkbox. Leave the “Remove formatting” checkbox turned on, this will remove any tags from the cell values (we are interested in data only). When ready, press the “Parse Table” button and wait for the file to be downloaded. Depending on your Internet connection speed, it can take several seconds.

If everything was OK, you will see values form the parsed table in the grid. Note that resulting table has the same layout as the input one. Thanks to this grid you can easily figure out positioning of each cell (notice column and row indexes to the left and in the top). Voila!

Now we have gathered enough information about the table and are ready to write a real application. Create a new file in your text editor (for this task Notepad is enough, however EditPlus is much better) and start coding.

First, it is necessary to create an instance of the TableProcessor object:

Dim Cells 'Variable for resulting table
Dim Cell 'Variable for a table cell
Dim Message 'Message to show user
Dim i 'Just a loop counter
Dim myTableProcessor
Set myTableProcessor = CreateObject(“HTMLProducer.TableProcessor”)

After, you need to specify the source of the HTML. As a web page (that is a remote file) is processed, it is necessary to set the file processing mode (the ProcessFile property) and file URL (the FileName property):

MyTableProcessor.ProcessFile = True
MyTableProcessor.FileName = "http://edition.cnn.com"

Now the source of the HTML is defined. You only need to tell the parser how to select the table to process. As this topic was discussed above, you are already ready to face the code:

MyTableProcessor.SelectCriterion = 3  'Value of SelectBySearchString
MyTableProcessor.SearchString = "MARKETS:"

MyTableProcessor.CaseSensitive = True

The only thing left with the parser is to start processing the table with the help of the ProcessTable method. It takes one argument – RemoveHTML. If it set True, TableProcessor will remove all tags from the cell values:

Set Cells = MyTableProcessor.ProcessTable(True) ‘Process table

Now you have already parsed the table and got the results. Finally, let’s show them:

'Build output table
Response.Write("<b>Most recent stock quotes</b>")
Response.Write("<table width=300>")
For i = 2 To Cells.CountRows
   Message = Message & "<tr><td>" & _
   Cells.GetCell(0, i - 1).Value & "</td><td>" & _
   Cells.GetCell(1, i - 1).Value & "</td><td>" & _
   Cells.GetCell(2, i - 1).Value & "</td><td>" & _
   Cells.GetCell(3, i - 1).Value & "</td><td>" & _
   Cells.GetCell(4, i - 1).Value & "</td></tr>"
Next
Response.Write(Message)
Response.Write("</table>")
 

There is a new interesting feature introduced in AWS HTML Producer 4.0. If you just want show the whole table on a web page (so you don't need separate values of every single cell) you can get HTML source of that table and display it in the page directly instead of rebuilding the initial table manually. Just take advantage of a new TableHTMLText property of the TableCells object:

'Build output table
Response.Write(Cells.TableHTMLText)

If you have done everything correct, after you requested the page, you get the page with stock indices taken from the CNN web page. If you try to process the page without removing HTML formatting:

Set Cells = MyTableProcessor.ProcessTable(False) ‘Process table

you will see color markers, as show on the left picture. They are loaded directly from CNN web site. However, in case of an application that needs figures only this is apparently unnecessary.

The code of this sample application can be found on TableProcessor Example page, and the working script is located in the TutorialDemo folder in the HTMLProducer's program folder.

HTML Producer

Demos!

What's New?

What is HTML Producer?

What can I do with HTML Producer?

Requirements

Features

Documentation

Pricing

Inside HTML Producer

Processing HTML with AWS HTML Producer

About the Table Processor Object

The TableProcessor Object Tutorial

Selecting Tables by Search String and Pattern

The SmartIndexing Algorithm

File Caching in the TableProcessor object

Need a custom parser?

We're ready to develop a custom HTML processing component (or the entire application) if you feel HTML Producer does not fit your needs. We have 2+ year experience in developing HTML processing solutions. And be sure, it will NOT be that expensive.
Contact us to get quote.
Services by AWS.

Copyright © AW/Systems, LLC 2002—2004.  |   Privacy Policy

To the top