The TableProcessor Object Tutorial
Download source code tableprocessordemo.zip (55 kb).
This tutorial shows how to create an application that uses information from a web site gathered with the help of the TableProcessor object included into AWS HTML Producer 4.0.
Let’s extract the stock indices from the CNN web site and show them on your web site. The example will be written in Visual Basic Script for ASP environment (you need working Internet Information Services or MS Personal Web Server). Actually it does not matter what language the code will be written in and where it will work. You can easily adapt it for VB6.0 or client-side VBScript.
Generally, the first thing you need to do is to determine what page you want to process. This gives you ability to find out what cells provide information you need, and find their position (column index and row index).
So, you need to go to http://edition.cnn.com and find the table that contains a list of indices. The able is located at the right bottom corner of the page and you can see that its layout is very suitable for the task.
The first issue you encounter is how to make TableProcessor identify this table among other table inside the page. The object supports 5 ways to do this: using number of that table, its name, a value of any attribute of the <table> tag of the target table, by a search string that matches a substring in source HTML text and with advanced string matching - by regular expressions pattern. The search string option will be most suitable for our task, as it gives you opportunity to select desired table by just a sample string that resides within it.
So, let's find a text string that would exactly identify our table. As it's written in Selecting tables by search string chapter, there are 2 major requirements to that string: it must be static and unique as possible within the page. The more these requirements are met, the more quality and reliability of data extracting you get. In this example, the word MARKETS: (in upper case and with colon) come as the most suitable test to compare for our task. On the one hand, it is not expected to change frequently. On the other there is rather high probability that it is unique -- we include the colon in the search string and will pay attention to character case, so only word Markets in upper case and with colon in the end will match our criterion. It is improbable, that another such word can appear in other context on the page. So the criteria for selecting the table with indices are the following:
SearchString = "MARKETS:"
CaseSensitive = True

Now we need evaluate the query and figure out the table structure in order to format output in our sample application properly. So, open TableProcessor Console. With its help you can find out the layout of the table and discover location of the cells that contain needed information for you. To get a page directly from the Internet, check the “From File or Web” and fill the web site address in: http://edition.cnn.com. Note that you must enter URL in the full form (with the http:// prefix). Now select the By Search String select criterion and type in the search string: MARKETS: and check the Case Sensitive checkbox. Leave the “Remove formatting” checkbox turned on, this will remove any tags from the cell values (we are interested in data only). When ready, press the “Parse Table” button and wait for the file to be downloaded. Depending on your Internet connection speed, it can take several seconds.
If everything was OK, you will see values form the parsed table in the grid. Note that resulting table has the same layout as the input one. Thanks to this grid you can easily figure out positioning of each cell (notice column and row indexes to the left and in the top). Voila!
Now we have gathered enough information about the table and are ready to write a real application. You can write it either in Visual Stutio.Net, MS Web Matrix or in usual text editor. However, we will consider the case with Visual Studio. So, follow the next steps.
1. Open VS.Net, click the New Project button, select Visual Basic Project / ASP.NET Web Application in the New Project window.
2. It will take some seconds while VS will be creating a new web project. When the process is finished, you will be an empty web form where you should place the following web controls (see the picture):
- TextBox named txtURL - for URL of the web page to process
- TextBox named txtSearchString - for search string to select the table with
- Button named txtProcessTable that will start processing
- chkCaseSensitive - to set if the parser must take the case of the search string into account
- and DataGrid named dgTable where the table will be shown
3. Now open VB.Net code editor (file WebForm1.aspx.vb) and type in the following code.
Private
Sub cmdProcessTable_Click(
_
ByVal sender As
System.Object, _
ByVal e As
System.EventArgs) _
Handles cmdProcessTable.Click
'Declare and create table processing
object
Dim Parser As
HTMLProducer.TableProcessor
Dim Table As
HTMLProducer.TableCells
Parser = New
HTMLProducer.TableProcessor()
'Declare and create data objects
Dim dt As
New DataTable()
Dim dr As DataRow
Dim i As
Integer
'Set main properties
Parser.ProcessFile = True
Parser.FileName = txtURL.Text
Parser.CacheFile = True
Parser.SearchString = txtSearchString.Text
Parser.MatchIndex = 0
Parser.CaseSensitive = chkCaseSensitive.Checked
Parser.SelectCriterion = _
HTMLProducer.TableProcessor. _ SelectCriteria.SelectBySearchString
Table = Parser.ProcessTable(True)
'Now we got a cell collection
with table cell.
'Take advantage of the power of ASP.NET - create
'a data table representation of the input table
'Also, it will give ability to bind it to DataGrid
'without further machinery
dt.Columns.Add(New
DataColumn("Index", GetType(String)))
dt.Columns.Add(New
DataColumn("Absolute Change", GetType(String)))
dt.Columns.Add(New
DataColumn("Value", GetType(String)))
dt.Columns.Add(New
DataColumn("% Change", GetType(String)))
'Add table data (indices and their
values)
For i = 1
To Table.CountRows
- 1
dr = dt.NewRow()
dr(0) = Table.GetCell(0, i).Value
dr(1) = Table.GetCell(2, i).Value
dr(2) = Table.GetCell(3, i).Value
dr(3) = Table.GetCell(4, i).Value
dt.Rows.Add(dr)
Next
'Display data
dgTable.DataSource = dt
dgTable.AutoGenerateColumns =
True
dgTable.DataBind()
End Sub
There is a new interesting feature introduced in AWS HTML Producer 4.0. If you just want show the whole table on a web page (so you don't need separate values of every single cell) you can get HTML source of that table and display it in the page directly instead of rebuilding the initial table manually. Just take advantage of a new TableHTMLText property of the TableCells object:
'Build output table
Response.Write(Cells.TableHTMLText)
If you have done everything correct, after you requested the page, you get the page with stock indices taken from the CNN web page. If you try to process the page without removing HTML formatting:
Set Cells =
MyTableProcessor.ProcessTable(False) ‘Process
table
you will see color markers, as show on the left picture. They are loaded directly from CNN web site. However, in case of an application that needs figures only this is apparently unnecessary.
The code of this sample application can be found on TableProcessor Example page, and the working script is located in the TutorialDemo folder in the HTMLProducer's program folder.
|