Home | News | Products | Services | Support | Contacts | About us

Featured Product

AWS HTML Producer is a powerful HTML processing engine. You can not only parse HTML templates but also extract the latest stock indices, currency rates, weather, news from various websites! See on-line demos.
COM and Net editions available.

Testimonials

We have been using this product since version 3.0 and during this time AWS has shown a great progress in improving the stability, especialy when dealing with a malform tag. Since version 3.5 released, we have convinced to deploy it as the kernel of global template of our web site. See the result at http://www.cifor.cgiar. org! AWS HTML Producer gives more than parsing some information from a table BUT it can be implemented for creating a dynamic layout of website.

Yahya Sampurna
CIFOR

Our Products

AWS HTML Producer

AWS HTML Producer 4.0
Enterprise-level HTML processing solution.

AWS HTML Producer .Net

AWS HTML Producer 4.0 .Net
Take advantage of the cutting-edge HTML processing in your .Net apps!

AWS HTML Wizard

AWS HTML Wizard 1.0
Design multi-page web forms and on-line wizards quickly and easily!

AWS FilePicker
AWS FilePicker

AWS FilePicker 2.1
"Open file" dialog for your web apps. Includes full-featured file manager!

Processing HTML with AWS HTML Producer .Net

The fowling chapter describes the main concepts of HTML parsing with the help of AWS HTML Producer. This  information is most important if you are going to use the component primarily as a HTML template parser. if you want to process ordinal web pages only, you do not have to know details about how the component parses them, which described below. In the second case, it will be enough for you to read only the "Tag using conventions" topic. 

Tag using conventions

Because AWS HTML Producer is a highly universal parsing component and was designed to parse any type of HTML-like text, it does not know how these tags are used, for example, whether they require corresponding end tags or not. This is the problem that all XML parsers face, as XML parsers know nothing about tags. The solution of this problem here is the same as with other XML parsers - the developer should adhere to a standard set of conventions that will solve most such problems. Here is the list of these conventions.

End tags. The problem of end tags is solved the same way as in XML but there are several features here that add highly improved flexibility in terms of pointing out when the parser must look for the end tag and when it must not.

  • By default, every tag to be processed by HTML Producer must have a corresponding end tag (<pagetitle>…</pagetitle>). Place a slash (/) before ">" in a start tag if it is not convenient to use an end tag. This is the same as in the XML standard.
  • You can change this precondition by setting the RequireCloseTags property of the HTMLParser object as False. In HTML Producer console this property is reflected by the Require close tags check box.
  • If the majority of tags require end tags and others don't or most of the tags don't require them when the rest do require them, you can use the unconditional specifiers used in the list of tags to be found in text to point strictly as to whether a tag needs to be closed or not. There are two specifiers that can be placed before each tag name.
    • & - a tag requires a corresponding end tag (i.e. &DIV, &TITLE, &P, &PHONE)
    • * - a tag does not require a corresponding end tag (i.e. *IMG, *META, *CUSTOMER)

    The meaning of these specifiers does not depend on the state of the Require close tags property. However if a tag that was found in text has a slash (/) before ">" (if it is specified as already closed) then an end tag will not be sought despite any specifiers.
    > So, if most of your tags don't require end tags (<BR>, <IMG>, <META>) but one requires (<P>) then you should set the Require close tags property as False and before the <P> tag name in the tag name list you should set & specifier: BR, IMG, META, &P.

Tag name. Tag name can contain, alphabetic symbols, numbers, and all other symbols except white space and the symbols defined as tag start and end (by default < and >). If you want to process all tags on a page you should put an asterisk (*) into the TagNames property (in HTML Producer Console this property is reflected by the Tag Names text box).

Tag parameters. Tags can have parameters written in this style: ParamName1=ParamValue1 ParamName2=ParamValue2 … ParamNameN=ParamValueN. If a value of a parameter has white spaces then it must be covered in single (') or double quotation marks(").  If a parameter with the same name repeats in one tag, then only the last instance will be treated as its value. Single specifiers, such as CHECKED, NOSHADE or SELECTED that are sometimes used in some HTML tags are recognized by HTML Producer, but not captured, so you cannot get them.

Fundamental tag processing

Now when you know a little about HTML Producer, you can work with the first example that shows the features described above in action.

Imagine that there is an HTML template where a page title must be inserted dynamically when a user requests the page. One possible solution is to add a special tag that will be replaced with the page title text during parsing of the template. Let's name it <pagetitle>.

After we found a name for the tag, we must decide if it requires the corresponding end tag. As usual end tags can be useful if the text between start and end tags is important. In our case, there is no need in it and we can use only the start tag. However, the parser requires tags to be closed by default and we can go on in several ways. The first way is to unset the Require Close Tags box in the HTML Producer Console window; the second one and maybe the best in this situation is to place is to put a slash before ">", that is <pagetitle/>.

   <html>
      <head>
         <title><pagetitle/></title>
      </head>
   </html>

Open HTML Producer Console, copy the text above and paste it into the field of the HTML Producer Console titled Insert HTML text here. After this we should specify the tag that must be found in this text. In order to do it, type the tag name into Tag Names field. Note that you can type either the name of the tag, pagetitle, or the full tag presence with "<" and ">", <pagetitle>. We could specify that the tag does not require closure by adding a star (*) before the tag name this way: *pagetitle.

Now you can press the Parse button. The <pagetitle/> tag will be highlighted. It means that the Producer found the tag; you can see its name and its text (in this example they are the same because we did not specified parameters for this tag). Now you can enter the text to replace this tag. Type it in the Covered Text field, for example, Welcome to my Home Page. To continue parsing click More >> and your text will be inserted in place of the tag. You will see the text like this:

   <html>
      <head>
         <title>Welcome to my Home Page!</title>
      </head>
   </html>

In this example we became familiar with the main function of HTML Producer - finding and replacing tags. Try to complete this example again use different ways of defining whether a tag requires to be closed.

Advanced tag processing

In the previous section you could see that HTML Producer is quite flexible in the task of tag processing. In this section you will learn other features that help you create reliable and powerful web applications.

Tag parameters

The first thing that we should pay attention to is recognizing tag parameters. This feature is implemented in XML parsers, but unfortunately few server-side template parsers support this. Using parameters is a very effective way to pass data to the code that will be processing the tag. For example, if you want articles taken from a database will be inserted when a special tag is found, you can specify the identifier of the article to show and so on. Also, you can specify the location of the database server, the user name and password to get access to the database, name of the table to retrieve the article from and so on.

RunAt parameter

The second thing we should consider is RunAt parameter that adds more flexibility in specifying what tags should be processed and what should not. If you pass a desired value of RunAt parameter to the parser (the RunAt property of the HTMLParser object or RUNAT text box in HTML Producer Console) the tags that have RunAt parameter only with the same value will be processed. Other tags even if they have the requested name will not be processed. This parameter can be used in any tag to be parsed like the other parameters. The same technology is used in Active Server Pages (ASP) when RunAt attribute is used in different tags (<SCRIPT>, <OBJECT>, etc) if they must be processed on the server side, not client. It is better to use RunAt parameter as a criterion for selecting tags to parse if you should process native HTML tags on the server. The value of RunAt parameter in usual HTML text lets HTML Producer recognize that this tag must be processed by your application, not by a web browser.

Note that you cannot use selecting on the value of RunAt parameter alone, but only combining this method with selecting on a tag name, or names if you need parse several tags. If you use closed tags, the RunAt parameter with the same value must be in the end tag, like this: <BODY RunAt="webapp">…bla…</BODY RunAt="webapp">. Otherwise this close tag will not be treated as the end tag for the first one.

If during the processing, the parser meets non-expected end tags (without such start tag) with the same value of RunAt parameter as given, it will remove them anyway.

Generating debug information

The third main thing that must be described is the ability of HTML Producer to generate debug information for each processed tag. This means that each block of text inserted while processing a tag will be covered by comments that show the start and the end of this block, so a developer can watch changes made by the parser. Besides these comments, information about processed tag that caused the insertion of this text block is generated, including the tag name, its parameters and initial text between start and end tag. This gives you full control over the parsing process and grants you important flexibility in debugging your code.

Recursive HTML parsing

When you parse HTML text, you can replace one tags with other. By default, these new inserted tags will not be noticed by HTML Producer in current parsing process. However, sometimes you need to process all the tags, including new tags in single parsing process. This can be done using  recursive parsing mode that enabled by setting the ParseRecursive property to True. This property is reflected by the Parse Recursive checkbox in HTML Producer Console. If after the second parsing cycle new suitable tags (that are specified in TagNames property) are inserted, they will be parsed as well. This loop will be repeating as long as suitable tags are inserted. 

ASP and PHP script islands parsing

From version 2.0 AWS HTML Producer works correctly with inline ASP and PHP sever side scripts on HTML page. If parser meets this script block covered by <% ... %> for ASP and <? ... ?> for PHP ones, it will extract the text of the script and put it to the TagText property correctly. "%" and "?" that encapsulate script body will be removed. To enable ASP or PHP script parsing you should put "%" (for ASP) or "?" for PHP scripts in TagNames property such as any name of tag. Both these sign are put without quotes. Simultaneously you can specify other names of tags that you also want to parse. Here is examples for Visual Basic: 

Dim Parser As HTMLProducer.HTMLParser
...bla...

Parser.TagNames ="?" 'Parse only PHP script islands _OR_
Parser.TagNames ="?, Meta, %, Font" 'Parse ASP, PHP script islands and some HTML tags

Needless to say that all these attributes you can set in HTML Producer Console as will in Tag Names text box.

Parsing all tags in HTML text

You can process all the tags existing in HTML text without specifying them explicitly. To do this you need to put an asterisk (*) in the TagNames property. All tags and inline server script islands will be parsed. However, HTML comments (<!-- ... -->) will not be parsed in this mode.

Using different brackets in tags 

Sometimes there is a necessity to parse non-standard, custom tags that are written in special style, such as [Product] or {Device}. HTML Producer let you change default tag brackets. Using the TagStartSign and TagEndSign properties you can specify appropriate symbols. Note that it is not allowed to specify several symbol sets for each property. Clauses like {,( will be considered as one symbol set. Example of using these properties for Visual Basic looks like this:

Dim Parser As HTMLProducer.HTMLParser
...bla...
Parser.TagStartSign = "["
'Tag starts with [
Parser.TagEndSign = "]"
'and ends with ]

Advanced tag processing: example

Let's expand your previous example, so we can consider these key features in action. Now we will model the situation when we need to insert the page body between <body> and </body> tags.

Imagine that we need to insert an article from a database on the web-page and all the data for connection with the database server to be established are variables and must be specified directly in the tag. Also imagine that this tag is processed by a custom program that is the HTML Producer client in this situation and that HTML Producer will not query the database by itself here. The article will not be inserted actually; the parser will only show you different characteristics of the tag and remove it.

   <html>
      <head>
         <title>Welcome to my Home Page!</title>
      </head>
      <body>
         <QueryDatabase runat=webapp datasource=mssql server=dbserver username=sa password="" sql="SELECT PageBody FROM PagesTable WHERE PageID=1">
            Page body will be inserted here
         </QueryDatabase runat="webapp">
      </body>
   </html>

Open HTML Producer Console if it is not open now. Copy this piece and paste it into the textbox titled "Insert HTML text here". Next, type the name of the tag (QueryDatabase) into "Tag names" field. As we use selecting by RunAt parameter also, we need to specify its value in the box called "RUNAT"; let its value to be webapp. Also, we will try to generate debug information for this tag processing, so set "Generate debug information" check box.

Now we are ready for parsing. Press "Parse" button and, if everything was set right, all expressions with QueryDatabase tags will be highlighted. All tag parameters (runat, datasource, server, username, password, sql) with their names and values will be in the table and you can watch them. Also you will see the text between start and end tags in the text box called "Covered text". Delete this text and type there "It is my article!"; after this click the "More" button and the parsing will be finished. Look through the HTML text and you will see that these tags will be replaced by "It is my article!" and covered in the HTML comments, pointing to the beginning and the end of inserted text. In addition, in the upper comment block you can see all the parameters of the tag and the initial text between the tags.

HTML Producer .Net

Demos!

What's New?

What is HTML Producer .Net?

What can I do with AWS HTML Producer?

Requirements

Features

Documentation

Pricing

Inside HTML Producer .Net

Processing HTML with AWS HTML Producer .Net

About the Table Processor Object

The TableProcessor Object Tutorial

Selecting Tables by Search String and Pattern

The SmartIndexing Algorithm

File Caching in the TableProcessor object

Need a custom parser?

We're ready to develop a custom HTML processing component (or the entire application) if you feel HTML Producer does not fit your needs. We have 2+ year experience in developing HTML processing solutions. And be sure, it will NOT be that expensive.
Contact us to get quote.
Services by AWS.

Copyright © AW/Systems, LLC 2002—2004.  |   Privacy Policy

To the top