Bartram's Bits

Tuesday, March 24, 2009

Converting Model-Glue Docs to HTML

I saw Ray Camden's Friday Puzzler - Helping the Model-Glue Team from March 13th today and noticed a solution had not been posted yet. I gave it some thought and had visions of recursive CFCs parsing out each HTML tag using Regular Expressions initially. But then I realized the task wasn't to create a tool to parse the HTML out of any webpage, but to parse the HTML out of these specific webpages.

Digging into the source code of these pages generated by RoboHelp, I began to see a structure and a set of rules that appeared to be followed on each page. I discovered the navigation tree was comprised of 19 webpages each representing a folder with links to the other folders and the documents contained within itself. My initial code looped through each of these files and stripped out anything above <body> and below </body> using the FindNoCase() function. Next, I added code to loop through the resulting HTML looking for links. Again, the code kept to rules, so I was able to parse out the links to folders by looking for target="_self". Any links that didn't refer to "_self" were the documents we were looking to "scrape" the content from. At this point I cheated a little; I now had a list of all the documents and the folder structure that they were stored in, so I manually created the folders mirroring the structure on the website. I figured this was ok, as I was creating a one-time process, not a reusable application. Finally, getting back to coding, I looped through each document stripping out anything above <h1 and below the <script beneath the content.

I appologize that the Blogger template I'm using doesn't handle code very well. You can download the code as well as all the docs in the link below:

Download docs.model-glue.com.zip

<!--- I manually reproduced the folder structure of the Model-Glue docs on my local harddrive --->
<cfset pathRemote = "http://docs.model-glue.com/whgdata/">
<cfset pathLocal = "c:/docs.model-glue.com/whgdata/">

<!--- The navigation tree for the docs in RoboHelp is comprised of 19 html files --->
<cfloop index="ptrTree" from="0" to="18">
<cfset fileName="whlstt#ptrTree#.htm">
<cfset fileRemote="#pathRemote##fileName#">
<cfset fileLocal="#pathLocal##fileName#">

<!--- Call up each of the 19 html files that make up the navigation tree and loop through finding each document --->
<cfhttp url="#fileRemote#" method="get" resolveurl="yes" throwonerror="yes"></cfhttp>
<cfif cfhttp.statusCode is "200 OK">
<p><strong><cfoutput>#fileName#</cfoutput></strong><br /><cfflush>
<cfset treeHTML=cfhttp.FileContent>
<cfset ptrLink=1>

<!--- Loop through each link it the navigation tree looking for links to documents --->
<cfloop condition="ptrLink lt len(treeHTML)">
<cfset startLink=FindNoCase("<a href=",treeHTML,ptrLink)>
<cfif startLink gt 0>
<cfset endLink=FindNoCase("</a>",treeHTML,startLink)+3>
<cfset tmpLink=mid(treeHTML,startLink,endLink-startLink+1)>

<!--- Found a link to a document, so parse out the url and link title --->
<cfif Not(FindNoCase("_self",tmpLink))>
<cfset startURL=FindNoCase("http://",tmpLink)>
<cfset endURL=FindNoCase(".htm",tmpLink,startURL)>
<cfset startImg=FindNoCase("<img",tmpLink,endURL)>
<cfset startTitle=FindNoCase(">",tmpLink,startImg)+1>
<cfset endTitle=FindNoCase("</a>",tmpLink,startTitle)>
<cfset pageURL=mid(tmpLink,startURL,endURL-startURL+4)>
<cfset pageTitle=mid(tmpLink,startTitle,endTitle-startTitle)>
<cfoutput><a href="#pageURL#">#pageTitle#</a></cfoutput><br /><cfflush>

<!--- Call up the document and parse out just the HTML throwing out the extra code --->
<cfhttp url="#pageURL#" method="get" resolveurl="yes" throwonerror="yes"></cfhttp>
<cfif cfhttp.statusCode is "200 OK">
<cfset pageHTML=cfhttp.FileContent>
<cfset startContent=FindNoCase("<h1",pageHTML)>
<cfset endContent=FindNoCase("<script type=",pageHTML,startContent)>
<cfset pageContent=Mid(pageHTML,startContent,endContent-startContent)>

<!--- Write out the content HTML using the same folder structure --->
<cfset pageLocal=Replace(Replace(Replace(pageURL,'/whgdata/../','/'),':80',''),"http://","c:\")>
<cffile action="write" file="#pageLocal#" output="#pageContent#">
<cfelse>
<cfdump var="#cfhttp#">
<cfabort>
</cfif>
</cfif>

<!--- Update the ptr used for looping --->
<cfif endLink gt 0>
<cfset ptrLink=endLink+1>
<cfelse>
<cfset ptrLink=len(treeHTML)+1>
</cfif>
<cfelse>
<cfset ptrLink=len(treeHTML)+1>
</cfif>
</cfloop>

<!--- Parse out just HTML in the Navigation Tree file throwing out the extra code --->
<cfset startHTML=FindNoCase("<body",treeHTML)>
<cfset endHTML=FindNoCase("</body>",treeHTML,startHTML)>
<cfset fileHTML=mid(treeHTML,startHTML,endHTML)>

<!--- Write out the Navigation Tree file --->
<cffile action="write" file="#fileLocal#" output="#fileHTML#">
<cfelse>
<cfdump var="#cfhttp#">
<cfabort>
</cfif>
</p>
</cfloop>

</p><h1>Done!</h1>

Monday, March 16, 2009

jQuery DatePicker w/Multiple Date Fields

I needed a replacement Datepicker tool for a legacy application so I headed over to jQuery.com to see what all the excitement was about. I ran through a couple tuturials starting with How jQuery Works by John Resig where I learned about the ready event, basic syntax, and chainability. Next, I jumped over to the jQuery UI Download section to grab a copy of the Datepicker widget.

I needed to be able to have several datepicker tools on the same page, so I used the .filter() to apply the widget to all input fields with a class of datepicker. I also wanted to give the user the ability to quickly pick from other months and even years, so I set changeMonth and changeYear to true. And finally, the users of my legacy app are expecting a calendar icon to click, so I duplicated that functionality with buttonImage and buttonImageOnly.

Demonstration

<html>
<head>
<!-- jQuery JS Includes -->
<script type="text/javascript" src="jquery/jquery-1.3.2.js"></script>
<script type="text/javascript" src="jquery/ui/ui.core.js"></script>
<script type="text/javascript" src="jquery/ui/ui.datepicker.js"></script>

<!-- jQuery CSS Includes -->
<link type="text/css" href="jquery/themes/base/ui.core.css" rel="stylesheet" />
<link type="text/css" href="jquery/themes/base/ui.datepicker.css" rel="stylesheet" />
<link type="text/css" href="jquery/themes/base/ui.theme.css" rel="stylesheet" />

<!-- Setup Datepicker -->
<script type="text/javascript"><!--
$(function() {
$('input').filter('.datepicker').datepicker({
changeMonth: true,
changeYear: true,
showOn: 'button',
buttonImage: 'jquery/images/calendar.gif',
buttonImageOnly: true
});
});
--></script>
</head>
<body>

<!-- Each input field needs a unique id, but all need to be datepicker -->
<p>Date 1: <input id="one" class="datepicker" type="text" readonly="true"></p>
<p>Date 2: <input id="two" class="datepicker" type="text" readonly="true"></p>
<p>Date 3: <input id="three" class="datepicker" type="text" readonly="true"></p>

</body>
</html>