TCLUG Development Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [TCLUG-DEVEL:138] Java HTML parser



Quoting Perry Hoekstra (dutchman@uswest.net):
> Is this HTML you wrote?  The reason I ask is that if the HTML conforms to
> the XHTML spec (which doesn't take a whole lot of work), you should be able
> to use XML tools to parse the code.  Depending on your needs, you could use
> DOM or SAX.  If you are grabbing HTML off the Net, I don't know of any
> tools other than Tidy to massage the code into an acceptable XHTML format.

Grabbing it off the net, otherwise I was going to use DOM, but the stuff is
not compliant.

-- 
Bob Tanner <tanner@real-time.com>       | Phone : (612)943-8700
http://www.real-time.com                | Fax   : (612)943-8500
Key fingerprint =  6C E9 51 4F D5 3E 4C 66 62 A9 10 E5 35 85 39 D9