TCLUG Development Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [TCLUG-DEVEL:138] Java HTML parser



Bob Tanner wrote:

> To my dismay, Sun no longer offers the HotJava HTML Components.
>
> Anyone know where I can get some good Java Components for parsing HTML,
> rendering is not need, just parsing for right now.
> --
> Bob Tanner <tanner@real-time.com>       | Phone : (612)943-8700
> http://www.real-time.com                | Fax   : (612)943-8500
> Key fingerprint =  6C E9 51 4F D5 3E 4C 66 62 A9 10 E5 35 85 39 D9
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tclug-devel-unsubscribe@mn-linux.org
> For additional commands, e-mail: tclug-devel-help@mn-linux.org

Is this HTML you wrote?  The reason I ask is that if the HTML conforms to
the XHTML spec (which doesn't take a whole lot of work), you should be able
to use XML tools to parse the code.  Depending on your needs, you could use
DOM or SAX.  If you are grabbing HTML off the Net, I don't know of any
tools other than Tidy to massage the code into an acceptable XHTML format.

--
Perry Hoekstra

---
"I don't see much sense in that," said Rabbit.
"No," said Pooh humbly, "there isn't. But there was going to be when I
began it. It's just that something happened to it along the way."