Arc Forumnew | comments | leaders | submitlogin
Alternate web fetching tools or libraries
1 point by thaddeus 5406 days ago | 4 comments
The Arc http libraries created on anarki work well enough for me most of the time, but there's an HTTPS website page causin' me a headache. The page contains a pdf file download I parse out and re-use.

I tried using curl, to navigate the page, but the curl doesn't handle it. I was looking into screen-scraper, but that seems like overkill.

Anyone know of good libraries that can navigate tricky pages?

[edit] for example sake: https://kowari.ogc.gov.bc.ca/reports/rwservlet?ogcr220w



1 point by aw 5405 days ago | link

By "navigate", do you mean looking at an HTML page and finding links in it?

-----

1 point by thaddeus 5405 days ago | link

Yup + manage https well (not a login, just an https page).

-----

2 points by aw 5405 days ago | link

By "manage", do you mean to download a page? Has curl been giving you trouble downloading https files?

For downloading http or https files or pages, both curl and wget work well, and I'm surprised to hear that curl would be giving you trouble. What exactly is the problem that you're seeing?

For looking at a downloaded HTML page and looking for links that match a particular pattern, I often find that regex's work well.

-----

1 point by thaddeus 5405 days ago | link

I've tried a dozen options in curl... but it couldn't get it working... curl: (35) error:140773F2:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert unexpected message

however wget worked for me!!! and I should be able to write code to follow the links from there. thanks.

-----