dotnetnoob
New member
- Joined
- Jan 26, 2003
- Messages
- 3
This is for learning perpose and does not have to function really well.... What I would like to do is create a simplified web spider. But Im running into a few problems/questions.
-I cant seem to read in a HTML into a string. My program does nothing then an error message appears saying the connection was closed and it could not connect to the remote server. Is this my PC? Or is it me whos forgetting something? Anyone has working code for this?
-If I ever manage to succesfull read in a HTML file. What would be the best way to collect all the <a>-tags out of it? I want to use all of these links as the next URL to visit....
-If I collect all of the links from a page and then do the same for these pages, chances are big my program ends up in a never ending loop. For example 4 pages with the same links (menu)... and my program will keep on visiting the same pages. What would be a good and fast way to check what URLS I already visited?
Its for a school assignment, so it doesnt have to work real good..... but good enough.
-I cant seem to read in a HTML into a string. My program does nothing then an error message appears saying the connection was closed and it could not connect to the remote server. Is this my PC? Or is it me whos forgetting something? Anyone has working code for this?
-If I ever manage to succesfull read in a HTML file. What would be the best way to collect all the <a>-tags out of it? I want to use all of these links as the next URL to visit....
-If I collect all of the links from a page and then do the same for these pages, chances are big my program ends up in a never ending loop. For example 4 pages with the same links (menu)... and my program will keep on visiting the same pages. What would be a good and fast way to check what URLS I already visited?
Its for a school assignment, so it doesnt have to work real good..... but good enough.