PDA

View Full Version : [GRUB] Yet another project for Team Ninja


Monkeymia
1st April 2003, 09:52 PM
Look here (http://grub.org/)

http://grub.org

Grub uses the power of distributed computing to build the best search on the Web. It automatically crawls the Web in the background, borrowing your computer's spare clock cycles, so you won't even notice it's there. The download is quick, you control how much you crawl, and the cool screensaver shows you the real-time progress your computer is making. You can even compare your stats to other Grubsters in the project!

Help perfect the search engine. Join the Grub project today!

Shookums
1st April 2003, 09:57 PM
I thought we already had this . . . . called Code-Red Worm or something like that. <huge grin>

Monkeymia
1st April 2003, 10:03 PM
I thought it was a bit of a joke at first (1st April and all) but it appears deadly serious. Can't seem to find an option for teams tho' - anyway might be worth a laugh :D just to see how it works.

[Edit]
Team options will be available shortly according to their forums

Shookums
1st April 2003, 10:14 PM
I couldn't resist. Might be interesting. I wonder what engine it feeds.

Monkeymia
1st April 2003, 10:27 PM
LookSmart

http://www.grub.org/html/news.php?op=show&id=38

Fireblade
2nd April 2003, 12:56 PM
You're just dyin' t' add another project t' yer sig is all MM :p

Meadmaker
2nd April 2003, 01:48 PM
Ok - downloaded and installed the client. Configured it for one of the Fleet.

It just keeps responding "cannot connect to server - will retry in xxx seconds".

Shut it down - I might try it again later on. :(

Monkeymia
2nd April 2003, 04:36 PM
This little app is great maximum I have seen it use is 6% of my resources so works fine with other DC projects - i.e. G@H

Working fine for me Meady - do you have to set-up a proxy / firewall to get it to run?

Here is a screen shot of the app in progress - yes I am on dial-up (only until the end of the week and then Broadband :drool:

Shookums
2nd April 2003, 04:50 PM
I'm cautious about this one.

If this has your machine acting as a web crawler, I could see the potential of some sites locking out your home IP address.

To see an example of this. Run Teleport Pro in non-anonymous mode. If you do it enough, you'll get blocked by most sites.

Also does this conform to the Robot Exclusion Standard?

Sorry, just thinking out loud here and playing devil's advocate.

Monkeymia
3rd April 2003, 07:26 AM
I've stopped this project as I cannot send any results back and Shookums thoughts concern me somewhat.

Also looking through the logs of the URL's that I have searched they seem to be dominated by porn a sad fact of the web I know but not an aspect that I would like to promote :mad:

Shookums
3rd April 2003, 12:47 PM
I was afraid of that. It just sounded a little "iffy" to me. Thanks for checking into it further.

Ozzer
5th April 2003, 11:06 AM
Originally posted by Shookums
I'm cautious about this one.

If this has your machine acting as a web crawler, I could see the potential of some sites locking out your home IP address.

To see an example of this. Run Teleport Pro in non-anonymous mode. If you do it enough, you'll get blocked by most sites.

Also does this conform to the Robot Exclusion Standard?

Sorry, just thinking out loud here and playing devil's advocate.

I am cautious about this one only because my broadband account has a 3gig limit (such is life in Australia with Telstra). then I pay per meg. I cant really afford to have my computer surfing the net in my idle time chewing up my bandwidth.

Ramokk
12th October 2003, 06:57 PM
Grub has teams now. The client runs a little better, and doesn't time-out from their servers as much. It does honor robots.txt files, but it can use a lot of bandwidth.

You can set it with a cap so it doesn't use a ton of bandwidth, though, I have mine to set only run two crawling process and use a max of 2k/sec of bandwidth.

I ran it at home this summer at max speed on my cable modem, and it was flying through the URL blocks (but probably using a ton of bandwidth). I just started it up again today and turned its bandwidth level way down so I don't get in trouble with my college for bandwidth usage.

I created a team for us if anyone wants to join:
http://www.grub.org/html/teams.php?op=stats&team_id=172

I'm not sure if it will carry over my blocks before I created the team or not...If it did, we should move up in the ranking. Not sure how often the stats database updates, I think it's every hour or two.

Monkeymia
12th October 2003, 08:50 PM
Thanks for setting up the team Ramokk :)

Installing the client on WinXP Pro was a pain - you have to install version 1.4.3 then let it update to 1.5.3. Installing the later client first results in all sorts of errors.

Once registered and running, click on the 'Profile' link and select 'Team Ninja' from the drop down and update.

I'll run it for awhile to see how it goes.

Thanks again :)

jema
12th October 2003, 09:07 PM
I've stopped this project as I cannot send any results back and Shookums thoughts concern me somewhat.

Also looking through the logs of the URL's that I have searched they seem to be dominated by porn a sad fact of the web I know but not an aspect that I would like to promote :mad:

I wonder if that is abolsutely inevitable with a crawler, not simply because there is so much porn, but that porn sites purposesly contain links designed to send spiders round in circles? all in the aim of high search engine rankings.

I don't think I would stop the project simply on this basis, as I would hope that the actual engine could have filters to keep the crap out. I must say though that I would be concerend about reprecussions, not just through being blocked, but the potential for the police to come knocking on the door asking why you have been browsing 10,000 child porn sites or whatever :( I am not convinced that "It wasn't me, it was my DC project" would wash.

jema

Ramokk
12th October 2003, 09:28 PM
I don't think the crawler is actually loading the page content, it just checks to see if it has been updated (there's an HTTP method for that). It certainly doesn't download any images, so it's not grabbing child porn.

Still, it would be nice if they could filter that stuff out. The project is still in beta, it will get better with time...