Dom2Dom

UPDATE (25 July 2013) :
I closed the project because it was consuming a lot of resources for not special result. It was merely a test of cassandra. BTW, Cassandra didn’t have any problem handling the load generated by this project. It went very smoothly.

UPDATE (07 Nov 2011) :
I reworked the same project with a totally different architecture. I used Java/Servlet/Glassfish + Cassandra. It’s just a test project to see how I could apply this kind of NoSQL DB to other projects, so it’s really simple.

UPDATE (07 Nov 2011) :
I reworked the same project with a totally different architecture. I used Java/Servlet/Glassfish + Cassandra. It’s just a test project to see how I could apply this kind of NoSQL DB to other projects, so it’s really simple.

UPDATE (02 Oct 2010) :
Service is available here. Here is an example with Apple.

This is an experimental little project of mine. The goal is to be able to tell what domain are hosted on the same hosts as an other domain. Some services are already offering it but they do a very crappy job. This service will do only that.

The program just goes from links to links to find new domain names. It stays as less possible on each domain, it doesn’t store any information other than the domain name.

The only address I did put in the program is the address of this blog. All the other were discovered.

Please tell me if you would also be interested by this service.

06/07/10 :
In less than 1 week, the program already collected a little bit more than 400 000 hosts (and I think a little bit more than 350 000 domains), and there really are a lot of porn sites.
I think I will start the DNS requesting part in two weeks and the little webinterface two weeks later. I could do it sooner but results would be very crappy anyway.

10/07/10 :
We’ve now reached 1 million indexed hosts.

12/07/10 :
I’ve added the DNS requesting code. It’s working fine (it’s much easier to do and maintain). It has indexed 60 000 hosts. I’ve made a little web interface but I’m not giving it right away because the SQL requests aren’t optimized yet (but no worries, the database is).

15/07/10 :
1.650 M hosts indexed
420 k have been ip linked
136 k different IP addresses have been found

21/07/10 :
2.5 M hosts indexed
1.0 M hosts IP linked
340 k IP addresses

02/08/10 :
I fixed the program. It is now identified as “Dom2Dom/0.1.3865.36392_2010-08-01_20:13:04″. It can’t make too much requests on the same server now (I limited it explicitely) and it should crawl the web more efficiently.
Statistics are :
3.7 M hosts indexed
(almost) all hosts IP linked
900 k IP addresses

04/08/10 :
4.1 M hosts indexed
all hosts IP linked (and it’s starting to relink old hosts to their potentially changed IP adresses)
957 k IP adresses

20/08/10 :
5.8 M hosts indexed
all hosts IP linked
1.2 M IP adresses

28/08/10 :
I still can’t find time to do the webinterface but the program and the great comments continue.
I took ten minutes to make a little stats page if you’re interested :
http://dom2dom.webingenia.com/stats

76 thoughts on “Dom2Dom”

  1. Found your referrer in my log files and I just was curious about what’s behind it :-) Don’t think it’s very usefull to me.

    Don’t see either why a French hoster wants to work in the Netherlands

  2. Dear Webmaster, with a name like Florent and the fact that you like crawling around peoples backends, I assume you are some sort of pervert. Would you mind crawling back under your rock and stop snooping around our servers.

  3. looks to me like a tool hackers could use to find (and subsequently expolit) other sites hosted on the same server as the target site they are looking to hack… :(

  4. Man… some people get all grumpy about crawlers.. It makes me wonder how Google ever made it big-time.

    Does your crawler acknowledge robots.txt? It should.. just my advice if it doesn’t.

  5. Hi,

    sounds great..if your project popup one day and if it still remains free as you wrote it.

    As mentionned before, you got the attention of “many webmasters”… could be useful for other projects ;-)

    Be carefull to what you are doing and how you will display the information. Depending on the country, you may get have some law issues…(not all webhosting companies may like your project).

    Good luck.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>