virus: Anyone else seen this behavior on Google

From: Walter Watts (wlwatts@cox.net)
Date: Fri Mar 19 2004 - 19:37:16 MST

  • Next message: Blunderov: "RE: virus: Lost in Translation"

    Google is using some new procedures in its caching procedures:
    (picking their cached page instead of the real one was always my
    default, safe choice)--Now you can't always trust it. Clicking on the
    cached page might take you to the "real" site, and all the nasty
    behavior that can entail. I'm sure they're doing this to try and trim
    the indexing from their own figures around 400 terrabytes of data at the
    lowest estimate and as much as 700 terrabytes
    ( depending on exactly what they can "deep crawl" )......see below.

    Damn them!.

    Anyone else seen this behavior on Google

    Walter
    ---------------------------------------------------------------------------

    ...
    So when in the middle of this last ??? google claimed to be "Caching"
    my pages..and yet the "cached" versions were able to show my images
    being loaded ...I realised they had made an important change in the
    way they handle data ...
    They currently are indexing from their own figures around 400
    terrabytes of dat at the lowest estimate and as much as 700 terrabytes
    ( depending on exactly what they can "deep crawl" )......
    To treat this data by whatever routines they run ( for example via
    msql or whatever ) is not too complex even if they are running hugely
    discriminatory algo's ....however it is very very costly in
    processing power annd upto now was indexed stored and ranked "off net"
    with the dance reflecting the reintroduction of the treated data into
    the "publically accessible index"...what we call "Google"...( this I
    know is horribly simplifying what happens ...but otherwise it will get
    too aesotheric for this forum )...
    from a practical point of view it would be more simple to at least
    store the data to be treated in situ (wher it already is on your
    website server )...ie ..why make a on googles hard drives when it can
    ( by spidering much more intensiveley and more frequently ...and by
    using more spiders each with its own functions ) simply treat all
    spidered sites as "in ram"..( again I'm simplifying horribly )
    this would require less outlay by google and would actually result in
    very much faster updates as it is effectivly now "ranking" on "the
    fly"....
    stes with purely html would not notice that their "cached" was now
    "hotlinked" into their sever and standard java etc neither ...
    Side routines ..php..msql etc wouldnt be affected either as its not
    "writing to
    disc " when it comes by ...
    However where this gets really interesting is that up until now You
    couldn't build pages in "flash" etc because the "bot" couldn't see
    them and would just skate blindly over the top of them and probably
    not index the page at all...

    if its doing what I think it is it may not now care wether you coded
    in "flash" as long as there is the basic minimum of html to get you a
    position ....

    I don't have a page currently running .swf ...if any one does ? ..When
    you click on your "cached" page in google ...Do you see your movie ??
    If so it must be "hotlinked" to your page in real time and using the
    movie player "you" have installed on your machine to show the
    movie....Ok

    If this is the case people searching will shft relatively quickly to
    the pages which are "interactive" ...and eventually google and the
    other engines will notice the diversion in traffic and rerank
    accordingly ....
    Maybe those of us with "picture" or "multi media sites" will seee the
    difference ?

    from Google's group: google.public.support.general

    ---
    To unsubscribe from the Virus list go to <http://www.lucifer.com/cgi-bin/virus-l>
    


    This archive was generated by hypermail 2.1.5 : Fri Mar 19 2004 - 19:38:02 MST