Fwd: Critical issue on forum.openoffice.org and Google Search

classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Fwd: Critical issue on forum.openoffice.org and Google Search

Peter Kovacs-3
Hi all,

I have received following mail. Probably because I am listed in the
google-Analytics page.

Does this has some action items? What can we answer Mr John Mueller?


All the Best

Peter



-------- Weitergeleitete Nachricht --------
Betreff: Critical issue on forum.openoffice.org and Google Search
Datum: Mon, 11 May 2020 13:37:27 +0200
Von: John Mueller <[hidden email]>
An: [hidden email], [hidden email], [hidden email]



Dear webmaster of forum.openoffice.org <http://forum.openoffice.org>

I'm an analyst at Google in Switzerland. We wanted to bring your
attention to a critical issue with your website, and how it's available
for Google's web search.

In particular, Googlebot has been unable to crawl URLs from
https://forum.openoffice.org/ . This will cause those pages to drop out
of Google's search results, and will prevent new pages from being picked
up for Search. If you're not aware of this issue, you may be
accidentally blocking these pages from Google Search due to a server
issue. If you need to block Googlebot from crawling pages on your
website, we'd recommend using the robots.txt file instead.

Should you need to recognize IP addresses of Googlebot requests, you can
use a reverse IP lookup to do so:
https://support.google.com/webmasters/answer/80553

Should you have any questions, feel free to contact me directly. For
verification purposes, we are sending a copy of this message to your
site's Search Console account.

Thank you,
John Mueller ([hidden email] <mailto:[hidden email]>)
Webmaster Trends Analyst




--

John Mueller, He/Him, Search Relations Team - go/search-rel
<https://goto.google.com/search-rel>
WTA is now Search-Rel (info
<https://sites.google.com/corp/google.com/search-rel/Home/reorg-2020-01>)

*Time-critical? Resend with "URGENT" in the subject.*

Google Switzerland GmbH
Gustav-Gull-Platz 1, 3. Stock
8004 Zurich, Switzerland

Identifikationsnummer:
CH-020.4.028.116-1
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Critical issue on forum.openoffice.org and Google Search

Kay Schenk-2
Hi Peter...

Since I am a Google Search admin for www.openoffice.org, and
openoffice.apache.org, I got this also. Disclaimer: I have not done ANY
work with the Google Search apis on these sites in quite some time.

I actually was NOT aware forum.openoffice.org was set up to use Google
Search until I saw this.

One of the Google Search admins for forum.openoffice.org could check the
current Google search apis that are in use on that site. Changes are
occasionally made to the calls, and maybe that is the issue, or a
robots.txt for that site is causing this. I don't think it requires a
response, but maybe some investigation.

Just some ideas...

Regards,

Kay


On 5/11/20 6:02 AM, Peter Kovacs wrote:

> Hi all,
>
> I have received following mail. Probably because I am listed in the
> google-Analytics page.
>
> Does this has some action items? What can we answer Mr John Mueller?
>
>
> All the Best
>
> Peter
>
>
>
> -------- Weitergeleitete Nachricht --------
> Betreff:     Critical issue on forum.openoffice.org and Google Search
> Datum:     Mon, 11 May 2020 13:37:27 +0200
> Von:     John Mueller <[hidden email]>
> An:     [hidden email], [hidden email], [hidden email]
>
>
>
> Dear webmaster of forum.openoffice.org <http://forum.openoffice.org>
>
> I'm an analyst at Google in Switzerland. We wanted to bring your
> attention to a critical issue with your website, and how it's
> available for Google's web search.
>
> In particular, Googlebot has been unable to crawl URLs from
> https://forum.openoffice.org/ . This will cause those pages to drop
> out of Google's search results, and will prevent new pages from being
> picked up for Search. If you're not aware of this issue, you may be
> accidentally blocking these pages from Google Search due to a server
> issue. If you need to block Googlebot from crawling pages on your
> website, we'd recommend using the robots.txt file instead.
>
> Should you need to recognize IP addresses of Googlebot requests, you
> can use a reverse IP lookup to do so:
> https://support.google.com/webmasters/answer/80553
>
> Should you have any questions, feel free to contact me directly. For
> verification purposes, we are sending a copy of this message to your
> site's Search Console account.
>
> Thank you,
> John Mueller ([hidden email] <mailto:[hidden email]>)
> Webmaster Trends Analyst
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Critical issue on forum.openoffice.org and Google Search

Matthias Seidel
Hi Kay,

Am 11.05.20 um 21:23 schrieb Kay Schenk:
> Hi Peter...
>
> Since I am a Google Search admin for www.openoffice.org, and
> openoffice.apache.org, I got this also. Disclaimer: I have not done
> ANY work with the Google Search apis on these sites in quite some time.
>
> I actually was NOT aware forum.openoffice.org was set up to use Google
> Search until I saw this.

I think, I added it to the list when we had a discussion about outdated
information regarding SourceForge found by Google Search.

But I don't have access to forum.openoffice.org, so I could never
complete the step.

Regards,

   Matthias

>
> One of the Google Search admins for forum.openoffice.org could check
> the current Google search apis that are in use on that site. Changes
> are occasionally made to the calls, and maybe that is the issue, or a
> robots.txt for that site is causing this. I don't think it requires a
> response, but maybe some investigation.
>
> Just some ideas...
>
> Regards,
>
> Kay
>
>
> On 5/11/20 6:02 AM, Peter Kovacs wrote:
>> Hi all,
>>
>> I have received following mail. Probably because I am listed in the
>> google-Analytics page.
>>
>> Does this has some action items? What can we answer Mr John Mueller?
>>
>>
>> All the Best
>>
>> Peter
>>
>>
>>
>> -------- Weitergeleitete Nachricht --------
>> Betreff:     Critical issue on forum.openoffice.org and Google Search
>> Datum:     Mon, 11 May 2020 13:37:27 +0200
>> Von:     John Mueller <[hidden email]>
>> An:     [hidden email], [hidden email], [hidden email]
>>
>>
>>
>> Dear webmaster of forum.openoffice.org <http://forum.openoffice.org>
>>
>> I'm an analyst at Google in Switzerland. We wanted to bring your
>> attention to a critical issue with your website, and how it's
>> available for Google's web search.
>>
>> In particular, Googlebot has been unable to crawl URLs from
>> https://forum.openoffice.org/ . This will cause those pages to drop
>> out of Google's search results, and will prevent new pages from being
>> picked up for Search. If you're not aware of this issue, you may be
>> accidentally blocking these pages from Google Search due to a server
>> issue. If you need to block Googlebot from crawling pages on your
>> website, we'd recommend using the robots.txt file instead.
>>
>> Should you need to recognize IP addresses of Googlebot requests, you
>> can use a reverse IP lookup to do so:
>> https://support.google.com/webmasters/answer/80553
>>
>> Should you have any questions, feel free to contact me directly. For
>> verification purposes, we are sending a copy of this message to your
>> site's Search Console account.
>>
>> Thank you,
>> John Mueller ([hidden email] <mailto:[hidden email]>)
>> Webmaster Trends Analyst
>>
>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


smime.p7s (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Critical issue on forum.openoffice.org and Google Search

Hagar Delest-2
Hi,

Maybe Andrea can help.
Forum admins don't have the karma for that.

Hagar

Le 11/05/2020 à 21:33, Matthias Seidel a écrit :

> Hi Kay,
>
> Am 11.05.20 um 21:23 schrieb Kay Schenk:
>> Hi Peter...
>>
>> Since I am a Google Search admin for www.openoffice.org, and
>> openoffice.apache.org, I got this also. Disclaimer: I have not done
>> ANY work with the Google Search apis on these sites in quite some time.
>>
>> I actually was NOT aware forum.openoffice.org was set up to use Google
>> Search until I saw this.
> I think, I added it to the list when we had a discussion about outdated
> information regarding SourceForge found by Google Search.
>
> But I don't have access to forum.openoffice.org, so I could never
> complete the step.
>
> Regards,
>
>     Matthias
>
>> One of the Google Search admins for forum.openoffice.org could check
>> the current Google search apis that are in use on that site. Changes
>> are occasionally made to the calls, and maybe that is the issue, or a
>> robots.txt for that site is causing this. I don't think it requires a
>> response, but maybe some investigation.
>>
>> Just some ideas...
>>
>> Regards,
>>
>> Kay
>>
>>
>> On 5/11/20 6:02 AM, Peter Kovacs wrote:
>>> Hi all,
>>>
>>> I have received following mail. Probably because I am listed in the
>>> google-Analytics page.
>>>
>>> Does this has some action items? What can we answer Mr John Mueller?
>>>
>>>
>>> All the Best
>>>
>>> Peter
>>>
>>>
>>>
>>> -------- Weitergeleitete Nachricht --------
>>> Betreff:     Critical issue on forum.openoffice.org and Google Search
>>> Datum:     Mon, 11 May 2020 13:37:27 +0200
>>> Von:     John Mueller <[hidden email]>
>>> An:     [hidden email], [hidden email], [hidden email]
>>>
>>>
>>>
>>> Dear webmaster of forum.openoffice.org <http://forum.openoffice.org>
>>>
>>> I'm an analyst at Google in Switzerland. We wanted to bring your
>>> attention to a critical issue with your website, and how it's
>>> available for Google's web search.
>>>
>>> In particular, Googlebot has been unable to crawl URLs from
>>> https://forum.openoffice.org/ . This will cause those pages to drop
>>> out of Google's search results, and will prevent new pages from being
>>> picked up for Search. If you're not aware of this issue, you may be
>>> accidentally blocking these pages from Google Search due to a server
>>> issue. If you need to block Googlebot from crawling pages on your
>>> website, we'd recommend using the robots.txt file instead.
>>>
>>> Should you need to recognize IP addresses of Googlebot requests, you
>>> can use a reverse IP lookup to do so:
>>> https://support.google.com/webmasters/answer/80553
>>>
>>> Should you have any questions, feel free to contact me directly. For
>>> verification purposes, we are sending a copy of this message to your
>>> site's Search Console account.
>>>
>>> Thank you,
>>> John Mueller ([hidden email] <mailto:[hidden email]>)
>>> Webmaster Trends Analyst
>>>
>>>
>>>
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Critical issue on forum.openoffice.org and Google Search

Kay Schenk-2
In reply to this post by Matthias Seidel

On 5/11/20 12:33 PM, Matthias Seidel wrote:

> Hi Kay,
>
> Am 11.05.20 um 21:23 schrieb Kay Schenk:
>> Hi Peter...
>>
>> Since I am a Google Search admin for www.openoffice.org, and
>> openoffice.apache.org, I got this also. Disclaimer: I have not done
>> ANY work with the Google Search apis on these sites in quite some time.
>>
>> I actually was NOT aware forum.openoffice.org was set up to use Google
>> Search until I saw this.
> I think, I added it to the list when we had a discussion about outdated
> information regarding SourceForge found by Google Search.
>
> But I don't have access to forum.openoffice.org, so I could never
> complete the step.
>
> Regards,
>
>     Matthias

OK. In the top level of the website source, there is a file called
"skeleton.html" which references the following bit of code --

<!--#include virtual="/scripts/google-analytics.js" -->

I didn't dig far enough to find how "skeleton.html" is used ( I forgot)
but this this is example for the google-analytics code snippet that is
used. Basically, this needs to be included in the site you want
analytics to be used on by putting it in the (header) files that
generate the site. And, you might  take a look at recent instructions
from Google. Things change.

https://support.google.com/analytics/answer/1008080

Regards,

Kay

>> One of the Google Search admins for forum.openoffice.org could check
>> the current Google search apis that are in use on that site. Changes
>> are occasionally made to the calls, and maybe that is the issue, or a
>> robots.txt for that site is causing this. I don't think it requires a
>> response, but maybe some investigation.
>>
>> Just some ideas...
>>
>> Regards,
>>
>> Kay
>>
>>
>> On 5/11/20 6:02 AM, Peter Kovacs wrote:
>>> Hi all,
>>>
>>> I have received following mail. Probably because I am listed in the
>>> google-Analytics page.
>>>
>>> Does this has some action items? What can we answer Mr John Mueller?
>>>
>>>
>>> All the Best
>>>
>>> Peter
>>>
>>>
>>>
>>> -------- Weitergeleitete Nachricht --------
>>> Betreff:     Critical issue on forum.openoffice.org and Google Search
>>> Datum:     Mon, 11 May 2020 13:37:27 +0200
>>> Von:     John Mueller <[hidden email]>
>>> An:     [hidden email], [hidden email], [hidden email]
>>>
>>>
>>>
>>> Dear webmaster of forum.openoffice.org <http://forum.openoffice.org>
>>>
>>> I'm an analyst at Google in Switzerland. We wanted to bring your
>>> attention to a critical issue with your website, and how it's
>>> available for Google's web search.
>>>
>>> In particular, Googlebot has been unable to crawl URLs from
>>> https://forum.openoffice.org/ . This will cause those pages to drop
>>> out of Google's search results, and will prevent new pages from being
>>> picked up for Search. If you're not aware of this issue, you may be
>>> accidentally blocking these pages from Google Search due to a server
>>> issue. If you need to block Googlebot from crawling pages on your
>>> website, we'd recommend using the robots.txt file instead.
>>>
>>> Should you need to recognize IP addresses of Googlebot requests, you
>>> can use a reverse IP lookup to do so:
>>> https://support.google.com/webmasters/answer/80553
>>>
>>> Should you have any questions, feel free to contact me directly. For
>>> verification purposes, we are sending a copy of this message to your
>>> site's Search Console account.
>>>
>>> Thank you,
>>> John Mueller ([hidden email] <mailto:[hidden email]>)
>>> Webmaster Trends Analyst
>>>
>>>
>>>
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Critical issue on forum.openoffice.org and Google Search

Matthias Seidel
Hi Kay,

Am 12.05.20 um 01:21 schrieb Kay Schenk:

>
> On 5/11/20 12:33 PM, Matthias Seidel wrote:
>> Hi Kay,
>>
>> Am 11.05.20 um 21:23 schrieb Kay Schenk:
>>> Hi Peter...
>>>
>>> Since I am a Google Search admin for www.openoffice.org, and
>>> openoffice.apache.org, I got this also. Disclaimer: I have not done
>>> ANY work with the Google Search apis on these sites in quite some time.
>>>
>>> I actually was NOT aware forum.openoffice.org was set up to use Google
>>> Search until I saw this.
>> I think, I added it to the list when we had a discussion about outdated
>> information regarding SourceForge found by Google Search.
>>
>> But I don't have access to forum.openoffice.org, so I could never
>> complete the step.
>>
>> Regards,
>>
>>     Matthias
>
> OK. In the top level of the website source, there is a file called
> "skeleton.html" which references the following bit of code --
>
> <!--#include virtual="/scripts/google-analytics.js" -->
>
> I didn't dig far enough to find how "skeleton.html" is used ( I
> forgot) but this this is example for the google-analytics code snippet
> that is used. Basically, this needs to be included in the site you
> want analytics to be used on by putting it in the (header) files that
> generate the site. And, you might  take a look at recent instructions
> from Google. Things change.
>
> https://support.google.com/analytics/answer/1008080
Yes, but this is for Google Analytics. I wouldn't want to "analyze" the
forum...
The procedure for the Google Search Console is the same, it needs access
to the root directory.

Maybe Andrea can help if he is available again?

Regards,

   Matthias

>
> Regards,
>
> Kay
>
>>> One of the Google Search admins for forum.openoffice.org could check
>>> the current Google search apis that are in use on that site. Changes
>>> are occasionally made to the calls, and maybe that is the issue, or a
>>> robots.txt for that site is causing this. I don't think it requires a
>>> response, but maybe some investigation.
>>>
>>> Just some ideas...
>>>
>>> Regards,
>>>
>>> Kay
>>>
>>>
>>> On 5/11/20 6:02 AM, Peter Kovacs wrote:
>>>> Hi all,
>>>>
>>>> I have received following mail. Probably because I am listed in the
>>>> google-Analytics page.
>>>>
>>>> Does this has some action items? What can we answer Mr John Mueller?
>>>>
>>>>
>>>> All the Best
>>>>
>>>> Peter
>>>>
>>>>
>>>>
>>>> -------- Weitergeleitete Nachricht --------
>>>> Betreff:     Critical issue on forum.openoffice.org and Google Search
>>>> Datum:     Mon, 11 May 2020 13:37:27 +0200
>>>> Von:     John Mueller <[hidden email]>
>>>> An:     [hidden email], [hidden email], [hidden email]
>>>>
>>>>
>>>>
>>>> Dear webmaster of forum.openoffice.org <http://forum.openoffice.org>
>>>>
>>>> I'm an analyst at Google in Switzerland. We wanted to bring your
>>>> attention to a critical issue with your website, and how it's
>>>> available for Google's web search.
>>>>
>>>> In particular, Googlebot has been unable to crawl URLs from
>>>> https://forum.openoffice.org/ . This will cause those pages to drop
>>>> out of Google's search results, and will prevent new pages from being
>>>> picked up for Search. If you're not aware of this issue, you may be
>>>> accidentally blocking these pages from Google Search due to a server
>>>> issue. If you need to block Googlebot from crawling pages on your
>>>> website, we'd recommend using the robots.txt file instead.
>>>>
>>>> Should you need to recognize IP addresses of Googlebot requests, you
>>>> can use a reverse IP lookup to do so:
>>>> https://support.google.com/webmasters/answer/80553
>>>>
>>>> Should you have any questions, feel free to contact me directly. For
>>>> verification purposes, we are sending a copy of this message to your
>>>> site's Search Console account.
>>>>
>>>> Thank you,
>>>> John Mueller ([hidden email] <mailto:[hidden email]>)
>>>> Webmaster Trends Analyst
>>>>
>>>>
>>>>
>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


smime.p7s (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Critical issue on forum.openoffice.org and Google Search

Peter Kovacs-4
Hello all,


What I figured is that from the Google search tool the URL
forum.openoffice.org is not reachable.

So I checked with Duckduckgo (my prefered Search engine), they don't use
crawler and point at the infra of Google, Bing and Yandex.

I checked then with Bing, but could not figure out to check bots
feedback on an URL so I moved on

I checked with Yandex. They have a search URL test page. I have entered
there forum.openoffice.org

The Response is:

------------------------------------------------------------------------

  * Date: Tue, 12 May 2020 10:37:47 GMT
  * Server: Apache/2.4.18 (Ubuntu)
  * Location: https://forum.openoffice.org/
  * Content-Length: 237
  * Keep-Alive: timeout=15, max=100
  * Connection: Keep-Alive
  * Content-Type: text/html; charset=iso-8859-1

------------------------------------------------------------------------


HTTP status code 301 Moved Permanently
Server response time 133 ms
IP address 54.84.201.130
Encoding UTF-8(unicode-1-1-utf-8, UTF8)
Page size 237 B


I am not sure, what that means. HTTP Status Code moved Permanently reads
wrong. I just dont know if this is the return code from our webservcer
or a response code from the crawler.
I try to get someone from Infra. Or I'll open a ticket.


All the best
Peter

Am 12.05.20 um 10:39 schrieb Matthias Seidel:

> Hi Kay,
>
> Am 12.05.20 um 01:21 schrieb Kay Schenk:
>> On 5/11/20 12:33 PM, Matthias Seidel wrote:
>>> Hi Kay,
>>>
>>> Am 11.05.20 um 21:23 schrieb Kay Schenk:
>>>> Hi Peter...
>>>>
>>>> Since I am a Google Search admin for www.openoffice.org, and
>>>> openoffice.apache.org, I got this also. Disclaimer: I have not done
>>>> ANY work with the Google Search apis on these sites in quite some time.
>>>>
>>>> I actually was NOT aware forum.openoffice.org was set up to use Google
>>>> Search until I saw this.
>>> I think, I added it to the list when we had a discussion about outdated
>>> information regarding SourceForge found by Google Search.
>>>
>>> But I don't have access to forum.openoffice.org, so I could never
>>> complete the step.
>>>
>>> Regards,
>>>
>>>      Matthias
>> OK. In the top level of the website source, there is a file called
>> "skeleton.html" which references the following bit of code --
>>
>> <!--#include virtual="/scripts/google-analytics.js" -->
>>
>> I didn't dig far enough to find how "skeleton.html" is used ( I
>> forgot) but this this is example for the google-analytics code snippet
>> that is used. Basically, this needs to be included in the site you
>> want analytics to be used on by putting it in the (header) files that
>> generate the site. And, you might  take a look at recent instructions
>> from Google. Things change.
>>
>> https://support.google.com/analytics/answer/1008080
> Yes, but this is for Google Analytics. I wouldn't want to "analyze" the
> forum...
> The procedure for the Google Search Console is the same, it needs access
> to the root directory.
>
> Maybe Andrea can help if he is available again?
>
> Regards,
>
>     Matthias
>
>> Regards,
>>
>> Kay
>>
>>>> One of the Google Search admins for forum.openoffice.org could check
>>>> the current Google search apis that are in use on that site. Changes
>>>> are occasionally made to the calls, and maybe that is the issue, or a
>>>> robots.txt for that site is causing this. I don't think it requires a
>>>> response, but maybe some investigation.
>>>>
>>>> Just some ideas...
>>>>
>>>> Regards,
>>>>
>>>> Kay
>>>>
>>>>
>>>> On 5/11/20 6:02 AM, Peter Kovacs wrote:
>>>>> Hi all,
>>>>>
>>>>> I have received following mail. Probably because I am listed in the
>>>>> google-Analytics page.
>>>>>
>>>>> Does this has some action items? What can we answer Mr John Mueller?
>>>>>
>>>>>
>>>>> All the Best
>>>>>
>>>>> Peter
>>>>>
>>>>>
>>>>>
>>>>> -------- Weitergeleitete Nachricht --------
>>>>> Betreff:     Critical issue on forum.openoffice.org and Google Search
>>>>> Datum:     Mon, 11 May 2020 13:37:27 +0200
>>>>> Von:     John Mueller <[hidden email]>
>>>>> An:     [hidden email], [hidden email], [hidden email]
>>>>>
>>>>>
>>>>>
>>>>> Dear webmaster of forum.openoffice.org <http://forum.openoffice.org>
>>>>>
>>>>> I'm an analyst at Google in Switzerland. We wanted to bring your
>>>>> attention to a critical issue with your website, and how it's
>>>>> available for Google's web search.
>>>>>
>>>>> In particular, Googlebot has been unable to crawl URLs from
>>>>> https://forum.openoffice.org/ . This will cause those pages to drop
>>>>> out of Google's search results, and will prevent new pages from being
>>>>> picked up for Search. If you're not aware of this issue, you may be
>>>>> accidentally blocking these pages from Google Search due to a server
>>>>> issue. If you need to block Googlebot from crawling pages on your
>>>>> website, we'd recommend using the robots.txt file instead.
>>>>>
>>>>> Should you need to recognize IP addresses of Googlebot requests, you
>>>>> can use a reverse IP lookup to do so:
>>>>> https://support.google.com/webmasters/answer/80553
>>>>>
>>>>> Should you have any questions, feel free to contact me directly. For
>>>>> verification purposes, we are sending a copy of this message to your
>>>>> site's Search Console account.
>>>>>
>>>>> Thank you,
>>>>> John Mueller ([hidden email] <mailto:[hidden email]>)
>>>>> Webmaster Trends Analyst
>>>>>
>>>>>
>>>>>
>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Critical issue on forum.openoffice.org and Google Search

Kay Schenk-2
In reply to this post by Matthias Seidel
Oops! My misunderstanding. Sorry for the noise.

Regards,
Kay


On Tue, May 12, 2020, 01:39 Matthias Seidel <[hidden email]>
wrote:

> Hi Kay,
>
> Am 12.05.20 um 01:21 schrieb Kay Schenk:
> >
> > On 5/11/20 12:33 PM, Matthias Seidel wrote:
> >> Hi Kay,
> >>
> >> Am 11.05.20 um 21:23 schrieb Kay Schenk:
> >>> Hi Peter...
> >>>
> >>> Since I am a Google Search admin for www.openoffice.org, and
> >>> openoffice.apache.org, I got this also. Disclaimer: I have not done
> >>> ANY work with the Google Search apis on these sites in quite some time.
> >>>
> >>> I actually was NOT aware forum.openoffice.org was set up to use Google
> >>> Search until I saw this.
> >> I think, I added it to the list when we had a discussion about outdated
> >> information regarding SourceForge found by Google Search.
> >>
> >> But I don't have access to forum.openoffice.org, so I could never
> >> complete the step.
> >>
> >> Regards,
> >>
> >>     Matthias
> >
> > OK. In the top level of the website source, there is a file called
> > "skeleton.html" which references the following bit of code --
> >
> > <!--#include virtual="/scripts/google-analytics.js" -->
> >
> > I didn't dig far enough to find how "skeleton.html" is used ( I
> > forgot) but this this is example for the google-analytics code snippet
> > that is used. Basically, this needs to be included in the site you
> > want analytics to be used on by putting it in the (header) files that
> > generate the site. And, you might  take a look at recent instructions
> > from Google. Things change.
> >
> > https://support.google.com/analytics/answer/1008080
>
> Yes, but this is for Google Analytics. I wouldn't want to "analyze" the
> forum...
> The procedure for the Google Search Console is the same, it needs access
> to the root directory.
>
> Maybe Andrea can help if he is available again?
>
> Regards,
>
>    Matthias
>
> >
> > Regards,
> >
> > Kay
> >
> >>> One of the Google Search admins for forum.openoffice.org could check
> >>> the current Google search apis that are in use on that site. Changes
> >>> are occasionally made to the calls, and maybe that is the issue, or a
> >>> robots.txt for that site is causing this. I don't think it requires a
> >>> response, but maybe some investigation.
> >>>
> >>> Just some ideas...
> >>>
> >>> Regards,
> >>>
> >>> Kay
> >>>
> >>>
> >>> On 5/11/20 6:02 AM, Peter Kovacs wrote:
> >>>> Hi all,
> >>>>
> >>>> I have received following mail. Probably because I am listed in the
> >>>> google-Analytics page.
> >>>>
> >>>> Does this has some action items? What can we answer Mr John Mueller?
> >>>>
> >>>>
> >>>> All the Best
> >>>>
> >>>> Peter
> >>>>
> >>>>
> >>>>
> >>>> -------- Weitergeleitete Nachricht --------
> >>>> Betreff:     Critical issue on forum.openoffice.org and Google Search
> >>>> Datum:     Mon, 11 May 2020 13:37:27 +0200
> >>>> Von:     John Mueller <[hidden email]>
> >>>> An:     [hidden email], [hidden email], [hidden email]
> >>>>
> >>>>
> >>>>
> >>>> Dear webmaster of forum.openoffice.org <http://forum.openoffice.org>
> >>>>
> >>>> I'm an analyst at Google in Switzerland. We wanted to bring your
> >>>> attention to a critical issue with your website, and how it's
> >>>> available for Google's web search.
> >>>>
> >>>> In particular, Googlebot has been unable to crawl URLs from
> >>>> https://forum.openoffice.org/ . This will cause those pages to drop
> >>>> out of Google's search results, and will prevent new pages from being
> >>>> picked up for Search. If you're not aware of this issue, you may be
> >>>> accidentally blocking these pages from Google Search due to a server
> >>>> issue. If you need to block Googlebot from crawling pages on your
> >>>> website, we'd recommend using the robots.txt file instead.
> >>>>
> >>>> Should you need to recognize IP addresses of Googlebot requests, you
> >>>> can use a reverse IP lookup to do so:
> >>>> https://support.google.com/webmasters/answer/80553
> >>>>
> >>>> Should you have any questions, feel free to contact me directly. For
> >>>> verification purposes, we are sending a copy of this message to your
> >>>> site's Search Console account.
> >>>>
> >>>> Thank you,
> >>>> John Mueller ([hidden email] <mailto:[hidden email]>)
> >>>> Webmaster Trends Analyst
> >>>>
> >>>>
> >>>>
> >>>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: [hidden email]
> >>> For additional commands, e-mail: [hidden email]
> >>>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Critical issue on forum.openoffice.org and Google Search

Matthias Seidel
Hi Kay,

Am 12.05.20 um 16:13 schrieb Kay Schenk:
> Oops! My misunderstanding. Sorry for the noise.

No noise (and no problem)! Google Analytics and Google Search are really
close together.

BTW: You (and Peter) also have access to the Google Search Console:

https://search.google.com/search-console

Regards,

   Matthias

>
> Regards,
> Kay
>
>
> On Tue, May 12, 2020, 01:39 Matthias Seidel <[hidden email]>
> wrote:
>
>> Hi Kay,
>>
>> Am 12.05.20 um 01:21 schrieb Kay Schenk:
>>> On 5/11/20 12:33 PM, Matthias Seidel wrote:
>>>> Hi Kay,
>>>>
>>>> Am 11.05.20 um 21:23 schrieb Kay Schenk:
>>>>> Hi Peter...
>>>>>
>>>>> Since I am a Google Search admin for www.openoffice.org, and
>>>>> openoffice.apache.org, I got this also. Disclaimer: I have not done
>>>>> ANY work with the Google Search apis on these sites in quite some time.
>>>>>
>>>>> I actually was NOT aware forum.openoffice.org was set up to use Google
>>>>> Search until I saw this.
>>>> I think, I added it to the list when we had a discussion about outdated
>>>> information regarding SourceForge found by Google Search.
>>>>
>>>> But I don't have access to forum.openoffice.org, so I could never
>>>> complete the step.
>>>>
>>>> Regards,
>>>>
>>>>     Matthias
>>> OK. In the top level of the website source, there is a file called
>>> "skeleton.html" which references the following bit of code --
>>>
>>> <!--#include virtual="/scripts/google-analytics.js" -->
>>>
>>> I didn't dig far enough to find how "skeleton.html" is used ( I
>>> forgot) but this this is example for the google-analytics code snippet
>>> that is used. Basically, this needs to be included in the site you
>>> want analytics to be used on by putting it in the (header) files that
>>> generate the site. And, you might  take a look at recent instructions
>>> from Google. Things change.
>>>
>>> https://support.google.com/analytics/answer/1008080
>> Yes, but this is for Google Analytics. I wouldn't want to "analyze" the
>> forum...
>> The procedure for the Google Search Console is the same, it needs access
>> to the root directory.
>>
>> Maybe Andrea can help if he is available again?
>>
>> Regards,
>>
>>    Matthias
>>
>>> Regards,
>>>
>>> Kay
>>>
>>>>> One of the Google Search admins for forum.openoffice.org could check
>>>>> the current Google search apis that are in use on that site. Changes
>>>>> are occasionally made to the calls, and maybe that is the issue, or a
>>>>> robots.txt for that site is causing this. I don't think it requires a
>>>>> response, but maybe some investigation.
>>>>>
>>>>> Just some ideas...
>>>>>
>>>>> Regards,
>>>>>
>>>>> Kay
>>>>>
>>>>>
>>>>> On 5/11/20 6:02 AM, Peter Kovacs wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> I have received following mail. Probably because I am listed in the
>>>>>> google-Analytics page.
>>>>>>
>>>>>> Does this has some action items? What can we answer Mr John Mueller?
>>>>>>
>>>>>>
>>>>>> All the Best
>>>>>>
>>>>>> Peter
>>>>>>
>>>>>>
>>>>>>
>>>>>> -------- Weitergeleitete Nachricht --------
>>>>>> Betreff:     Critical issue on forum.openoffice.org and Google Search
>>>>>> Datum:     Mon, 11 May 2020 13:37:27 +0200
>>>>>> Von:     John Mueller <[hidden email]>
>>>>>> An:     [hidden email], [hidden email], [hidden email]
>>>>>>
>>>>>>
>>>>>>
>>>>>> Dear webmaster of forum.openoffice.org <http://forum.openoffice.org>
>>>>>>
>>>>>> I'm an analyst at Google in Switzerland. We wanted to bring your
>>>>>> attention to a critical issue with your website, and how it's
>>>>>> available for Google's web search.
>>>>>>
>>>>>> In particular, Googlebot has been unable to crawl URLs from
>>>>>> https://forum.openoffice.org/ . This will cause those pages to drop
>>>>>> out of Google's search results, and will prevent new pages from being
>>>>>> picked up for Search. If you're not aware of this issue, you may be
>>>>>> accidentally blocking these pages from Google Search due to a server
>>>>>> issue. If you need to block Googlebot from crawling pages on your
>>>>>> website, we'd recommend using the robots.txt file instead.
>>>>>>
>>>>>> Should you need to recognize IP addresses of Googlebot requests, you
>>>>>> can use a reverse IP lookup to do so:
>>>>>> https://support.google.com/webmasters/answer/80553
>>>>>>
>>>>>> Should you have any questions, feel free to contact me directly. For
>>>>>> verification purposes, we are sending a copy of this message to your
>>>>>> site's Search Console account.
>>>>>>
>>>>>> Thank you,
>>>>>> John Mueller ([hidden email] <mailto:[hidden email]>)
>>>>>> Webmaster Trends Analyst
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [hidden email]
>>>>> For additional commands, e-mail: [hidden email]
>>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>


smime.p7s (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Critical issue on forum.openoffice.org and Google Search

Dave Fisher-2
In reply to this post by Peter Kovacs-4
Are you sure you weren’t using forums.openoffice.org instead of forum.openoffice.org?

 curl -D headers https://forum.openoffice.org/ does return the correct page.

The robots.txt is this:

curl -D headers https://forum.openoffice.org/robots.txt
User-agent: *
Crawl-delay: 1
Disallow: /en/forum/common.php
Disallow: /en/forum/config.php
Disallow: /en/forum/con.php
Disallow: /en/forum/faq.php
Disallow: /en/forum/mcp.php
Disallow: /en/forum/memberlist.php
Disallow: /en/forum/posting.php
Disallow: /en/forum/report.php
Disallow: /en/forum/search.php
Disallow: /en/forum/style.php
Disallow: /en/forum/ucp.php
Disallow: /en/forum/viewonline.php
Disallow: /en/forum/adm
Disallow: /en/forum/cache
Disallow: /en/forum/docs
Disallow: /en/forum/files
Disallow: /en/forum/images
Disallow: /en/forum/includes
Disallow: /en/forum/language
Disallow: /en/forum/store
Disallow: /en/forum/styles
Disallow: /es/forum/common.php
Disallow: /es/forum/config.php
Disallow: /es/forum/con.php
Disallow: /es/forum/faq.php
Disallow: /es/forum/mcp.php
Disallow: /es/forum/memberlist.php
Disallow: /es/forum/posting.php
Disallow: /es/forum/report.php
Disallow: /es/forum/search.php
Disallow: /es/forum/style.php
Disallow: /es/forum/ucp.php
Disallow: /es/forum/viewonline.php
Disallow: /es/forum/adm
Disallow: /es/forum/cache
Disallow: /es/forum/docs
Disallow: /es/forum/files
Disallow: /es/forum/images
Disallow: /es/forum/includes
Disallow: /es/forum/language
Disallow: /es/forum/store
Disallow: /es/forum/styles
Disallow: /fr/forum/common.php
Disallow: /fr/forum/config.php
Disallow: /fr/forum/con.php
Disallow: /fr/forum/faq.php
Disallow: /fr/forum/mcp.php
Disallow: /fr/forum/memberlist.php
Disallow: /fr/forum/posting.php
Disallow: /fr/forum/report.php
Disallow: /fr/forum/search.php
Disallow: /fr/forum/style.php
Disallow: /fr/forum/ucp.php
Disallow: /fr/forum/viewonline.php
Disallow: /fr/forum/adm
Disallow: /fr/forum/cache
Disallow: /fr/forum/docs
Disallow: /fr/forum/files
Disallow: /fr/forum/images
Disallow: /fr/forum/includes
Disallow: /fr/forum/language
Disallow: /fr/forum/store
Disallow: /fr/forum/styles
Disallow: /fr/ci-joint
Disallow: /hu/forum/common.php
Disallow: /hu/forum/config.php
Disallow: /hu/forum/con.php
Disallow: /hu/forum/faq.php
Disallow: /hu/forum/mcp.php
Disallow: /hu/forum/memberlist.php
Disallow: /hu/forum/posting.php
Disallow: /hu/forum/report.php
Disallow: /hu/forum/search.php
Disallow: /hu/forum/style.php
Disallow: /hu/forum/ucp.php
Disallow: /hu/forum/viewonline.php
Disallow: /hu/forum/adm
Disallow: /hu/forum/cache
Disallow: /hu/forum/docs
Disallow: /hu/forum/files
Disallow: /hu/forum/images
Disallow: /hu/forum/includes
Disallow: /hu/forum/language
Disallow: /hu/forum/store
Disallow: /hu/forum/styles
Disallow: /ja/forum/common.php
Disallow: /ja/forum/config.php
Disallow: /ja/forum/con.php
Disallow: /ja/forum/faq.php
Disallow: /ja/forum/mcp.php
Disallow: /ja/forum/memberlist.php
Disallow: /ja/forum/posting.php
Disallow: /ja/forum/report.php
Disallow: /ja/forum/search.php
Disallow: /ja/forum/style.php
Disallow: /ja/forum/ucp.php
Disallow: /ja/forum/viewonline.php
Disallow: /ja/forum/adm
Disallow: /ja/forum/cache
Disallow: /ja/forum/docs
Disallow: /ja/forum/files
Disallow: /ja/forum/images
Disallow: /ja/forum/includes
Disallow: /ja/forum/language
Disallow: /ja/forum/store
Disallow: /ja/forum/styles
Disallow: /test
Disallow: /nl/forum/common.php
Disallow: /nl/forum/config.php
Disallow: /nl/forum/con.php
Disallow: /nl/forum/faq.php
Disallow: /nl/forum/mcp.php
Disallow: /nl/forum/memberlist.php
Disallow: /nl/forum/posting.php
Disallow: /nl/forum/report.php
Disallow: /nl/forum/search.php
Disallow: /nl/forum/style.php
Disallow: /nl/forum/ucp.php
Disallow: /nl/forum/viewonline.php
Disallow: /nl/forum/adm
Disallow: /nl/forum/cache
Disallow: /nl/forum/docs
Disallow: /nl/forum/files
Disallow: /nl/forum/images
Disallow: /nl/forum/includes
Disallow: /nl/forum/language
Disallow: /nl/forum/store
Disallow: /nl/forum/styles
Disallow: /vi/forum/common.php
Disallow: /vi/forum/config.php
Disallow: /vi/forum/con.php
Disallow: /vi/forum/faq.php
Disallow: /vi/forum/mcp.php
Disallow: /vi/forum/memberlist.php
Disallow: /vi/forum/posting.php
Disallow: /vi/forum/report.php
Disallow: /vi/forum/search.php
Disallow: /vi/forum/style.php
Disallow: /vi/forum/ucp.php
Disallow: /vi/forum/viewonline.php
Disallow: /vi/forum/adm
Disallow: /vi/forum/cache
Disallow: /vi/forum/docs
Disallow: /vi/forum/files
Disallow: /vi/forum/images
Disallow: /vi/forum/includes
Disallow: /vi/forum/language
Disallow: /vi/forum/store
Disallow: /vi/forum/styles
Disallow: /zh/forum/common.php
Disallow: /zh/forum/config.php
Disallow: /zh/forum/con.php
Disallow: /zh/forum/faq.php
Disallow: /zh/forum/mcp.php
Disallow: /zh/forum/memberlist.php
Disallow: /zh/forum/posting.php
Disallow: /zh/forum/report.php
Disallow: /zh/forum/search.php
Disallow: /zh/forum/style.php
Disallow: /zh/forum/ucp.php
Disallow: /zh/forum/viewonline.php
Disallow: /zh/forum/adm
Disallow: /zh/forum/cache
Disallow: /zh/forum/docs
Disallow: /zh/forum/files
Disallow: /zh/forum/images
Disallow: /zh/forum/includes
Disallow: /zh/forum/language
Disallow: /zh/forum/store
Disallow: /zh/forum/styles

This has been the robots.txt file since: Last-Modified: Sat, 06 Jun 2009 23:40:14 GMT

Forum search uses phpBB

We haven’t allowed search engines to crawl forum.openoffice.org since before the Oracle donation to the ASF.

Crawlers IP addresses might be blocked by ASF Infra if their use is excessive. That could give the 301.

Regards,
Dave

> On May 12, 2020, at 3:55 AM, Peter Kovacs <[hidden email]> wrote:
>
> Hello all,
>
>
> What I figured is that from the Google search tool the URL forum.openoffice.org is not reachable.
>
> So I checked with Duckduckgo (my prefered Search engine), they don't use crawler and point at the infra of Google, Bing and Yandex.
>
> I checked then with Bing, but could not figure out to check bots feedback on an URL so I moved on
>
> I checked with Yandex. They have a search URL test page. I have entered there forum.openoffice.org
>
> The Response is:
>
> ------------------------------------------------------------------------
>
> * Date: Tue, 12 May 2020 10:37:47 GMT
> * Server: Apache/2.4.18 (Ubuntu)
> * Location: https://forum.openoffice.org/
> * Content-Length: 237
> * Keep-Alive: timeout=15, max=100
> * Connection: Keep-Alive
> * Content-Type: text/html; charset=iso-8859-1
>
> ------------------------------------------------------------------------
>
>
> HTTP status code 301 Moved Permanently
> Server response time 133 ms
> IP address 54.84.201.130
> Encoding UTF-8(unicode-1-1-utf-8, UTF8)
> Page size 237 B
>
>
> I am not sure, what that means. HTTP Status Code moved Permanently reads wrong. I just dont know if this is the return code from our webservcer or a response code from the crawler.
> I try to get someone from Infra. Or I'll open a ticket.
>
>
> All the best
> Peter
>
> Am 12.05.20 um 10:39 schrieb Matthias Seidel:
>> Hi Kay,
>>
>> Am 12.05.20 um 01:21 schrieb Kay Schenk:
>>> On 5/11/20 12:33 PM, Matthias Seidel wrote:
>>>> Hi Kay,
>>>>
>>>> Am 11.05.20 um 21:23 schrieb Kay Schenk:
>>>>> Hi Peter...
>>>>>
>>>>> Since I am a Google Search admin for www.openoffice.org, and
>>>>> openoffice.apache.org, I got this also. Disclaimer: I have not done
>>>>> ANY work with the Google Search apis on these sites in quite some time.
>>>>>
>>>>> I actually was NOT aware forum.openoffice.org was set up to use Google
>>>>> Search until I saw this.
>>>> I think, I added it to the list when we had a discussion about outdated
>>>> information regarding SourceForge found by Google Search.
>>>>
>>>> But I don't have access to forum.openoffice.org, so I could never
>>>> complete the step.
>>>>
>>>> Regards,
>>>>
>>>>     Matthias
>>> OK. In the top level of the website source, there is a file called
>>> "skeleton.html" which references the following bit of code --
>>>
>>> <!--#include virtual="/scripts/google-analytics.js" -->
>>>
>>> I didn't dig far enough to find how "skeleton.html" is used ( I
>>> forgot) but this this is example for the google-analytics code snippet
>>> that is used. Basically, this needs to be included in the site you
>>> want analytics to be used on by putting it in the (header) files that
>>> generate the site. And, you might  take a look at recent instructions
>>> from Google. Things change.
>>>
>>> https://support.google.com/analytics/answer/1008080
>> Yes, but this is for Google Analytics. I wouldn't want to "analyze" the
>> forum...
>> The procedure for the Google Search Console is the same, it needs access
>> to the root directory.
>>
>> Maybe Andrea can help if he is available again?
>>
>> Regards,
>>
>>    Matthias
>>
>>> Regards,
>>>
>>> Kay
>>>
>>>>> One of the Google Search admins for forum.openoffice.org could check
>>>>> the current Google search apis that are in use on that site. Changes
>>>>> are occasionally made to the calls, and maybe that is the issue, or a
>>>>> robots.txt for that site is causing this. I don't think it requires a
>>>>> response, but maybe some investigation.
>>>>>
>>>>> Just some ideas...
>>>>>
>>>>> Regards,
>>>>>
>>>>> Kay
>>>>>
>>>>>
>>>>> On 5/11/20 6:02 AM, Peter Kovacs wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> I have received following mail. Probably because I am listed in the
>>>>>> google-Analytics page.
>>>>>>
>>>>>> Does this has some action items? What can we answer Mr John Mueller?
>>>>>>
>>>>>>
>>>>>> All the Best
>>>>>>
>>>>>> Peter
>>>>>>
>>>>>>
>>>>>>
>>>>>> -------- Weitergeleitete Nachricht --------
>>>>>> Betreff:     Critical issue on forum.openoffice.org and Google Search
>>>>>> Datum:     Mon, 11 May 2020 13:37:27 +0200
>>>>>> Von:     John Mueller <[hidden email]>
>>>>>> An:     [hidden email], [hidden email], [hidden email]
>>>>>>
>>>>>>
>>>>>>
>>>>>> Dear webmaster of forum.openoffice.org <http://forum.openoffice.org>
>>>>>>
>>>>>> I'm an analyst at Google in Switzerland. We wanted to bring your
>>>>>> attention to a critical issue with your website, and how it's
>>>>>> available for Google's web search.
>>>>>>
>>>>>> In particular, Googlebot has been unable to crawl URLs from
>>>>>> https://forum.openoffice.org/ . This will cause those pages to drop
>>>>>> out of Google's search results, and will prevent new pages from being
>>>>>> picked up for Search. If you're not aware of this issue, you may be
>>>>>> accidentally blocking these pages from Google Search due to a server
>>>>>> issue. If you need to block Googlebot from crawling pages on your
>>>>>> website, we'd recommend using the robots.txt file instead.
>>>>>>
>>>>>> Should you need to recognize IP addresses of Googlebot requests, you
>>>>>> can use a reverse IP lookup to do so:
>>>>>> https://support.google.com/webmasters/answer/80553
>>>>>>
>>>>>> Should you have any questions, feel free to contact me directly. For
>>>>>> verification purposes, we are sending a copy of this message to your
>>>>>> site's Search Console account.
>>>>>>
>>>>>> Thank you,
>>>>>> John Mueller ([hidden email] <mailto:[hidden email]>)
>>>>>> Webmaster Trends Analyst
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [hidden email]
>>>>> For additional commands, e-mail: [hidden email]
>>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Critical issue on forum.openoffice.org and Google Search

Dave Fisher-2
Information about Infra IP Bans is here: https://infra.apache.org/infra-ban.html

Please direct the Google engineer to that resource.

Regards,
Dave

> On May 12, 2020, at 7:55 AM, Dave Fisher <[hidden email]> wrote:
>
> Are you sure you weren’t using forums.openoffice.org instead of forum.openoffice.org?
>
> curl -D headers https://forum.openoffice.org/ does return the correct page.
>
> The robots.txt is this:
>
> curl -D headers https://forum.openoffice.org/robots.txt
> User-agent: *
> Crawl-delay: 1
> Disallow: /en/forum/common.php
> Disallow: /en/forum/config.php
> Disallow: /en/forum/con.php
> Disallow: /en/forum/faq.php
> Disallow: /en/forum/mcp.php
> Disallow: /en/forum/memberlist.php
> Disallow: /en/forum/posting.php
> Disallow: /en/forum/report.php
> Disallow: /en/forum/search.php
> Disallow: /en/forum/style.php
> Disallow: /en/forum/ucp.php
> Disallow: /en/forum/viewonline.php
> Disallow: /en/forum/adm
> Disallow: /en/forum/cache
> Disallow: /en/forum/docs
> Disallow: /en/forum/files
> Disallow: /en/forum/images
> Disallow: /en/forum/includes
> Disallow: /en/forum/language
> Disallow: /en/forum/store
> Disallow: /en/forum/styles
> Disallow: /es/forum/common.php
> Disallow: /es/forum/config.php
> Disallow: /es/forum/con.php
> Disallow: /es/forum/faq.php
> Disallow: /es/forum/mcp.php
> Disallow: /es/forum/memberlist.php
> Disallow: /es/forum/posting.php
> Disallow: /es/forum/report.php
> Disallow: /es/forum/search.php
> Disallow: /es/forum/style.php
> Disallow: /es/forum/ucp.php
> Disallow: /es/forum/viewonline.php
> Disallow: /es/forum/adm
> Disallow: /es/forum/cache
> Disallow: /es/forum/docs
> Disallow: /es/forum/files
> Disallow: /es/forum/images
> Disallow: /es/forum/includes
> Disallow: /es/forum/language
> Disallow: /es/forum/store
> Disallow: /es/forum/styles
> Disallow: /fr/forum/common.php
> Disallow: /fr/forum/config.php
> Disallow: /fr/forum/con.php
> Disallow: /fr/forum/faq.php
> Disallow: /fr/forum/mcp.php
> Disallow: /fr/forum/memberlist.php
> Disallow: /fr/forum/posting.php
> Disallow: /fr/forum/report.php
> Disallow: /fr/forum/search.php
> Disallow: /fr/forum/style.php
> Disallow: /fr/forum/ucp.php
> Disallow: /fr/forum/viewonline.php
> Disallow: /fr/forum/adm
> Disallow: /fr/forum/cache
> Disallow: /fr/forum/docs
> Disallow: /fr/forum/files
> Disallow: /fr/forum/images
> Disallow: /fr/forum/includes
> Disallow: /fr/forum/language
> Disallow: /fr/forum/store
> Disallow: /fr/forum/styles
> Disallow: /fr/ci-joint
> Disallow: /hu/forum/common.php
> Disallow: /hu/forum/config.php
> Disallow: /hu/forum/con.php
> Disallow: /hu/forum/faq.php
> Disallow: /hu/forum/mcp.php
> Disallow: /hu/forum/memberlist.php
> Disallow: /hu/forum/posting.php
> Disallow: /hu/forum/report.php
> Disallow: /hu/forum/search.php
> Disallow: /hu/forum/style.php
> Disallow: /hu/forum/ucp.php
> Disallow: /hu/forum/viewonline.php
> Disallow: /hu/forum/adm
> Disallow: /hu/forum/cache
> Disallow: /hu/forum/docs
> Disallow: /hu/forum/files
> Disallow: /hu/forum/images
> Disallow: /hu/forum/includes
> Disallow: /hu/forum/language
> Disallow: /hu/forum/store
> Disallow: /hu/forum/styles
> Disallow: /ja/forum/common.php
> Disallow: /ja/forum/config.php
> Disallow: /ja/forum/con.php
> Disallow: /ja/forum/faq.php
> Disallow: /ja/forum/mcp.php
> Disallow: /ja/forum/memberlist.php
> Disallow: /ja/forum/posting.php
> Disallow: /ja/forum/report.php
> Disallow: /ja/forum/search.php
> Disallow: /ja/forum/style.php
> Disallow: /ja/forum/ucp.php
> Disallow: /ja/forum/viewonline.php
> Disallow: /ja/forum/adm
> Disallow: /ja/forum/cache
> Disallow: /ja/forum/docs
> Disallow: /ja/forum/files
> Disallow: /ja/forum/images
> Disallow: /ja/forum/includes
> Disallow: /ja/forum/language
> Disallow: /ja/forum/store
> Disallow: /ja/forum/styles
> Disallow: /test
> Disallow: /nl/forum/common.php
> Disallow: /nl/forum/config.php
> Disallow: /nl/forum/con.php
> Disallow: /nl/forum/faq.php
> Disallow: /nl/forum/mcp.php
> Disallow: /nl/forum/memberlist.php
> Disallow: /nl/forum/posting.php
> Disallow: /nl/forum/report.php
> Disallow: /nl/forum/search.php
> Disallow: /nl/forum/style.php
> Disallow: /nl/forum/ucp.php
> Disallow: /nl/forum/viewonline.php
> Disallow: /nl/forum/adm
> Disallow: /nl/forum/cache
> Disallow: /nl/forum/docs
> Disallow: /nl/forum/files
> Disallow: /nl/forum/images
> Disallow: /nl/forum/includes
> Disallow: /nl/forum/language
> Disallow: /nl/forum/store
> Disallow: /nl/forum/styles
> Disallow: /vi/forum/common.php
> Disallow: /vi/forum/config.php
> Disallow: /vi/forum/con.php
> Disallow: /vi/forum/faq.php
> Disallow: /vi/forum/mcp.php
> Disallow: /vi/forum/memberlist.php
> Disallow: /vi/forum/posting.php
> Disallow: /vi/forum/report.php
> Disallow: /vi/forum/search.php
> Disallow: /vi/forum/style.php
> Disallow: /vi/forum/ucp.php
> Disallow: /vi/forum/viewonline.php
> Disallow: /vi/forum/adm
> Disallow: /vi/forum/cache
> Disallow: /vi/forum/docs
> Disallow: /vi/forum/files
> Disallow: /vi/forum/images
> Disallow: /vi/forum/includes
> Disallow: /vi/forum/language
> Disallow: /vi/forum/store
> Disallow: /vi/forum/styles
> Disallow: /zh/forum/common.php
> Disallow: /zh/forum/config.php
> Disallow: /zh/forum/con.php
> Disallow: /zh/forum/faq.php
> Disallow: /zh/forum/mcp.php
> Disallow: /zh/forum/memberlist.php
> Disallow: /zh/forum/posting.php
> Disallow: /zh/forum/report.php
> Disallow: /zh/forum/search.php
> Disallow: /zh/forum/style.php
> Disallow: /zh/forum/ucp.php
> Disallow: /zh/forum/viewonline.php
> Disallow: /zh/forum/adm
> Disallow: /zh/forum/cache
> Disallow: /zh/forum/docs
> Disallow: /zh/forum/files
> Disallow: /zh/forum/images
> Disallow: /zh/forum/includes
> Disallow: /zh/forum/language
> Disallow: /zh/forum/store
> Disallow: /zh/forum/styles
>
> This has been the robots.txt file since: Last-Modified: Sat, 06 Jun 2009 23:40:14 GMT
>
> Forum search uses phpBB
>
> We haven’t allowed search engines to crawl forum.openoffice.org since before the Oracle donation to the ASF.
>
> Crawlers IP addresses might be blocked by ASF Infra if their use is excessive. That could give the 301.
>
> Regards,
> Dave
>
>> On May 12, 2020, at 3:55 AM, Peter Kovacs <[hidden email]> wrote:
>>
>> Hello all,
>>
>>
>> What I figured is that from the Google search tool the URL forum.openoffice.org is not reachable.
>>
>> So I checked with Duckduckgo (my prefered Search engine), they don't use crawler and point at the infra of Google, Bing and Yandex.
>>
>> I checked then with Bing, but could not figure out to check bots feedback on an URL so I moved on
>>
>> I checked with Yandex. They have a search URL test page. I have entered there forum.openoffice.org
>>
>> The Response is:
>>
>> ------------------------------------------------------------------------
>>
>> * Date: Tue, 12 May 2020 10:37:47 GMT
>> * Server: Apache/2.4.18 (Ubuntu)
>> * Location: https://forum.openoffice.org/
>> * Content-Length: 237
>> * Keep-Alive: timeout=15, max=100
>> * Connection: Keep-Alive
>> * Content-Type: text/html; charset=iso-8859-1
>>
>> ------------------------------------------------------------------------
>>
>>
>> HTTP status code 301 Moved Permanently
>> Server response time 133 ms
>> IP address 54.84.201.130
>> Encoding UTF-8(unicode-1-1-utf-8, UTF8)
>> Page size 237 B
>>
>>
>> I am not sure, what that means. HTTP Status Code moved Permanently reads wrong. I just dont know if this is the return code from our webservcer or a response code from the crawler.
>> I try to get someone from Infra. Or I'll open a ticket.
>>
>>
>> All the best
>> Peter
>>
>> Am 12.05.20 um 10:39 schrieb Matthias Seidel:
>>> Hi Kay,
>>>
>>> Am 12.05.20 um 01:21 schrieb Kay Schenk:
>>>> On 5/11/20 12:33 PM, Matthias Seidel wrote:
>>>>> Hi Kay,
>>>>>
>>>>> Am 11.05.20 um 21:23 schrieb Kay Schenk:
>>>>>> Hi Peter...
>>>>>>
>>>>>> Since I am a Google Search admin for www.openoffice.org, and
>>>>>> openoffice.apache.org, I got this also. Disclaimer: I have not done
>>>>>> ANY work with the Google Search apis on these sites in quite some time.
>>>>>>
>>>>>> I actually was NOT aware forum.openoffice.org was set up to use Google
>>>>>> Search until I saw this.
>>>>> I think, I added it to the list when we had a discussion about outdated
>>>>> information regarding SourceForge found by Google Search.
>>>>>
>>>>> But I don't have access to forum.openoffice.org, so I could never
>>>>> complete the step.
>>>>>
>>>>> Regards,
>>>>>
>>>>>    Matthias
>>>> OK. In the top level of the website source, there is a file called
>>>> "skeleton.html" which references the following bit of code --
>>>>
>>>> <!--#include virtual="/scripts/google-analytics.js" -->
>>>>
>>>> I didn't dig far enough to find how "skeleton.html" is used ( I
>>>> forgot) but this this is example for the google-analytics code snippet
>>>> that is used. Basically, this needs to be included in the site you
>>>> want analytics to be used on by putting it in the (header) files that
>>>> generate the site. And, you might  take a look at recent instructions
>>>> from Google. Things change.
>>>>
>>>> https://support.google.com/analytics/answer/1008080
>>> Yes, but this is for Google Analytics. I wouldn't want to "analyze" the
>>> forum...
>>> The procedure for the Google Search Console is the same, it needs access
>>> to the root directory.
>>>
>>> Maybe Andrea can help if he is available again?
>>>
>>> Regards,
>>>
>>>   Matthias
>>>
>>>> Regards,
>>>>
>>>> Kay
>>>>
>>>>>> One of the Google Search admins for forum.openoffice.org could check
>>>>>> the current Google search apis that are in use on that site. Changes
>>>>>> are occasionally made to the calls, and maybe that is the issue, or a
>>>>>> robots.txt for that site is causing this. I don't think it requires a
>>>>>> response, but maybe some investigation.
>>>>>>
>>>>>> Just some ideas...
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Kay
>>>>>>
>>>>>>
>>>>>> On 5/11/20 6:02 AM, Peter Kovacs wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I have received following mail. Probably because I am listed in the
>>>>>>> google-Analytics page.
>>>>>>>
>>>>>>> Does this has some action items? What can we answer Mr John Mueller?
>>>>>>>
>>>>>>>
>>>>>>> All the Best
>>>>>>>
>>>>>>> Peter
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -------- Weitergeleitete Nachricht --------
>>>>>>> Betreff:     Critical issue on forum.openoffice.org and Google Search
>>>>>>> Datum:     Mon, 11 May 2020 13:37:27 +0200
>>>>>>> Von:     John Mueller <[hidden email]>
>>>>>>> An:     [hidden email], [hidden email], [hidden email]
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Dear webmaster of forum.openoffice.org <http://forum.openoffice.org>
>>>>>>>
>>>>>>> I'm an analyst at Google in Switzerland. We wanted to bring your
>>>>>>> attention to a critical issue with your website, and how it's
>>>>>>> available for Google's web search.
>>>>>>>
>>>>>>> In particular, Googlebot has been unable to crawl URLs from
>>>>>>> https://forum.openoffice.org/ . This will cause those pages to drop
>>>>>>> out of Google's search results, and will prevent new pages from being
>>>>>>> picked up for Search. If you're not aware of this issue, you may be
>>>>>>> accidentally blocking these pages from Google Search due to a server
>>>>>>> issue. If you need to block Googlebot from crawling pages on your
>>>>>>> website, we'd recommend using the robots.txt file instead.
>>>>>>>
>>>>>>> Should you need to recognize IP addresses of Googlebot requests, you
>>>>>>> can use a reverse IP lookup to do so:
>>>>>>> https://support.google.com/webmasters/answer/80553
>>>>>>>
>>>>>>> Should you have any questions, feel free to contact me directly. For
>>>>>>> verification purposes, we are sending a copy of this message to your
>>>>>>> site's Search Console account.
>>>>>>>
>>>>>>> Thank you,
>>>>>>> John Mueller ([hidden email] <mailto:[hidden email]>)
>>>>>>> Webmaster Trends Analyst
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Critical issue on forum.openoffice.org and Google Search

Dave Fisher-2
It’s not an IP Ban. Infra tells me that would not be a 301.

Ah-ha - here is the 301:

% curl -D headers http://forum.openoffice.org/ 
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="https://forum.openoffice.org/">here</a>.</p>
</body></html>

Surprising that they cannot shift from HTTP to HTTPS via a 301!

Regards,
Dave

> On May 12, 2020, at 8:04 AM, Dave Fisher <[hidden email]> wrote:
>
> Information about Infra IP Bans is here: https://infra.apache.org/infra-ban.html
>
> Please direct the Google engineer to that resource.
>
> Regards,
> Dave
>
>> On May 12, 2020, at 7:55 AM, Dave Fisher <[hidden email]> wrote:
>>
>> Are you sure you weren’t using forums.openoffice.org instead of forum.openoffice.org?
>>
>> curl -D headers https://forum.openoffice.org/ does return the correct page.
>>
>> The robots.txt is this:
>>
>> curl -D headers https://forum.openoffice.org/robots.txt
>> User-agent: *
>> Crawl-delay: 1
>> Disallow: /en/forum/common.php
>> Disallow: /en/forum/config.php
>> Disallow: /en/forum/con.php
>> Disallow: /en/forum/faq.php
>> Disallow: /en/forum/mcp.php
>> Disallow: /en/forum/memberlist.php
>> Disallow: /en/forum/posting.php
>> Disallow: /en/forum/report.php
>> Disallow: /en/forum/search.php
>> Disallow: /en/forum/style.php
>> Disallow: /en/forum/ucp.php
>> Disallow: /en/forum/viewonline.php
>> Disallow: /en/forum/adm
>> Disallow: /en/forum/cache
>> Disallow: /en/forum/docs
>> Disallow: /en/forum/files
>> Disallow: /en/forum/images
>> Disallow: /en/forum/includes
>> Disallow: /en/forum/language
>> Disallow: /en/forum/store
>> Disallow: /en/forum/styles
>> Disallow: /es/forum/common.php
>> Disallow: /es/forum/config.php
>> Disallow: /es/forum/con.php
>> Disallow: /es/forum/faq.php
>> Disallow: /es/forum/mcp.php
>> Disallow: /es/forum/memberlist.php
>> Disallow: /es/forum/posting.php
>> Disallow: /es/forum/report.php
>> Disallow: /es/forum/search.php
>> Disallow: /es/forum/style.php
>> Disallow: /es/forum/ucp.php
>> Disallow: /es/forum/viewonline.php
>> Disallow: /es/forum/adm
>> Disallow: /es/forum/cache
>> Disallow: /es/forum/docs
>> Disallow: /es/forum/files
>> Disallow: /es/forum/images
>> Disallow: /es/forum/includes
>> Disallow: /es/forum/language
>> Disallow: /es/forum/store
>> Disallow: /es/forum/styles
>> Disallow: /fr/forum/common.php
>> Disallow: /fr/forum/config.php
>> Disallow: /fr/forum/con.php
>> Disallow: /fr/forum/faq.php
>> Disallow: /fr/forum/mcp.php
>> Disallow: /fr/forum/memberlist.php
>> Disallow: /fr/forum/posting.php
>> Disallow: /fr/forum/report.php
>> Disallow: /fr/forum/search.php
>> Disallow: /fr/forum/style.php
>> Disallow: /fr/forum/ucp.php
>> Disallow: /fr/forum/viewonline.php
>> Disallow: /fr/forum/adm
>> Disallow: /fr/forum/cache
>> Disallow: /fr/forum/docs
>> Disallow: /fr/forum/files
>> Disallow: /fr/forum/images
>> Disallow: /fr/forum/includes
>> Disallow: /fr/forum/language
>> Disallow: /fr/forum/store
>> Disallow: /fr/forum/styles
>> Disallow: /fr/ci-joint
>> Disallow: /hu/forum/common.php
>> Disallow: /hu/forum/config.php
>> Disallow: /hu/forum/con.php
>> Disallow: /hu/forum/faq.php
>> Disallow: /hu/forum/mcp.php
>> Disallow: /hu/forum/memberlist.php
>> Disallow: /hu/forum/posting.php
>> Disallow: /hu/forum/report.php
>> Disallow: /hu/forum/search.php
>> Disallow: /hu/forum/style.php
>> Disallow: /hu/forum/ucp.php
>> Disallow: /hu/forum/viewonline.php
>> Disallow: /hu/forum/adm
>> Disallow: /hu/forum/cache
>> Disallow: /hu/forum/docs
>> Disallow: /hu/forum/files
>> Disallow: /hu/forum/images
>> Disallow: /hu/forum/includes
>> Disallow: /hu/forum/language
>> Disallow: /hu/forum/store
>> Disallow: /hu/forum/styles
>> Disallow: /ja/forum/common.php
>> Disallow: /ja/forum/config.php
>> Disallow: /ja/forum/con.php
>> Disallow: /ja/forum/faq.php
>> Disallow: /ja/forum/mcp.php
>> Disallow: /ja/forum/memberlist.php
>> Disallow: /ja/forum/posting.php
>> Disallow: /ja/forum/report.php
>> Disallow: /ja/forum/search.php
>> Disallow: /ja/forum/style.php
>> Disallow: /ja/forum/ucp.php
>> Disallow: /ja/forum/viewonline.php
>> Disallow: /ja/forum/adm
>> Disallow: /ja/forum/cache
>> Disallow: /ja/forum/docs
>> Disallow: /ja/forum/files
>> Disallow: /ja/forum/images
>> Disallow: /ja/forum/includes
>> Disallow: /ja/forum/language
>> Disallow: /ja/forum/store
>> Disallow: /ja/forum/styles
>> Disallow: /test
>> Disallow: /nl/forum/common.php
>> Disallow: /nl/forum/config.php
>> Disallow: /nl/forum/con.php
>> Disallow: /nl/forum/faq.php
>> Disallow: /nl/forum/mcp.php
>> Disallow: /nl/forum/memberlist.php
>> Disallow: /nl/forum/posting.php
>> Disallow: /nl/forum/report.php
>> Disallow: /nl/forum/search.php
>> Disallow: /nl/forum/style.php
>> Disallow: /nl/forum/ucp.php
>> Disallow: /nl/forum/viewonline.php
>> Disallow: /nl/forum/adm
>> Disallow: /nl/forum/cache
>> Disallow: /nl/forum/docs
>> Disallow: /nl/forum/files
>> Disallow: /nl/forum/images
>> Disallow: /nl/forum/includes
>> Disallow: /nl/forum/language
>> Disallow: /nl/forum/store
>> Disallow: /nl/forum/styles
>> Disallow: /vi/forum/common.php
>> Disallow: /vi/forum/config.php
>> Disallow: /vi/forum/con.php
>> Disallow: /vi/forum/faq.php
>> Disallow: /vi/forum/mcp.php
>> Disallow: /vi/forum/memberlist.php
>> Disallow: /vi/forum/posting.php
>> Disallow: /vi/forum/report.php
>> Disallow: /vi/forum/search.php
>> Disallow: /vi/forum/style.php
>> Disallow: /vi/forum/ucp.php
>> Disallow: /vi/forum/viewonline.php
>> Disallow: /vi/forum/adm
>> Disallow: /vi/forum/cache
>> Disallow: /vi/forum/docs
>> Disallow: /vi/forum/files
>> Disallow: /vi/forum/images
>> Disallow: /vi/forum/includes
>> Disallow: /vi/forum/language
>> Disallow: /vi/forum/store
>> Disallow: /vi/forum/styles
>> Disallow: /zh/forum/common.php
>> Disallow: /zh/forum/config.php
>> Disallow: /zh/forum/con.php
>> Disallow: /zh/forum/faq.php
>> Disallow: /zh/forum/mcp.php
>> Disallow: /zh/forum/memberlist.php
>> Disallow: /zh/forum/posting.php
>> Disallow: /zh/forum/report.php
>> Disallow: /zh/forum/search.php
>> Disallow: /zh/forum/style.php
>> Disallow: /zh/forum/ucp.php
>> Disallow: /zh/forum/viewonline.php
>> Disallow: /zh/forum/adm
>> Disallow: /zh/forum/cache
>> Disallow: /zh/forum/docs
>> Disallow: /zh/forum/files
>> Disallow: /zh/forum/images
>> Disallow: /zh/forum/includes
>> Disallow: /zh/forum/language
>> Disallow: /zh/forum/store
>> Disallow: /zh/forum/styles
>>
>> This has been the robots.txt file since: Last-Modified: Sat, 06 Jun 2009 23:40:14 GMT
>>
>> Forum search uses phpBB
>>
>> We haven’t allowed search engines to crawl forum.openoffice.org since before the Oracle donation to the ASF.
>>
>> Crawlers IP addresses might be blocked by ASF Infra if their use is excessive. That could give the 301.
>>
>> Regards,
>> Dave
>>
>>> On May 12, 2020, at 3:55 AM, Peter Kovacs <[hidden email]> wrote:
>>>
>>> Hello all,
>>>
>>>
>>> What I figured is that from the Google search tool the URL forum.openoffice.org is not reachable.
>>>
>>> So I checked with Duckduckgo (my prefered Search engine), they don't use crawler and point at the infra of Google, Bing and Yandex.
>>>
>>> I checked then with Bing, but could not figure out to check bots feedback on an URL so I moved on
>>>
>>> I checked with Yandex. They have a search URL test page. I have entered there forum.openoffice.org
>>>
>>> The Response is:
>>>
>>> ------------------------------------------------------------------------
>>>
>>> * Date: Tue, 12 May 2020 10:37:47 GMT
>>> * Server: Apache/2.4.18 (Ubuntu)
>>> * Location: https://forum.openoffice.org/
>>> * Content-Length: 237
>>> * Keep-Alive: timeout=15, max=100
>>> * Connection: Keep-Alive
>>> * Content-Type: text/html; charset=iso-8859-1
>>>
>>> ------------------------------------------------------------------------
>>>
>>>
>>> HTTP status code 301 Moved Permanently
>>> Server response time 133 ms
>>> IP address 54.84.201.130
>>> Encoding UTF-8(unicode-1-1-utf-8, UTF8)
>>> Page size 237 B
>>>
>>>
>>> I am not sure, what that means. HTTP Status Code moved Permanently reads wrong. I just dont know if this is the return code from our webservcer or a response code from the crawler.
>>> I try to get someone from Infra. Or I'll open a ticket.
>>>
>>>
>>> All the best
>>> Peter
>>>
>>> Am 12.05.20 um 10:39 schrieb Matthias Seidel:
>>>> Hi Kay,
>>>>
>>>> Am 12.05.20 um 01:21 schrieb Kay Schenk:
>>>>> On 5/11/20 12:33 PM, Matthias Seidel wrote:
>>>>>> Hi Kay,
>>>>>>
>>>>>> Am 11.05.20 um 21:23 schrieb Kay Schenk:
>>>>>>> Hi Peter...
>>>>>>>
>>>>>>> Since I am a Google Search admin for www.openoffice.org, and
>>>>>>> openoffice.apache.org, I got this also. Disclaimer: I have not done
>>>>>>> ANY work with the Google Search apis on these sites in quite some time.
>>>>>>>
>>>>>>> I actually was NOT aware forum.openoffice.org was set up to use Google
>>>>>>> Search until I saw this.
>>>>>> I think, I added it to the list when we had a discussion about outdated
>>>>>> information regarding SourceForge found by Google Search.
>>>>>>
>>>>>> But I don't have access to forum.openoffice.org, so I could never
>>>>>> complete the step.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>>   Matthias
>>>>> OK. In the top level of the website source, there is a file called
>>>>> "skeleton.html" which references the following bit of code --
>>>>>
>>>>> <!--#include virtual="/scripts/google-analytics.js" -->
>>>>>
>>>>> I didn't dig far enough to find how "skeleton.html" is used ( I
>>>>> forgot) but this this is example for the google-analytics code snippet
>>>>> that is used. Basically, this needs to be included in the site you
>>>>> want analytics to be used on by putting it in the (header) files that
>>>>> generate the site. And, you might  take a look at recent instructions
>>>>> from Google. Things change.
>>>>>
>>>>> https://support.google.com/analytics/answer/1008080
>>>> Yes, but this is for Google Analytics. I wouldn't want to "analyze" the
>>>> forum...
>>>> The procedure for the Google Search Console is the same, it needs access
>>>> to the root directory.
>>>>
>>>> Maybe Andrea can help if he is available again?
>>>>
>>>> Regards,
>>>>
>>>>  Matthias
>>>>
>>>>> Regards,
>>>>>
>>>>> Kay
>>>>>
>>>>>>> One of the Google Search admins for forum.openoffice.org could check
>>>>>>> the current Google search apis that are in use on that site. Changes
>>>>>>> are occasionally made to the calls, and maybe that is the issue, or a
>>>>>>> robots.txt for that site is causing this. I don't think it requires a
>>>>>>> response, but maybe some investigation.
>>>>>>>
>>>>>>> Just some ideas...
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Kay
>>>>>>>
>>>>>>>
>>>>>>> On 5/11/20 6:02 AM, Peter Kovacs wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I have received following mail. Probably because I am listed in the
>>>>>>>> google-Analytics page.
>>>>>>>>
>>>>>>>> Does this has some action items? What can we answer Mr John Mueller?
>>>>>>>>
>>>>>>>>
>>>>>>>> All the Best
>>>>>>>>
>>>>>>>> Peter
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -------- Weitergeleitete Nachricht --------
>>>>>>>> Betreff:     Critical issue on forum.openoffice.org and Google Search
>>>>>>>> Datum:     Mon, 11 May 2020 13:37:27 +0200
>>>>>>>> Von:     John Mueller <[hidden email]>
>>>>>>>> An:     [hidden email], [hidden email], [hidden email]
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Dear webmaster of forum.openoffice.org <http://forum.openoffice.org>
>>>>>>>>
>>>>>>>> I'm an analyst at Google in Switzerland. We wanted to bring your
>>>>>>>> attention to a critical issue with your website, and how it's
>>>>>>>> available for Google's web search.
>>>>>>>>
>>>>>>>> In particular, Googlebot has been unable to crawl URLs from
>>>>>>>> https://forum.openoffice.org/ . This will cause those pages to drop
>>>>>>>> out of Google's search results, and will prevent new pages from being
>>>>>>>> picked up for Search. If you're not aware of this issue, you may be
>>>>>>>> accidentally blocking these pages from Google Search due to a server
>>>>>>>> issue. If you need to block Googlebot from crawling pages on your
>>>>>>>> website, we'd recommend using the robots.txt file instead.
>>>>>>>>
>>>>>>>> Should you need to recognize IP addresses of Googlebot requests, you
>>>>>>>> can use a reverse IP lookup to do so:
>>>>>>>> https://support.google.com/webmasters/answer/80553
>>>>>>>>
>>>>>>>> Should you have any questions, feel free to contact me directly. For
>>>>>>>> verification purposes, we are sending a copy of this message to your
>>>>>>>> site's Search Console account.
>>>>>>>>
>>>>>>>> Thank you,
>>>>>>>> John Mueller ([hidden email] <mailto:[hidden email]>)
>>>>>>>> Webmaster Trends Analyst
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [hidden email]
>>>>> For additional commands, e-mail: [hidden email]
>>>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Critical issue on forum.openoffice.org and Google Search

Peter Kovacs-3
Okay, I had a short debug session with Dave and Humbedooh.

We are now sure that the crawlers are not blocked. The 301 Response
comes from the fact that Yandex still defaults to http and not https.

After I added https toi the URL all worked fine.

Wave did also do a curl request which also worked fine.


We have agreed now that I play the ball back to google, with the
feedback that this looks like a Google internal issue.

The Robot.txt has not been changed for 11 years. Yandex can crawl the
URL and we can curl the Webpage. So we think it is an Google Issue.


I very much appreciated the quick session. Thanks.


all the Best

Peter

Am 12.05.20 um 17:24 schrieb Dave Fisher:

> It’s not an IP Ban. Infra tells me that would not be a 301.
>
> Ah-ha - here is the 301:
>
> % curl -D headers http://forum.openoffice.org/
> <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
> <html><head>
> <title>301 Moved Permanently</title>
> </head><body>
> <h1>Moved Permanently</h1>
> <p>The document has moved <a href="https://forum.openoffice.org/">here</a>.</p>
> </body></html>
>
> Surprising that they cannot shift from HTTP to HTTPS via a 301!
>
> Regards,
> Dave
>
>> On May 12, 2020, at 8:04 AM, Dave Fisher <[hidden email]> wrote:
>>
>> Information about Infra IP Bans is here: https://infra.apache.org/infra-ban.html
>>
>> Please direct the Google engineer to that resource.
>>
>> Regards,
>> Dave
>>
>>> On May 12, 2020, at 7:55 AM, Dave Fisher <[hidden email]> wrote:
>>>
>>> Are you sure you weren’t using forums.openoffice.org instead of forum.openoffice.org?
>>>
>>> curl -D headers https://forum.openoffice.org/ does return the correct page.
>>>
>>> The robots.txt is this:
>>>
>>> curl -D headers https://forum.openoffice.org/robots.txt
>>> User-agent: *
>>> Crawl-delay: 1
>>> Disallow: /en/forum/common.php
>>> Disallow: /en/forum/config.php
>>> Disallow: /en/forum/con.php
>>> Disallow: /en/forum/faq.php
>>> Disallow: /en/forum/mcp.php
>>> Disallow: /en/forum/memberlist.php
>>> Disallow: /en/forum/posting.php
>>> Disallow: /en/forum/report.php
>>> Disallow: /en/forum/search.php
>>> Disallow: /en/forum/style.php
>>> Disallow: /en/forum/ucp.php
>>> Disallow: /en/forum/viewonline.php
>>> Disallow: /en/forum/adm
>>> Disallow: /en/forum/cache
>>> Disallow: /en/forum/docs
>>> Disallow: /en/forum/files
>>> Disallow: /en/forum/images
>>> Disallow: /en/forum/includes
>>> Disallow: /en/forum/language
>>> Disallow: /en/forum/store
>>> Disallow: /en/forum/styles
>>> Disallow: /es/forum/common.php
>>> Disallow: /es/forum/config.php
>>> Disallow: /es/forum/con.php
>>> Disallow: /es/forum/faq.php
>>> Disallow: /es/forum/mcp.php
>>> Disallow: /es/forum/memberlist.php
>>> Disallow: /es/forum/posting.php
>>> Disallow: /es/forum/report.php
>>> Disallow: /es/forum/search.php
>>> Disallow: /es/forum/style.php
>>> Disallow: /es/forum/ucp.php
>>> Disallow: /es/forum/viewonline.php
>>> Disallow: /es/forum/adm
>>> Disallow: /es/forum/cache
>>> Disallow: /es/forum/docs
>>> Disallow: /es/forum/files
>>> Disallow: /es/forum/images
>>> Disallow: /es/forum/includes
>>> Disallow: /es/forum/language
>>> Disallow: /es/forum/store
>>> Disallow: /es/forum/styles
>>> Disallow: /fr/forum/common.php
>>> Disallow: /fr/forum/config.php
>>> Disallow: /fr/forum/con.php
>>> Disallow: /fr/forum/faq.php
>>> Disallow: /fr/forum/mcp.php
>>> Disallow: /fr/forum/memberlist.php
>>> Disallow: /fr/forum/posting.php
>>> Disallow: /fr/forum/report.php
>>> Disallow: /fr/forum/search.php
>>> Disallow: /fr/forum/style.php
>>> Disallow: /fr/forum/ucp.php
>>> Disallow: /fr/forum/viewonline.php
>>> Disallow: /fr/forum/adm
>>> Disallow: /fr/forum/cache
>>> Disallow: /fr/forum/docs
>>> Disallow: /fr/forum/files
>>> Disallow: /fr/forum/images
>>> Disallow: /fr/forum/includes
>>> Disallow: /fr/forum/language
>>> Disallow: /fr/forum/store
>>> Disallow: /fr/forum/styles
>>> Disallow: /fr/ci-joint
>>> Disallow: /hu/forum/common.php
>>> Disallow: /hu/forum/config.php
>>> Disallow: /hu/forum/con.php
>>> Disallow: /hu/forum/faq.php
>>> Disallow: /hu/forum/mcp.php
>>> Disallow: /hu/forum/memberlist.php
>>> Disallow: /hu/forum/posting.php
>>> Disallow: /hu/forum/report.php
>>> Disallow: /hu/forum/search.php
>>> Disallow: /hu/forum/style.php
>>> Disallow: /hu/forum/ucp.php
>>> Disallow: /hu/forum/viewonline.php
>>> Disallow: /hu/forum/adm
>>> Disallow: /hu/forum/cache
>>> Disallow: /hu/forum/docs
>>> Disallow: /hu/forum/files
>>> Disallow: /hu/forum/images
>>> Disallow: /hu/forum/includes
>>> Disallow: /hu/forum/language
>>> Disallow: /hu/forum/store
>>> Disallow: /hu/forum/styles
>>> Disallow: /ja/forum/common.php
>>> Disallow: /ja/forum/config.php
>>> Disallow: /ja/forum/con.php
>>> Disallow: /ja/forum/faq.php
>>> Disallow: /ja/forum/mcp.php
>>> Disallow: /ja/forum/memberlist.php
>>> Disallow: /ja/forum/posting.php
>>> Disallow: /ja/forum/report.php
>>> Disallow: /ja/forum/search.php
>>> Disallow: /ja/forum/style.php
>>> Disallow: /ja/forum/ucp.php
>>> Disallow: /ja/forum/viewonline.php
>>> Disallow: /ja/forum/adm
>>> Disallow: /ja/forum/cache
>>> Disallow: /ja/forum/docs
>>> Disallow: /ja/forum/files
>>> Disallow: /ja/forum/images
>>> Disallow: /ja/forum/includes
>>> Disallow: /ja/forum/language
>>> Disallow: /ja/forum/store
>>> Disallow: /ja/forum/styles
>>> Disallow: /test
>>> Disallow: /nl/forum/common.php
>>> Disallow: /nl/forum/config.php
>>> Disallow: /nl/forum/con.php
>>> Disallow: /nl/forum/faq.php
>>> Disallow: /nl/forum/mcp.php
>>> Disallow: /nl/forum/memberlist.php
>>> Disallow: /nl/forum/posting.php
>>> Disallow: /nl/forum/report.php
>>> Disallow: /nl/forum/search.php
>>> Disallow: /nl/forum/style.php
>>> Disallow: /nl/forum/ucp.php
>>> Disallow: /nl/forum/viewonline.php
>>> Disallow: /nl/forum/adm
>>> Disallow: /nl/forum/cache
>>> Disallow: /nl/forum/docs
>>> Disallow: /nl/forum/files
>>> Disallow: /nl/forum/images
>>> Disallow: /nl/forum/includes
>>> Disallow: /nl/forum/language
>>> Disallow: /nl/forum/store
>>> Disallow: /nl/forum/styles
>>> Disallow: /vi/forum/common.php
>>> Disallow: /vi/forum/config.php
>>> Disallow: /vi/forum/con.php
>>> Disallow: /vi/forum/faq.php
>>> Disallow: /vi/forum/mcp.php
>>> Disallow: /vi/forum/memberlist.php
>>> Disallow: /vi/forum/posting.php
>>> Disallow: /vi/forum/report.php
>>> Disallow: /vi/forum/search.php
>>> Disallow: /vi/forum/style.php
>>> Disallow: /vi/forum/ucp.php
>>> Disallow: /vi/forum/viewonline.php
>>> Disallow: /vi/forum/adm
>>> Disallow: /vi/forum/cache
>>> Disallow: /vi/forum/docs
>>> Disallow: /vi/forum/files
>>> Disallow: /vi/forum/images
>>> Disallow: /vi/forum/includes
>>> Disallow: /vi/forum/language
>>> Disallow: /vi/forum/store
>>> Disallow: /vi/forum/styles
>>> Disallow: /zh/forum/common.php
>>> Disallow: /zh/forum/config.php
>>> Disallow: /zh/forum/con.php
>>> Disallow: /zh/forum/faq.php
>>> Disallow: /zh/forum/mcp.php
>>> Disallow: /zh/forum/memberlist.php
>>> Disallow: /zh/forum/posting.php
>>> Disallow: /zh/forum/report.php
>>> Disallow: /zh/forum/search.php
>>> Disallow: /zh/forum/style.php
>>> Disallow: /zh/forum/ucp.php
>>> Disallow: /zh/forum/viewonline.php
>>> Disallow: /zh/forum/adm
>>> Disallow: /zh/forum/cache
>>> Disallow: /zh/forum/docs
>>> Disallow: /zh/forum/files
>>> Disallow: /zh/forum/images
>>> Disallow: /zh/forum/includes
>>> Disallow: /zh/forum/language
>>> Disallow: /zh/forum/store
>>> Disallow: /zh/forum/styles
>>>
>>> This has been the robots.txt file since: Last-Modified: Sat, 06 Jun 2009 23:40:14 GMT
>>>
>>> Forum search uses phpBB
>>>
>>> We haven’t allowed search engines to crawl forum.openoffice.org since before the Oracle donation to the ASF.
>>>
>>> Crawlers IP addresses might be blocked by ASF Infra if their use is excessive. That could give the 301.
>>>
>>> Regards,
>>> Dave
>>>
>>>> On May 12, 2020, at 3:55 AM, Peter Kovacs <[hidden email]> wrote:
>>>>
>>>> Hello all,
>>>>
>>>>
>>>> What I figured is that from the Google search tool the URL forum.openoffice.org is not reachable.
>>>>
>>>> So I checked with Duckduckgo (my prefered Search engine), they don't use crawler and point at the infra of Google, Bing and Yandex.
>>>>
>>>> I checked then with Bing, but could not figure out to check bots feedback on an URL so I moved on
>>>>
>>>> I checked with Yandex. They have a search URL test page. I have entered there forum.openoffice.org
>>>>
>>>> The Response is:
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>> * Date: Tue, 12 May 2020 10:37:47 GMT
>>>> * Server: Apache/2.4.18 (Ubuntu)
>>>> * Location: https://forum.openoffice.org/
>>>> * Content-Length: 237
>>>> * Keep-Alive: timeout=15, max=100
>>>> * Connection: Keep-Alive
>>>> * Content-Type: text/html; charset=iso-8859-1
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>>
>>>> HTTP status code 301 Moved Permanently
>>>> Server response time 133 ms
>>>> IP address 54.84.201.130
>>>> Encoding UTF-8(unicode-1-1-utf-8, UTF8)
>>>> Page size 237 B
>>>>
>>>>
>>>> I am not sure, what that means. HTTP Status Code moved Permanently reads wrong. I just dont know if this is the return code from our webservcer or a response code from the crawler.
>>>> I try to get someone from Infra. Or I'll open a ticket.
>>>>
>>>>
>>>> All the best
>>>> Peter
>>>>
>>>> Am 12.05.20 um 10:39 schrieb Matthias Seidel:
>>>>> Hi Kay,
>>>>>
>>>>> Am 12.05.20 um 01:21 schrieb Kay Schenk:
>>>>>> On 5/11/20 12:33 PM, Matthias Seidel wrote:
>>>>>>> Hi Kay,
>>>>>>>
>>>>>>> Am 11.05.20 um 21:23 schrieb Kay Schenk:
>>>>>>>> Hi Peter...
>>>>>>>>
>>>>>>>> Since I am a Google Search admin for www.openoffice.org, and
>>>>>>>> openoffice.apache.org, I got this also. Disclaimer: I have not done
>>>>>>>> ANY work with the Google Search apis on these sites in quite some time.
>>>>>>>>
>>>>>>>> I actually was NOT aware forum.openoffice.org was set up to use Google
>>>>>>>> Search until I saw this.
>>>>>>> I think, I added it to the list when we had a discussion about outdated
>>>>>>> information regarding SourceForge found by Google Search.
>>>>>>>
>>>>>>> But I don't have access to forum.openoffice.org, so I could never
>>>>>>> complete the step.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>>    Matthias
>>>>>> OK. In the top level of the website source, there is a file called
>>>>>> "skeleton.html" which references the following bit of code --
>>>>>>
>>>>>> <!--#include virtual="/scripts/google-analytics.js" -->
>>>>>>
>>>>>> I didn't dig far enough to find how "skeleton.html" is used ( I
>>>>>> forgot) but this this is example for the google-analytics code snippet
>>>>>> that is used. Basically, this needs to be included in the site you
>>>>>> want analytics to be used on by putting it in the (header) files that
>>>>>> generate the site. And, you might  take a look at recent instructions
>>>>>> from Google. Things change.
>>>>>>
>>>>>> https://support.google.com/analytics/answer/1008080
>>>>> Yes, but this is for Google Analytics. I wouldn't want to "analyze" the
>>>>> forum...
>>>>> The procedure for the Google Search Console is the same, it needs access
>>>>> to the root directory.
>>>>>
>>>>> Maybe Andrea can help if he is available again?
>>>>>
>>>>> Regards,
>>>>>
>>>>>   Matthias
>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Kay
>>>>>>
>>>>>>>> One of the Google Search admins for forum.openoffice.org could check
>>>>>>>> the current Google search apis that are in use on that site. Changes
>>>>>>>> are occasionally made to the calls, and maybe that is the issue, or a
>>>>>>>> robots.txt for that site is causing this. I don't think it requires a
>>>>>>>> response, but maybe some investigation.
>>>>>>>>
>>>>>>>> Just some ideas...
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Kay
>>>>>>>>
>>>>>>>>
>>>>>>>> On 5/11/20 6:02 AM, Peter Kovacs wrote:
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I have received following mail. Probably because I am listed in the
>>>>>>>>> google-Analytics page.
>>>>>>>>>
>>>>>>>>> Does this has some action items? What can we answer Mr John Mueller?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> All the Best
>>>>>>>>>
>>>>>>>>> Peter
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -------- Weitergeleitete Nachricht --------
>>>>>>>>> Betreff:     Critical issue on forum.openoffice.org and Google Search
>>>>>>>>> Datum:     Mon, 11 May 2020 13:37:27 +0200
>>>>>>>>> Von:     John Mueller <[hidden email]>
>>>>>>>>> An:     [hidden email], [hidden email], [hidden email]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Dear webmaster of forum.openoffice.org <http://forum.openoffice.org>
>>>>>>>>>
>>>>>>>>> I'm an analyst at Google in Switzerland. We wanted to bring your
>>>>>>>>> attention to a critical issue with your website, and how it's
>>>>>>>>> available for Google's web search.
>>>>>>>>>
>>>>>>>>> In particular, Googlebot has been unable to crawl URLs from
>>>>>>>>> https://forum.openoffice.org/ . This will cause those pages to drop
>>>>>>>>> out of Google's search results, and will prevent new pages from being
>>>>>>>>> picked up for Search. If you're not aware of this issue, you may be
>>>>>>>>> accidentally blocking these pages from Google Search due to a server
>>>>>>>>> issue. If you need to block Googlebot from crawling pages on your
>>>>>>>>> website, we'd recommend using the robots.txt file instead.
>>>>>>>>>
>>>>>>>>> Should you need to recognize IP addresses of Googlebot requests, you
>>>>>>>>> can use a reverse IP lookup to do so:
>>>>>>>>> https://support.google.com/webmasters/answer/80553
>>>>>>>>>
>>>>>>>>> Should you have any questions, feel free to contact me directly. For
>>>>>>>>> verification purposes, we are sending a copy of this message to your
>>>>>>>>> site's Search Console account.
>>>>>>>>>
>>>>>>>>> Thank you,
>>>>>>>>> John Mueller ([hidden email] <mailto:[hidden email]>)
>>>>>>>>> Webmaster Trends Analyst
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Critical issue on forum.openoffice.org and Google Search

Rory O'Farrell
On Tue, 12 May 2020 17:41:09 +0200
Peter Kovacs <[hidden email]> wrote:

> Okay, I had a short debug session with Dave and Humbedooh.
>
> We are now sure that the crawlers are not blocked. The 301 Response
> comes from the fact that Yandex still defaults to http and not https.


This post on User Forum might be relevant
https://forum.openoffice.org/en/forum/viewtopic.php?f=50&t=102021#p492756

Rory

>
> After I added https toi the URL all worked fine.
>
> Wave did also do a curl request which also worked fine.
>
>
> We have agreed now that I play the ball back to google, with the
> feedback that this looks like a Google internal issue.
>
> The Robot.txt has not been changed for 11 years. Yandex can crawl the
> URL and we can curl the Webpage. So we think it is an Google Issue.
>
>
> I very much appreciated the quick session. Thanks.
>
>
> all the Best
>
> Peter
>
> Am 12.05.20 um 17:24 schrieb Dave Fisher:
> > It’s not an IP Ban. Infra tells me that would not be a 301.
> >
> > Ah-ha - here is the 301:
> >
> > % curl -D headers http://forum.openoffice.org/
> > <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
> > <html><head>
> > <title>301 Moved Permanently</title>
> > </head><body>
> > <h1>Moved Permanently</h1>
> > <p>The document has moved <a href="https://forum.openoffice.org/">here</a>.</p>
> > </body></html>
> >
> > Surprising that they cannot shift from HTTP to HTTPS via a 301!
> >
> > Regards,
> > Dave
> >
> >> On May 12, 2020, at 8:04 AM, Dave Fisher <[hidden email]> wrote:
> >>
> >> Information about Infra IP Bans is here: https://infra.apache.org/infra-ban.html
> >>
> >> Please direct the Google engineer to that resource.
> >>
> >> Regards,
> >> Dave
> >>
> >>> On May 12, 2020, at 7:55 AM, Dave Fisher <[hidden email]> wrote:
> >>>
> >>> Are you sure you weren’t using forums.openoffice.org instead of forum.openoffice.org?
> >>>
> >>> curl -D headers https://forum.openoffice.org/ does return the correct page.
> >>>
> >>> The robots.txt is this:
> >>>
> >>> curl -D headers https://forum.openoffice.org/robots.txt
> >>> User-agent: *
> >>> Crawl-delay: 1
> >>> Disallow: /en/forum/common.php
> >>> Disallow: /en/forum/config.php
> >>> Disallow: /en/forum/con.php
> >>> Disallow: /en/forum/faq.php
> >>> Disallow: /en/forum/mcp.php
> >>> Disallow: /en/forum/memberlist.php
> >>> Disallow: /en/forum/posting.php
> >>> Disallow: /en/forum/report.php
> >>> Disallow: /en/forum/search.php
> >>> Disallow: /en/forum/style.php
> >>> Disallow: /en/forum/ucp.php
> >>> Disallow: /en/forum/viewonline.php
> >>> Disallow: /en/forum/adm
> >>> Disallow: /en/forum/cache
> >>> Disallow: /en/forum/docs
> >>> Disallow: /en/forum/files
> >>> Disallow: /en/forum/images
> >>> Disallow: /en/forum/includes
> >>> Disallow: /en/forum/language
> >>> Disallow: /en/forum/store
> >>> Disallow: /en/forum/styles
> >>> Disallow: /es/forum/common.php
> >>> Disallow: /es/forum/config.php
> >>> Disallow: /es/forum/con.php
> >>> Disallow: /es/forum/faq.php
> >>> Disallow: /es/forum/mcp.php
> >>> Disallow: /es/forum/memberlist.php
> >>> Disallow: /es/forum/posting.php
> >>> Disallow: /es/forum/report.php
> >>> Disallow: /es/forum/search.php
> >>> Disallow: /es/forum/style.php
> >>> Disallow: /es/forum/ucp.php
> >>> Disallow: /es/forum/viewonline.php
> >>> Disallow: /es/forum/adm
> >>> Disallow: /es/forum/cache
> >>> Disallow: /es/forum/docs
> >>> Disallow: /es/forum/files
> >>> Disallow: /es/forum/images
> >>> Disallow: /es/forum/includes
> >>> Disallow: /es/forum/language
> >>> Disallow: /es/forum/store
> >>> Disallow: /es/forum/styles
> >>> Disallow: /fr/forum/common.php
> >>> Disallow: /fr/forum/config.php
> >>> Disallow: /fr/forum/con.php
> >>> Disallow: /fr/forum/faq.php
> >>> Disallow: /fr/forum/mcp.php
> >>> Disallow: /fr/forum/memberlist.php
> >>> Disallow: /fr/forum/posting.php
> >>> Disallow: /fr/forum/report.php
> >>> Disallow: /fr/forum/search.php
> >>> Disallow: /fr/forum/style.php
> >>> Disallow: /fr/forum/ucp.php
> >>> Disallow: /fr/forum/viewonline.php
> >>> Disallow: /fr/forum/adm
> >>> Disallow: /fr/forum/cache
> >>> Disallow: /fr/forum/docs
> >>> Disallow: /fr/forum/files
> >>> Disallow: /fr/forum/images
> >>> Disallow: /fr/forum/includes
> >>> Disallow: /fr/forum/language
> >>> Disallow: /fr/forum/store
> >>> Disallow: /fr/forum/styles
> >>> Disallow: /fr/ci-joint
> >>> Disallow: /hu/forum/common.php
> >>> Disallow: /hu/forum/config.php
> >>> Disallow: /hu/forum/con.php
> >>> Disallow: /hu/forum/faq.php
> >>> Disallow: /hu/forum/mcp.php
> >>> Disallow: /hu/forum/memberlist.php
> >>> Disallow: /hu/forum/posting.php
> >>> Disallow: /hu/forum/report.php
> >>> Disallow: /hu/forum/search.php
> >>> Disallow: /hu/forum/style.php
> >>> Disallow: /hu/forum/ucp.php
> >>> Disallow: /hu/forum/viewonline.php
> >>> Disallow: /hu/forum/adm
> >>> Disallow: /hu/forum/cache
> >>> Disallow: /hu/forum/docs
> >>> Disallow: /hu/forum/files
> >>> Disallow: /hu/forum/images
> >>> Disallow: /hu/forum/includes
> >>> Disallow: /hu/forum/language
> >>> Disallow: /hu/forum/store
> >>> Disallow: /hu/forum/styles
> >>> Disallow: /ja/forum/common.php
> >>> Disallow: /ja/forum/config.php
> >>> Disallow: /ja/forum/con.php
> >>> Disallow: /ja/forum/faq.php
> >>> Disallow: /ja/forum/mcp.php
> >>> Disallow: /ja/forum/memberlist.php
> >>> Disallow: /ja/forum/posting.php
> >>> Disallow: /ja/forum/report.php
> >>> Disallow: /ja/forum/search.php
> >>> Disallow: /ja/forum/style.php
> >>> Disallow: /ja/forum/ucp.php
> >>> Disallow: /ja/forum/viewonline.php
> >>> Disallow: /ja/forum/adm
> >>> Disallow: /ja/forum/cache
> >>> Disallow: /ja/forum/docs
> >>> Disallow: /ja/forum/files
> >>> Disallow: /ja/forum/images
> >>> Disallow: /ja/forum/includes
> >>> Disallow: /ja/forum/language
> >>> Disallow: /ja/forum/store
> >>> Disallow: /ja/forum/styles
> >>> Disallow: /test
> >>> Disallow: /nl/forum/common.php
> >>> Disallow: /nl/forum/config.php
> >>> Disallow: /nl/forum/con.php
> >>> Disallow: /nl/forum/faq.php
> >>> Disallow: /nl/forum/mcp.php
> >>> Disallow: /nl/forum/memberlist.php
> >>> Disallow: /nl/forum/posting.php
> >>> Disallow: /nl/forum/report.php
> >>> Disallow: /nl/forum/search.php
> >>> Disallow: /nl/forum/style.php
> >>> Disallow: /nl/forum/ucp.php
> >>> Disallow: /nl/forum/viewonline.php
> >>> Disallow: /nl/forum/adm
> >>> Disallow: /nl/forum/cache
> >>> Disallow: /nl/forum/docs
> >>> Disallow: /nl/forum/files
> >>> Disallow: /nl/forum/images
> >>> Disallow: /nl/forum/includes
> >>> Disallow: /nl/forum/language
> >>> Disallow: /nl/forum/store
> >>> Disallow: /nl/forum/styles
> >>> Disallow: /vi/forum/common.php
> >>> Disallow: /vi/forum/config.php
> >>> Disallow: /vi/forum/con.php
> >>> Disallow: /vi/forum/faq.php
> >>> Disallow: /vi/forum/mcp.php
> >>> Disallow: /vi/forum/memberlist.php
> >>> Disallow: /vi/forum/posting.php
> >>> Disallow: /vi/forum/report.php
> >>> Disallow: /vi/forum/search.php
> >>> Disallow: /vi/forum/style.php
> >>> Disallow: /vi/forum/ucp.php
> >>> Disallow: /vi/forum/viewonline.php
> >>> Disallow: /vi/forum/adm
> >>> Disallow: /vi/forum/cache
> >>> Disallow: /vi/forum/docs
> >>> Disallow: /vi/forum/files
> >>> Disallow: /vi/forum/images
> >>> Disallow: /vi/forum/includes
> >>> Disallow: /vi/forum/language
> >>> Disallow: /vi/forum/store
> >>> Disallow: /vi/forum/styles
> >>> Disallow: /zh/forum/common.php
> >>> Disallow: /zh/forum/config.php
> >>> Disallow: /zh/forum/con.php
> >>> Disallow: /zh/forum/faq.php
> >>> Disallow: /zh/forum/mcp.php
> >>> Disallow: /zh/forum/memberlist.php
> >>> Disallow: /zh/forum/posting.php
> >>> Disallow: /zh/forum/report.php
> >>> Disallow: /zh/forum/search.php
> >>> Disallow: /zh/forum/style.php
> >>> Disallow: /zh/forum/ucp.php
> >>> Disallow: /zh/forum/viewonline.php
> >>> Disallow: /zh/forum/adm
> >>> Disallow: /zh/forum/cache
> >>> Disallow: /zh/forum/docs
> >>> Disallow: /zh/forum/files
> >>> Disallow: /zh/forum/images
> >>> Disallow: /zh/forum/includes
> >>> Disallow: /zh/forum/language
> >>> Disallow: /zh/forum/store
> >>> Disallow: /zh/forum/styles
> >>>
> >>> This has been the robots.txt file since: Last-Modified: Sat, 06 Jun 2009 23:40:14 GMT
> >>>
> >>> Forum search uses phpBB
> >>>
> >>> We haven’t allowed search engines to crawl forum.openoffice.org since before the Oracle donation to the ASF.
> >>>
> >>> Crawlers IP addresses might be blocked by ASF Infra if their use is excessive. That could give the 301.
> >>>
> >>> Regards,
> >>> Dave
> >>>
> >>>> On May 12, 2020, at 3:55 AM, Peter Kovacs <[hidden email]> wrote:
> >>>>
> >>>> Hello all,
> >>>>
> >>>>
> >>>> What I figured is that from the Google search tool the URL forum.openoffice.org is not reachable.
> >>>>
> >>>> So I checked with Duckduckgo (my prefered Search engine), they don't use crawler and point at the infra of Google, Bing and Yandex.
> >>>>
> >>>> I checked then with Bing, but could not figure out to check bots feedback on an URL so I moved on
> >>>>
> >>>> I checked with Yandex. They have a search URL test page. I have entered there forum.openoffice.org
> >>>>
> >>>> The Response is:
> >>>>
> >>>> ------------------------------------------------------------------------
> >>>>
> >>>> * Date: Tue, 12 May 2020 10:37:47 GMT
> >>>> * Server: Apache/2.4.18 (Ubuntu)
> >>>> * Location: https://forum.openoffice.org/
> >>>> * Content-Length: 237
> >>>> * Keep-Alive: timeout=15, max=100
> >>>> * Connection: Keep-Alive
> >>>> * Content-Type: text/html; charset=iso-8859-1
> >>>>
> >>>> ------------------------------------------------------------------------
> >>>>
> >>>>
> >>>> HTTP status code 301 Moved Permanently
> >>>> Server response time 133 ms
> >>>> IP address 54.84.201.130
> >>>> Encoding UTF-8(unicode-1-1-utf-8, UTF8)
> >>>> Page size 237 B
> >>>>
> >>>>
> >>>> I am not sure, what that means. HTTP Status Code moved Permanently reads wrong. I just dont know if this is the return code from our webservcer or a response code from the crawler.
> >>>> I try to get someone from Infra. Or I'll open a ticket.
> >>>>
> >>>>
> >>>> All the best
> >>>> Peter
> >>>>
> >>>> Am 12.05.20 um 10:39 schrieb Matthias Seidel:
> >>>>> Hi Kay,
> >>>>>
> >>>>> Am 12.05.20 um 01:21 schrieb Kay Schenk:
> >>>>>> On 5/11/20 12:33 PM, Matthias Seidel wrote:
> >>>>>>> Hi Kay,
> >>>>>>>
> >>>>>>> Am 11.05.20 um 21:23 schrieb Kay Schenk:
> >>>>>>>> Hi Peter...
> >>>>>>>>
> >>>>>>>> Since I am a Google Search admin for www.openoffice.org, and
> >>>>>>>> openoffice.apache.org, I got this also. Disclaimer: I have not done
> >>>>>>>> ANY work with the Google Search apis on these sites in quite some time.
> >>>>>>>>
> >>>>>>>> I actually was NOT aware forum.openoffice.org was set up to use Google
> >>>>>>>> Search until I saw this.
> >>>>>>> I think, I added it to the list when we had a discussion about outdated
> >>>>>>> information regarding SourceForge found by Google Search.
> >>>>>>>
> >>>>>>> But I don't have access to forum.openoffice.org, so I could never
> >>>>>>> complete the step.
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>>
> >>>>>>>    Matthias
> >>>>>> OK. In the top level of the website source, there is a file called
> >>>>>> "skeleton.html" which references the following bit of code --
> >>>>>>
> >>>>>> <!--#include virtual="/scripts/google-analytics.js" -->
> >>>>>>
> >>>>>> I didn't dig far enough to find how "skeleton.html" is used ( I
> >>>>>> forgot) but this this is example for the google-analytics code snippet
> >>>>>> that is used. Basically, this needs to be included in the site you
> >>>>>> want analytics to be used on by putting it in the (header) files that
> >>>>>> generate the site. And, you might  take a look at recent instructions
> >>>>>> from Google. Things change.
> >>>>>>
> >>>>>> https://support.google.com/analytics/answer/1008080
> >>>>> Yes, but this is for Google Analytics. I wouldn't want to "analyze" the
> >>>>> forum...
> >>>>> The procedure for the Google Search Console is the same, it needs access
> >>>>> to the root directory.
> >>>>>
> >>>>> Maybe Andrea can help if he is available again?
> >>>>>
> >>>>> Regards,
> >>>>>
> >>>>>   Matthias
> >>>>>
> >>>>>> Regards,
> >>>>>>
> >>>>>> Kay
> >>>>>>
> >>>>>>>> One of the Google Search admins for forum.openoffice.org could check
> >>>>>>>> the current Google search apis that are in use on that site. Changes
> >>>>>>>> are occasionally made to the calls, and maybe that is the issue, or a
> >>>>>>>> robots.txt for that site is causing this. I don't think it requires a
> >>>>>>>> response, but maybe some investigation.
> >>>>>>>>
> >>>>>>>> Just some ideas...
> >>>>>>>>
> >>>>>>>> Regards,
> >>>>>>>>
> >>>>>>>> Kay
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 5/11/20 6:02 AM, Peter Kovacs wrote:
> >>>>>>>>> Hi all,
> >>>>>>>>>
> >>>>>>>>> I have received following mail. Probably because I am listed in the
> >>>>>>>>> google-Analytics page.
> >>>>>>>>>
> >>>>>>>>> Does this has some action items? What can we answer Mr John Mueller?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> All the Best
> >>>>>>>>>
> >>>>>>>>> Peter
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> -------- Weitergeleitete Nachricht --------
> >>>>>>>>> Betreff:     Critical issue on forum.openoffice.org and Google Search
> >>>>>>>>> Datum:     Mon, 11 May 2020 13:37:27 +0200
> >>>>>>>>> Von:     John Mueller <[hidden email]>
> >>>>>>>>> An:     [hidden email], [hidden email], [hidden email]
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Dear webmaster of forum.openoffice.org <http://forum.openoffice.org>
> >>>>>>>>>
> >>>>>>>>> I'm an analyst at Google in Switzerland. We wanted to bring your
> >>>>>>>>> attention to a critical issue with your website, and how it's
> >>>>>>>>> available for Google's web search.
> >>>>>>>>>
> >>>>>>>>> In particular, Googlebot has been unable to crawl URLs from
> >>>>>>>>> https://forum.openoffice.org/ . This will cause those pages to drop
> >>>>>>>>> out of Google's search results, and will prevent new pages from being
> >>>>>>>>> picked up for Search. If you're not aware of this issue, you may be
> >>>>>>>>> accidentally blocking these pages from Google Search due to a server
> >>>>>>>>> issue. If you need to block Googlebot from crawling pages on your
> >>>>>>>>> website, we'd recommend using the robots.txt file instead.
> >>>>>>>>>
> >>>>>>>>> Should you need to recognize IP addresses of Googlebot requests, you
> >>>>>>>>> can use a reverse IP lookup to do so:
> >>>>>>>>> https://support.google.com/webmasters/answer/80553
> >>>>>>>>>
> >>>>>>>>> Should you have any questions, feel free to contact me directly. For
> >>>>>>>>> verification purposes, we are sending a copy of this message to your
> >>>>>>>>> site's Search Console account.
> >>>>>>>>>
> >>>>>>>>> Thank you,
> >>>>>>>>> John Mueller ([hidden email] <mailto:[hidden email]>)
> >>>>>>>>> Webmaster Trends Analyst
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> ---------------------------------------------------------------------
> >>>>>>>> To unsubscribe, e-mail: [hidden email]
> >>>>>>>> For additional commands, e-mail: [hidden email]
> >>>>>>>>
> >>>>>> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail: [hidden email]
> >>>>>> For additional commands, e-mail: [hidden email]
> >>>>>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: [hidden email]
> >>> For additional commands, e-mail: [hidden email]
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [hidden email]
> >> For additional commands, e-mail: [hidden email]
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


--
Rory O'Farrell <[hidden email]>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Critical issue on forum.openoffice.org and Google Search

Rory O'Farrell
On Mon, 18 May 2020 15:44:42 +0100
Rory O'Farrell <[hidden email]> wrote:

> On Tue, 12 May 2020 17:41:09 +0200
> Peter Kovacs <[hidden email]> wrote:
>
> > Okay, I had a short debug session with Dave and Humbedooh.
> >
> > We are now sure that the crawlers are not blocked. The 301 Response
> > comes from the fact that Yandex still defaults to http and not https.
>
>
> This post on User Forum might be relevant
> https://forum.openoffice.org/en/forum/viewtopic.php?f=50&t=102021#p492756
>
> Rory

More detailed examination today shows that
Google search in French seems to drop out six days ago, in Italian five days ago, and in English about 23rd April - try a search for openoffice and the site specifier

See the above URL for details.

Rory


> >
> > After I added https toi the URL all worked fine.
> >
> > Wave did also do a curl request which also worked fine.
> >
> >
> > We have agreed now that I play the ball back to google, with the
> > feedback that this looks like a Google internal issue.
> >
> > The Robot.txt has not been changed for 11 years. Yandex can crawl the
> > URL and we can curl the Webpage. So we think it is an Google Issue.
> >
> >
> > I very much appreciated the quick session. Thanks.
> >
> >
> > all the Best
> >
> > Peter
> >
> > Am 12.05.20 um 17:24 schrieb Dave Fisher:
> > > It’s not an IP Ban. Infra tells me that would not be a 301.
> > >
> > > Ah-ha - here is the 301:
> > >
> > > % curl -D headers http://forum.openoffice.org/
> > > <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
> > > <html><head>
> > > <title>301 Moved Permanently</title>
> > > </head><body>
> > > <h1>Moved Permanently</h1>
> > > <p>The document has moved <a href="https://forum.openoffice.org/">here</a>.</p>
> > > </body></html>
> > >
> > > Surprising that they cannot shift from HTTP to HTTPS via a 301!
> > >
> > > Regards,
> > > Dave
> > >
> > >> On May 12, 2020, at 8:04 AM, Dave Fisher <[hidden email]> wrote:
> > >>
> > >> Information about Infra IP Bans is here: https://infra.apache.org/infra-ban.html
> > >>
> > >> Please direct the Google engineer to that resource.
> > >>
> > >> Regards,
> > >> Dave
> > >>
> > >>> On May 12, 2020, at 7:55 AM, Dave Fisher <[hidden email]> wrote:
> > >>>
> > >>> Are you sure you weren’t using forums.openoffice.org instead of forum.openoffice.org?
> > >>>
> > >>> curl -D headers https://forum.openoffice.org/ does return the correct page.
> > >>>
> > >>> The robots.txt is this:
> > >>>
> > >>> curl -D headers https://forum.openoffice.org/robots.txt
> > >>> User-agent: *
> > >>> Crawl-delay: 1
> > >>> Disallow: /en/forum/common.php
> > >>> Disallow: /en/forum/config.php
> > >>> Disallow: /en/forum/con.php
> > >>> Disallow: /en/forum/faq.php
> > >>> Disallow: /en/forum/mcp.php
> > >>> Disallow: /en/forum/memberlist.php
> > >>> Disallow: /en/forum/posting.php
> > >>> Disallow: /en/forum/report.php
> > >>> Disallow: /en/forum/search.php
> > >>> Disallow: /en/forum/style.php
> > >>> Disallow: /en/forum/ucp.php
> > >>> Disallow: /en/forum/viewonline.php
> > >>> Disallow: /en/forum/adm
> > >>> Disallow: /en/forum/cache
> > >>> Disallow: /en/forum/docs
> > >>> Disallow: /en/forum/files
> > >>> Disallow: /en/forum/images
> > >>> Disallow: /en/forum/includes
> > >>> Disallow: /en/forum/language
> > >>> Disallow: /en/forum/store
> > >>> Disallow: /en/forum/styles
> > >>> Disallow: /es/forum/common.php
> > >>> Disallow: /es/forum/config.php
> > >>> Disallow: /es/forum/con.php
> > >>> Disallow: /es/forum/faq.php
> > >>> Disallow: /es/forum/mcp.php
> > >>> Disallow: /es/forum/memberlist.php
> > >>> Disallow: /es/forum/posting.php
> > >>> Disallow: /es/forum/report.php
> > >>> Disallow: /es/forum/search.php
> > >>> Disallow: /es/forum/style.php
> > >>> Disallow: /es/forum/ucp.php
> > >>> Disallow: /es/forum/viewonline.php
> > >>> Disallow: /es/forum/adm
> > >>> Disallow: /es/forum/cache
> > >>> Disallow: /es/forum/docs
> > >>> Disallow: /es/forum/files
> > >>> Disallow: /es/forum/images
> > >>> Disallow: /es/forum/includes
> > >>> Disallow: /es/forum/language
> > >>> Disallow: /es/forum/store
> > >>> Disallow: /es/forum/styles
> > >>> Disallow: /fr/forum/common.php
> > >>> Disallow: /fr/forum/config.php
> > >>> Disallow: /fr/forum/con.php
> > >>> Disallow: /fr/forum/faq.php
> > >>> Disallow: /fr/forum/mcp.php
> > >>> Disallow: /fr/forum/memberlist.php
> > >>> Disallow: /fr/forum/posting.php
> > >>> Disallow: /fr/forum/report.php
> > >>> Disallow: /fr/forum/search.php
> > >>> Disallow: /fr/forum/style.php
> > >>> Disallow: /fr/forum/ucp.php
> > >>> Disallow: /fr/forum/viewonline.php
> > >>> Disallow: /fr/forum/adm
> > >>> Disallow: /fr/forum/cache
> > >>> Disallow: /fr/forum/docs
> > >>> Disallow: /fr/forum/files
> > >>> Disallow: /fr/forum/images
> > >>> Disallow: /fr/forum/includes
> > >>> Disallow: /fr/forum/language
> > >>> Disallow: /fr/forum/store
> > >>> Disallow: /fr/forum/styles
> > >>> Disallow: /fr/ci-joint
> > >>> Disallow: /hu/forum/common.php
> > >>> Disallow: /hu/forum/config.php
> > >>> Disallow: /hu/forum/con.php
> > >>> Disallow: /hu/forum/faq.php
> > >>> Disallow: /hu/forum/mcp.php
> > >>> Disallow: /hu/forum/memberlist.php
> > >>> Disallow: /hu/forum/posting.php
> > >>> Disallow: /hu/forum/report.php
> > >>> Disallow: /hu/forum/search.php
> > >>> Disallow: /hu/forum/style.php
> > >>> Disallow: /hu/forum/ucp.php
> > >>> Disallow: /hu/forum/viewonline.php
> > >>> Disallow: /hu/forum/adm
> > >>> Disallow: /hu/forum/cache
> > >>> Disallow: /hu/forum/docs
> > >>> Disallow: /hu/forum/files
> > >>> Disallow: /hu/forum/images
> > >>> Disallow: /hu/forum/includes
> > >>> Disallow: /hu/forum/language
> > >>> Disallow: /hu/forum/store
> > >>> Disallow: /hu/forum/styles
> > >>> Disallow: /ja/forum/common.php
> > >>> Disallow: /ja/forum/config.php
> > >>> Disallow: /ja/forum/con.php
> > >>> Disallow: /ja/forum/faq.php
> > >>> Disallow: /ja/forum/mcp.php
> > >>> Disallow: /ja/forum/memberlist.php
> > >>> Disallow: /ja/forum/posting.php
> > >>> Disallow: /ja/forum/report.php
> > >>> Disallow: /ja/forum/search.php
> > >>> Disallow: /ja/forum/style.php
> > >>> Disallow: /ja/forum/ucp.php
> > >>> Disallow: /ja/forum/viewonline.php
> > >>> Disallow: /ja/forum/adm
> > >>> Disallow: /ja/forum/cache
> > >>> Disallow: /ja/forum/docs
> > >>> Disallow: /ja/forum/files
> > >>> Disallow: /ja/forum/images
> > >>> Disallow: /ja/forum/includes
> > >>> Disallow: /ja/forum/language
> > >>> Disallow: /ja/forum/store
> > >>> Disallow: /ja/forum/styles
> > >>> Disallow: /test
> > >>> Disallow: /nl/forum/common.php
> > >>> Disallow: /nl/forum/config.php
> > >>> Disallow: /nl/forum/con.php
> > >>> Disallow: /nl/forum/faq.php
> > >>> Disallow: /nl/forum/mcp.php
> > >>> Disallow: /nl/forum/memberlist.php
> > >>> Disallow: /nl/forum/posting.php
> > >>> Disallow: /nl/forum/report.php
> > >>> Disallow: /nl/forum/search.php
> > >>> Disallow: /nl/forum/style.php
> > >>> Disallow: /nl/forum/ucp.php
> > >>> Disallow: /nl/forum/viewonline.php
> > >>> Disallow: /nl/forum/adm
> > >>> Disallow: /nl/forum/cache
> > >>> Disallow: /nl/forum/docs
> > >>> Disallow: /nl/forum/files
> > >>> Disallow: /nl/forum/images
> > >>> Disallow: /nl/forum/includes
> > >>> Disallow: /nl/forum/language
> > >>> Disallow: /nl/forum/store
> > >>> Disallow: /nl/forum/styles
> > >>> Disallow: /vi/forum/common.php
> > >>> Disallow: /vi/forum/config.php
> > >>> Disallow: /vi/forum/con.php
> > >>> Disallow: /vi/forum/faq.php
> > >>> Disallow: /vi/forum/mcp.php
> > >>> Disallow: /vi/forum/memberlist.php
> > >>> Disallow: /vi/forum/posting.php
> > >>> Disallow: /vi/forum/report.php
> > >>> Disallow: /vi/forum/search.php
> > >>> Disallow: /vi/forum/style.php
> > >>> Disallow: /vi/forum/ucp.php
> > >>> Disallow: /vi/forum/viewonline.php
> > >>> Disallow: /vi/forum/adm
> > >>> Disallow: /vi/forum/cache
> > >>> Disallow: /vi/forum/docs
> > >>> Disallow: /vi/forum/files
> > >>> Disallow: /vi/forum/images
> > >>> Disallow: /vi/forum/includes
> > >>> Disallow: /vi/forum/language
> > >>> Disallow: /vi/forum/store
> > >>> Disallow: /vi/forum/styles
> > >>> Disallow: /zh/forum/common.php
> > >>> Disallow: /zh/forum/config.php
> > >>> Disallow: /zh/forum/con.php
> > >>> Disallow: /zh/forum/faq.php
> > >>> Disallow: /zh/forum/mcp.php
> > >>> Disallow: /zh/forum/memberlist.php
> > >>> Disallow: /zh/forum/posting.php
> > >>> Disallow: /zh/forum/report.php
> > >>> Disallow: /zh/forum/search.php
> > >>> Disallow: /zh/forum/style.php
> > >>> Disallow: /zh/forum/ucp.php
> > >>> Disallow: /zh/forum/viewonline.php
> > >>> Disallow: /zh/forum/adm
> > >>> Disallow: /zh/forum/cache
> > >>> Disallow: /zh/forum/docs
> > >>> Disallow: /zh/forum/files
> > >>> Disallow: /zh/forum/images
> > >>> Disallow: /zh/forum/includes
> > >>> Disallow: /zh/forum/language
> > >>> Disallow: /zh/forum/store
> > >>> Disallow: /zh/forum/styles
> > >>>
> > >>> This has been the robots.txt file since: Last-Modified: Sat, 06 Jun 2009 23:40:14 GMT
> > >>>
> > >>> Forum search uses phpBB
> > >>>
> > >>> We haven’t allowed search engines to crawl forum.openoffice.org since before the Oracle donation to the ASF.
> > >>>
> > >>> Crawlers IP addresses might be blocked by ASF Infra if their use is excessive. That could give the 301.
> > >>>
> > >>> Regards,
> > >>> Dave
> > >>>
> > >>>> On May 12, 2020, at 3:55 AM, Peter Kovacs <[hidden email]> wrote:
> > >>>>
> > >>>> Hello all,
> > >>>>
> > >>>>
> > >>>> What I figured is that from the Google search tool the URL forum.openoffice.org is not reachable.
> > >>>>
> > >>>> So I checked with Duckduckgo (my prefered Search engine), they don't use crawler and point at the infra of Google, Bing and Yandex.
> > >>>>
> > >>>> I checked then with Bing, but could not figure out to check bots feedback on an URL so I moved on
> > >>>>
> > >>>> I checked with Yandex. They have a search URL test page. I have entered there forum.openoffice.org
> > >>>>
> > >>>> The Response is:
> > >>>>
> > >>>> ------------------------------------------------------------------------
> > >>>>
> > >>>> * Date: Tue, 12 May 2020 10:37:47 GMT
> > >>>> * Server: Apache/2.4.18 (Ubuntu)
> > >>>> * Location: https://forum.openoffice.org/
> > >>>> * Content-Length: 237
> > >>>> * Keep-Alive: timeout=15, max=100
> > >>>> * Connection: Keep-Alive
> > >>>> * Content-Type: text/html; charset=iso-8859-1
> > >>>>
> > >>>> ------------------------------------------------------------------------
> > >>>>
> > >>>>
> > >>>> HTTP status code 301 Moved Permanently
> > >>>> Server response time 133 ms
> > >>>> IP address 54.84.201.130
> > >>>> Encoding UTF-8(unicode-1-1-utf-8, UTF8)
> > >>>> Page size 237 B
> > >>>>
> > >>>>
> > >>>> I am not sure, what that means. HTTP Status Code moved Permanently reads wrong. I just dont know if this is the return code from our webservcer or a response code from the crawler.
> > >>>> I try to get someone from Infra. Or I'll open a ticket.
> > >>>>
> > >>>>
> > >>>> All the best
> > >>>> Peter
> > >>>>
> > >>>> Am 12.05.20 um 10:39 schrieb Matthias Seidel:
> > >>>>> Hi Kay,
> > >>>>>
> > >>>>> Am 12.05.20 um 01:21 schrieb Kay Schenk:
> > >>>>>> On 5/11/20 12:33 PM, Matthias Seidel wrote:
> > >>>>>>> Hi Kay,
> > >>>>>>>
> > >>>>>>> Am 11.05.20 um 21:23 schrieb Kay Schenk:
> > >>>>>>>> Hi Peter...
> > >>>>>>>>
> > >>>>>>>> Since I am a Google Search admin for www.openoffice.org, and
> > >>>>>>>> openoffice.apache.org, I got this also. Disclaimer: I have not done
> > >>>>>>>> ANY work with the Google Search apis on these sites in quite some time.
> > >>>>>>>>
> > >>>>>>>> I actually was NOT aware forum.openoffice.org was set up to use Google
> > >>>>>>>> Search until I saw this.
> > >>>>>>> I think, I added it to the list when we had a discussion about outdated
> > >>>>>>> information regarding SourceForge found by Google Search.
> > >>>>>>>
> > >>>>>>> But I don't have access to forum.openoffice.org, so I could never
> > >>>>>>> complete the step.
> > >>>>>>>
> > >>>>>>> Regards,
> > >>>>>>>
> > >>>>>>>    Matthias
> > >>>>>> OK. In the top level of the website source, there is a file called
> > >>>>>> "skeleton.html" which references the following bit of code --
> > >>>>>>
> > >>>>>> <!--#include virtual="/scripts/google-analytics.js" -->
> > >>>>>>
> > >>>>>> I didn't dig far enough to find how "skeleton.html" is used ( I
> > >>>>>> forgot) but this this is example for the google-analytics code snippet
> > >>>>>> that is used. Basically, this needs to be included in the site you
> > >>>>>> want analytics to be used on by putting it in the (header) files that
> > >>>>>> generate the site. And, you might  take a look at recent instructions
> > >>>>>> from Google. Things change.
> > >>>>>>
> > >>>>>> https://support.google.com/analytics/answer/1008080
> > >>>>> Yes, but this is for Google Analytics. I wouldn't want to "analyze" the
> > >>>>> forum...
> > >>>>> The procedure for the Google Search Console is the same, it needs access
> > >>>>> to the root directory.
> > >>>>>
> > >>>>> Maybe Andrea can help if he is available again?
> > >>>>>
> > >>>>> Regards,
> > >>>>>
> > >>>>>   Matthias
> > >>>>>
> > >>>>>> Regards,
> > >>>>>>
> > >>>>>> Kay
> > >>>>>>
> > >>>>>>>> One of the Google Search admins for forum.openoffice.org could check
> > >>>>>>>> the current Google search apis that are in use on that site. Changes
> > >>>>>>>> are occasionally made to the calls, and maybe that is the issue, or a
> > >>>>>>>> robots.txt for that site is causing this. I don't think it requires a
> > >>>>>>>> response, but maybe some investigation.
> > >>>>>>>>
> > >>>>>>>> Just some ideas...
> > >>>>>>>>
> > >>>>>>>> Regards,
> > >>>>>>>>
> > >>>>>>>> Kay
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On 5/11/20 6:02 AM, Peter Kovacs wrote:
> > >>>>>>>>> Hi all,
> > >>>>>>>>>
> > >>>>>>>>> I have received following mail. Probably because I am listed in the
> > >>>>>>>>> google-Analytics page.
> > >>>>>>>>>
> > >>>>>>>>> Does this has some action items? What can we answer Mr John Mueller?
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> All the Best
> > >>>>>>>>>
> > >>>>>>>>> Peter
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> -------- Weitergeleitete Nachricht --------
> > >>>>>>>>> Betreff:     Critical issue on forum.openoffice.org and Google Search
> > >>>>>>>>> Datum:     Mon, 11 May 2020 13:37:27 +0200
> > >>>>>>>>> Von:     John Mueller <[hidden email]>
> > >>>>>>>>> An:     [hidden email], [hidden email], [hidden email]
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Dear webmaster of forum.openoffice.org <http://forum.openoffice.org>
> > >>>>>>>>>
> > >>>>>>>>> I'm an analyst at Google in Switzerland. We wanted to bring your
> > >>>>>>>>> attention to a critical issue with your website, and how it's
> > >>>>>>>>> available for Google's web search.
> > >>>>>>>>>
> > >>>>>>>>> In particular, Googlebot has been unable to crawl URLs from
> > >>>>>>>>> https://forum.openoffice.org/ . This will cause those pages to drop
> > >>>>>>>>> out of Google's search results, and will prevent new pages from being
> > >>>>>>>>> picked up for Search. If you're not aware of this issue, you may be
> > >>>>>>>>> accidentally blocking these pages from Google Search due to a server
> > >>>>>>>>> issue. If you need to block Googlebot from crawling pages on your
> > >>>>>>>>> website, we'd recommend using the robots.txt file instead.
> > >>>>>>>>>
> > >>>>>>>>> Should you need to recognize IP addresses of Googlebot requests, you
> > >>>>>>>>> can use a reverse IP lookup to do so:
> > >>>>>>>>> https://support.google.com/webmasters/answer/80553
> > >>>>>>>>>
> > >>>>>>>>> Should you have any questions, feel free to contact me directly. For
> > >>>>>>>>> verification purposes, we are sending a copy of this message to your
> > >>>>>>>>> site's Search Console account.
> > >>>>>>>>>
> > >>>>>>>>> Thank you,
> > >>>>>>>>> John Mueller ([hidden email] <mailto:[hidden email]>)
> > >>>>>>>>> Webmaster Trends Analyst
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>> ---------------------------------------------------------------------
> > >>>>>>>> To unsubscribe, e-mail: [hidden email]
> > >>>>>>>> For additional commands, e-mail: [hidden email]
> > >>>>>>>>
> > >>>>>> ---------------------------------------------------------------------
> > >>>>>> To unsubscribe, e-mail: [hidden email]
> > >>>>>> For additional commands, e-mail: [hidden email]
> > >>>>>>
> > >>>
> > >>> ---------------------------------------------------------------------
> > >>> To unsubscribe, e-mail: [hidden email]
> > >>> For additional commands, e-mail: [hidden email]
> > >>>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: [hidden email]
> > >> For additional commands, e-mail: [hidden email]
> > >>
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [hidden email]
> > > For additional commands, e-mail: [hidden email]
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
>
>
> --
> Rory O'Farrell <[hidden email]>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


--
Rory O'Farrell <[hidden email]>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Critical issue on forum.openoffice.org and Google Search

Peter Kovacs-3
Im am already at it. It worked for me so far. I get search results.Maybe
it has to do with the cache.

Not sure.

Am 18.05.20 um 18:22 schrieb Rory O'Farrell:

> On Mon, 18 May 2020 15:44:42 +0100
> Rory O'Farrell <[hidden email]> wrote:
>
>> On Tue, 12 May 2020 17:41:09 +0200
>> Peter Kovacs <[hidden email]> wrote:
>>
>>> Okay, I had a short debug session with Dave and Humbedooh.
>>>
>>> We are now sure that the crawlers are not blocked. The 301 Response
>>> comes from the fact that Yandex still defaults to http and not https.
>>
>> This post on User Forum might be relevant
>> https://forum.openoffice.org/en/forum/viewtopic.php?f=50&t=102021#p492756
>>
>> Rory
> More detailed examination today shows that
> Google search in French seems to drop out six days ago, in Italian five days ago, and in English about 23rd April - try a search for openoffice and the site specifier
>
> See the above URL for details.
>
> Rory
>
>
>>> After I added https toi the URL all worked fine.
>>>
>>> Wave did also do a curl request which also worked fine.
>>>
>>>
>>> We have agreed now that I play the ball back to google, with the
>>> feedback that this looks like a Google internal issue.
>>>
>>> The Robot.txt has not been changed for 11 years. Yandex can crawl the
>>> URL and we can curl the Webpage. So we think it is an Google Issue.
>>>
>>>
>>> I very much appreciated the quick session. Thanks.
>>>
>>>
>>> all the Best
>>>
>>> Peter
>>>
>>> Am 12.05.20 um 17:24 schrieb Dave Fisher:
>>>> It’s not an IP Ban. Infra tells me that would not be a 301.
>>>>
>>>> Ah-ha - here is the 301:
>>>>
>>>> % curl -D headers http://forum.openoffice.org/
>>>> <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
>>>> <html><head>
>>>> <title>301 Moved Permanently</title>
>>>> </head><body>
>>>> <h1>Moved Permanently</h1>
>>>> <p>The document has moved <a href="https://forum.openoffice.org/">here</a>.</p>
>>>> </body></html>
>>>>
>>>> Surprising that they cannot shift from HTTP to HTTPS via a 301!
>>>>
>>>> Regards,
>>>> Dave
>>>>
>>>>> On May 12, 2020, at 8:04 AM, Dave Fisher <[hidden email]> wrote:
>>>>>
>>>>> Information about Infra IP Bans is here: https://infra.apache.org/infra-ban.html
>>>>>
>>>>> Please direct the Google engineer to that resource.
>>>>>
>>>>> Regards,
>>>>> Dave
>>>>>
>>>>>> On May 12, 2020, at 7:55 AM, Dave Fisher <[hidden email]> wrote:
>>>>>>
>>>>>> Are you sure you weren’t using forums.openoffice.org instead of forum.openoffice.org?
>>>>>>
>>>>>> curl -D headers https://forum.openoffice.org/ does return the correct page.
>>>>>>
>>>>>> The robots.txt is this:
>>>>>>
>>>>>> curl -D headers https://forum.openoffice.org/robots.txt
>>>>>> User-agent: *
>>>>>> Crawl-delay: 1
>>>>>> Disallow: /en/forum/common.php
>>>>>> Disallow: /en/forum/config.php
>>>>>> Disallow: /en/forum/con.php
>>>>>> Disallow: /en/forum/faq.php
>>>>>> Disallow: /en/forum/mcp.php
>>>>>> Disallow: /en/forum/memberlist.php
>>>>>> Disallow: /en/forum/posting.php
>>>>>> Disallow: /en/forum/report.php
>>>>>> Disallow: /en/forum/search.php
>>>>>> Disallow: /en/forum/style.php
>>>>>> Disallow: /en/forum/ucp.php
>>>>>> Disallow: /en/forum/viewonline.php
>>>>>> Disallow: /en/forum/adm
>>>>>> Disallow: /en/forum/cache
>>>>>> Disallow: /en/forum/docs
>>>>>> Disallow: /en/forum/files
>>>>>> Disallow: /en/forum/images
>>>>>> Disallow: /en/forum/includes
>>>>>> Disallow: /en/forum/language
>>>>>> Disallow: /en/forum/store
>>>>>> Disallow: /en/forum/styles
>>>>>> Disallow: /es/forum/common.php
>>>>>> Disallow: /es/forum/config.php
>>>>>> Disallow: /es/forum/con.php
>>>>>> Disallow: /es/forum/faq.php
>>>>>> Disallow: /es/forum/mcp.php
>>>>>> Disallow: /es/forum/memberlist.php
>>>>>> Disallow: /es/forum/posting.php
>>>>>> Disallow: /es/forum/report.php
>>>>>> Disallow: /es/forum/search.php
>>>>>> Disallow: /es/forum/style.php
>>>>>> Disallow: /es/forum/ucp.php
>>>>>> Disallow: /es/forum/viewonline.php
>>>>>> Disallow: /es/forum/adm
>>>>>> Disallow: /es/forum/cache
>>>>>> Disallow: /es/forum/docs
>>>>>> Disallow: /es/forum/files
>>>>>> Disallow: /es/forum/images
>>>>>> Disallow: /es/forum/includes
>>>>>> Disallow: /es/forum/language
>>>>>> Disallow: /es/forum/store
>>>>>> Disallow: /es/forum/styles
>>>>>> Disallow: /fr/forum/common.php
>>>>>> Disallow: /fr/forum/config.php
>>>>>> Disallow: /fr/forum/con.php
>>>>>> Disallow: /fr/forum/faq.php
>>>>>> Disallow: /fr/forum/mcp.php
>>>>>> Disallow: /fr/forum/memberlist.php
>>>>>> Disallow: /fr/forum/posting.php
>>>>>> Disallow: /fr/forum/report.php
>>>>>> Disallow: /fr/forum/search.php
>>>>>> Disallow: /fr/forum/style.php
>>>>>> Disallow: /fr/forum/ucp.php
>>>>>> Disallow: /fr/forum/viewonline.php
>>>>>> Disallow: /fr/forum/adm
>>>>>> Disallow: /fr/forum/cache
>>>>>> Disallow: /fr/forum/docs
>>>>>> Disallow: /fr/forum/files
>>>>>> Disallow: /fr/forum/images
>>>>>> Disallow: /fr/forum/includes
>>>>>> Disallow: /fr/forum/language
>>>>>> Disallow: /fr/forum/store
>>>>>> Disallow: /fr/forum/styles
>>>>>> Disallow: /fr/ci-joint
>>>>>> Disallow: /hu/forum/common.php
>>>>>> Disallow: /hu/forum/config.php
>>>>>> Disallow: /hu/forum/con.php
>>>>>> Disallow: /hu/forum/faq.php
>>>>>> Disallow: /hu/forum/mcp.php
>>>>>> Disallow: /hu/forum/memberlist.php
>>>>>> Disallow: /hu/forum/posting.php
>>>>>> Disallow: /hu/forum/report.php
>>>>>> Disallow: /hu/forum/search.php
>>>>>> Disallow: /hu/forum/style.php
>>>>>> Disallow: /hu/forum/ucp.php
>>>>>> Disallow: /hu/forum/viewonline.php
>>>>>> Disallow: /hu/forum/adm
>>>>>> Disallow: /hu/forum/cache
>>>>>> Disallow: /hu/forum/docs
>>>>>> Disallow: /hu/forum/files
>>>>>> Disallow: /hu/forum/images
>>>>>> Disallow: /hu/forum/includes
>>>>>> Disallow: /hu/forum/language
>>>>>> Disallow: /hu/forum/store
>>>>>> Disallow: /hu/forum/styles
>>>>>> Disallow: /ja/forum/common.php
>>>>>> Disallow: /ja/forum/config.php
>>>>>> Disallow: /ja/forum/con.php
>>>>>> Disallow: /ja/forum/faq.php
>>>>>> Disallow: /ja/forum/mcp.php
>>>>>> Disallow: /ja/forum/memberlist.php
>>>>>> Disallow: /ja/forum/posting.php
>>>>>> Disallow: /ja/forum/report.php
>>>>>> Disallow: /ja/forum/search.php
>>>>>> Disallow: /ja/forum/style.php
>>>>>> Disallow: /ja/forum/ucp.php
>>>>>> Disallow: /ja/forum/viewonline.php
>>>>>> Disallow: /ja/forum/adm
>>>>>> Disallow: /ja/forum/cache
>>>>>> Disallow: /ja/forum/docs
>>>>>> Disallow: /ja/forum/files
>>>>>> Disallow: /ja/forum/images
>>>>>> Disallow: /ja/forum/includes
>>>>>> Disallow: /ja/forum/language
>>>>>> Disallow: /ja/forum/store
>>>>>> Disallow: /ja/forum/styles
>>>>>> Disallow: /test
>>>>>> Disallow: /nl/forum/common.php
>>>>>> Disallow: /nl/forum/config.php
>>>>>> Disallow: /nl/forum/con.php
>>>>>> Disallow: /nl/forum/faq.php
>>>>>> Disallow: /nl/forum/mcp.php
>>>>>> Disallow: /nl/forum/memberlist.php
>>>>>> Disallow: /nl/forum/posting.php
>>>>>> Disallow: /nl/forum/report.php
>>>>>> Disallow: /nl/forum/search.php
>>>>>> Disallow: /nl/forum/style.php
>>>>>> Disallow: /nl/forum/ucp.php
>>>>>> Disallow: /nl/forum/viewonline.php
>>>>>> Disallow: /nl/forum/adm
>>>>>> Disallow: /nl/forum/cache
>>>>>> Disallow: /nl/forum/docs
>>>>>> Disallow: /nl/forum/files
>>>>>> Disallow: /nl/forum/images
>>>>>> Disallow: /nl/forum/includes
>>>>>> Disallow: /nl/forum/language
>>>>>> Disallow: /nl/forum/store
>>>>>> Disallow: /nl/forum/styles
>>>>>> Disallow: /vi/forum/common.php
>>>>>> Disallow: /vi/forum/config.php
>>>>>> Disallow: /vi/forum/con.php
>>>>>> Disallow: /vi/forum/faq.php
>>>>>> Disallow: /vi/forum/mcp.php
>>>>>> Disallow: /vi/forum/memberlist.php
>>>>>> Disallow: /vi/forum/posting.php
>>>>>> Disallow: /vi/forum/report.php
>>>>>> Disallow: /vi/forum/search.php
>>>>>> Disallow: /vi/forum/style.php
>>>>>> Disallow: /vi/forum/ucp.php
>>>>>> Disallow: /vi/forum/viewonline.php
>>>>>> Disallow: /vi/forum/adm
>>>>>> Disallow: /vi/forum/cache
>>>>>> Disallow: /vi/forum/docs
>>>>>> Disallow: /vi/forum/files
>>>>>> Disallow: /vi/forum/images
>>>>>> Disallow: /vi/forum/includes
>>>>>> Disallow: /vi/forum/language
>>>>>> Disallow: /vi/forum/store
>>>>>> Disallow: /vi/forum/styles
>>>>>> Disallow: /zh/forum/common.php
>>>>>> Disallow: /zh/forum/config.php
>>>>>> Disallow: /zh/forum/con.php
>>>>>> Disallow: /zh/forum/faq.php
>>>>>> Disallow: /zh/forum/mcp.php
>>>>>> Disallow: /zh/forum/memberlist.php
>>>>>> Disallow: /zh/forum/posting.php
>>>>>> Disallow: /zh/forum/report.php
>>>>>> Disallow: /zh/forum/search.php
>>>>>> Disallow: /zh/forum/style.php
>>>>>> Disallow: /zh/forum/ucp.php
>>>>>> Disallow: /zh/forum/viewonline.php
>>>>>> Disallow: /zh/forum/adm
>>>>>> Disallow: /zh/forum/cache
>>>>>> Disallow: /zh/forum/docs
>>>>>> Disallow: /zh/forum/files
>>>>>> Disallow: /zh/forum/images
>>>>>> Disallow: /zh/forum/includes
>>>>>> Disallow: /zh/forum/language
>>>>>> Disallow: /zh/forum/store
>>>>>> Disallow: /zh/forum/styles
>>>>>>
>>>>>> This has been the robots.txt file since: Last-Modified: Sat, 06 Jun 2009 23:40:14 GMT
>>>>>>
>>>>>> Forum search uses phpBB
>>>>>>
>>>>>> We haven’t allowed search engines to crawl forum.openoffice.org since before the Oracle donation to the ASF.
>>>>>>
>>>>>> Crawlers IP addresses might be blocked by ASF Infra if their use is excessive. That could give the 301.
>>>>>>
>>>>>> Regards,
>>>>>> Dave
>>>>>>
>>>>>>> On May 12, 2020, at 3:55 AM, Peter Kovacs <[hidden email]> wrote:
>>>>>>>
>>>>>>> Hello all,
>>>>>>>
>>>>>>>
>>>>>>> What I figured is that from the Google search tool the URL forum.openoffice.org is not reachable.
>>>>>>>
>>>>>>> So I checked with Duckduckgo (my prefered Search engine), they don't use crawler and point at the infra of Google, Bing and Yandex.
>>>>>>>
>>>>>>> I checked then with Bing, but could not figure out to check bots feedback on an URL so I moved on
>>>>>>>
>>>>>>> I checked with Yandex. They have a search URL test page. I have entered there forum.openoffice.org
>>>>>>>
>>>>>>> The Response is:
>>>>>>>
>>>>>>> ------------------------------------------------------------------------
>>>>>>>
>>>>>>> * Date: Tue, 12 May 2020 10:37:47 GMT
>>>>>>> * Server: Apache/2.4.18 (Ubuntu)
>>>>>>> * Location: https://forum.openoffice.org/
>>>>>>> * Content-Length: 237
>>>>>>> * Keep-Alive: timeout=15, max=100
>>>>>>> * Connection: Keep-Alive
>>>>>>> * Content-Type: text/html; charset=iso-8859-1
>>>>>>>
>>>>>>> ------------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> HTTP status code 301 Moved Permanently
>>>>>>> Server response time 133 ms
>>>>>>> IP address 54.84.201.130
>>>>>>> Encoding UTF-8(unicode-1-1-utf-8, UTF8)
>>>>>>> Page size 237 B
>>>>>>>
>>>>>>>
>>>>>>> I am not sure, what that means. HTTP Status Code moved Permanently reads wrong. I just dont know if this is the return code from our webservcer or a response code from the crawler.
>>>>>>> I try to get someone from Infra. Or I'll open a ticket.
>>>>>>>
>>>>>>>
>>>>>>> All the best
>>>>>>> Peter
>>>>>>>
>>>>>>> Am 12.05.20 um 10:39 schrieb Matthias Seidel:
>>>>>>>> Hi Kay,
>>>>>>>>
>>>>>>>> Am 12.05.20 um 01:21 schrieb Kay Schenk:
>>>>>>>>> On 5/11/20 12:33 PM, Matthias Seidel wrote:
>>>>>>>>>> Hi Kay,
>>>>>>>>>>
>>>>>>>>>> Am 11.05.20 um 21:23 schrieb Kay Schenk:
>>>>>>>>>>> Hi Peter...
>>>>>>>>>>>
>>>>>>>>>>> Since I am a Google Search admin for www.openoffice.org, and
>>>>>>>>>>> openoffice.apache.org, I got this also. Disclaimer: I have not done
>>>>>>>>>>> ANY work with the Google Search apis on these sites in quite some time.
>>>>>>>>>>>
>>>>>>>>>>> I actually was NOT aware forum.openoffice.org was set up to use Google
>>>>>>>>>>> Search until I saw this.
>>>>>>>>>> I think, I added it to the list when we had a discussion about outdated
>>>>>>>>>> information regarding SourceForge found by Google Search.
>>>>>>>>>>
>>>>>>>>>> But I don't have access to forum.openoffice.org, so I could never
>>>>>>>>>> complete the step.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>>>     Matthias
>>>>>>>>> OK. In the top level of the website source, there is a file called
>>>>>>>>> "skeleton.html" which references the following bit of code --
>>>>>>>>>
>>>>>>>>> <!--#include virtual="/scripts/google-analytics.js" -->
>>>>>>>>>
>>>>>>>>> I didn't dig far enough to find how "skeleton.html" is used ( I
>>>>>>>>> forgot) but this this is example for the google-analytics code snippet
>>>>>>>>> that is used. Basically, this needs to be included in the site you
>>>>>>>>> want analytics to be used on by putting it in the (header) files that
>>>>>>>>> generate the site. And, you might  take a look at recent instructions
>>>>>>>>> from Google. Things change.
>>>>>>>>>
>>>>>>>>> https://support.google.com/analytics/answer/1008080
>>>>>>>> Yes, but this is for Google Analytics. I wouldn't want to "analyze" the
>>>>>>>> forum...
>>>>>>>> The procedure for the Google Search Console is the same, it needs access
>>>>>>>> to the root directory.
>>>>>>>>
>>>>>>>> Maybe Andrea can help if he is available again?
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>>    Matthias
>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>> Kay
>>>>>>>>>
>>>>>>>>>>> One of the Google Search admins for forum.openoffice.org could check
>>>>>>>>>>> the current Google search apis that are in use on that site. Changes
>>>>>>>>>>> are occasionally made to the calls, and maybe that is the issue, or a
>>>>>>>>>>> robots.txt for that site is causing this. I don't think it requires a
>>>>>>>>>>> response, but maybe some investigation.
>>>>>>>>>>>
>>>>>>>>>>> Just some ideas...
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>>
>>>>>>>>>>> Kay
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 5/11/20 6:02 AM, Peter Kovacs wrote:
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> I have received following mail. Probably because I am listed in the
>>>>>>>>>>>> google-Analytics page.
>>>>>>>>>>>>
>>>>>>>>>>>> Does this has some action items? What can we answer Mr John Mueller?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> All the Best
>>>>>>>>>>>>
>>>>>>>>>>>> Peter
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -------- Weitergeleitete Nachricht --------
>>>>>>>>>>>> Betreff:     Critical issue on forum.openoffice.org and Google Search
>>>>>>>>>>>> Datum:     Mon, 11 May 2020 13:37:27 +0200
>>>>>>>>>>>> Von:     John Mueller <[hidden email]>
>>>>>>>>>>>> An:     [hidden email], [hidden email], [hidden email]
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Dear webmaster of forum.openoffice.org <http://forum.openoffice.org>
>>>>>>>>>>>>
>>>>>>>>>>>> I'm an analyst at Google in Switzerland. We wanted to bring your
>>>>>>>>>>>> attention to a critical issue with your website, and how it's
>>>>>>>>>>>> available for Google's web search.
>>>>>>>>>>>>
>>>>>>>>>>>> In particular, Googlebot has been unable to crawl URLs from
>>>>>>>>>>>> https://forum.openoffice.org/ . This will cause those pages to drop
>>>>>>>>>>>> out of Google's search results, and will prevent new pages from being
>>>>>>>>>>>> picked up for Search. If you're not aware of this issue, you may be
>>>>>>>>>>>> accidentally blocking these pages from Google Search due to a server
>>>>>>>>>>>> issue. If you need to block Googlebot from crawling pages on your
>>>>>>>>>>>> website, we'd recommend using the robots.txt file instead.
>>>>>>>>>>>>
>>>>>>>>>>>> Should you need to recognize IP addresses of Googlebot requests, you
>>>>>>>>>>>> can use a reverse IP lookup to do so:
>>>>>>>>>>>> https://support.google.com/webmasters/answer/80553
>>>>>>>>>>>>
>>>>>>>>>>>> Should you have any questions, feel free to contact me directly. For
>>>>>>>>>>>> verification purposes, we are sending a copy of this message to your
>>>>>>>>>>>> site's Search Console account.
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you,
>>>>>>>>>>>> John Mueller ([hidden email] <mailto:[hidden email]>)
>>>>>>>>>>>> Webmaster Trends Analyst
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [hidden email]
>>>>> For additional commands, e-mail: [hidden email]
>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>
>> --
>> Rory O'Farrell <[hidden email]>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Critical issue on forum.openoffice.org and Google Search

Rory O'Farrell
On Mon, 18 May 2020 18:48:07 +0200
Peter Kovacs <[hidden email]> wrote:

> Im am already at it. It worked for me so far. I get search results.Maybe
> it has to do with the cache.
>
> Not sure.


We were testing on recent results; the figures I gave were for finding "openoffice" which would be used daily in many postings.

Rory

>
> Am 18.05.20 um 18:22 schrieb Rory O'Farrell:
> > On Mon, 18 May 2020 15:44:42 +0100
> > Rory O'Farrell <[hidden email]> wrote:
> >
> >> On Tue, 12 May 2020 17:41:09 +0200
> >> Peter Kovacs <[hidden email]> wrote:
> >>
> >>> Okay, I had a short debug session with Dave and Humbedooh.
> >>>
> >>> We are now sure that the crawlers are not blocked. The 301 Response
> >>> comes from the fact that Yandex still defaults to http and not https.
> >>
> >> This post on User Forum might be relevant
> >> https://forum.openoffice.org/en/forum/viewtopic.php?f=50&t=102021#p492756
> >>
> >> Rory
> > More detailed examination today shows that
> > Google search in French seems to drop out six days ago, in Italian five days ago, and in English about 23rd April - try a search for openoffice and the site specifier
> >
> > See the above URL for details.
> >
> > Rory
> >
> >
> >>> After I added https toi the URL all worked fine.
> >>>
> >>> Wave did also do a curl request which also worked fine.
> >>>
> >>>
> >>> We have agreed now that I play the ball back to google, with the
> >>> feedback that this looks like a Google internal issue.
> >>>
> >>> The Robot.txt has not been changed for 11 years. Yandex can crawl the
> >>> URL and we can curl the Webpage. So we think it is an Google Issue.
> >>>
> >>>
> >>> I very much appreciated the quick session. Thanks.
> >>>
> >>>
> >>> all the Best
> >>>
> >>> Peter
> >>>
> >>> Am 12.05.20 um 17:24 schrieb Dave Fisher:
> >>>> It’s not an IP Ban. Infra tells me that would not be a 301.
> >>>>
> >>>> Ah-ha - here is the 301:
> >>>>
> >>>> % curl -D headers http://forum.openoffice.org/
> >>>> <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
> >>>> <html><head>
> >>>> <title>301 Moved Permanently</title>
> >>>> </head><body>
> >>>> <h1>Moved Permanently</h1>
> >>>> <p>The document has moved <a href="https://forum.openoffice.org/">here</a>.</p>
> >>>> </body></html>
> >>>>
> >>>> Surprising that they cannot shift from HTTP to HTTPS via a 301!
> >>>>
> >>>> Regards,
> >>>> Dave
> >>>>
> >>>>> On May 12, 2020, at 8:04 AM, Dave Fisher <[hidden email]> wrote:
> >>>>>
> >>>>> Information about Infra IP Bans is here: https://infra.apache.org/infra-ban.html
> >>>>>
> >>>>> Please direct the Google engineer to that resource.
> >>>>>
> >>>>> Regards,
> >>>>> Dave
> >>>>>
> >>>>>> On May 12, 2020, at 7:55 AM, Dave Fisher <[hidden email]> wrote:
> >>>>>>
> >>>>>> Are you sure you weren’t using forums.openoffice.org instead of forum.openoffice.org?
> >>>>>>
> >>>>>> curl -D headers https://forum.openoffice.org/ does return the correct page.
> >>>>>>
> >>>>>> The robots.txt is this:
> >>>>>>
> >>>>>> curl -D headers https://forum.openoffice.org/robots.txt
> >>>>>> User-agent: *
> >>>>>> Crawl-delay: 1
> >>>>>> Disallow: /en/forum/common.php
> >>>>>> Disallow: /en/forum/config.php
> >>>>>> Disallow: /en/forum/con.php
> >>>>>> Disallow: /en/forum/faq.php
> >>>>>> Disallow: /en/forum/mcp.php
> >>>>>> Disallow: /en/forum/memberlist.php
> >>>>>> Disallow: /en/forum/posting.php
> >>>>>> Disallow: /en/forum/report.php
> >>>>>> Disallow: /en/forum/search.php
> >>>>>> Disallow: /en/forum/style.php
> >>>>>> Disallow: /en/forum/ucp.php
> >>>>>> Disallow: /en/forum/viewonline.php
> >>>>>> Disallow: /en/forum/adm
> >>>>>> Disallow: /en/forum/cache
> >>>>>> Disallow: /en/forum/docs
> >>>>>> Disallow: /en/forum/files
> >>>>>> Disallow: /en/forum/images
> >>>>>> Disallow: /en/forum/includes
> >>>>>> Disallow: /en/forum/language
> >>>>>> Disallow: /en/forum/store
> >>>>>> Disallow: /en/forum/styles
> >>>>>> Disallow: /es/forum/common.php
> >>>>>> Disallow: /es/forum/config.php
> >>>>>> Disallow: /es/forum/con.php
> >>>>>> Disallow: /es/forum/faq.php
> >>>>>> Disallow: /es/forum/mcp.php
> >>>>>> Disallow: /es/forum/memberlist.php
> >>>>>> Disallow: /es/forum/posting.php
> >>>>>> Disallow: /es/forum/report.php
> >>>>>> Disallow: /es/forum/search.php
> >>>>>> Disallow: /es/forum/style.php
> >>>>>> Disallow: /es/forum/ucp.php
> >>>>>> Disallow: /es/forum/viewonline.php
> >>>>>> Disallow: /es/forum/adm
> >>>>>> Disallow: /es/forum/cache
> >>>>>> Disallow: /es/forum/docs
> >>>>>> Disallow: /es/forum/files
> >>>>>> Disallow: /es/forum/images
> >>>>>> Disallow: /es/forum/includes
> >>>>>> Disallow: /es/forum/language
> >>>>>> Disallow: /es/forum/store
> >>>>>> Disallow: /es/forum/styles
> >>>>>> Disallow: /fr/forum/common.php
> >>>>>> Disallow: /fr/forum/config.php
> >>>>>> Disallow: /fr/forum/con.php
> >>>>>> Disallow: /fr/forum/faq.php
> >>>>>> Disallow: /fr/forum/mcp.php
> >>>>>> Disallow: /fr/forum/memberlist.php
> >>>>>> Disallow: /fr/forum/posting.php
> >>>>>> Disallow: /fr/forum/report.php
> >>>>>> Disallow: /fr/forum/search.php
> >>>>>> Disallow: /fr/forum/style.php
> >>>>>> Disallow: /fr/forum/ucp.php
> >>>>>> Disallow: /fr/forum/viewonline.php
> >>>>>> Disallow: /fr/forum/adm
> >>>>>> Disallow: /fr/forum/cache
> >>>>>> Disallow: /fr/forum/docs
> >>>>>> Disallow: /fr/forum/files
> >>>>>> Disallow: /fr/forum/images
> >>>>>> Disallow: /fr/forum/includes
> >>>>>> Disallow: /fr/forum/language
> >>>>>> Disallow: /fr/forum/store
> >>>>>> Disallow: /fr/forum/styles
> >>>>>> Disallow: /fr/ci-joint
> >>>>>> Disallow: /hu/forum/common.php
> >>>>>> Disallow: /hu/forum/config.php
> >>>>>> Disallow: /hu/forum/con.php
> >>>>>> Disallow: /hu/forum/faq.php
> >>>>>> Disallow: /hu/forum/mcp.php
> >>>>>> Disallow: /hu/forum/memberlist.php
> >>>>>> Disallow: /hu/forum/posting.php
> >>>>>> Disallow: /hu/forum/report.php
> >>>>>> Disallow: /hu/forum/search.php
> >>>>>> Disallow: /hu/forum/style.php
> >>>>>> Disallow: /hu/forum/ucp.php
> >>>>>> Disallow: /hu/forum/viewonline.php
> >>>>>> Disallow: /hu/forum/adm
> >>>>>> Disallow: /hu/forum/cache
> >>>>>> Disallow: /hu/forum/docs
> >>>>>> Disallow: /hu/forum/files
> >>>>>> Disallow: /hu/forum/images
> >>>>>> Disallow: /hu/forum/includes
> >>>>>> Disallow: /hu/forum/language
> >>>>>> Disallow: /hu/forum/store
> >>>>>> Disallow: /hu/forum/styles
> >>>>>> Disallow: /ja/forum/common.php
> >>>>>> Disallow: /ja/forum/config.php
> >>>>>> Disallow: /ja/forum/con.php
> >>>>>> Disallow: /ja/forum/faq.php
> >>>>>> Disallow: /ja/forum/mcp.php
> >>>>>> Disallow: /ja/forum/memberlist.php
> >>>>>> Disallow: /ja/forum/posting.php
> >>>>>> Disallow: /ja/forum/report.php
> >>>>>> Disallow: /ja/forum/search.php
> >>>>>> Disallow: /ja/forum/style.php
> >>>>>> Disallow: /ja/forum/ucp.php
> >>>>>> Disallow: /ja/forum/viewonline.php
> >>>>>> Disallow: /ja/forum/adm
> >>>>>> Disallow: /ja/forum/cache
> >>>>>> Disallow: /ja/forum/docs
> >>>>>> Disallow: /ja/forum/files
> >>>>>> Disallow: /ja/forum/images
> >>>>>> Disallow: /ja/forum/includes
> >>>>>> Disallow: /ja/forum/language
> >>>>>> Disallow: /ja/forum/store
> >>>>>> Disallow: /ja/forum/styles
> >>>>>> Disallow: /test
> >>>>>> Disallow: /nl/forum/common.php
> >>>>>> Disallow: /nl/forum/config.php
> >>>>>> Disallow: /nl/forum/con.php
> >>>>>> Disallow: /nl/forum/faq.php
> >>>>>> Disallow: /nl/forum/mcp.php
> >>>>>> Disallow: /nl/forum/memberlist.php
> >>>>>> Disallow: /nl/forum/posting.php
> >>>>>> Disallow: /nl/forum/report.php
> >>>>>> Disallow: /nl/forum/search.php
> >>>>>> Disallow: /nl/forum/style.php
> >>>>>> Disallow: /nl/forum/ucp.php
> >>>>>> Disallow: /nl/forum/viewonline.php
> >>>>>> Disallow: /nl/forum/adm
> >>>>>> Disallow: /nl/forum/cache
> >>>>>> Disallow: /nl/forum/docs
> >>>>>> Disallow: /nl/forum/files
> >>>>>> Disallow: /nl/forum/images
> >>>>>> Disallow: /nl/forum/includes
> >>>>>> Disallow: /nl/forum/language
> >>>>>> Disallow: /nl/forum/store
> >>>>>> Disallow: /nl/forum/styles
> >>>>>> Disallow: /vi/forum/common.php
> >>>>>> Disallow: /vi/forum/config.php
> >>>>>> Disallow: /vi/forum/con.php
> >>>>>> Disallow: /vi/forum/faq.php
> >>>>>> Disallow: /vi/forum/mcp.php
> >>>>>> Disallow: /vi/forum/memberlist.php
> >>>>>> Disallow: /vi/forum/posting.php
> >>>>>> Disallow: /vi/forum/report.php
> >>>>>> Disallow: /vi/forum/search.php
> >>>>>> Disallow: /vi/forum/style.php
> >>>>>> Disallow: /vi/forum/ucp.php
> >>>>>> Disallow: /vi/forum/viewonline.php
> >>>>>> Disallow: /vi/forum/adm
> >>>>>> Disallow: /vi/forum/cache
> >>>>>> Disallow: /vi/forum/docs
> >>>>>> Disallow: /vi/forum/files
> >>>>>> Disallow: /vi/forum/images
> >>>>>> Disallow: /vi/forum/includes
> >>>>>> Disallow: /vi/forum/language
> >>>>>> Disallow: /vi/forum/store
> >>>>>> Disallow: /vi/forum/styles
> >>>>>> Disallow: /zh/forum/common.php
> >>>>>> Disallow: /zh/forum/config.php
> >>>>>> Disallow: /zh/forum/con.php
> >>>>>> Disallow: /zh/forum/faq.php
> >>>>>> Disallow: /zh/forum/mcp.php
> >>>>>> Disallow: /zh/forum/memberlist.php
> >>>>>> Disallow: /zh/forum/posting.php
> >>>>>> Disallow: /zh/forum/report.php
> >>>>>> Disallow: /zh/forum/search.php
> >>>>>> Disallow: /zh/forum/style.php
> >>>>>> Disallow: /zh/forum/ucp.php
> >>>>>> Disallow: /zh/forum/viewonline.php
> >>>>>> Disallow: /zh/forum/adm
> >>>>>> Disallow: /zh/forum/cache
> >>>>>> Disallow: /zh/forum/docs
> >>>>>> Disallow: /zh/forum/files
> >>>>>> Disallow: /zh/forum/images
> >>>>>> Disallow: /zh/forum/includes
> >>>>>> Disallow: /zh/forum/language
> >>>>>> Disallow: /zh/forum/store
> >>>>>> Disallow: /zh/forum/styles
> >>>>>>
> >>>>>> This has been the robots.txt file since: Last-Modified: Sat, 06 Jun 2009 23:40:14 GMT
> >>>>>>
> >>>>>> Forum search uses phpBB
> >>>>>>
> >>>>>> We haven’t allowed search engines to crawl forum.openoffice.org since before the Oracle donation to the ASF.
> >>>>>>
> >>>>>> Crawlers IP addresses might be blocked by ASF Infra if their use is excessive. That could give the 301.
> >>>>>>
> >>>>>> Regards,
> >>>>>> Dave
> >>>>>>
> >>>>>>> On May 12, 2020, at 3:55 AM, Peter Kovacs <[hidden email]> wrote:
> >>>>>>>
> >>>>>>> Hello all,
> >>>>>>>
> >>>>>>>
> >>>>>>> What I figured is that from the Google search tool the URL forum.openoffice.org is not reachable.
> >>>>>>>
> >>>>>>> So I checked with Duckduckgo (my prefered Search engine), they don't use crawler and point at the infra of Google, Bing and Yandex.
> >>>>>>>
> >>>>>>> I checked then with Bing, but could not figure out to check bots feedback on an URL so I moved on
> >>>>>>>
> >>>>>>> I checked with Yandex. They have a search URL test page. I have entered there forum.openoffice.org
> >>>>>>>
> >>>>>>> The Response is:
> >>>>>>>
> >>>>>>> ------------------------------------------------------------------------
> >>>>>>>
> >>>>>>> * Date: Tue, 12 May 2020 10:37:47 GMT
> >>>>>>> * Server: Apache/2.4.18 (Ubuntu)
> >>>>>>> * Location: https://forum.openoffice.org/
> >>>>>>> * Content-Length: 237
> >>>>>>> * Keep-Alive: timeout=15, max=100
> >>>>>>> * Connection: Keep-Alive
> >>>>>>> * Content-Type: text/html; charset=iso-8859-1
> >>>>>>>
> >>>>>>> ------------------------------------------------------------------------
> >>>>>>>
> >>>>>>>
> >>>>>>> HTTP status code 301 Moved Permanently
> >>>>>>> Server response time 133 ms
> >>>>>>> IP address 54.84.201.130
> >>>>>>> Encoding UTF-8(unicode-1-1-utf-8, UTF8)
> >>>>>>> Page size 237 B
> >>>>>>>
> >>>>>>>
> >>>>>>> I am not sure, what that means. HTTP Status Code moved Permanently reads wrong. I just dont know if this is the return code from our webservcer or a response code from the crawler.
> >>>>>>> I try to get someone from Infra. Or I'll open a ticket.
> >>>>>>>
> >>>>>>>
> >>>>>>> All the best
> >>>>>>> Peter
> >>>>>>>
> >>>>>>> Am 12.05.20 um 10:39 schrieb Matthias Seidel:
> >>>>>>>> Hi Kay,
> >>>>>>>>
> >>>>>>>> Am 12.05.20 um 01:21 schrieb Kay Schenk:
> >>>>>>>>> On 5/11/20 12:33 PM, Matthias Seidel wrote:
> >>>>>>>>>> Hi Kay,
> >>>>>>>>>>
> >>>>>>>>>> Am 11.05.20 um 21:23 schrieb Kay Schenk:
> >>>>>>>>>>> Hi Peter...
> >>>>>>>>>>>
> >>>>>>>>>>> Since I am a Google Search admin for www.openoffice.org, and
> >>>>>>>>>>> openoffice.apache.org, I got this also. Disclaimer: I have not done
> >>>>>>>>>>> ANY work with the Google Search apis on these sites in quite some time.
> >>>>>>>>>>>
> >>>>>>>>>>> I actually was NOT aware forum.openoffice.org was set up to use Google
> >>>>>>>>>>> Search until I saw this.
> >>>>>>>>>> I think, I added it to the list when we had a discussion about outdated
> >>>>>>>>>> information regarding SourceForge found by Google Search.
> >>>>>>>>>>
> >>>>>>>>>> But I don't have access to forum.openoffice.org, so I could never
> >>>>>>>>>> complete the step.
> >>>>>>>>>>
> >>>>>>>>>> Regards,
> >>>>>>>>>>
> >>>>>>>>>>     Matthias
> >>>>>>>>> OK. In the top level of the website source, there is a file called
> >>>>>>>>> "skeleton.html" which references the following bit of code --
> >>>>>>>>>
> >>>>>>>>> <!--#include virtual="/scripts/google-analytics.js" -->
> >>>>>>>>>
> >>>>>>>>> I didn't dig far enough to find how "skeleton.html" is used ( I
> >>>>>>>>> forgot) but this this is example for the google-analytics code snippet
> >>>>>>>>> that is used. Basically, this needs to be included in the site you
> >>>>>>>>> want analytics to be used on by putting it in the (header) files that
> >>>>>>>>> generate the site. And, you might  take a look at recent instructions
> >>>>>>>>> from Google. Things change.
> >>>>>>>>>
> >>>>>>>>> https://support.google.com/analytics/answer/1008080
> >>>>>>>> Yes, but this is for Google Analytics. I wouldn't want to "analyze" the
> >>>>>>>> forum...
> >>>>>>>> The procedure for the Google Search Console is the same, it needs access
> >>>>>>>> to the root directory.
> >>>>>>>>
> >>>>>>>> Maybe Andrea can help if he is available again?
> >>>>>>>>
> >>>>>>>> Regards,
> >>>>>>>>
> >>>>>>>>    Matthias
> >>>>>>>>
> >>>>>>>>> Regards,
> >>>>>>>>>
> >>>>>>>>> Kay
> >>>>>>>>>
> >>>>>>>>>>> One of the Google Search admins for forum.openoffice.org could check
> >>>>>>>>>>> the current Google search apis that are in use on that site. Changes
> >>>>>>>>>>> are occasionally made to the calls, and maybe that is the issue, or a
> >>>>>>>>>>> robots.txt for that site is causing this. I don't think it requires a
> >>>>>>>>>>> response, but maybe some investigation.
> >>>>>>>>>>>
> >>>>>>>>>>> Just some ideas...
> >>>>>>>>>>>
> >>>>>>>>>>> Regards,
> >>>>>>>>>>>
> >>>>>>>>>>> Kay
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On 5/11/20 6:02 AM, Peter Kovacs wrote:
> >>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>
> >>>>>>>>>>>> I have received following mail. Probably because I am listed in the
> >>>>>>>>>>>> google-Analytics page.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Does this has some action items? What can we answer Mr John Mueller?
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> All the Best
> >>>>>>>>>>>>
> >>>>>>>>>>>> Peter
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> -------- Weitergeleitete Nachricht --------
> >>>>>>>>>>>> Betreff:     Critical issue on forum.openoffice.org and Google Search
> >>>>>>>>>>>> Datum:     Mon, 11 May 2020 13:37:27 +0200
> >>>>>>>>>>>> Von:     John Mueller <[hidden email]>
> >>>>>>>>>>>> An:     [hidden email], [hidden email], [hidden email]
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Dear webmaster of forum.openoffice.org <http://forum.openoffice.org>
> >>>>>>>>>>>>
> >>>>>>>>>>>> I'm an analyst at Google in Switzerland. We wanted to bring your
> >>>>>>>>>>>> attention to a critical issue with your website, and how it's
> >>>>>>>>>>>> available for Google's web search.
> >>>>>>>>>>>>
> >>>>>>>>>>>> In particular, Googlebot has been unable to crawl URLs from
> >>>>>>>>>>>> https://forum.openoffice.org/ . This will cause those pages to drop
> >>>>>>>>>>>> out of Google's search results, and will prevent new pages from being
> >>>>>>>>>>>> picked up for Search. If you're not aware of this issue, you may be
> >>>>>>>>>>>> accidentally blocking these pages from Google Search due to a server
> >>>>>>>>>>>> issue. If you need to block Googlebot from crawling pages on your
> >>>>>>>>>>>> website, we'd recommend using the robots.txt file instead.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Should you need to recognize IP addresses of Googlebot requests, you
> >>>>>>>>>>>> can use a reverse IP lookup to do so:
> >>>>>>>>>>>> https://support.google.com/webmasters/answer/80553
> >>>>>>>>>>>>
> >>>>>>>>>>>> Should you have any questions, feel free to contact me directly. For
> >>>>>>>>>>>> verification purposes, we are sending a copy of this message to your
> >>>>>>>>>>>> site's Search Console account.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thank you,
> >>>>>>>>>>>> John Mueller ([hidden email] <mailto:[hidden email]>)
> >>>>>>>>>>>> Webmaster Trends Analyst
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> ---------------------------------------------------------------------
> >>>>>>>>>>> To unsubscribe, e-mail: [hidden email]
> >>>>>>>>>>> For additional commands, e-mail: [hidden email]
> >>>>>>>>>>>
> >>>>>>>>> ---------------------------------------------------------------------
> >>>>>>>>> To unsubscribe, e-mail: [hidden email]
> >>>>>>>>> For additional commands, e-mail: [hidden email]
> >>>>>>>>>
> >>>>>> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail: [hidden email]
> >>>>>> For additional commands, e-mail: [hidden email]
> >>>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: [hidden email]
> >>>>> For additional commands, e-mail: [hidden email]
> >>>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: [hidden email]
> >>>> For additional commands, e-mail: [hidden email]
> >>>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: [hidden email]
> >>> For additional commands, e-mail: [hidden email]
> >>>
> >>
> >> --
> >> Rory O'Farrell <[hidden email]>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [hidden email]
> >> For additional commands, e-mail: [hidden email]
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


--
Rory O'Farrell <[hidden email]>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Critical issue on forum.openoffice.org and Google Search

Hagar Delest-2
In reply to this post by Peter Kovacs-3
Hi Peter,

I noticed that Google provides hits nevertheless. But the first line
does tell that there are no hits with the specified string.

Hagar

Le 18/05/2020 à 18:48, Peter Kovacs a écrit :

> Im am already at it. It worked for me so far. I get search
> results.Maybe it has to do with the cache.
>
> Not sure.
>
> Am 18.05.20 um 18:22 schrieb Rory O'Farrell:
>> On Mon, 18 May 2020 15:44:42 +0100
>> Rory O'Farrell <[hidden email]> wrote:
>>
>>> On Tue, 12 May 2020 17:41:09 +0200
>>> Peter Kovacs <[hidden email]> wrote:
>>>
>>>> Okay, I had a short debug session with Dave and Humbedooh.
>>>>
>>>> We are now sure that the crawlers are not blocked. The 301 Response
>>>> comes from the fact that Yandex still defaults to http and not https.
>>>
>>> This post on User Forum might be relevant
>>> https://forum.openoffice.org/en/forum/viewtopic.php?f=50&t=102021#p492756 
>>>
>>>
>>> Rory
>> More detailed examination today shows that
>> Google search in French seems to drop out six days ago, in Italian
>> five days ago, and in English about 23rd April - try a search for
>> openoffice and the site specifier
>>
>> See the above URL for details.
>>
>> Rory
>>
>>
>>>> After I added https toi the URL all worked fine.
>>>>
>>>> Wave did also do a curl request which also worked fine.
>>>>
>>>>
>>>> We have agreed now that I play the ball back to google, with the
>>>> feedback that this looks like a Google internal issue.
>>>>
>>>> The Robot.txt has not been changed for 11 years. Yandex can crawl the
>>>> URL and we can curl the Webpage. So we think it is an Google Issue.
>>>>
>>>>
>>>> I very much appreciated the quick session. Thanks.
>>>>
>>>>
>>>> all the Best
>>>>
>>>> Peter
>>>>
>>>> Am 12.05.20 um 17:24 schrieb Dave Fisher:
>>>>> It’s not an IP Ban. Infra tells me that would not be a 301.
>>>>>
>>>>> Ah-ha - here is the 301:
>>>>>
>>>>> % curl -D headers http://forum.openoffice.org/
>>>>> <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
>>>>> <html><head>
>>>>> <title>301 Moved Permanently</title>
>>>>> </head><body>
>>>>> <h1>Moved Permanently</h1>
>>>>> <p>The document has moved <a
>>>>> href="https://forum.openoffice.org/">here</a>.</p>
>>>>> </body></html>
>>>>>
>>>>> Surprising that they cannot shift from HTTP to HTTPS via a 301!
>>>>>
>>>>> Regards,
>>>>> Dave
>>>>>
>>>>>> On May 12, 2020, at 8:04 AM, Dave Fisher <[hidden email]> wrote:
>>>>>>
>>>>>> Information about Infra IP Bans is here:
>>>>>> https://infra.apache.org/infra-ban.html
>>>>>>
>>>>>> Please direct the Google engineer to that resource.
>>>>>>
>>>>>> Regards,
>>>>>> Dave
>>>>>>
>>>>>>> On May 12, 2020, at 7:55 AM, Dave Fisher <[hidden email]> wrote:
>>>>>>>
>>>>>>> Are you sure you weren’t using forums.openoffice.org instead of
>>>>>>> forum.openoffice.org?
>>>>>>>
>>>>>>> curl -D headers https://forum.openoffice.org/ does return the
>>>>>>> correct page.
>>>>>>>
>>>>>>> The robots.txt is this:
>>>>>>>
>>>>>>> curl -D headers https://forum.openoffice.org/robots.txt
>>>>>>> User-agent: *
>>>>>>> Crawl-delay: 1
>>>>>>> Disallow: /en/forum/common.php
>>>>>>> Disallow: /en/forum/config.php
>>>>>>> Disallow: /en/forum/con.php
>>>>>>> Disallow: /en/forum/faq.php
>>>>>>> Disallow: /en/forum/mcp.php
>>>>>>> Disallow: /en/forum/memberlist.php
>>>>>>> Disallow: /en/forum/posting.php
>>>>>>> Disallow: /en/forum/report.php
>>>>>>> Disallow: /en/forum/search.php
>>>>>>> Disallow: /en/forum/style.php
>>>>>>> Disallow: /en/forum/ucp.php
>>>>>>> Disallow: /en/forum/viewonline.php
>>>>>>> Disallow: /en/forum/adm
>>>>>>> Disallow: /en/forum/cache
>>>>>>> Disallow: /en/forum/docs
>>>>>>> Disallow: /en/forum/files
>>>>>>> Disallow: /en/forum/images
>>>>>>> Disallow: /en/forum/includes
>>>>>>> Disallow: /en/forum/language
>>>>>>> Disallow: /en/forum/store
>>>>>>> Disallow: /en/forum/styles
>>>>>>> Disallow: /es/forum/common.php
>>>>>>> Disallow: /es/forum/config.php
>>>>>>> Disallow: /es/forum/con.php
>>>>>>> Disallow: /es/forum/faq.php
>>>>>>> Disallow: /es/forum/mcp.php
>>>>>>> Disallow: /es/forum/memberlist.php
>>>>>>> Disallow: /es/forum/posting.php
>>>>>>> Disallow: /es/forum/report.php
>>>>>>> Disallow: /es/forum/search.php
>>>>>>> Disallow: /es/forum/style.php
>>>>>>> Disallow: /es/forum/ucp.php
>>>>>>> Disallow: /es/forum/viewonline.php
>>>>>>> Disallow: /es/forum/adm
>>>>>>> Disallow: /es/forum/cache
>>>>>>> Disallow: /es/forum/docs
>>>>>>> Disallow: /es/forum/files
>>>>>>> Disallow: /es/forum/images
>>>>>>> Disallow: /es/forum/includes
>>>>>>> Disallow: /es/forum/language
>>>>>>> Disallow: /es/forum/store
>>>>>>> Disallow: /es/forum/styles
>>>>>>> Disallow: /fr/forum/common.php
>>>>>>> Disallow: /fr/forum/config.php
>>>>>>> Disallow: /fr/forum/con.php
>>>>>>> Disallow: /fr/forum/faq.php
>>>>>>> Disallow: /fr/forum/mcp.php
>>>>>>> Disallow: /fr/forum/memberlist.php
>>>>>>> Disallow: /fr/forum/posting.php
>>>>>>> Disallow: /fr/forum/report.php
>>>>>>> Disallow: /fr/forum/search.php
>>>>>>> Disallow: /fr/forum/style.php
>>>>>>> Disallow: /fr/forum/ucp.php
>>>>>>> Disallow: /fr/forum/viewonline.php
>>>>>>> Disallow: /fr/forum/adm
>>>>>>> Disallow: /fr/forum/cache
>>>>>>> Disallow: /fr/forum/docs
>>>>>>> Disallow: /fr/forum/files
>>>>>>> Disallow: /fr/forum/images
>>>>>>> Disallow: /fr/forum/includes
>>>>>>> Disallow: /fr/forum/language
>>>>>>> Disallow: /fr/forum/store
>>>>>>> Disallow: /fr/forum/styles
>>>>>>> Disallow: /fr/ci-joint
>>>>>>> Disallow: /hu/forum/common.php
>>>>>>> Disallow: /hu/forum/config.php
>>>>>>> Disallow: /hu/forum/con.php
>>>>>>> Disallow: /hu/forum/faq.php
>>>>>>> Disallow: /hu/forum/mcp.php
>>>>>>> Disallow: /hu/forum/memberlist.php
>>>>>>> Disallow: /hu/forum/posting.php
>>>>>>> Disallow: /hu/forum/report.php
>>>>>>> Disallow: /hu/forum/search.php
>>>>>>> Disallow: /hu/forum/style.php
>>>>>>> Disallow: /hu/forum/ucp.php
>>>>>>> Disallow: /hu/forum/viewonline.php
>>>>>>> Disallow: /hu/forum/adm
>>>>>>> Disallow: /hu/forum/cache
>>>>>>> Disallow: /hu/forum/docs
>>>>>>> Disallow: /hu/forum/files
>>>>>>> Disallow: /hu/forum/images
>>>>>>> Disallow: /hu/forum/includes
>>>>>>> Disallow: /hu/forum/language
>>>>>>> Disallow: /hu/forum/store
>>>>>>> Disallow: /hu/forum/styles
>>>>>>> Disallow: /ja/forum/common.php
>>>>>>> Disallow: /ja/forum/config.php
>>>>>>> Disallow: /ja/forum/con.php
>>>>>>> Disallow: /ja/forum/faq.php
>>>>>>> Disallow: /ja/forum/mcp.php
>>>>>>> Disallow: /ja/forum/memberlist.php
>>>>>>> Disallow: /ja/forum/posting.php
>>>>>>> Disallow: /ja/forum/report.php
>>>>>>> Disallow: /ja/forum/search.php
>>>>>>> Disallow: /ja/forum/style.php
>>>>>>> Disallow: /ja/forum/ucp.php
>>>>>>> Disallow: /ja/forum/viewonline.php
>>>>>>> Disallow: /ja/forum/adm
>>>>>>> Disallow: /ja/forum/cache
>>>>>>> Disallow: /ja/forum/docs
>>>>>>> Disallow: /ja/forum/files
>>>>>>> Disallow: /ja/forum/images
>>>>>>> Disallow: /ja/forum/includes
>>>>>>> Disallow: /ja/forum/language
>>>>>>> Disallow: /ja/forum/store
>>>>>>> Disallow: /ja/forum/styles
>>>>>>> Disallow: /test
>>>>>>> Disallow: /nl/forum/common.php
>>>>>>> Disallow: /nl/forum/config.php
>>>>>>> Disallow: /nl/forum/con.php
>>>>>>> Disallow: /nl/forum/faq.php
>>>>>>> Disallow: /nl/forum/mcp.php
>>>>>>> Disallow: /nl/forum/memberlist.php
>>>>>>> Disallow: /nl/forum/posting.php
>>>>>>> Disallow: /nl/forum/report.php
>>>>>>> Disallow: /nl/forum/search.php
>>>>>>> Disallow: /nl/forum/style.php
>>>>>>> Disallow: /nl/forum/ucp.php
>>>>>>> Disallow: /nl/forum/viewonline.php
>>>>>>> Disallow: /nl/forum/adm
>>>>>>> Disallow: /nl/forum/cache
>>>>>>> Disallow: /nl/forum/docs
>>>>>>> Disallow: /nl/forum/files
>>>>>>> Disallow: /nl/forum/images
>>>>>>> Disallow: /nl/forum/includes
>>>>>>> Disallow: /nl/forum/language
>>>>>>> Disallow: /nl/forum/store
>>>>>>> Disallow: /nl/forum/styles
>>>>>>> Disallow: /vi/forum/common.php
>>>>>>> Disallow: /vi/forum/config.php
>>>>>>> Disallow: /vi/forum/con.php
>>>>>>> Disallow: /vi/forum/faq.php
>>>>>>> Disallow: /vi/forum/mcp.php
>>>>>>> Disallow: /vi/forum/memberlist.php
>>>>>>> Disallow: /vi/forum/posting.php
>>>>>>> Disallow: /vi/forum/report.php
>>>>>>> Disallow: /vi/forum/search.php
>>>>>>> Disallow: /vi/forum/style.php
>>>>>>> Disallow: /vi/forum/ucp.php
>>>>>>> Disallow: /vi/forum/viewonline.php
>>>>>>> Disallow: /vi/forum/adm
>>>>>>> Disallow: /vi/forum/cache
>>>>>>> Disallow: /vi/forum/docs
>>>>>>> Disallow: /vi/forum/files
>>>>>>> Disallow: /vi/forum/images
>>>>>>> Disallow: /vi/forum/includes
>>>>>>> Disallow: /vi/forum/language
>>>>>>> Disallow: /vi/forum/store
>>>>>>> Disallow: /vi/forum/styles
>>>>>>> Disallow: /zh/forum/common.php
>>>>>>> Disallow: /zh/forum/config.php
>>>>>>> Disallow: /zh/forum/con.php
>>>>>>> Disallow: /zh/forum/faq.php
>>>>>>> Disallow: /zh/forum/mcp.php
>>>>>>> Disallow: /zh/forum/memberlist.php
>>>>>>> Disallow: /zh/forum/posting.php
>>>>>>> Disallow: /zh/forum/report.php
>>>>>>> Disallow: /zh/forum/search.php
>>>>>>> Disallow: /zh/forum/style.php
>>>>>>> Disallow: /zh/forum/ucp.php
>>>>>>> Disallow: /zh/forum/viewonline.php
>>>>>>> Disallow: /zh/forum/adm
>>>>>>> Disallow: /zh/forum/cache
>>>>>>> Disallow: /zh/forum/docs
>>>>>>> Disallow: /zh/forum/files
>>>>>>> Disallow: /zh/forum/images
>>>>>>> Disallow: /zh/forum/includes
>>>>>>> Disallow: /zh/forum/language
>>>>>>> Disallow: /zh/forum/store
>>>>>>> Disallow: /zh/forum/styles
>>>>>>>
>>>>>>> This has been the robots.txt file since: Last-Modified: Sat, 06
>>>>>>> Jun 2009 23:40:14 GMT
>>>>>>>
>>>>>>> Forum search uses phpBB
>>>>>>>
>>>>>>> We haven’t allowed search engines to crawl forum.openoffice.org
>>>>>>> since before the Oracle donation to the ASF.
>>>>>>>
>>>>>>> Crawlers IP addresses might be blocked by ASF Infra if their use
>>>>>>> is excessive. That could give the 301.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Dave
>>>>>>>
>>>>>>>> On May 12, 2020, at 3:55 AM, Peter Kovacs <[hidden email]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hello all,
>>>>>>>>
>>>>>>>>
>>>>>>>> What I figured is that from the Google search tool the URL
>>>>>>>> forum.openoffice.org is not reachable.
>>>>>>>>
>>>>>>>> So I checked with Duckduckgo (my prefered Search engine), they
>>>>>>>> don't use crawler and point at the infra of Google, Bing and
>>>>>>>> Yandex.
>>>>>>>>
>>>>>>>> I checked then with Bing, but could not figure out to check
>>>>>>>> bots feedback on an URL so I moved on
>>>>>>>>
>>>>>>>> I checked with Yandex. They have a search URL test page. I have
>>>>>>>> entered there forum.openoffice.org
>>>>>>>>
>>>>>>>> The Response is:
>>>>>>>>
>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> * Date: Tue, 12 May 2020 10:37:47 GMT
>>>>>>>> * Server: Apache/2.4.18 (Ubuntu)
>>>>>>>> * Location: https://forum.openoffice.org/
>>>>>>>> * Content-Length: 237
>>>>>>>> * Keep-Alive: timeout=15, max=100
>>>>>>>> * Connection: Keep-Alive
>>>>>>>> * Content-Type: text/html; charset=iso-8859-1
>>>>>>>>
>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> HTTP status code     301 Moved Permanently
>>>>>>>> Server response time     133 ms
>>>>>>>> IP address     54.84.201.130
>>>>>>>> Encoding     UTF-8(unicode-1-1-utf-8, UTF8)
>>>>>>>> Page size     237 B
>>>>>>>>
>>>>>>>>
>>>>>>>> I am not sure, what that means. HTTP Status Code moved
>>>>>>>> Permanently reads wrong. I just dont know if this is the return
>>>>>>>> code from our webservcer or a response code from the crawler.
>>>>>>>> I try to get someone from Infra. Or I'll open a ticket.
>>>>>>>>
>>>>>>>>
>>>>>>>> All the best
>>>>>>>> Peter
>>>>>>>>
>>>>>>>> Am 12.05.20 um 10:39 schrieb Matthias Seidel:
>>>>>>>>> Hi Kay,
>>>>>>>>>
>>>>>>>>> Am 12.05.20 um 01:21 schrieb Kay Schenk:
>>>>>>>>>> On 5/11/20 12:33 PM, Matthias Seidel wrote:
>>>>>>>>>>> Hi Kay,
>>>>>>>>>>>
>>>>>>>>>>> Am 11.05.20 um 21:23 schrieb Kay Schenk:
>>>>>>>>>>>> Hi Peter...
>>>>>>>>>>>>
>>>>>>>>>>>> Since I am a Google Search admin for www.openoffice.org, and
>>>>>>>>>>>> openoffice.apache.org, I got this also. Disclaimer: I have
>>>>>>>>>>>> not done
>>>>>>>>>>>> ANY work with the Google Search apis on these sites in
>>>>>>>>>>>> quite some time.
>>>>>>>>>>>>
>>>>>>>>>>>> I actually was NOT aware forum.openoffice.org was set up to
>>>>>>>>>>>> use Google
>>>>>>>>>>>> Search until I saw this.
>>>>>>>>>>> I think, I added it to the list when we had a discussion
>>>>>>>>>>> about outdated
>>>>>>>>>>> information regarding SourceForge found by Google Search.
>>>>>>>>>>>
>>>>>>>>>>> But I don't have access to forum.openoffice.org, so I could
>>>>>>>>>>> never
>>>>>>>>>>> complete the step.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>>
>>>>>>>>>>>     Matthias
>>>>>>>>>> OK. In the top level of the website source, there is a file
>>>>>>>>>> called
>>>>>>>>>> "skeleton.html" which references the following bit of code --
>>>>>>>>>>
>>>>>>>>>> <!--#include virtual="/scripts/google-analytics.js" -->
>>>>>>>>>>
>>>>>>>>>> I didn't dig far enough to find how "skeleton.html" is used ( I
>>>>>>>>>> forgot) but this this is example for the google-analytics
>>>>>>>>>> code snippet
>>>>>>>>>> that is used. Basically, this needs to be included in the
>>>>>>>>>> site you
>>>>>>>>>> want analytics to be used on by putting it in the (header)
>>>>>>>>>> files that
>>>>>>>>>> generate the site. And, you might  take a look at recent
>>>>>>>>>> instructions
>>>>>>>>>> from Google. Things change.
>>>>>>>>>>
>>>>>>>>>> https://support.google.com/analytics/answer/1008080
>>>>>>>>> Yes, but this is for Google Analytics. I wouldn't want to
>>>>>>>>> "analyze" the
>>>>>>>>> forum...
>>>>>>>>> The procedure for the Google Search Console is the same, it
>>>>>>>>> needs access
>>>>>>>>> to the root directory.
>>>>>>>>>
>>>>>>>>> Maybe Andrea can help if he is available again?
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>>    Matthias
>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>>> Kay
>>>>>>>>>>
>>>>>>>>>>>> One of the Google Search admins for forum.openoffice.org
>>>>>>>>>>>> could check
>>>>>>>>>>>> the current Google search apis that are in use on that
>>>>>>>>>>>> site. Changes
>>>>>>>>>>>> are occasionally made to the calls, and maybe that is the
>>>>>>>>>>>> issue, or a
>>>>>>>>>>>> robots.txt for that site is causing this. I don't think it
>>>>>>>>>>>> requires a
>>>>>>>>>>>> response, but maybe some investigation.
>>>>>>>>>>>>
>>>>>>>>>>>> Just some ideas...
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>>
>>>>>>>>>>>> Kay
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 5/11/20 6:02 AM, Peter Kovacs wrote:
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have received following mail. Probably because I am
>>>>>>>>>>>>> listed in the
>>>>>>>>>>>>> google-Analytics page.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Does this has some action items? What can we answer Mr
>>>>>>>>>>>>> John Mueller?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> All the Best
>>>>>>>>>>>>>
>>>>>>>>>>>>> Peter
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> -------- Weitergeleitete Nachricht --------
>>>>>>>>>>>>> Betreff:     Critical issue on forum.openoffice.org and
>>>>>>>>>>>>> Google Search
>>>>>>>>>>>>> Datum:     Mon, 11 May 2020 13:37:27 +0200
>>>>>>>>>>>>> Von:     John Mueller <[hidden email]>
>>>>>>>>>>>>> An:     [hidden email], [hidden email],
>>>>>>>>>>>>> [hidden email]
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Dear webmaster of forum.openoffice.org
>>>>>>>>>>>>> <http://forum.openoffice.org>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm an analyst at Google in Switzerland. We wanted to
>>>>>>>>>>>>> bring your
>>>>>>>>>>>>> attention to a critical issue with your website, and how it's
>>>>>>>>>>>>> available for Google's web search.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In particular, Googlebot has been unable to crawl URLs from
>>>>>>>>>>>>> https://forum.openoffice.org/ . This will cause those
>>>>>>>>>>>>> pages to drop
>>>>>>>>>>>>> out of Google's search results, and will prevent new pages
>>>>>>>>>>>>> from being
>>>>>>>>>>>>> picked up for Search. If you're not aware of this issue,
>>>>>>>>>>>>> you may be
>>>>>>>>>>>>> accidentally blocking these pages from Google Search due
>>>>>>>>>>>>> to a server
>>>>>>>>>>>>> issue. If you need to block Googlebot from crawling pages
>>>>>>>>>>>>> on your
>>>>>>>>>>>>> website, we'd recommend using the robots.txt file instead.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Should you need to recognize IP addresses of Googlebot
>>>>>>>>>>>>> requests, you
>>>>>>>>>>>>> can use a reverse IP lookup to do so:
>>>>>>>>>>>>> https://support.google.com/webmasters/answer/80553
>>>>>>>>>>>>>
>>>>>>>>>>>>> Should you have any questions, feel free to contact me
>>>>>>>>>>>>> directly. For
>>>>>>>>>>>>> verification purposes, we are sending a copy of this
>>>>>>>>>>>>> message to your
>>>>>>>>>>>>> site's Search Console account.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thank you,
>>>>>>>>>>>>> John Mueller ([hidden email] <mailto:[hidden email]>)
>>>>>>>>>>>>> Webmaster Trends Analyst
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>
>>>>>>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>>>>>>> For additional commands, e-mail:
>>>>>>>>>>>> [hidden email]
>>>>>>>>>>>>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>
>>>>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>>
>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>>
>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [hidden email]
>>>>> For additional commands, e-mail: [hidden email]
>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>
>>> --
>>> Rory O'Farrell <[hidden email]>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Critical issue on forum.openoffice.org and Google Search

Peter Kovacs-3
what is your search string? I do not get the line that Google has no hits.

Am 18.05.20 um 22:20 schrieb Hagar Delest:

> Hi Peter,
>
> I noticed that Google provides hits nevertheless. But the first line
> does tell that there are no hits with the specified string.
>
> Hagar
>
> Le 18/05/2020 à 18:48, Peter Kovacs a écrit :
>> Im am already at it. It worked for me so far. I get search
>> results.Maybe it has to do with the cache.
>>
>> Not sure.
>>
>> Am 18.05.20 um 18:22 schrieb Rory O'Farrell:
>>> On Mon, 18 May 2020 15:44:42 +0100
>>> Rory O'Farrell <[hidden email]> wrote:
>>>
>>>> On Tue, 12 May 2020 17:41:09 +0200
>>>> Peter Kovacs <[hidden email]> wrote:
>>>>
>>>>> Okay, I had a short debug session with Dave and Humbedooh.
>>>>>
>>>>> We are now sure that the crawlers are not blocked. The 301 Response
>>>>> comes from the fact that Yandex still defaults to http and not https.
>>>>
>>>> This post on User Forum might be relevant
>>>> https://forum.openoffice.org/en/forum/viewtopic.php?f=50&t=102021#p492756 
>>>>
>>>>
>>>> Rory
>>> More detailed examination today shows that
>>> Google search in French seems to drop out six days ago, in Italian
>>> five days ago, and in English about 23rd April - try a search for
>>> openoffice and the site specifier
>>>
>>> See the above URL for details.
>>>
>>> Rory
>>>
>>>
>>>>> After I added https toi the URL all worked fine.
>>>>>
>>>>> Wave did also do a curl request which also worked fine.
>>>>>
>>>>>
>>>>> We have agreed now that I play the ball back to google, with the
>>>>> feedback that this looks like a Google internal issue.
>>>>>
>>>>> The Robot.txt has not been changed for 11 years. Yandex can crawl the
>>>>> URL and we can curl the Webpage. So we think it is an Google Issue.
>>>>>
>>>>>
>>>>> I very much appreciated the quick session. Thanks.
>>>>>
>>>>>
>>>>> all the Best
>>>>>
>>>>> Peter
>>>>>
>>>>> Am 12.05.20 um 17:24 schrieb Dave Fisher:
>>>>>> It’s not an IP Ban. Infra tells me that would not be a 301.
>>>>>>
>>>>>> Ah-ha - here is the 301:
>>>>>>
>>>>>> % curl -D headers http://forum.openoffice.org/
>>>>>> <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
>>>>>> <html><head>
>>>>>> <title>301 Moved Permanently</title>
>>>>>> </head><body>
>>>>>> <h1>Moved Permanently</h1>
>>>>>> <p>The document has moved <a
>>>>>> href="https://forum.openoffice.org/">here</a>.</p>
>>>>>> </body></html>
>>>>>>
>>>>>> Surprising that they cannot shift from HTTP to HTTPS via a 301!
>>>>>>
>>>>>> Regards,
>>>>>> Dave
>>>>>>
>>>>>>> On May 12, 2020, at 8:04 AM, Dave Fisher <[hidden email]> wrote:
>>>>>>>
>>>>>>> Information about Infra IP Bans is here:
>>>>>>> https://infra.apache.org/infra-ban.html
>>>>>>>
>>>>>>> Please direct the Google engineer to that resource.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Dave
>>>>>>>
>>>>>>>> On May 12, 2020, at 7:55 AM, Dave Fisher <[hidden email]> wrote:
>>>>>>>>
>>>>>>>> Are you sure you weren’t using forums.openoffice.org instead of
>>>>>>>> forum.openoffice.org?
>>>>>>>>
>>>>>>>> curl -D headers https://forum.openoffice.org/ does return the
>>>>>>>> correct page.
>>>>>>>>
>>>>>>>> The robots.txt is this:
>>>>>>>>
>>>>>>>> curl -D headers https://forum.openoffice.org/robots.txt
>>>>>>>> User-agent: *
>>>>>>>> Crawl-delay: 1
>>>>>>>> Disallow: /en/forum/common.php
>>>>>>>> Disallow: /en/forum/config.php
>>>>>>>> Disallow: /en/forum/con.php
>>>>>>>> Disallow: /en/forum/faq.php
>>>>>>>> Disallow: /en/forum/mcp.php
>>>>>>>> Disallow: /en/forum/memberlist.php
>>>>>>>> Disallow: /en/forum/posting.php
>>>>>>>> Disallow: /en/forum/report.php
>>>>>>>> Disallow: /en/forum/search.php
>>>>>>>> Disallow: /en/forum/style.php
>>>>>>>> Disallow: /en/forum/ucp.php
>>>>>>>> Disallow: /en/forum/viewonline.php
>>>>>>>> Disallow: /en/forum/adm
>>>>>>>> Disallow: /en/forum/cache
>>>>>>>> Disallow: /en/forum/docs
>>>>>>>> Disallow: /en/forum/files
>>>>>>>> Disallow: /en/forum/images
>>>>>>>> Disallow: /en/forum/includes
>>>>>>>> Disallow: /en/forum/language
>>>>>>>> Disallow: /en/forum/store
>>>>>>>> Disallow: /en/forum/styles
>>>>>>>> Disallow: /es/forum/common.php
>>>>>>>> Disallow: /es/forum/config.php
>>>>>>>> Disallow: /es/forum/con.php
>>>>>>>> Disallow: /es/forum/faq.php
>>>>>>>> Disallow: /es/forum/mcp.php
>>>>>>>> Disallow: /es/forum/memberlist.php
>>>>>>>> Disallow: /es/forum/posting.php
>>>>>>>> Disallow: /es/forum/report.php
>>>>>>>> Disallow: /es/forum/search.php
>>>>>>>> Disallow: /es/forum/style.php
>>>>>>>> Disallow: /es/forum/ucp.php
>>>>>>>> Disallow: /es/forum/viewonline.php
>>>>>>>> Disallow: /es/forum/adm
>>>>>>>> Disallow: /es/forum/cache
>>>>>>>> Disallow: /es/forum/docs
>>>>>>>> Disallow: /es/forum/files
>>>>>>>> Disallow: /es/forum/images
>>>>>>>> Disallow: /es/forum/includes
>>>>>>>> Disallow: /es/forum/language
>>>>>>>> Disallow: /es/forum/store
>>>>>>>> Disallow: /es/forum/styles
>>>>>>>> Disallow: /fr/forum/common.php
>>>>>>>> Disallow: /fr/forum/config.php
>>>>>>>> Disallow: /fr/forum/con.php
>>>>>>>> Disallow: /fr/forum/faq.php
>>>>>>>> Disallow: /fr/forum/mcp.php
>>>>>>>> Disallow: /fr/forum/memberlist.php
>>>>>>>> Disallow: /fr/forum/posting.php
>>>>>>>> Disallow: /fr/forum/report.php
>>>>>>>> Disallow: /fr/forum/search.php
>>>>>>>> Disallow: /fr/forum/style.php
>>>>>>>> Disallow: /fr/forum/ucp.php
>>>>>>>> Disallow: /fr/forum/viewonline.php
>>>>>>>> Disallow: /fr/forum/adm
>>>>>>>> Disallow: /fr/forum/cache
>>>>>>>> Disallow: /fr/forum/docs
>>>>>>>> Disallow: /fr/forum/files
>>>>>>>> Disallow: /fr/forum/images
>>>>>>>> Disallow: /fr/forum/includes
>>>>>>>> Disallow: /fr/forum/language
>>>>>>>> Disallow: /fr/forum/store
>>>>>>>> Disallow: /fr/forum/styles
>>>>>>>> Disallow: /fr/ci-joint
>>>>>>>> Disallow: /hu/forum/common.php
>>>>>>>> Disallow: /hu/forum/config.php
>>>>>>>> Disallow: /hu/forum/con.php
>>>>>>>> Disallow: /hu/forum/faq.php
>>>>>>>> Disallow: /hu/forum/mcp.php
>>>>>>>> Disallow: /hu/forum/memberlist.php
>>>>>>>> Disallow: /hu/forum/posting.php
>>>>>>>> Disallow: /hu/forum/report.php
>>>>>>>> Disallow: /hu/forum/search.php
>>>>>>>> Disallow: /hu/forum/style.php
>>>>>>>> Disallow: /hu/forum/ucp.php
>>>>>>>> Disallow: /hu/forum/viewonline.php
>>>>>>>> Disallow: /hu/forum/adm
>>>>>>>> Disallow: /hu/forum/cache
>>>>>>>> Disallow: /hu/forum/docs
>>>>>>>> Disallow: /hu/forum/files
>>>>>>>> Disallow: /hu/forum/images
>>>>>>>> Disallow: /hu/forum/includes
>>>>>>>> Disallow: /hu/forum/language
>>>>>>>> Disallow: /hu/forum/store
>>>>>>>> Disallow: /hu/forum/styles
>>>>>>>> Disallow: /ja/forum/common.php
>>>>>>>> Disallow: /ja/forum/config.php
>>>>>>>> Disallow: /ja/forum/con.php
>>>>>>>> Disallow: /ja/forum/faq.php
>>>>>>>> Disallow: /ja/forum/mcp.php
>>>>>>>> Disallow: /ja/forum/memberlist.php
>>>>>>>> Disallow: /ja/forum/posting.php
>>>>>>>> Disallow: /ja/forum/report.php
>>>>>>>> Disallow: /ja/forum/search.php
>>>>>>>> Disallow: /ja/forum/style.php
>>>>>>>> Disallow: /ja/forum/ucp.php
>>>>>>>> Disallow: /ja/forum/viewonline.php
>>>>>>>> Disallow: /ja/forum/adm
>>>>>>>> Disallow: /ja/forum/cache
>>>>>>>> Disallow: /ja/forum/docs
>>>>>>>> Disallow: /ja/forum/files
>>>>>>>> Disallow: /ja/forum/images
>>>>>>>> Disallow: /ja/forum/includes
>>>>>>>> Disallow: /ja/forum/language
>>>>>>>> Disallow: /ja/forum/store
>>>>>>>> Disallow: /ja/forum/styles
>>>>>>>> Disallow: /test
>>>>>>>> Disallow: /nl/forum/common.php
>>>>>>>> Disallow: /nl/forum/config.php
>>>>>>>> Disallow: /nl/forum/con.php
>>>>>>>> Disallow: /nl/forum/faq.php
>>>>>>>> Disallow: /nl/forum/mcp.php
>>>>>>>> Disallow: /nl/forum/memberlist.php
>>>>>>>> Disallow: /nl/forum/posting.php
>>>>>>>> Disallow: /nl/forum/report.php
>>>>>>>> Disallow: /nl/forum/search.php
>>>>>>>> Disallow: /nl/forum/style.php
>>>>>>>> Disallow: /nl/forum/ucp.php
>>>>>>>> Disallow: /nl/forum/viewonline.php
>>>>>>>> Disallow: /nl/forum/adm
>>>>>>>> Disallow: /nl/forum/cache
>>>>>>>> Disallow: /nl/forum/docs
>>>>>>>> Disallow: /nl/forum/files
>>>>>>>> Disallow: /nl/forum/images
>>>>>>>> Disallow: /nl/forum/includes
>>>>>>>> Disallow: /nl/forum/language
>>>>>>>> Disallow: /nl/forum/store
>>>>>>>> Disallow: /nl/forum/styles
>>>>>>>> Disallow: /vi/forum/common.php
>>>>>>>> Disallow: /vi/forum/config.php
>>>>>>>> Disallow: /vi/forum/con.php
>>>>>>>> Disallow: /vi/forum/faq.php
>>>>>>>> Disallow: /vi/forum/mcp.php
>>>>>>>> Disallow: /vi/forum/memberlist.php
>>>>>>>> Disallow: /vi/forum/posting.php
>>>>>>>> Disallow: /vi/forum/report.php
>>>>>>>> Disallow: /vi/forum/search.php
>>>>>>>> Disallow: /vi/forum/style.php
>>>>>>>> Disallow: /vi/forum/ucp.php
>>>>>>>> Disallow: /vi/forum/viewonline.php
>>>>>>>> Disallow: /vi/forum/adm
>>>>>>>> Disallow: /vi/forum/cache
>>>>>>>> Disallow: /vi/forum/docs
>>>>>>>> Disallow: /vi/forum/files
>>>>>>>> Disallow: /vi/forum/images
>>>>>>>> Disallow: /vi/forum/includes
>>>>>>>> Disallow: /vi/forum/language
>>>>>>>> Disallow: /vi/forum/store
>>>>>>>> Disallow: /vi/forum/styles
>>>>>>>> Disallow: /zh/forum/common.php
>>>>>>>> Disallow: /zh/forum/config.php
>>>>>>>> Disallow: /zh/forum/con.php
>>>>>>>> Disallow: /zh/forum/faq.php
>>>>>>>> Disallow: /zh/forum/mcp.php
>>>>>>>> Disallow: /zh/forum/memberlist.php
>>>>>>>> Disallow: /zh/forum/posting.php
>>>>>>>> Disallow: /zh/forum/report.php
>>>>>>>> Disallow: /zh/forum/search.php
>>>>>>>> Disallow: /zh/forum/style.php
>>>>>>>> Disallow: /zh/forum/ucp.php
>>>>>>>> Disallow: /zh/forum/viewonline.php
>>>>>>>> Disallow: /zh/forum/adm
>>>>>>>> Disallow: /zh/forum/cache
>>>>>>>> Disallow: /zh/forum/docs
>>>>>>>> Disallow: /zh/forum/files
>>>>>>>> Disallow: /zh/forum/images
>>>>>>>> Disallow: /zh/forum/includes
>>>>>>>> Disallow: /zh/forum/language
>>>>>>>> Disallow: /zh/forum/store
>>>>>>>> Disallow: /zh/forum/styles
>>>>>>>>
>>>>>>>> This has been the robots.txt file since: Last-Modified: Sat, 06
>>>>>>>> Jun 2009 23:40:14 GMT
>>>>>>>>
>>>>>>>> Forum search uses phpBB
>>>>>>>>
>>>>>>>> We haven’t allowed search engines to crawl forum.openoffice.org
>>>>>>>> since before the Oracle donation to the ASF.
>>>>>>>>
>>>>>>>> Crawlers IP addresses might be blocked by ASF Infra if their
>>>>>>>> use is excessive. That could give the 301.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Dave
>>>>>>>>
>>>>>>>>> On May 12, 2020, at 3:55 AM, Peter Kovacs <[hidden email]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hello all,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> What I figured is that from the Google search tool the URL
>>>>>>>>> forum.openoffice.org is not reachable.
>>>>>>>>>
>>>>>>>>> So I checked with Duckduckgo (my prefered Search engine), they
>>>>>>>>> don't use crawler and point at the infra of Google, Bing and
>>>>>>>>> Yandex.
>>>>>>>>>
>>>>>>>>> I checked then with Bing, but could not figure out to check
>>>>>>>>> bots feedback on an URL so I moved on
>>>>>>>>>
>>>>>>>>> I checked with Yandex. They have a search URL test page. I
>>>>>>>>> have entered there forum.openoffice.org
>>>>>>>>>
>>>>>>>>> The Response is:
>>>>>>>>>
>>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> * Date: Tue, 12 May 2020 10:37:47 GMT
>>>>>>>>> * Server: Apache/2.4.18 (Ubuntu)
>>>>>>>>> * Location: https://forum.openoffice.org/
>>>>>>>>> * Content-Length: 237
>>>>>>>>> * Keep-Alive: timeout=15, max=100
>>>>>>>>> * Connection: Keep-Alive
>>>>>>>>> * Content-Type: text/html; charset=iso-8859-1
>>>>>>>>>
>>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> HTTP status code     301 Moved Permanently
>>>>>>>>> Server response time     133 ms
>>>>>>>>> IP address     54.84.201.130
>>>>>>>>> Encoding     UTF-8(unicode-1-1-utf-8, UTF8)
>>>>>>>>> Page size     237 B
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I am not sure, what that means. HTTP Status Code moved
>>>>>>>>> Permanently reads wrong. I just dont know if this is the
>>>>>>>>> return code from our webservcer or a response code from the
>>>>>>>>> crawler.
>>>>>>>>> I try to get someone from Infra. Or I'll open a ticket.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> All the best
>>>>>>>>> Peter
>>>>>>>>>
>>>>>>>>> Am 12.05.20 um 10:39 schrieb Matthias Seidel:
>>>>>>>>>> Hi Kay,
>>>>>>>>>>
>>>>>>>>>> Am 12.05.20 um 01:21 schrieb Kay Schenk:
>>>>>>>>>>> On 5/11/20 12:33 PM, Matthias Seidel wrote:
>>>>>>>>>>>> Hi Kay,
>>>>>>>>>>>>
>>>>>>>>>>>> Am 11.05.20 um 21:23 schrieb Kay Schenk:
>>>>>>>>>>>>> Hi Peter...
>>>>>>>>>>>>>
>>>>>>>>>>>>> Since I am a Google Search admin for www.openoffice.org, and
>>>>>>>>>>>>> openoffice.apache.org, I got this also. Disclaimer: I have
>>>>>>>>>>>>> not done
>>>>>>>>>>>>> ANY work with the Google Search apis on these sites in
>>>>>>>>>>>>> quite some time.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I actually was NOT aware forum.openoffice.org was set up
>>>>>>>>>>>>> to use Google
>>>>>>>>>>>>> Search until I saw this.
>>>>>>>>>>>> I think, I added it to the list when we had a discussion
>>>>>>>>>>>> about outdated
>>>>>>>>>>>> information regarding SourceForge found by Google Search.
>>>>>>>>>>>>
>>>>>>>>>>>> But I don't have access to forum.openoffice.org, so I could
>>>>>>>>>>>> never
>>>>>>>>>>>> complete the step.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>>
>>>>>>>>>>>>     Matthias
>>>>>>>>>>> OK. In the top level of the website source, there is a file
>>>>>>>>>>> called
>>>>>>>>>>> "skeleton.html" which references the following bit of code --
>>>>>>>>>>>
>>>>>>>>>>> <!--#include virtual="/scripts/google-analytics.js" -->
>>>>>>>>>>>
>>>>>>>>>>> I didn't dig far enough to find how "skeleton.html" is used ( I
>>>>>>>>>>> forgot) but this this is example for the google-analytics
>>>>>>>>>>> code snippet
>>>>>>>>>>> that is used. Basically, this needs to be included in the
>>>>>>>>>>> site you
>>>>>>>>>>> want analytics to be used on by putting it in the (header)
>>>>>>>>>>> files that
>>>>>>>>>>> generate the site. And, you might  take a look at recent
>>>>>>>>>>> instructions
>>>>>>>>>>> from Google. Things change.
>>>>>>>>>>>
>>>>>>>>>>> https://support.google.com/analytics/answer/1008080
>>>>>>>>>> Yes, but this is for Google Analytics. I wouldn't want to
>>>>>>>>>> "analyze" the
>>>>>>>>>> forum...
>>>>>>>>>> The procedure for the Google Search Console is the same, it
>>>>>>>>>> needs access
>>>>>>>>>> to the root directory.
>>>>>>>>>>
>>>>>>>>>> Maybe Andrea can help if he is available again?
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>>>    Matthias
>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>>
>>>>>>>>>>> Kay
>>>>>>>>>>>
>>>>>>>>>>>>> One of the Google Search admins for forum.openoffice.org
>>>>>>>>>>>>> could check
>>>>>>>>>>>>> the current Google search apis that are in use on that
>>>>>>>>>>>>> site. Changes
>>>>>>>>>>>>> are occasionally made to the calls, and maybe that is the
>>>>>>>>>>>>> issue, or a
>>>>>>>>>>>>> robots.txt for that site is causing this. I don't think it
>>>>>>>>>>>>> requires a
>>>>>>>>>>>>> response, but maybe some investigation.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Just some ideas...
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Kay
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 5/11/20 6:02 AM, Peter Kovacs wrote:
>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have received following mail. Probably because I am
>>>>>>>>>>>>>> listed in the
>>>>>>>>>>>>>> google-Analytics page.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Does this has some action items? What can we answer Mr
>>>>>>>>>>>>>> John Mueller?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> All the Best
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Peter
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -------- Weitergeleitete Nachricht --------
>>>>>>>>>>>>>> Betreff:     Critical issue on forum.openoffice.org and
>>>>>>>>>>>>>> Google Search
>>>>>>>>>>>>>> Datum:     Mon, 11 May 2020 13:37:27 +0200
>>>>>>>>>>>>>> Von:     John Mueller <[hidden email]>
>>>>>>>>>>>>>> An:     [hidden email], [hidden email],
>>>>>>>>>>>>>> [hidden email]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Dear webmaster of forum.openoffice.org
>>>>>>>>>>>>>> <http://forum.openoffice.org>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'm an analyst at Google in Switzerland. We wanted to
>>>>>>>>>>>>>> bring your
>>>>>>>>>>>>>> attention to a critical issue with your website, and how
>>>>>>>>>>>>>> it's
>>>>>>>>>>>>>> available for Google's web search.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In particular, Googlebot has been unable to crawl URLs from
>>>>>>>>>>>>>> https://forum.openoffice.org/ . This will cause those
>>>>>>>>>>>>>> pages to drop
>>>>>>>>>>>>>> out of Google's search results, and will prevent new
>>>>>>>>>>>>>> pages from being
>>>>>>>>>>>>>> picked up for Search. If you're not aware of this issue,
>>>>>>>>>>>>>> you may be
>>>>>>>>>>>>>> accidentally blocking these pages from Google Search due
>>>>>>>>>>>>>> to a server
>>>>>>>>>>>>>> issue. If you need to block Googlebot from crawling pages
>>>>>>>>>>>>>> on your
>>>>>>>>>>>>>> website, we'd recommend using the robots.txt file instead.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Should you need to recognize IP addresses of Googlebot
>>>>>>>>>>>>>> requests, you
>>>>>>>>>>>>>> can use a reverse IP lookup to do so:
>>>>>>>>>>>>>> https://support.google.com/webmasters/answer/80553
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Should you have any questions, feel free to contact me
>>>>>>>>>>>>>> directly. For
>>>>>>>>>>>>>> verification purposes, we are sending a copy of this
>>>>>>>>>>>>>> message to your
>>>>>>>>>>>>>> site's Search Console account.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thank you,
>>>>>>>>>>>>>> John Mueller ([hidden email] <mailto:[hidden email]>)
>>>>>>>>>>>>>> Webmaster Trends Analyst
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>>
>>>>>>>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>>>>>>>> For additional commands, e-mail:
>>>>>>>>>>>>> [hidden email]
>>>>>>>>>>>>>
>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>
>>>>>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>
>>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>>
>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>>
>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [hidden email]
>>>>> For additional commands, e-mail: [hidden email]
>>>>>
>>>>
>>>> --
>>>> Rory O'Farrell <[hidden email]>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Critical issue on forum.openoffice.org and Google Search

Hagar Delest-2
Le 19/05/2020 à 08:35, Peter Kovacs a écrit :
> what is your search string? I do not get the line that Google has no hits.
The string is the one in the thread in the forum:
"text lines are overwriting margins" site:forum.openoffice.org

The result page says (in French):
No result found for...
Then it says:
Results for... (without quotes)
And a list of topics from the forum but no match for the exact string of
course.

I've posted a screenshot in the forum:
https://forum.openoffice.org/en/forum/viewtopic.php?f=50&t=102021&p=492807#p492807

Hagar

>
> Am 18.05.20 um 22:20 schrieb Hagar Delest:
>> Hi Peter,
>>
>> I noticed that Google provides hits nevertheless. But the first line
>> does tell that there are no hits with the specified string.
>>
>> Hagar
>>
>> Le 18/05/2020 à 18:48, Peter Kovacs a écrit :
>>> Im am already at it. It worked for me so far. I get search
>>> results.Maybe it has to do with the cache.
>>>
>>> Not sure.
>>>
>>> Am 18.05.20 um 18:22 schrieb Rory O'Farrell:
>>>> On Mon, 18 May 2020 15:44:42 +0100
>>>> Rory O'Farrell <[hidden email]> wrote:
>>>>
>>>>> On Tue, 12 May 2020 17:41:09 +0200
>>>>> Peter Kovacs <[hidden email]> wrote:
>>>>>
>>>>>> Okay, I had a short debug session with Dave and Humbedooh.
>>>>>>
>>>>>> We are now sure that the crawlers are not blocked. The 301 Response
>>>>>> comes from the fact that Yandex still defaults to http and not
>>>>>> https.
>>>>>
>>>>> This post on User Forum might be relevant
>>>>> https://forum.openoffice.org/en/forum/viewtopic.php?f=50&t=102021#p492756 
>>>>>
>>>>>
>>>>> Rory
>>>> More detailed examination today shows that
>>>> Google search in French seems to drop out six days ago, in Italian
>>>> five days ago, and in English about 23rd April - try a search for
>>>> openoffice and the site specifier
>>>>
>>>> See the above URL for details.
>>>>
>>>> Rory
>>>>
>>>>
>>>>>> After I added https toi the URL all worked fine.
>>>>>>
>>>>>> Wave did also do a curl request which also worked fine.
>>>>>>
>>>>>>
>>>>>> We have agreed now that I play the ball back to google, with the
>>>>>> feedback that this looks like a Google internal issue.
>>>>>>
>>>>>> The Robot.txt has not been changed for 11 years. Yandex can crawl
>>>>>> the
>>>>>> URL and we can curl the Webpage. So we think it is an Google Issue.
>>>>>>
>>>>>>
>>>>>> I very much appreciated the quick session. Thanks.
>>>>>>
>>>>>>
>>>>>> all the Best
>>>>>>
>>>>>> Peter
>>>>>>
>>>>>> Am 12.05.20 um 17:24 schrieb Dave Fisher:
>>>>>>> It’s not an IP Ban. Infra tells me that would not be a 301.
>>>>>>>
>>>>>>> Ah-ha - here is the 301:
>>>>>>>
>>>>>>> % curl -D headers http://forum.openoffice.org/
>>>>>>> <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
>>>>>>> <html><head>
>>>>>>> <title>301 Moved Permanently</title>
>>>>>>> </head><body>
>>>>>>> <h1>Moved Permanently</h1>
>>>>>>> <p>The document has moved <a
>>>>>>> href="https://forum.openoffice.org/">here</a>.</p>
>>>>>>> </body></html>
>>>>>>>
>>>>>>> Surprising that they cannot shift from HTTP to HTTPS via a 301!
>>>>>>>
>>>>>>> Regards,
>>>>>>> Dave
>>>>>>>
>>>>>>>> On May 12, 2020, at 8:04 AM, Dave Fisher <[hidden email]> wrote:
>>>>>>>>
>>>>>>>> Information about Infra IP Bans is here:
>>>>>>>> https://infra.apache.org/infra-ban.html
>>>>>>>>
>>>>>>>> Please direct the Google engineer to that resource.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Dave
>>>>>>>>
>>>>>>>>> On May 12, 2020, at 7:55 AM, Dave Fisher <[hidden email]> wrote:
>>>>>>>>>
>>>>>>>>> Are you sure you weren’t using forums.openoffice.org instead
>>>>>>>>> of forum.openoffice.org?
>>>>>>>>>
>>>>>>>>> curl -D headers https://forum.openoffice.org/ does return the
>>>>>>>>> correct page.
>>>>>>>>>
>>>>>>>>> The robots.txt is this:
>>>>>>>>>
>>>>>>>>> curl -D headers https://forum.openoffice.org/robots.txt
>>>>>>>>> User-agent: *
>>>>>>>>> Crawl-delay: 1
>>>>>>>>> Disallow: /en/forum/common.php
>>>>>>>>> Disallow: /en/forum/config.php
>>>>>>>>> Disallow: /en/forum/con.php
>>>>>>>>> Disallow: /en/forum/faq.php
>>>>>>>>> Disallow: /en/forum/mcp.php
>>>>>>>>> Disallow: /en/forum/memberlist.php
>>>>>>>>> Disallow: /en/forum/posting.php
>>>>>>>>> Disallow: /en/forum/report.php
>>>>>>>>> Disallow: /en/forum/search.php
>>>>>>>>> Disallow: /en/forum/style.php
>>>>>>>>> Disallow: /en/forum/ucp.php
>>>>>>>>> Disallow: /en/forum/viewonline.php
>>>>>>>>> Disallow: /en/forum/adm
>>>>>>>>> Disallow: /en/forum/cache
>>>>>>>>> Disallow: /en/forum/docs
>>>>>>>>> Disallow: /en/forum/files
>>>>>>>>> Disallow: /en/forum/images
>>>>>>>>> Disallow: /en/forum/includes
>>>>>>>>> Disallow: /en/forum/language
>>>>>>>>> Disallow: /en/forum/store
>>>>>>>>> Disallow: /en/forum/styles
>>>>>>>>> Disallow: /es/forum/common.php
>>>>>>>>> Disallow: /es/forum/config.php
>>>>>>>>> Disallow: /es/forum/con.php
>>>>>>>>> Disallow: /es/forum/faq.php
>>>>>>>>> Disallow: /es/forum/mcp.php
>>>>>>>>> Disallow: /es/forum/memberlist.php
>>>>>>>>> Disallow: /es/forum/posting.php
>>>>>>>>> Disallow: /es/forum/report.php
>>>>>>>>> Disallow: /es/forum/search.php
>>>>>>>>> Disallow: /es/forum/style.php
>>>>>>>>> Disallow: /es/forum/ucp.php
>>>>>>>>> Disallow: /es/forum/viewonline.php
>>>>>>>>> Disallow: /es/forum/adm
>>>>>>>>> Disallow: /es/forum/cache
>>>>>>>>> Disallow: /es/forum/docs
>>>>>>>>> Disallow: /es/forum/files
>>>>>>>>> Disallow: /es/forum/images
>>>>>>>>> Disallow: /es/forum/includes
>>>>>>>>> Disallow: /es/forum/language
>>>>>>>>> Disallow: /es/forum/store
>>>>>>>>> Disallow: /es/forum/styles
>>>>>>>>> Disallow: /fr/forum/common.php
>>>>>>>>> Disallow: /fr/forum/config.php
>>>>>>>>> Disallow: /fr/forum/con.php
>>>>>>>>> Disallow: /fr/forum/faq.php
>>>>>>>>> Disallow: /fr/forum/mcp.php
>>>>>>>>> Disallow: /fr/forum/memberlist.php
>>>>>>>>> Disallow: /fr/forum/posting.php
>>>>>>>>> Disallow: /fr/forum/report.php
>>>>>>>>> Disallow: /fr/forum/search.php
>>>>>>>>> Disallow: /fr/forum/style.php
>>>>>>>>> Disallow: /fr/forum/ucp.php
>>>>>>>>> Disallow: /fr/forum/viewonline.php
>>>>>>>>> Disallow: /fr/forum/adm
>>>>>>>>> Disallow: /fr/forum/cache
>>>>>>>>> Disallow: /fr/forum/docs
>>>>>>>>> Disallow: /fr/forum/files
>>>>>>>>> Disallow: /fr/forum/images
>>>>>>>>> Disallow: /fr/forum/includes
>>>>>>>>> Disallow: /fr/forum/language
>>>>>>>>> Disallow: /fr/forum/store
>>>>>>>>> Disallow: /fr/forum/styles
>>>>>>>>> Disallow: /fr/ci-joint
>>>>>>>>> Disallow: /hu/forum/common.php
>>>>>>>>> Disallow: /hu/forum/config.php
>>>>>>>>> Disallow: /hu/forum/con.php
>>>>>>>>> Disallow: /hu/forum/faq.php
>>>>>>>>> Disallow: /hu/forum/mcp.php
>>>>>>>>> Disallow: /hu/forum/memberlist.php
>>>>>>>>> Disallow: /hu/forum/posting.php
>>>>>>>>> Disallow: /hu/forum/report.php
>>>>>>>>> Disallow: /hu/forum/search.php
>>>>>>>>> Disallow: /hu/forum/style.php
>>>>>>>>> Disallow: /hu/forum/ucp.php
>>>>>>>>> Disallow: /hu/forum/viewonline.php
>>>>>>>>> Disallow: /hu/forum/adm
>>>>>>>>> Disallow: /hu/forum/cache
>>>>>>>>> Disallow: /hu/forum/docs
>>>>>>>>> Disallow: /hu/forum/files
>>>>>>>>> Disallow: /hu/forum/images
>>>>>>>>> Disallow: /hu/forum/includes
>>>>>>>>> Disallow: /hu/forum/language
>>>>>>>>> Disallow: /hu/forum/store
>>>>>>>>> Disallow: /hu/forum/styles
>>>>>>>>> Disallow: /ja/forum/common.php
>>>>>>>>> Disallow: /ja/forum/config.php
>>>>>>>>> Disallow: /ja/forum/con.php
>>>>>>>>> Disallow: /ja/forum/faq.php
>>>>>>>>> Disallow: /ja/forum/mcp.php
>>>>>>>>> Disallow: /ja/forum/memberlist.php
>>>>>>>>> Disallow: /ja/forum/posting.php
>>>>>>>>> Disallow: /ja/forum/report.php
>>>>>>>>> Disallow: /ja/forum/search.php
>>>>>>>>> Disallow: /ja/forum/style.php
>>>>>>>>> Disallow: /ja/forum/ucp.php
>>>>>>>>> Disallow: /ja/forum/viewonline.php
>>>>>>>>> Disallow: /ja/forum/adm
>>>>>>>>> Disallow: /ja/forum/cache
>>>>>>>>> Disallow: /ja/forum/docs
>>>>>>>>> Disallow: /ja/forum/files
>>>>>>>>> Disallow: /ja/forum/images
>>>>>>>>> Disallow: /ja/forum/includes
>>>>>>>>> Disallow: /ja/forum/language
>>>>>>>>> Disallow: /ja/forum/store
>>>>>>>>> Disallow: /ja/forum/styles
>>>>>>>>> Disallow: /test
>>>>>>>>> Disallow: /nl/forum/common.php
>>>>>>>>> Disallow: /nl/forum/config.php
>>>>>>>>> Disallow: /nl/forum/con.php
>>>>>>>>> Disallow: /nl/forum/faq.php
>>>>>>>>> Disallow: /nl/forum/mcp.php
>>>>>>>>> Disallow: /nl/forum/memberlist.php
>>>>>>>>> Disallow: /nl/forum/posting.php
>>>>>>>>> Disallow: /nl/forum/report.php
>>>>>>>>> Disallow: /nl/forum/search.php
>>>>>>>>> Disallow: /nl/forum/style.php
>>>>>>>>> Disallow: /nl/forum/ucp.php
>>>>>>>>> Disallow: /nl/forum/viewonline.php
>>>>>>>>> Disallow: /nl/forum/adm
>>>>>>>>> Disallow: /nl/forum/cache
>>>>>>>>> Disallow: /nl/forum/docs
>>>>>>>>> Disallow: /nl/forum/files
>>>>>>>>> Disallow: /nl/forum/images
>>>>>>>>> Disallow: /nl/forum/includes
>>>>>>>>> Disallow: /nl/forum/language
>>>>>>>>> Disallow: /nl/forum/store
>>>>>>>>> Disallow: /nl/forum/styles
>>>>>>>>> Disallow: /vi/forum/common.php
>>>>>>>>> Disallow: /vi/forum/config.php
>>>>>>>>> Disallow: /vi/forum/con.php
>>>>>>>>> Disallow: /vi/forum/faq.php
>>>>>>>>> Disallow: /vi/forum/mcp.php
>>>>>>>>> Disallow: /vi/forum/memberlist.php
>>>>>>>>> Disallow: /vi/forum/posting.php
>>>>>>>>> Disallow: /vi/forum/report.php
>>>>>>>>> Disallow: /vi/forum/search.php
>>>>>>>>> Disallow: /vi/forum/style.php
>>>>>>>>> Disallow: /vi/forum/ucp.php
>>>>>>>>> Disallow: /vi/forum/viewonline.php
>>>>>>>>> Disallow: /vi/forum/adm
>>>>>>>>> Disallow: /vi/forum/cache
>>>>>>>>> Disallow: /vi/forum/docs
>>>>>>>>> Disallow: /vi/forum/files
>>>>>>>>> Disallow: /vi/forum/images
>>>>>>>>> Disallow: /vi/forum/includes
>>>>>>>>> Disallow: /vi/forum/language
>>>>>>>>> Disallow: /vi/forum/store
>>>>>>>>> Disallow: /vi/forum/styles
>>>>>>>>> Disallow: /zh/forum/common.php
>>>>>>>>> Disallow: /zh/forum/config.php
>>>>>>>>> Disallow: /zh/forum/con.php
>>>>>>>>> Disallow: /zh/forum/faq.php
>>>>>>>>> Disallow: /zh/forum/mcp.php
>>>>>>>>> Disallow: /zh/forum/memberlist.php
>>>>>>>>> Disallow: /zh/forum/posting.php
>>>>>>>>> Disallow: /zh/forum/report.php
>>>>>>>>> Disallow: /zh/forum/search.php
>>>>>>>>> Disallow: /zh/forum/style.php
>>>>>>>>> Disallow: /zh/forum/ucp.php
>>>>>>>>> Disallow: /zh/forum/viewonline.php
>>>>>>>>> Disallow: /zh/forum/adm
>>>>>>>>> Disallow: /zh/forum/cache
>>>>>>>>> Disallow: /zh/forum/docs
>>>>>>>>> Disallow: /zh/forum/files
>>>>>>>>> Disallow: /zh/forum/images
>>>>>>>>> Disallow: /zh/forum/includes
>>>>>>>>> Disallow: /zh/forum/language
>>>>>>>>> Disallow: /zh/forum/store
>>>>>>>>> Disallow: /zh/forum/styles
>>>>>>>>>
>>>>>>>>> This has been the robots.txt file since: Last-Modified: Sat,
>>>>>>>>> 06 Jun 2009 23:40:14 GMT
>>>>>>>>>
>>>>>>>>> Forum search uses phpBB
>>>>>>>>>
>>>>>>>>> We haven’t allowed search engines to crawl
>>>>>>>>> forum.openoffice.org since before the Oracle donation to the ASF.
>>>>>>>>>
>>>>>>>>> Crawlers IP addresses might be blocked by ASF Infra if their
>>>>>>>>> use is excessive. That could give the 301.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Dave
>>>>>>>>>
>>>>>>>>>> On May 12, 2020, at 3:55 AM, Peter Kovacs <[hidden email]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hello all,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> What I figured is that from the Google search tool the URL
>>>>>>>>>> forum.openoffice.org is not reachable.
>>>>>>>>>>
>>>>>>>>>> So I checked with Duckduckgo (my prefered Search engine),
>>>>>>>>>> they don't use crawler and point at the infra of Google, Bing
>>>>>>>>>> and Yandex.
>>>>>>>>>>
>>>>>>>>>> I checked then with Bing, but could not figure out to check
>>>>>>>>>> bots feedback on an URL so I moved on
>>>>>>>>>>
>>>>>>>>>> I checked with Yandex. They have a search URL test page. I
>>>>>>>>>> have entered there forum.openoffice.org
>>>>>>>>>>
>>>>>>>>>> The Response is:
>>>>>>>>>>
>>>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> * Date: Tue, 12 May 2020 10:37:47 GMT
>>>>>>>>>> * Server: Apache/2.4.18 (Ubuntu)
>>>>>>>>>> * Location: https://forum.openoffice.org/
>>>>>>>>>> * Content-Length: 237
>>>>>>>>>> * Keep-Alive: timeout=15, max=100
>>>>>>>>>> * Connection: Keep-Alive
>>>>>>>>>> * Content-Type: text/html; charset=iso-8859-1
>>>>>>>>>>
>>>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> HTTP status code     301 Moved Permanently
>>>>>>>>>> Server response time     133 ms
>>>>>>>>>> IP address     54.84.201.130
>>>>>>>>>> Encoding     UTF-8(unicode-1-1-utf-8, UTF8)
>>>>>>>>>> Page size     237 B
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I am not sure, what that means. HTTP Status Code moved
>>>>>>>>>> Permanently reads wrong. I just dont know if this is the
>>>>>>>>>> return code from our webservcer or a response code from the
>>>>>>>>>> crawler.
>>>>>>>>>> I try to get someone from Infra. Or I'll open a ticket.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> All the best
>>>>>>>>>> Peter
>>>>>>>>>>
>>>>>>>>>> Am 12.05.20 um 10:39 schrieb Matthias Seidel:
>>>>>>>>>>> Hi Kay,
>>>>>>>>>>>
>>>>>>>>>>> Am 12.05.20 um 01:21 schrieb Kay Schenk:
>>>>>>>>>>>> On 5/11/20 12:33 PM, Matthias Seidel wrote:
>>>>>>>>>>>>> Hi Kay,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Am 11.05.20 um 21:23 schrieb Kay Schenk:
>>>>>>>>>>>>>> Hi Peter...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Since I am a Google Search admin for www.openoffice.org, and
>>>>>>>>>>>>>> openoffice.apache.org, I got this also. Disclaimer: I
>>>>>>>>>>>>>> have not done
>>>>>>>>>>>>>> ANY work with the Google Search apis on these sites in
>>>>>>>>>>>>>> quite some time.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I actually was NOT aware forum.openoffice.org was set up
>>>>>>>>>>>>>> to use Google
>>>>>>>>>>>>>> Search until I saw this.
>>>>>>>>>>>>> I think, I added it to the list when we had a discussion
>>>>>>>>>>>>> about outdated
>>>>>>>>>>>>> information regarding SourceForge found by Google Search.
>>>>>>>>>>>>>
>>>>>>>>>>>>> But I don't have access to forum.openoffice.org, so I
>>>>>>>>>>>>> could never
>>>>>>>>>>>>> complete the step.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>
>>>>>>>>>>>>>     Matthias
>>>>>>>>>>>> OK. In the top level of the website source, there is a file
>>>>>>>>>>>> called
>>>>>>>>>>>> "skeleton.html" which references the following bit of code --
>>>>>>>>>>>>
>>>>>>>>>>>> <!--#include virtual="/scripts/google-analytics.js" -->
>>>>>>>>>>>>
>>>>>>>>>>>> I didn't dig far enough to find how "skeleton.html" is used
>>>>>>>>>>>> ( I
>>>>>>>>>>>> forgot) but this this is example for the google-analytics
>>>>>>>>>>>> code snippet
>>>>>>>>>>>> that is used. Basically, this needs to be included in the
>>>>>>>>>>>> site you
>>>>>>>>>>>> want analytics to be used on by putting it in the (header)
>>>>>>>>>>>> files that
>>>>>>>>>>>> generate the site. And, you might  take a look at recent
>>>>>>>>>>>> instructions
>>>>>>>>>>>> from Google. Things change.
>>>>>>>>>>>>
>>>>>>>>>>>> https://support.google.com/analytics/answer/1008080
>>>>>>>>>>> Yes, but this is for Google Analytics. I wouldn't want to
>>>>>>>>>>> "analyze" the
>>>>>>>>>>> forum...
>>>>>>>>>>> The procedure for the Google Search Console is the same, it
>>>>>>>>>>> needs access
>>>>>>>>>>> to the root directory.
>>>>>>>>>>>
>>>>>>>>>>> Maybe Andrea can help if he is available again?
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>>
>>>>>>>>>>>    Matthias
>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>>
>>>>>>>>>>>> Kay
>>>>>>>>>>>>
>>>>>>>>>>>>>> One of the Google Search admins for forum.openoffice.org
>>>>>>>>>>>>>> could check
>>>>>>>>>>>>>> the current Google search apis that are in use on that
>>>>>>>>>>>>>> site. Changes
>>>>>>>>>>>>>> are occasionally made to the calls, and maybe that is the
>>>>>>>>>>>>>> issue, or a
>>>>>>>>>>>>>> robots.txt for that site is causing this. I don't think
>>>>>>>>>>>>>> it requires a
>>>>>>>>>>>>>> response, but maybe some investigation.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Just some ideas...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Kay
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 5/11/20 6:02 AM, Peter Kovacs wrote:
>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have received following mail. Probably because I am
>>>>>>>>>>>>>>> listed in the
>>>>>>>>>>>>>>> google-Analytics page.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Does this has some action items? What can we answer Mr
>>>>>>>>>>>>>>> John Mueller?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> All the Best
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Peter
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -------- Weitergeleitete Nachricht --------
>>>>>>>>>>>>>>> Betreff:     Critical issue on forum.openoffice.org and
>>>>>>>>>>>>>>> Google Search
>>>>>>>>>>>>>>> Datum:     Mon, 11 May 2020 13:37:27 +0200
>>>>>>>>>>>>>>> Von:     John Mueller <[hidden email]>
>>>>>>>>>>>>>>> An:     [hidden email], [hidden email],
>>>>>>>>>>>>>>> [hidden email]
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Dear webmaster of forum.openoffice.org
>>>>>>>>>>>>>>> <http://forum.openoffice.org>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm an analyst at Google in Switzerland. We wanted to
>>>>>>>>>>>>>>> bring your
>>>>>>>>>>>>>>> attention to a critical issue with your website, and how
>>>>>>>>>>>>>>> it's
>>>>>>>>>>>>>>> available for Google's web search.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In particular, Googlebot has been unable to crawl URLs from
>>>>>>>>>>>>>>> https://forum.openoffice.org/ . This will cause those
>>>>>>>>>>>>>>> pages to drop
>>>>>>>>>>>>>>> out of Google's search results, and will prevent new
>>>>>>>>>>>>>>> pages from being
>>>>>>>>>>>>>>> picked up for Search. If you're not aware of this issue,
>>>>>>>>>>>>>>> you may be
>>>>>>>>>>>>>>> accidentally blocking these pages from Google Search due
>>>>>>>>>>>>>>> to a server
>>>>>>>>>>>>>>> issue. If you need to block Googlebot from crawling
>>>>>>>>>>>>>>> pages on your
>>>>>>>>>>>>>>> website, we'd recommend using the robots.txt file instead.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Should you need to recognize IP addresses of Googlebot
>>>>>>>>>>>>>>> requests, you
>>>>>>>>>>>>>>> can use a reverse IP lookup to do so:
>>>>>>>>>>>>>>> https://support.google.com/webmasters/answer/80553
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Should you have any questions, feel free to contact me
>>>>>>>>>>>>>>> directly. For
>>>>>>>>>>>>>>> verification purposes, we are sending a copy of this
>>>>>>>>>>>>>>> message to your
>>>>>>>>>>>>>>> site's Search Console account.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thank you,
>>>>>>>>>>>>>>> John Mueller ([hidden email] <mailto:[hidden email]>)
>>>>>>>>>>>>>>> Webmaster Trends Analyst
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> To unsubscribe, e-mail:
>>>>>>>>>>>>>> [hidden email]
>>>>>>>>>>>>>> For additional commands, e-mail:
>>>>>>>>>>>>>> [hidden email]
>>>>>>>>>>>>>>
>>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>
>>>>>>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>>>>>>> For additional commands, e-mail:
>>>>>>>>>>>> [hidden email]
>>>>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>
>>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>>
>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>>
>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>
>>>>>
>>>>> --
>>>>> Rory O'Farrell <[hidden email]>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [hidden email]
>>>>> For additional commands, e-mail: [hidden email]
>>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

12