Issues accessing apache services from campus network after mining of bug reports

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Issues accessing apache services from campus network after mining of bug reports

Juan Florez
Hello, my name is Juan Florez and I am a PhD student at the University
of Texas at Dallas.

I'm writing to report a problem that surfaced after trying to download a
relatively big number of bug reports from this project's bugzilla. In
short, any domain at apache.org started rejecting connections from our
networks, and we suspect we were blacklisted at the top level. This is
strange because we had already downloaded similar amounts of bug reports
from other Apache bug trackers and never ran into any issues.

We already tried contacting the webmaster but got no answer. The
original email is below, and it explains the problem in more detail.

I would appreciate your help in sorting out this issue, since our
research routinely depends on data of this nature.

Sincerely,



-------- Forwarded Message --------
Subject: Issues accessing apache services from campus network
Date: Tue, 23 Jan 2018 15:01:24 -0600
From: Juan Florez <[hidden email]>
To: [hidden email]
CC: [hidden email], Oscar Chaparro <[hidden email]>



My name is Juan Florez and I write on behalf of the SEERS group at the
University of Texas at Dallas. I'm writing this email to report
difficulties accessing this domain from some of our networks after
attempting to collect data for research purposes.

The problems started on Sunday January 21, 2018 after we tried to
programmatically download around 31k bug reports from the website
https://bz.apache.org/ooo/ , accessing each one as XML through bugzilla
(for example https://bz.apache.org/ooo/show_bug.cgi?ctype=xml&id=84969
). After a portion of the bug reports was downloaded, further
connections to the website started timing out, and we realized that any
connection to an apache.org subdomain would also time out. The problem
surfaced again after retrying from two other networks with different IP
addresses.

This came as a surprise since we have performed this procedure before,
even from other apache bugzilla websites (for example
https://issues.apache.org/jira/browse/CASSANDRA-7657 ), and never
encountered any problems while downloading thousands of bug reports.

We would appreciate you help in solving this issue, since we routinely
require access to many Apache services for our research. The IPs
affected by this problem are:
  - 129.110.93.16 (on-campus)
  - 129.110.241.5 (on-campus)
  - 66.253.176.84 (off-campus, used after the two options on-campus
stopped working)

We suspect this to be an issue related to rate limits, and we are CCing
the office of tech support of our department so that they can set a rate
limit on the campus networks to avoid this situation from happening
again. However, we could not find out this rate limit by ourselves, so
we would appreciate if you could include it in the reply to this email.

We apologize for any inconvenience caused. Please don't hesitate to
write back if you require more details.


Sincerely,

--
Juan Manuel Florez
Software Engineering PhD Student

Reply | Threaded
Open this post in threaded view
|

Re: Issues accessing apache services from campus network after mining of bug reports

Andrea Pescetti-2
Juan Florez wrote:
> I'm writing to report a problem that surfaced after trying to download a
> relatively big number of bug reports from this project's bugzilla. In
> short, any domain at apache.org started rejecting connections

You have likely been blacklisted. You need to contact ASF Infra. See
here: http://www.apache.org/dev/infra-contact

Usually they don't like mass downloads from bug tracking system, but
unless you are a known offender they will be ready to help.

Regards,
   Andrea.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Issues accessing apache services from campus network after mining of bug reports

Juan Florez
Thank you, I will try writing to them.


On 02/02/2018 02:50 AM, Andrea Pescetti wrote:

> Juan Florez wrote:
>> I'm writing to report a problem that surfaced after trying to
>> download a relatively big number of bug reports from this project's
>> bugzilla. In short, any domain at apache.org started rejecting
>> connections
>
> You have likely been blacklisted. You need to contact ASF Infra. See
> here: http://www.apache.org/dev/infra-contact
>
> Usually they don't like mass downloads from bug tracking system, but
> unless you are a known offender they will be ready to help.
>
> Regards,
>   Andrea.

--
Juan Manuel Florez
Software Engineering PhD Student


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]