Downloading AH data fails a lot...

91 Night Elf Hunter
5820
I am collecting AH data on an hourly basis and stuffing it away for my eventual desire to consume the data in some meaningful way. However, I have lots of holes in my data. Just looking at this morning, I see the following:

20120824-06:50:01 fetched meta data: 1345801572 304 Not Modified
20120824-07:00:01 fetched meta data: 1345801572 200 OK
20120824-07:00:02 fetched auction house data: http://us.battle.net/auction-data/17da6c3d752336c5d3b5144e4ad2a0a2/auctions.json 404 Not Found

20120824-08:00:02 fetched meta data: 1345805472 200 OK
20120824-08:00:04 fetched auction house data: http://us.battle.net/auction-data/17da6c3d752336c5d3b5144e4ad2a0a2/auctions.json 200 OK

20120824-09:00:02 fetched meta data: 1345809372 304 Not Modified
20120824-09:10:01 fetched meta data: 1345809372 200 OK
20120824-09:10:02 fetched auction house data: http://us.battle.net/auction-data/17da6c3d752336c5d3b5144e4ad2a0a2/auctions.json 404 Not Found

20120824-10:10:01 fetched meta data: 1345813272 200 OK
20120824-10:10:01 fetched auction house data: http://us.battle.net/auction-data/17da6c3d752336c5d3b5144e4ad2a0a2/auctions.json 404 Not Found

In the last many hours, I only have a successful download at 8am. But I am being a good API-Citizen by checking to see if an update has been posted prior to attempting a download. Once a change is detected, I try to do a download and then get a 404 error. Why is that? I am no where close to hitting any API quota limits. I can try to hit this URL from multiple places on the Internet and they all give me the same 404 error (as of right now, even), so it doesn't appear to be based on my IP or anything.

This has been going on for as long as I have been collecting data (which is only since May), but haven't worried much about it because I was just laying the groundwork for the future. But now, I have a much stronger desire to not have holes in my data, so I am asking Blizzard and the community (particularly those that already do this kind of thing on a regular basis). I do realize I could detect the 404 condition and continue to check often for a non-404 error, but from what I am seeing, that won't solve the issue... 404 errors persist long-term.

Thanks!
Edited by Zucanthor on 8/24/2012 7:37 AM PDT
Reply Quote
91 Night Elf Hunter
5820
I just tried again and finally got a successful download. Interestingly enough, I hit it from Firefox and it came up. I reran my script and it failed. I ran wget on it, it failed. Ran it a few more times with wget and each of those runs were successful. Ran my script again, this time successful. It is hard to peg what the reason is for why it is failing. I would like to think it is a bug in my script that can be fixed, but multiple wget runs should be consistent too.

I just looked at the logs. Going back to May 15th, I have attempted 2020 downloads of the AH data directly. 2 failed due to DNS issues (blame the Internet). 413 failed due to 404 errors. 1605 were successful downloads. Looking at the rate of failure, it looks like 20% of my attempted downloads result in 404 errors. I find this to be dramatically high.

Does anyone else have the same experience? By the way, I am downloading the US-Lothar data. I have not tried to download any other realms, since I am not really interested in them.

Thanks.
Edited by Zucanthor on 8/24/2012 7:56 AM PDT
Reply Quote
1 Troll Rogue
0
Curious.

When you have wget fail, please run wget with the --debug flag and copypaste it here. I'd like to see what it says.

I've been pulling Lothar (and all other realms) without trouble. Rarely I get a 404, in which case I just sleep 5 and try again, but not nearly as often as you. I do use HTTPS, which I suggest you do as well (in case your ISP is doing some caching). However, you said you tried from other locations and had similar failures, so I don't know.

Here are my pulls from Lothar for today. All times are UTC.

Fetch Date Blizz Time Blizz Time Alliance Aucs
2012-08-24 00:02:18 1345766472 2012-08-24 00:01:12 32548
2012-08-24 01:06:30 1345770372 2012-08-24 01:06:12 32553
2012-08-24 02:11:49 1345774272 2012-08-24 02:11:12 32227
2012-08-24 03:17:41 1345778172 2012-08-24 03:16:12 31651
2012-08-24 04:21:59 1345782072 2012-08-24 04:21:12 31604
2012-08-24 05:27:01 1345785972 2012-08-24 05:26:12 31727
2012-08-24 06:31:27 1345789872 2012-08-24 06:31:12 31905
2012-08-24 07:37:36 1345793772 2012-08-24 07:36:12 32016
2012-08-24 08:42:02 1345797672 2012-08-24 08:41:12 31936
2012-08-24 09:46:26 1345801572 2012-08-24 09:46:12 32253
2012-08-24 10:52:42 1345805472 2012-08-24 10:51:12 31912
2012-08-24 11:56:59 1345809372 2012-08-24 11:56:12 31683
2012-08-24 13:01:34 1345813272 2012-08-24 13:01:12 31868
2012-08-24 14:07:17 1345817172 2012-08-24 14:06:12 32193
2012-08-24 15:11:57 1345821072 2012-08-24 15:11:12 32402
2012-08-24 16:16:51 1345824972 2012-08-24 16:16:12 31924
Reply Quote
91 Night Elf Hunter
5820
I just wanted to post a quick update to this thread... I tweaked my code a little so that it would automatically retry the download if it failed on the previous attempt. This appears to work just fine and I feel that I am getting a complete set of data each day (as much as is possible, at least). Thank you all for the advice I received.

I am still recording lots of 404 errors. Here is my log since this past Monday:


Mon Sep 24 11:08 - 404 Not Found
Mon Sep 24 12:13 - 404 Not Found
Mon Sep 24 12:14 - 404 Not Found (attempt 2)
Mon Sep 24 14:23 - 404 Not Found
Mon Sep 24 15:28 - 404 Not Found
Tue Sep 25 15:31 - 404 Not Found (attempt 2)
Tue Sep 25 15:32 - 404 Not Found (attempt 3)
Tue Sep 25 15:33 - 404 Not Found (attempt 4)
Tue Sep 25 15:34 - 404 Not Found (attempt 5)
Tue Sep 25 17:31 - 404 Not Found
Wed Sep 26 00:31 - 404 Not Found
Wed Sep 26 00:32 - 404 Not Found (attempt 2)
Wed Sep 26 00:33 - 404 Not Found (attempt 3)
Wed Sep 26 00:34 - 404 Not Found (attempt 4)
Wed Sep 26 01:31 - 404 Not Found
Wed Sep 26 05:31 - 404 Not Found
Wed Sep 26 05:32 - 404 Not Found (attempt 2)
Wed Sep 26 13:31 - 404 Not Found
Wed Sep 26 16:04 - 404 Not Found
Wed Sep 26 17:31 - 404 Not Found
Wed Sep 26 17:32 - 404 Not Found (attempt 2)
Wed Sep 26 18:31 - 404 Not Found
Wed Sep 26 18:32 - 404 Not Found (attempt 2)
Wed Sep 26 18:33 - 404 Not Found (attempt 3)
Wed Sep 26 19:31 - 404 Not Found
Wed Sep 26 20:31 - 404 Not Found
Wed Sep 26 22:31 - 404 Not Found
Thu Sep 27 01:31 - 404 Not Found
Thu Sep 27 01:32 - 404 Not Found (attempt 2)
Thu Sep 27 03:31 - 404 Not Found
Thu Sep 27 07:31 - 404 Not Found
Thu Sep 27 09:31 - 404 Not Found


Somebody suggested that it may be because I am downloading the data before it is fully available on the other side. I don't know if that is true or not... there was one case (shown above) in the last week where the API indicated the data has been updated, but it was a full 5 minutes later before I could download it. Since I don't attempt another download until an hour past the last update, I can't draw the conclusion that the 404's only occur after the data is updated, or if there would be more 404's if I was downloading sometime in the middle of an hour...

Anyways, I am happy with my collection and wanted to thank everyone for their help.
Reply Quote

Please report any Code of Conduct violations, including:

Threats of violence. We take these seriously and will alert the proper authorities.

Posts containing personal information about other players. This includes physical addresses, e-mail addresses, phone numbers, and inappropriate photos and/or videos.

Harassing or discriminatory language. This will not be tolerated.

Forums Code of Conduct

Report Post # written by

Reason
Explain (256 characters max)

Reported!

[Close]