It's no secret that Mafia Wars is riddled with bugs. In fact, when it first came out on myspace, I left it largely in part because the game never worked. If it doesn't work, there's no point in playing it. When I came to play it on Facebook, it seemed to have worked. Those days have ceased.
Now the latest news is they added a new batch of servers. There are some contradictory things that if you catch, you can figure out someone isn't telling you the true story of what's going on, and if you know something about code writing, you can see through this quite readily.
The latest issue is people having their mafia just vanish-and they call that a page load issue. No, that's called select users accounts have been loused up due to some data access issue or data integrity issue. Adding new servers doesn't put data back that's missing. If you also noticed, there was a message before that saying those accounts would be restored. That means-there is data missing.
This is like saying you lost all of your money. So while you have a way to get it back through some legitimate means, your solution is to buy a new pair of pants. That might help hold your wallet, but if you keep loosing money through no defect of your pants, buying a new pair of pants doesn't help you get your money back that you lost through whatever.
Here is one of my favorites-the so called "performance upgrade". They rolled it out, and then they had to roll it back. Then they fixed it, and they had many of the same issues. If this performance upgrade worked, why then is someone rolling out new servers? If this software upgrade worked, then it should have made their software more efficient. That's not the case.
The reality is that I have had a like problem with others from around the globe. The pages don't load. Gifts won't send without an error. The pages don't fully render and you're forced to start a new session on their server to get around that. If you want to post a comment (which should be a simple API call or write a line in a data base), it doesn't work at least half the time.
I have noticed 2 things about Mafia Wars. Based on the behavoir of it, it appears that it's a request based cluster. That should help performance out. So should having a multi-threaded application if written correctly. This appears to be multi-threaded, but it's not working correctly. It is one of many suspect examples of unmanaged code that can bring any network to it's knees.
Here is an example of some of what could happen and has-after the so called performance upgrade. You click on some button to play the game in some manner. I noticed this with the businesses in the game. You click, and nothing happens. So you click again. Suddenly you see the frame containing the game refresh twice. It's as if you clicked twice when you didn't intend the game to behave in that way.
Why does this happen? It's simple. You have one thread with a request. It backs up. You see nothing as your waiting. You click again. Either you get a new thread on a new box, or the request is set enque, or you have a new thread that is waiting for the first one to complete. In the case of the last one, it's a pooling issue, and requests get backed up. As a result, you get a bottle neck.
Here is another example of this malformed processing. I've seen this game work such that I perform a task. I get that box that shows it's busy, and has a "Try Refreshing" button. Suddenly, the page refreshes but the busy box is still visible and won't go away untill I do something that forces a refresh.
Here's what's happening. You have a refresh request taking too long. So you have a thread there enque. Then you have another one that executes the busy box that isn't in sync with the app. It hangs up as there is a lack of a compartment to contain both threads. It's as if they're both out of process with respect to the other and random at times. The page refreshes, and the other thread executing with the busy box never gets notified. So as a result, that box still stays. Then when request #3 rolls around, it finally forces a refresh.
That may look like that solve that problem on the surface or that it rectified itself. It's quite possible that neither thread was destroyed. So as a result, it's a resource that's still allocated. If you never properly destroy these things, that will bring a serve to it's knees over time. It takes up processor time and memory resources. If you don't manage your threads correctly if you're using them, instead of making an application perform better, it performs worse.
There is also the suspect conccurency issue. After an obvious data base call is made to write data, the page can be busy for a breif period of time. It's quite likely that the update function is being used, and that can put a lock on certain data from being read. So if you're trying to update something, you lock it from being read. So while someone is trying to read data, there is a connection that is being held up. As a result, you wait. You get a bottleneck. I don't care how many boxes you have in that case. If there is a lock, you will still have a bottleneck.
I have seen cases in which you can perform a job, buy some item like a vehicle, armor, etc. You get a message that it was bought. There is a latency time between that action and the time it takes to update how much money you have. How can that be?
This is a case of too many network calls hindering things. This is perhaps attributable to a back end that doesn't support something like a stored procedure and/or the absence of a real data access or business tier. You make the purchase. That's a write. You have an inventory update. That's another write. It appears that after that part, a message gets sent, and then there is yet another call to read and update the most current data. That's 3 different data access tasks that may or may not be in a single network call. There appears to be at least 2 calls there from observation.
Realistically, there should be some transactional management on that. Not all back ends support that-which is another issue. I doubt their using Oracle which supports these things, but something that supports that and a stored procedure could streamline these things.
Here is what the logic should be:
SELECT @cash_on_hand=SUM (all transactions)
If they did that, you could get everything in one single call. If you had a data access tier, this could have been retrived, placed into whatever, and then format it on server side script. To get around the space issue, if there was a session end event to track the correct user ID, go and then consolidate all the data there into one single row where applicable and then delete the other stuff. That way you reduce the likelihood of a concurrency issue and recover space while still maintaining performance.
OK...there are other places to manage that transaction in a business tier, but if you know code, you get the basic idea. It would save 1-2 calls. Take that times how ever many users are on at a given time, and that would help the performance out majorly.
Here is another issue that may or may not be resolved. I call this the golden glitch. The amount of cash on hand at least prior to this so called upgrade might get data from another city and stick it into presentation, read from that instead of what is from the data tier. Here is why I called it the golden glitch and what happened.
1. Collect money from Moscow.
2. Wait for some thread to execute that never does and hangs.
3. Go to Cuba. A new thread starts (?).
4. Collect money.
5. Oops! Moscow woke up and collected 10 million rubles!
6. The number gets stuck into the current city on another thread since this is acting like a shared piece of data, and put it into your current locale (Cuba in this case).
7. Write the newly retrieved data to the current city as it's somehow or another assumed to be correct.
8. Suddenly you have 10 million pesos and short of 10 million rubles.
The data is assumed to be from the presentation tier-which may not necessarily be correct. They do that as it saves a network call for another read. However, the data updated via java in the DOM element containing the amount of money you have was from another city, and there was no way to guarentee where it came from. I suspect that there were out of sync sessions in the cluster, and whatever came down the pipe was just stuck in. That would explain the sharing resultant of a race condition created by async threads somewhere (either in the cluster, the code, or both).
There are signs of memory leaks in this application, and certain messages for "maintenace" seem to support that. If you play the game, they may reboot something or take it off line for a few minutes. The game comes back on. It hops along. As time goes on, it comes to a crawl. It just gets slower and slower. After a time, they have to do the same thing again.
When you have a memory leak either do to something in the OS or within unmanaged code not properly deallocating resources, your performance will start to take a dive. Your pages might not render correctly, etc. The typical network admin response is to add something like memory or another box. That just distributes the flood and prolongs it. The unmanaged code is still there, and it will only waste whatever is added. If you've ever seen something that had a memory leak, you'd recognize it. This application hints at that.
Under good ole Vista, I ran IIS, and SQL Server 2005 on a laptop. I had a multi-threaded service that checked 10 different email accounts over a cable connection. I wrote a service that checked each of these simutaneously. I ran it, and it had all 3 teirs (data, business, and presentation) on the same laptop. I also had some exotic XML stuff going on that is a resource hog for what I was doing. I still bench marked 1800 emails a minute that were popped, and archived to SQL. I had 20 some threads going at once at any given time. I would have half a gig of RAM being used actively at any given second till everything was done. Then it was deallocated quickly, and no memory leaks.
If Zynga can't get a game right using what is likely better perfroming equipment than what I had, and I know the processing time for a given task in the game can't be as nearly intense as what I had going, this isn't a hardware issue. I won't say there isn't a configuration issue, but adding boxes don't resolve inefficencies in code all the time. If you have unmanaged resources, more boxes do nothing.
So when you see these messages about who responded with all these boxes, take it with a grain of salt. If your data vanished, adding a new box doesn't get that data back. Read what these messages say and don't say when you look at your news feed when you can actually load the page for the game, and see what they put out there. It doesn't add up.