Running Nginx and PHP-FPM is a pretty great setup for the most part. One thing we’ve struggled with on occasion however are some problems with unclear error messages and non-obvious solutions. One of those instances involved the following generic error message in the nginx logs:
500 Internal Server Error,readv() failed (104: Connection reset by peer) while reading upstream
That’s a pretty broad error message used to indicate all types of problems and usually you look elsewhere for a clue, like the php logs. Unfortunately, none of the other system logs had any information related to the error.
We started seeing this message immediately after a distribution upgrade from Debian 7 Wheezy to Debian 8 Jessie which made the scope of the problem very large. The strangest symptom was that PHP applications would work intermittently. One request would execute fine, the next page load would fail with the 500 error. Restarting nginx and php might allow the first few requests to work and then it would start failing again.
We tried a lot of things to fix this, pouring over the usual culprits like listener pool exhaustion, file permission and ownership, etc. before the real problem was discovered.
When the server was running Debian 7 we had XCache configured for the opcode cache on PHP 5.4 (the stable version for Debian 7). With the upgrade to Debian 8, PHP was upgraded to 5.6 and, along with it, the Zend opcode cacher was installed and enabled by default. The two opcode caches went to war behind the scenes, wreaking havoc on PHP applications.
In case you’re not familiar with what an opcode cache does, here’s the short version. Since PHP is an interpreted language, the system reads PHP code on demand and compiles it down to machine code for each request to run the application. An opcode cacher optimizes this process by caching the compiled code and storing it in RAM where it can be reused without recompiling on the next request. This greatly improves the performance of PHP applications by making it behave more like a compiled application.
Based on the above description you can imagine why having two opcode caches running simultaneously would cause problems. It’s a crap shoot as to which engine had cached the compiled code and which will serve up the result for a request. This has the effect of delivering a broken / partially compiled application and causing the 500 server error.
To identify the issue, simply execute the following command at the terminal and read the output:
If you see more than one cache system listed, you’re going to have a bad time.
When the problem was happening, XCache was also listed here along side the Zend OPcache.
The solution is simple enough, uninstall one! I chose to remove my trusty XCache in favor of Zend since it now came bundled with the PHP package in Debian 8.
apt-get remove php5-xcache --purge
And whamo, everything was instantly back to normal.