Logon to a server and get the PID for an application
$ ps -aux | grep -i <application>
Count of Files
$ sudo lsof -a -p <PID> | wc -l
List of Files
$ sudo lsof -a -p <PID> | more
From: "Mamidi, Sundeep (Capgemini)" <SMamidi@hollandamerica.com>
Subject: RE: Splunk Alert: MLR - Secondary Flow API 500 Errors Watch
Date: August 15, 2017 at 10:38:16 AM PDT
It’s the same on p013 as well (4096).
Here’s the command :
lsof –a –p <pid> | wc –l à This gives the number of connected files
lsof –a –p <pid> à This gives the list of files that are connected
As of now, it shows around 700-750 on both servers. What I’m hoping is the file connections are being stalled on one of the servers, reaching the max connections limit. When we do a restart, it refreshes. As I said, I could increase the number of max.connections. However, that’s not the best way of handling this issue.
What possibly could happen, if we increase the max connections is, the application would consume all of those (even 10000) and give us the same error again.
There should be a bigger issue, somewhere on the backend that’s causing the stale connections. Looking at the AppD, I see some errors on hal-porta as well, at the same time.
Thanks,
Sundeep
