[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Solaris 11 memory usage
From: |
Martin Pala |
Subject: |
Re: Solaris 11 memory usage |
Date: |
Thu, 23 Oct 2014 20:41:00 +0200 |
Thanks for data.
We use kstat to get freemem statistics ... it contains both freelist and
cachelist. In your case the output of mdb ::memstat shows, that the memory
usage was ca. 85% (7+6+10+1+61 = 85), which matches the monit test limit (>
80%). The memory usage is real, check the "sr" (page scanner activity) in
vmstat to see if it's problem for the system.
If the high memory usage is normal, you can adjust the test limit to suppress
the alerts, you can also use the "for X cycles" option to alert only if the
memory usage remains high for long time, for example:
if memory usage > 90% for 20 cycles then alert
Regards,
Martin
> On 23 Oct 2014, at 15:36, Nestor Urquiza <address@hidden> wrote:
>
> Thanks a lot for this Martin.
>
> Here is what I got (using version 5.7)
> ___________ Thu Oct 23 01:15:10 EDT 2014 ___________
>
> Page Summary Pages MB %Tot
>
> ------------ ---------------- ---------------- ----
>
> Kernel 588196 2297 7%
>
> ZFS File Data 480826 1878 6%
>
> Anon 820560 3205 10%
>
> Exec and libs 49531 193 1%
>
> Page cache 5125284 20020 61%
>
> Free (cachelist) 1009189 3942 12%
>
> Free (freelist) 314893 1230
>
>
>
> The main culprit is a process used by a vendor product:
>
> PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
>
>
> 10592 geneva 37G 20G cpu5 20 0 0:10:32 11% newaga/1
>
>
>
> This machine has 32GB RAM so at first glance someone would say we either
> increase memory or ask the vendor to provide some guidance on how to limit
> memory usage by that process.
>
> However I am wondering if "page cache" should really be alarming? According
> to Oracle https://blogs.oracle.com/rmc/entry/the_vm_system_formally_known
> "The cachelist operates as part of the freelist. When the freelist is
> depleted, allocations are made from the oldest pages in the cachelist. This
> allows the file system page cache to grow to consume all available memory and
> to dynamically shrink as memory is required for other purposes."
>
> In this case the newaga command is part of a replication script which brings
> an in memory database from a remote server locally. This in memory database
> works with memory segments that are replicated in disk and loaded as needed.
> This system can even work with 16GB RAM. We increased it because we were
> getting too many alerts from monit. In Solaris 10 (with the previous version
> of the same software) we used to have no memory alerts from monit using 16GB
> RAM, same database, or kind of because of course we changed both the OS and
> the version of the app.
>
> Bottom line I am now trying to understand if monit should be reporting memory
> usage in a different way for Solaris 11 or the vendor should be using memory
> in a different way or Solaris should be tweaked to please alerts.
>
>
>
> Under normal operation BTW this is what we get:
>
> > ::memstat
>
> Page Summary Pages MB %Tot
>
> ------------ ---------------- ---------------- ----
>
> Kernel 585743 2288 7%
>
> ZFS File Data 861077 3363 10%
>
> Anon 793486 3099 9%
>
> Exec and libs 45752 178 1%
>
> Page cache 259302 1012 3%
>
> Free (cachelist) 4301112 16801 51%
>
> Free (freelist) 1542007 6023 18%
>
>
> Total 8388479 32767
>
>
>
> Thanks again for your help with this!
>
> - Nestor
>
>
> On Thu, Oct 23, 2014 at 5:24 AM, Martin Pala <address@hidden> wrote:
> You can use the prstat exec action too, just remove the "-s rss" option to
> let it sort the output by CPU usage (default)
>
> Regards,
> Martin
>
>
>> On 22 Oct 2014, at 18:58, Nestor Urquiza <address@hidden> wrote:
>>
>> Thanks for this Martin,
>>
>> I will keep you posted now that I installed 5.7 and put the command in
>> monitrc as recommended.
>>
>> We are also getting some alerts for CPU usage spikes. Do you have a
>> recommendation for the command to run when getting those as well?
>>
>> Thanks!
>> - Nestor
>>
>> On Wed, Oct 22, 2014 at 3:33 AM, Martin Pala <address@hidden> wrote:
>> Hi Nestor,
>>
>> you can use something like this to get the distribution (will record the
>> memstat output + user space distribution ... processes by RSS):
>>
>> if memory usage > 80% then exec "/bin/sh -c 'exec >>
>> /tmp/memstat.$$; echo ___________ `date` ___________; echo ::memstat | sudo
>> mdb -k; prstat -c -s rss 1 10'"
>>
>>
>> There was fix for memory usage report for Solaris in Monit 5.7 ... please
>> can you upgrade to Monit 5.9? If the problem will persist - is the system
>> where Monit is running 32-bit or 64-bit? Is it the Solaris zone?
>>
>>
>> Regards,
>> Martin
>>
>>
>> > On 20 Oct 2014, at 22:04, Nestor Urquiza <address@hidden> wrote:
>> >
>> > Hi Martin,
>> >
>> > Is there a way to put monit in debug mode so we get more information about
>> > the memory distribution at the moment of the alert?
>> >
>> > One thing we have noticed is that regardless how many cycles we wait to
>> > alert, the succeed message comes in the next cycle after the alert which
>> > is really weird.
>> >
>> > Thanks,
>> >
>> > - Nestor
>> >
>> > On Sun, Oct 19, 2014 at 12:32 PM, Nestor Urquiza <address@hidden> wrote:
>> > I am sorry about the examples but yes we do get memory utilization spikes:
>> >
>> > "mem usage of 82.6% matches resource limit [mem usage>80.0%],"
>> >
>> > It is difficult to get that information at the time of the alert though.
>> > Is there a way to put monit on debug mode or something to get exactly the
>> > memory utilization distribution?
>> >
>> > Right now everything is alright:
>> >
>> > $ sudo monit status
>> >
>> > ...
>> >
>> > System 'server'
>> >
>> > status Running
>> >
>> > monitoring status Monitored
>> >
>> > load average [0.13] [0.12] [0.11]
>> >
>> > cpu 0.3%us 1.4%sy 0.0%wa
>> >
>> > memory usage 11822268 kB [35.2%]
>> >
>> > swap usage 0 kB [0.0%]
>> >
>> > data collected Sun, 19 Oct 2014 12:23:47
>> >
>> > ...
>> >
>> >
>> >
>> > $ echo ::memstat | sudo mdb -k
>> >
>> > Page Summary Pages MB %Tot
>> >
>> > ------------ ---------------- ---------------- ----
>> >
>> > Kernel 591587 2310 7%
>> >
>> > ZFS File Data 1089502 4255 13%
>> >
>> > Anon 999345 3903 12%
>> >
>> > Exec and libs 50239 196 1%
>> >
>> > Page cache 249081 972 3%
>> >
>> > Free (cachelist) 3821104 14926 46%
>> >
>> > Free (freelist) 1587621 6201 19%
>> >
>> >
>> > Total 8388479 32767
>> >
>> >
>> >
>> >
>> >
>> > Thanks,
>> >
>> > - Nestor
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Sat, Oct 18, 2014 at 4:22 PM, Martin Pala <address@hidden> wrote:
>> > Hi,
>> >
>> > the attached error message ("cpu system usage ...") is for CPU test ...
>> > not related to memory usage. High "cpu system" usage may be for example
>> > sign of heavy disk I/O activity and/or swapping (memory shortage) - check
>> > vmstat output for details.
>> >
>> > If the memory usage report is problem, please can you provide output of
>> > "echo ::memstat | mdb -k" and "monit status" (just the System service part
>> > is sufficient).
>> >
>> >
>> > Regards,
>> > Martin
>> >
>> >
>> >
>> > > On 16 Oct 2014, at 16:41, Nestor Urquiza <address@hidden> wrote:
>> > >
>> > > Hi guys,
>> > >
>> > > Since we went from Solaris 10 to 11 we have seen an increase monit
>> > > alerts related to memory resource utilization. We used to get no alerts
>> > > even when we set the memorty threshold really low, for example:
>> > >
>> > > "...cpu system usage of 45.8% matches resource limit [cpu system
>> > > usage>40.0%]"
>> > >
>> > >
>> > > We have incremented the threshold to 90% but still we get alerts.
>> > >
>> > > Could it be that the way monit decides what is free memory in Solaris is
>> > > incorrect when using ZFS
>> > > http://serverfault.com/questions/378392/how-should-i-monitor-memory-usage-performance-in-sunos-solaris
>> > >
>> > > We are running monit version 5.5 BTW which has been working fine for
>> > > ages.
>> > >
>> > > Perhaps version 5.9 has done something in that regard as I read the
>> > > release notes ( http://mmonit.com/monit/changes/ ) are allowing to
>> > > monitor generic device strings (not related really but worth to ask).
>> > >
>> > > Thanks!
>> > >
>> > > - Nestor
>> > >
>> > > --
>> > > To unsubscribe:
>> > > https://lists.nongnu.org/mailman/listinfo/monit-general
>> >
>> >
>> > --
>> > To unsubscribe:
>> > https://lists.nongnu.org/mailman/listinfo/monit-general
>> >
>> >
>> > --
>> > To unsubscribe:
>> > https://lists.nongnu.org/mailman/listinfo/monit-general
>>
>>
>> --
>> To unsubscribe:
>> https://lists.nongnu.org/mailman/listinfo/monit-general
>>
>> --
>> To unsubscribe:
>> https://lists.nongnu.org/mailman/listinfo/monit-general
>
>
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general
>
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general
- Solaris 11 memory usage, Nestor Urquiza, 2014/10/16
- Re: Solaris 11 memory usage, Martin Pala, 2014/10/18
- Re: Solaris 11 memory usage, Nestor Urquiza, 2014/10/19
- Re: Solaris 11 memory usage, Nestor Urquiza, 2014/10/20
- Re: Solaris 11 memory usage, Martin Pala, 2014/10/22
- Re: Solaris 11 memory usage, Nestor Urquiza, 2014/10/22
- Re: Solaris 11 memory usage, Martin Pala, 2014/10/23
- Re: Solaris 11 memory usage, Nestor Urquiza, 2014/10/23
- Re: Solaris 11 memory usage,
Martin Pala <=
- Re: Solaris 11 memory usage, Nestor Urquiza, 2014/10/23