日志优化

发表于 2013-07-09 | 分类于 java

一些日志优化相关的东东.

1. 关掉没用的appender

现在我们所有的系统都会把日志输出到指定的文件，为了本地开发测试方便，我们也会把日志输出到控制台。在生产环境输出到控制台是完全没有必要的，既浪费了存储空间，又浪费了性能，请在生产环境去掉ConsoleAppender

logback去掉ch.qos.logback.core.ConsoleAppender
log4j去掉org.apache.log4j.ConsoleAppender

使用logback的同学也可以如下操作：

1.添加依赖

<dependency>
        <groupId>org.codehaus.janino</groupId>
        <artifactId>janino</artifactId>
        <version>2.5.16</version>
</dependency>

2.修改logback.xml中原ConsoleAppender配置如下：

<if condition='property("os.name").toUpperCase().contains("WINDOWS")'>
    <then>
        <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
            <encoder>
                <pattern>%d{yyyy-MM-dd HH:mm:ss} [%thread] %-5level %logger{36}:%L-%msg%n%n</pattern>
            </encoder>
        </appender>
        <root>
            <appender-ref ref="STDOUT" />
        </root>
    </then>
</if>

上面的配置表示在windows中运行时日志输出到控制台，在其他系统中日志不输出到控制台

2. 禁用日志收集栈信息

在写日志时，打印行号能方便我们快速定位出问题的代码。此功能通过生成栈信息，然后定位到合适的栈帧来实现。禁用此功能会提高日志性能。

异步日志实现默认关闭了收集栈信息，见ch.qos.logback.classic.AsyncAppender#includeCallerData

对于同步日志，只需要把pattern中的%L去掉就可以禁用栈信息。

3. 使用异步日志

3.1 logback

logback只需要用异步appender包裹原来的appender就可以了

<appender name="ASYNC_ROLLING-cs" class="com.yjf.common.log.LogbackAsyncAppender">
    <appender-ref ref="ROLLING-cs" />
</appender>

目前我们使用了用于输出日志到文件的ch.qos.logback.core.rolling.RollingFileAppender和输出日志到rabbitmq的com.yjf.monitor.client.integration.log.appender.jms.LogbackMsgRollingAppenderAdapter，请都用上面的LogbackAsyncAppender包裹住。

此配置下，对象先放入一个队列，然后由另外一个线程来处理具体的日志输出，实现异步化，实现流程如下：

queue大小为1024，如果queue剩余空间小于100时,会做丢弃判断
如果queue剩余空间小于100时，会丢弃TRACE、DEBUG日志，也会丢弃LoggerName不以com.yjf开始的INFO日志
如果queue没有剩余空间为0时，会阻塞业务线程，直到queue有剩余空间
输出日志到文件的appender，由于会用锁来控制多线程的写入，这种用单线程来写文件的方式很完美。
输出日志到rabbitmq的appender，LogbackMsgRollingAppenderAdapter在并发下会有性能问题，因为LogbackMsgRollingAppenderAdapter内部只用到一个channel，如果遇到高并发场景，请给CachingConnectionFactory设置合理的channelCacheSize值

3.2 log4j

写日志到文件的appender，用org.apache.log4j.AsyncAppender代替org.apache.log4j.DailyRollingFileAppender
写日志到rabbitmq的appender，建议用com.yjf.common.log.Log4jAsynAmqpAppender，内部也用queue来实现异步化

4. logger api增强

下面是常用的使用方式：

logger.info("金额无效,手续费大于或者等于提现总金额.charge:{},amount:{}", new Object[] { 1, 2, 3 });
logger.info("金额无效,手续费大于或者等于提现总金额.charge:{},amount:{},test:{}", 1, 2, 3); 
LoggerFormat.info(String.format("金额无效,手续费大于或者等于提现总金额.charge:%s,amount:%s,test:%s", 1, 2,3), logger);

第一种是slf4j提供的原生api，第二种是yjf-common-utils提供的支持可变的参数的logger，第三种是yjf-common-util提供的日志工具类。这三种使用方式在日志级别为error的情况下，跑一百万次测试出的性能数据如下：

LoggerFactoryTest.testLogOri: [measured 1000000 out of 1020000 rounds, threads: 4 (all cores)]
 round: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 2, GC.time: 0.62, time.total: 2.67, time.warmup: 0.04, time.bench: 2.63

LoggerFactoryTest.testYjfLogger: [measured 1000000 out of 1020000 rounds, threads: 4 (all cores)]
 round: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 1, GC.time: 0.72, time.total: 3.43, time.warmup: 0.06, time.bench: 3.37

LoggerFactoryTest.testLogUtils: [measured 1000000 out of 1020000 rounds, threads: 4 (all cores)]
 round: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 14, GC.time: 1.78, time.total: 11.62, time.warmup: 0.22, time.bench: 11.40

从这些数据可以看出，第三种使用方式由于在方法调用时就有字符串拼接的开销，在error级别下，性能肯定不是很好。

在日志级别为info情况下，这种差距不是很大，但是如果我们使用异步日志，字符串的拼接在独立的线程中完成，不会在调用线程中完成，业务整体执行时间会减少。

第三种方式还有一个问题，行号永远显示的是LoggerFormat.info方法的行号，不便于我们定位问题。“懒”的同学就用第二种方式吧，支持变长参数，例子如下：

import com.yjf.common.log.Logger;
import com.yjf.common.log.LoggerFactory;
...        
Logger logger = LoggerFactory.getLogger(LoggerFactoryTest.class
    .getName());
...
logger.info("xx:{} xx:{} xx:{}","a","b","c");
logger.info("xx:{} xx:{} xx:[]", "a","b","c", new RuntimeException("挂了"));

请不要再使用com.yjf.common.log.LoggerFormat和com.yjf.common.lang.util.PrintLogTool

5. tomcat日志优化

tomcat embed默认日志使用的是org.apache.juli.logging.DirectJDKLog，此日志使用了java.util.logging.Logger，此logger使用了jre下面的配置文件$JAVA_HOME\jre\lib\logging.properties，此配置文件配置了一个处理器java.util.logging.ConsoleHandler，它只做了一个事情，使用System.err来作为默认的日志输出，也就是说tomcat embed遇到不能处理的异常后，用System.err来打印日志。(见：org.apache.catalina.core.StandardWrapperValve#invoke方法)

我们使用的的tomcat和tomcat embed有点差别，tomcat 中使用了org.apache.juli.FileHandler，此输出位置为$CATALINA_HOME/logs/localhost.yyyy-MM-dd.log，也就是说tomcat在遇到他搞不定的异常时，把日志输出到了此位置。
tomcat把输出和错误输出都打印到catalina.out文件了，见catalina.sh

org.apache.catalina.startup.Bootstrap "$@" start \
  >> "$CATALINA_OUT" 2>&1 "&"

这也解释了上次易交易产品遇到的那个问题，spring mvc action递归调用自己，catalina.out文件中没有异常日志。但是我在本地使用tomcat embed测试，会在控制台输出StackOverflowError。

tomcat日志优化点：

用org.apache.juli.AsyncFileHandler代替org.apache.juli.FileHandler
现在我们应用的日志目录都改为/var/log/webapps/{project_name}/。为了便于管理，tomcat 所有日志文件也会输出到/var/log/webapps/{project_name}/路径

以上优化已提交到定制化tomcat模板。

6.优化J.U.C

J.U.L代表java内置的日志机制，上面部分提到tomcat在遇到他搞不定的异常时，会把日志输出到localhost.yyyy-MM-dd.log。这样很不爽，很多同学在检查应用异常时，下意识的不会去看此日志文件，为了避免此问题，请使用下面的解决方案：

使用com.yjf.common.log.LogbackConfigListener,使用方法见doc

请使用yjf-common-util版本大于等于1.6.9.20130708的包

logback增加配置

<contextListener class="ch.qos.logback.classic.jul.LevelChangePropagator">
    <resetJUL>true</resetJUL>
  </contextListener>

从此以后，所有的异常日志都会在应用日志文件中找到。

7. 性能测试

测试四种情况，普通日志输出，普通日志禁用线程栈信息，异步日志，异步日志禁用线程栈信息，测试结果如下：

LoggerPerfTest.testCommonLogger: [measured 100000 out of 101000 rounds, threads: 16 (physical processors: 4)]
 round: 0.00 [+- 0.00], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 2, GC.time: 0.12, time.total: 10.73, time.warmup: 0.11, time.bench: 10.62

LoggerPerfTest.testCommonLoggerWithoutStackInfo: [measured 100000 out of 101000 rounds, threads: 16 (physical processors: 4)]
 round: 0.00 [+- 0.00], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 2, GC.time: 0.04, time.total: 3.93, time.warmup: 0.14, time.bench: 3.79

LoggerPerfTest.testAsyncLogger: [measured 100000 out of 101000 rounds, threads: 16 (physical processors: 4)]
 round: 0.00 [+- 0.00], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 3, GC.time: 0.15, time.total: 2.77, time.warmup: 0.06, time.bench: 2.71

LoggerPerfTest.testAsyncLoggerWithoutStackInfo: [measured 100000 out of 101000 rounds, threads: 16 (physical processors: 4)]
 round: 0.00 [+- 0.00], round.block: 0.00 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 1.86, time.warmup: 0.00, time.bench: 1.85

可见异步日志性能会提升不少，不收集栈信息性能也会提升不少。

使用ssh tunneling和JMX远程监控java程序

发表于 2013-05-18 | 分类于 java

很多时候，我们需要连到远程应用服务器上去观察java进程的运行情况。由于邪恶的防火墙限制我们很难直接连到应用服务器。所以，也有了本文…

阅读全文 »

使用jdk7新增异常构造器提高异常性能

发表于 2013-05-09 | 分类于 java

JDK7在Throwable对象中增加了一个新的构造器

protected Throwable(String message, Throwable cause,
                    boolean enableSuppression,
                    boolean writableStackTrace)

第三个参数表示是否启用suppressedExceptions(try代码快中抛出异常，在finally中又抛出异常，导致try中的异常丢失)
。第四个参数表示是否填充异常栈，如果为false，异常在初始化的时候不会调用本地方法fillInStackTrace。

阅读全文 »

Thinking Clearly about Performance笔记

发表于 2013-04-29 | 分类于 java

Thinking Clearly about Performance笔记

原文链接:Thinking Clearly about Performance

RESPONSE TIME VERSUS THROUGHPUT

响应时间和吞吐量没有太多关系.你要了解两个值需要测试两个值.下面两个例子说明为什么两者之间没有太多关系.

应用吞吐量为1000笔/s,用户的平均响应时间是多少?

如果应用下面是1000个服务提供者在提供服务,每一笔的响应时间最大可以为1s.所以,只能得出的结论是平均响应时间为0-1s
客户对某应用的需求为在单cpu的服务器上吞吐量为100笔/s.现在你写的应用每次执行耗时1ms,你的程序满足客户需求吗?

如果请求串行发过来,每次执行一个,一个执行完在执行下一个,这种情况应该还是可以满足需求的.但是如果这100个请求在1s内随机的发送过来,CPU调度器(比如线程上下文切换)和串行资源(比如CAS导致的重试)可能让你不能满足客户需求.

PERCENTILE SPECIFICATIONS

平均并不能精确的定义响应时间.假如你能容忍的响应时间是1s,对于不同的应用,他们的平均响应时间都是1s.但是应用A90%的请求响应时间都小于1s和应用B60%的请求响应时间都小于1s,这两个应用提供的服务性能是不一样的.我们一般可以如下的形式来定义响应时间:the “Track Shipment” task must complete in less than .5 second in at least 99.9 percent of executions.

PROBLEM DIAGNOSIS

明确用户的需求,用户不会精确的定义他对性能的需求.大多数时候,他只是说”系统太慢了,我们没办法使用”,可以引导用户提出他的需求Response time of X is more than 20 seconds in many cases. We’ll be happy when response time is one second or less in at least 95 percent of executions.

THE SEQUENCE DIAGRAM

The sequence diagram is a good tool for conceptualizing flow of control and the corresponding flow of time. To think clearly about response time, however, you need something else.

THE PROFILE

A profile shows where your code has spent your time and—sometimes even more importantly—where it has not. There is tremendous value in not having to guess about these things.

With a profile, you can begin to formulate the answer to the question, “How long should this task run?” which, by now, you know is an important question in the first step of any good problem diagnosis.

AMDAHL’S LAW

Performance improvement is proportional to how much a program uses the thing you improved. If the thing you’re trying to improve contributes only 5 percent to your task’s total response time, then the maximum impact you’ll be able to make is 5 percent of your total response time. This means that the closer to the top of a profile that you work (assuming that the profile is sorted in descending response-time order), the bigger the benefit potential for your overall response time.

MINIMIZING RISK

when everyone is happy except for you, make sure your local stuff is in order before you go messing around with the global stuff that affects everyone else, too.

LOAD

One measure of load is utilization, which is resource usage divided by resource capacity for a specified time interval.

There are two reasons that systems get slower as load increases: queuing delay and coherency delay.

QUEUING DELAY

Response time (R), in the perfect scalability M/M/m model, consists of two components: service time (S) and queuing delay (Q), or R = S + Q.

当谈到性能时,你期望一个系统满足下面两个目标:

最佳的响应时间(不用等太久就能获得结果)
最佳吞吐量(能服务更多的人)

但是这两个目标是互相矛盾的,优化第一个目标,需要你较少系统的负载.优化第二个目标,又需要你提高系统使用率,增加负载.你不能同时满足这两个目标,只能权衡取舍.

COHERENCY DELAY

Your system doesn’t have theoretically perfect scalability. Coherency delay is the factor that you can use to model the imperfection. It is the duration that a task spends communicating and coordinating access to a shared resource.

The utilization value at which this optimal balance occurs is called the knee. This is the point at which throughput is maximized with minimal negative impact to response times.

java序列化

发表于 2012-04-02 | 分类于 java

什么是序列化

serialization is the process of converting a data structure or object state into a format that can be stored (for example, in a file or memory buffer, or transmitted across a network connection link) and “resurrected” later in the same or another computer environment.

阅读全文 »