起因
线上出现了一段空指针异常,根据异常栈和经验,第一时间的反应是消费者段在发序列化的时候报错了,觉得应该是返回值在反序列化时出问题。但是前前后后检查了好几次返回值,都没有发现有什么有问题的地方。下面是异常堆栈:
java.lang.NullPointerException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.alibaba.com.caucho.hessian.io.JavaDeserializer.instantiate(JavaDeserializer.java:271)
at com.alibaba.com.caucho.hessian.io.JavaDeserializer.readObject(JavaDeserializer.java:155)
at com.alibaba.com.caucho.hessian.io.SerializerFactory.readObject(SerializerFactory.java:397)
at com.alibaba.com.caucho.hessian.io.Hessian2Input.readObjectInstance(Hessian2Input.java:2070)
at com.alibaba.com.caucho.hessian.io.Hessian2Input.readObject(Hessian2Input.java:2005)
at com.alibaba.com.caucho.hessian.io.Hessian2Input.readObject(Hessian2Input.java:1990)
at com.alibaba.dubbo.common.serialize.support.hessian.Hessian2ObjectInput.readObject(Hessian2ObjectInput.java:88)
at com.alibaba.dubbo.rpc.protocol.dubbo.DecodeableRpcResult.decode(DecodeableRpcResult.java:92)
at com.alibaba.dubbo.rpc.protocol.dubbo.DecodeableRpcResult.decode(DecodeableRpcResult.java:109)
at com.alibaba.dubbo.rpc.protocol.dubbo.DubboCodec.decodeBody(DubboCodec.java:97)
at com.alibaba.dubbo.remoting.exchange.codec.ExchangeCodec.decode(ExchangeCodec.java:129)
at com.alibaba.dubbo.remoting.exchange.codec.ExchangeCodec.decode(ExchangeCodec.java:90)
at com.alibaba.dubbo.rpc.protocol.dubbo.DubboCountCodec.decode(DubboCountCodec.java:46)
at com.alibaba.dubbo.remoting.transport.netty.NettyCodecAdapter$InternalDecoder.messageReceived(NettyCodecAdapter.java:134)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at com.alibaba.ttl.TtlRunnable.run(TtlRunnable.java:49)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
于是,在dubbo的github项目中,我发现了一个类似的issue,https://github.com/apache/dubbo/issues/619,与我遇到的问题完全一致。
原因
详细地异常追踪和源码分析请参见:https://www.yuque.com/zhaixiaoxiang/fmga67/sr041v
下面简要理下为什么会有这样的异常输出,输出的堆栈为什么是这样的。
provider抛出没有堆栈的异常
频繁地打印堆栈信息,会造成大量的IO资源浪费。出于性能优化的角度考虑,如果抛出了大量的内置的”cold”的异常,jvm就会重新编译,不再打印异常堆栈,只会抛出一个没有堆栈的预分配的异常。 https://www.oracle.com/technetwork/java/javase/relnotes-139183.html
The compiler in the server VM now provides correct stack backtraces for all "cold" built-in exceptions. For performance purposes, when such an exception is thrown a few times, the method may be recompiled. After recompilation, the compiler may choose a faster tactic using preallocated exceptions that do not provide a stack trace. To disable completely the use of preallocated exceptions, use this new flag: -XX:-OmitStackTraceInFastThrow.
所以provider返回给consumer的并不是一个正常的result输出,而是一个没有堆栈的异常。
consumer输出堆栈信息时,输出了默认的当前堆栈
consumer在InvokerInvocationHandler动态代理返回结果给业务时,拿到了服务端的异常,throw了出去。(从服务端拿到的异常,所以是没有堆栈的),但是在我们打印日志输出异常堆栈的时候,会调用getOurStackTrace()
方法获取堆栈信息以供输出。
这个方法可以会在栈为空的情况下(UNASSIGNED_STACK= new StackTraceElement[0]),为其赋上当前栈的相关信息。所以,provider虽然返回的是一个空的异常,consumer拿到的也是一个空的异常,但是输出的时候,带上了consumer当前栈的相关信息。
private synchronized StackTraceElement[] getOurStackTrace() {
// Initialize stack trace field with information from
// backtrace if this is the first call to this method
if (stackTrace == UNASSIGNED_STACK ||
(stackTrace == null && backtrace != null) /* Out of protocol state */) {
int depth = getStackTraceDepth();
stackTrace = new StackTraceElement[depth];
for (int i=0; i < depth; i++)
stackTrace[i] = getStackTraceElement(i);
} else if (stackTrace == null) {
return UNASSIGNED_STACK;
}
return stackTrace;
}
下面是关于异常的一段测试代码,
public static void main(String[] args) throws NoSuchFieldException, IllegalAccessException {
Throwable throwable = new Throwable();
throwable.printStackTrace();
System.err.println("========================================");
Field stackTrace = throwable.getClass().getDeclaredField("stackTrace");
stackTrace.setAccessible(true);
stackTrace.set(throwable, null);
try {
throw throwable;
} catch (Throwable ex) {
ex.printStackTrace();
}
}
java.lang.Throwable
at com.xzy.test.TestMain2.getThrowable(TestMain2.java:31)
at com.xzy.test.TestMain2.main(TestMain2.java:13)
========================================
java.lang.Throwable
at com.xzy.test.TestMain2.getThrowable(TestMain2.java:31)
at com.xzy.test.TestMain2.main(TestMain2.java:13)
debug可以看到catch到的ex,它的stackTrace是null的,但是输出依然是有内容。因为在调用getOurStackTrace()
方法时又为异常赋上了当前栈的信息。
资料总结
- 相关github issue:https://github.com/apache/dubbo/issues/619
- 关于发序列化的内容分析:http://pfmiles.github.io/blog/recently-hessian-deserialize-problem-and-thread-pool-executor-experience/
- 日志不打印异常栈:https://github.com/edwardlee03/issue-case/blob/master/java/exception/10.日志未打印Java异常堆栈跟踪stackTrace.md
- 问题分析:https://www.yuque.com/fa902k/id5z6r/sr041v