首頁 > 軟體

一次線上mongo慢查詢問題排查處理記錄

2022-03-18 13:00:23

一、簡單介紹

mongo語句查詢條件中出現null或空串,可能會導致索引失效,查詢優化器無法選擇正確的查詢計劃,出現慢查詢引起服務異常

mongo查詢的執行計劃使用了LRU快取,在很多種情況下會失效,導致重新選擇執行計劃並快取,供後續同類查詢直接使用;

服務開始時一直執行正常,當執行計劃失效後,恰好出現null值的查詢導致選擇了錯誤的執行計劃並快取,後續正常的查詢也會出現異常。

二、事件脈絡

1、起因

使用者中心收到業務方反饋,第三方登入註冊出現頻繁dubbo呼叫超時。

org.apache.dubbo.rpc.RpcException: Failed to invoke the method loginWithThird in the service weli.wormhole.rpc.user.center.api.ILoginNeedPhoneService. Tried 1 times of the providers [10.65.5.0:11090] (1/4) from the registry node1.zk.all.platform.wtc.hwhosts.com:2181 on the consumer 10.65.1.81 using the dubbo version 2.7.3-SNAPSHOT. Last error is: loginWithThird_2 failed and fallback failed.
  at org.apache.dubbo.rpc.cluster.support.FailoverClusterInvoker.doInvoke(FailoverClusterInvoker.java:113)
  at org.apache.dubbo.rpc.cluster.support.AbstractClusterInvoker.invoke(AbstractClusterInvoker.java:248)
  at org.apache.dubbo.rpc.cluster.support.wrapper.MockClusterInvoker.invoke(MockClusterInvoker.java:78)
  at org.apache.dubbo.rpc.proxy.InvokerInvocationHandler.invoke(InvokerInvocationHandler.java:55)
  at org.apache.dubbo.common.bytecode.proxy9.loginWithThird(proxy9.java)
  at weli.peanut.user.service.LoginService.loginAndRegister(LoginService.java:684)
  at weli.peanut.user.service.LoginService.loginByThirdPartyV2(LoginService.java:629)
  at weli.peanut.web.controller.api.UserLoginController.lambda$loginByThirdPartyV2$0(UserLoginController.java:113)
  at weli.peanut.common.controller.BaseController.resultForActionWithCat(BaseController.java:71)
  at weli.peanut.web.controller.api.UserLoginController.loginByThirdPartyV2(UserLoginController.java:113)
  at sun.reflect.GeneratedMethodAccessor2017.invoke(Unknown Source)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:205)
  at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:133)
  at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:116)
  at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:827)
  at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:738)
  at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:85)
  at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:963)
  at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:897)
  at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:970)
  at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:872)
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:646)
  at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:846)
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
  at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
  at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
  at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
  at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
  at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
  at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:197)
  at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
  at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
  at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
  at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
  at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
  at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:501)
  at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
  at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
  at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
  at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
  at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040)
  at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
  at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1721)
  at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1679)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)
Caused by: com.netflix.hystrix.exception.HystrixRuntimeException: loginWithThird_2 failed and fallback failed.
  at com.netflix.hystrix.AbstractCommand$22.call(AbstractCommand.java:818)
  at com.netflix.hystrix.AbstractCommand$22.call(AbstractCommand.java:793)
  at rx.internal.operators.OperatorOnErrorResumeNextViaFunction$4.onError(OperatorOnErrorResumeNextViaFunction.java:140)
  at rx.internal.operators.OnSubscribeDoOnEach$DoOnEachSubscriber.onError(OnSubscribeDoOnEach.java:87)
  at rx.internal.operators.OnSubscribeDoOnEach$DoOnEachSubscriber.onError(OnSubscribeDoOnEach.java:87)
  at com.netflix.hystrix.AbstractCommand$DeprecatedOnFallbackHookApplication$1.onError(AbstractCommand.java:1454)
  at com.netflix.hystrix.AbstractCommand$FallbackHookApplication$1.onError(AbstractCommand.java:1379)
  at rx.internal.operators.OnSubscribeDoOnEach$DoOnEachSubscriber.onError(OnSubscribeDoOnEach.java:87)
  at rx.observers.Subscribers$5.onError(Subscribers.java:230)
  at rx.internal.operators.OnSubscribeThrow.call(OnSubscribeThrow.java:44)
  at rx.internal.operators.OnSubscribeThrow.call(OnSubscribeThrow.java:28)
  at rx.Observable.unsafeSubscribe(Observable.java:10151)
  at rx.internal.operators.OnSubscribeDefer.call(OnSubscribeDefer.java:51)
  at rx.internal.operators.OnSubscribeDefer.call(OnSubscribeDefer.java:35)
  at rx.Observable.unsafeSubscribe(Observable.java:10151)
  at rx.internal.operators.OnSubscribeDoOnEach.call(OnSubscribeDoOnEach.java:41)
  at rx.internal.operators.OnSubscribeDoOnEach.call(OnSubscribeDoOnEach.java:30)
  at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48)
  at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30)
  at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48)
  at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30)
  at rx.Observable.unsafeSubscribe(Observable.java:10151)
  at rx.internal.operators.OnSubscribeDoOnEach.call(OnSubscribeDoOnEach.java:41)
  at rx.internal.operators.OnSubscribeDoOnEach.call(OnSubscribeDoOnEach.java:30)
  at rx.Observable.unsafeSubscribe(Observable.java:10151)
  at rx.internal.operators.OnSubscribeDoOnEach.call(OnSubscribeDoOnEach.java:41)
  at rx.internal.operators.OnSubscribeDoOnEach.call(OnSubscribeDoOnEach.java:30)
  at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48)
  at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30)
  at rx.Observable.unsafeSubscribe(Observable.java:10151)
  at rx.internal.operators.OnSubscribeDoOnEach.call(OnSubscribeDoOnEach.java:41)
  at rx.internal.operators.OnSubscribeDoOnEach.call(OnSubscribeDoOnEach.java:30)
  at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48)
  at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30)
  at rx.Observable.unsafeSubscribe(Observable.java:10151)
  at rx.internal.operators.OperatorOnErrorResumeNextViaFunction$4.onError(OperatorOnErrorResumeNextViaFunction.java:142)
  at rx.internal.operators.OnSubscribeDoOnEach$DoOnEachSubscriber.onError(OnSubscribeDoOnEach.java:87)
  at rx.internal.operators.OnSubscribeDoOnEach$DoOnEachSubscriber.onError(OnSubscribeDoOnEach.java:87)
  at rx.internal.operators.OperatorSubscribeOn$1$1.onError(OperatorSubscribeOn.java:59)
  at rx.observers.Subscribers$5.onError(Subscribers.java:230)
  at rx.internal.operators.OnSubscribeDoOnEach$DoOnEachSubscriber.onError(OnSubscribeDoOnEach.java:87)
  at rx.observers.Subscribers$5.onError(Subscribers.java:230)
  at com.netflix.hystrix.AbstractCommand$DeprecatedOnRunHookApplication$1.onError(AbstractCommand.java:1413)
  at com.netflix.hystrix.AbstractCommand$ExecutionHookApplication$1.onError(AbstractCommand.java:1344)
  at rx.observers.Subscribers$5.onError(Subscribers.java:230)
  at rx.observers.Subscribers$5.onError(Subscribers.java:230)
  at rx.internal.operators.OnSubscribeThrow.call(OnSubscribeThrow.java:44)
  at rx.internal.operators.OnSubscribeThrow.call(OnSubscribeThrow.java:28)
  at rx.Observable.unsafeSubscribe(Observable.java:10151)
  at rx.internal.operators.OnSubscribeDefer.call(OnSubscribeDefer.java:51)
  at rx.internal.operators.OnSubscribeDefer.call(OnSubscribeDefer.java:35)
  at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48)
  at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30)
  at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48)
  at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30)
  at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48)
  at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30)
  at rx.Observable.unsafeSubscribe(Observable.java:10151)
  at rx.internal.operators.OnSubscribeDefer.call(OnSubscribeDefer.java:51)
  at rx.internal.operators.OnSubscribeDefer.call(OnSubscribeDefer.java:35)
  at rx.Observable.unsafeSubscribe(Observable.java:10151)
  at rx.internal.operators.OnSubscribeDoOnEach.call(OnSubscribeDoOnEach.java:41)
  at rx.internal.operators.OnSubscribeDoOnEach.call(OnSubscribeDoOnEach.java:30)
  at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48)
  at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30)
  at rx.Observable.unsafeSubscribe(Observable.java:10151)
  at rx.internal.operators.OperatorSubscribeOn$1.call(OperatorSubscribeOn.java:94)
  at com.netflix.hystrix.strategy.concurrency.HystrixContexSchedulerAction$1.call(HystrixContexSchedulerAction.java:56)
  at com.netflix.hystrix.strategy.concurrency.HystrixContexSchedulerAction$1.call(HystrixContexSchedulerAction.java:47)
  at com.netflix.hystrix.strategy.concurrency.HystrixContexSchedulerAction.call(HystrixContexSchedulerAction.java:69)
  at rx.internal.schedulers.ScheduledAction.run(ScheduledAction.java:55)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  ... 3 more
Caused by: java.lang.RuntimeException: org.apache.dubbo.rpc.RpcException: Invoke remote method timeout. method: loginWithThird, provider: dubbo://10.65.5.0:11090/weli.wormhole.rpc.user.center.api.ILoginNeedPhoneService?anyhost=true&application=peanut-admin&bean.name=weli.wormhole.rpc.user.center.api.ILoginNeedPhoneService&check=false&deprecated=false&dubbo=2.0.2&dynamic=true&generic=false&interface=weli.wormhole.rpc.user.center.api.ILoginNeedPhoneService&lazy=false&loadbalance=leastactive&methods=logout,sendSms,loginWithVisitor,loginWithThird,loginWithPhone,loginWithThirdBindPhone&pid=1&register=true&register.ip=10.65.1.81&release=2.7.3-SNAPSHOT&remote.application=wormhole-api&retries=0&revision=1.1.11-SNAPSHOT&side=consumer&sticky=false&timeout=5000&timestamp=1647332146309, cause: org.apache.dubbo.remoting.TimeoutException: Waiting server-side response timeout by scan timer. start time: 2022-03-15 20:33:31.937, end time: 2022-03-15 20:33:36.968, client elapsed: 0 ms, server elapsed: 5031 ms, timeout: 5000 ms, request: Request [id=690506, version=2.0.2, twoway=true, event=false, broken=false, data=RpcInvocation [methodName=loginWithThird, parameterTypes=[class weli.wormhole.rpc.user.center.model.LoginWithThirdRequest, class weli.wormhole.rpc.user.center.model.LoginCommonParam], arguments=[LoginWithThirdRequest(accessToken=54_y3n-YsttdKOklr_X-mbjRgKAKROLDKfx3PV23zAQHmBvmT3zVliFp9mWodHSUEcKgNnJGC4AP_bAOX-2ZSSsfCHfNM5wfjbJDyiQh-sCjFQ, openId=ofrsY6xraPjv9augA9L2kUIB0-Fk, unionId=null, code=null, type=WX, nickName=null, headImgurl=null, sex=null, skipPhoneBind=1), LoginCommonParam(channel=vivo, ip=null, verName=null, platform=null, deviceId=null, imei=null, app=peanut, subApp=peanut, dfId=DUE_59hD2L9BBgukNR3Jv2DjnW00iRtU8z0eRFVFXzU5aEQyTDlCQmd1a05SM0p2MkRqblcwMGlSdFU4ejBlc2h1, idfa=null, mac=null, oaid=null, androidId=null)], attachments={path=weli.wormhole.rpc.user.center.api.ILoginNeedPhoneService, interface=weli.wormhole.rpc.user.center.api.ILoginNeedPhoneService, version=0.0.0, timeout=5000}]], channel: /10.65.1.81:35450 -> /10.65.5.0:11090
  at suishen.esb.hystrix.dubbo.DubboHystrixCommand.run(DubboHystrixCommand.java:93)
  at suishen.esb.hystrix.dubbo.DubboHystrixCommand.run(DubboHystrixCommand.java:12)
  at com.netflix.hystrix.HystrixCommand$2.call(HystrixCommand.java:301)
  at com.netflix.hystrix.HystrixCommand$2.call(HystrixCommand.java:297)
  at rx.internal.operators.OnSubscribeDefer.call(OnSubscribeDefer.java:46)
  ... 26 more
Caused by: org.apache.dubbo.rpc.RpcException: Invoke remote method timeout. method: loginWithThird, provider: dubbo://10.65.5.0:11090/weli.wormhole.rpc.user.center.api.ILoginNeedPhoneService?anyhost=true&application=peanut-admin&bean.name=weli.wormhole.rpc.user.center.api.ILoginNeedPhoneService&check=false&deprecated=false&dubbo=2.0.2&dynamic=true&generic=false&interface=weli.wormhole.rpc.user.center.api.ILoginNeedPhoneService&lazy=false&loadbalance=leastactive&methods=logout,sendSms,loginWithVisitor,loginWithThird,loginWithPhone,loginWithThirdBindPhone&pid=1&register=true&register.ip=10.65.1.81&release=2.7.3-SNAPSHOT&remote.application=wormhole-api&retries=0&revision=1.1.11-SNAPSHOT&side=consumer&sticky=false&timeout=5000&timestamp=1647332146309, cause: org.apache.dubbo.remoting.TimeoutException: Waiting server-side response timeout by scan timer. start time: 2022-03-15 20:33:31.937, end time: 2022-03-15 20:33:36.968, client elapsed: 0 ms, server elapsed: 5031 ms, timeout: 5000 ms, request: Request [id=690506, version=2.0.2, twoway=true, event=false, broken=false, data=RpcInvocation [methodName=loginWithThird, parameterTypes=[class weli.wormhole.rpc.user.center.model.LoginWithThirdRequest, class weli.wormhole.rpc.user.center.model.LoginCommonParam], arguments=[LoginWithThirdRequest(accessToken=54_y3n-YsttdKOklr_X-mbjRgKAKROLDKfx3PV23zAQHmBvmT3zVliFp9mWodHSUEcKgNnJGC4AP_bAOX-2ZSSsfCHfNM5wfjbJDyiQh-sCjFQ, openId=ofrsY6xraPjv9augA9L2kUIB0-Fk, unionId=null, code=null, type=WX, nickName=null, headImgurl=null, sex=null, skipPhoneBind=1), LoginCommonParam(channel=vivo, ip=null, verName=null, platform=null, deviceId=null, imei=null, app=peanut, subApp=peanut, dfId=DUE_59hD2L9BBgukNR3Jv2DjnW00iRtU8z0eRFVFXzU5aEQyTDlCQmd1a05SM0p2MkRqblcwMGlSdFU4ejBlc2h1, idfa=null, mac=null, oaid=null, androidId=null)], attachments={path=weli.wormhole.rpc.user.center.api.ILoginNeedPhoneService, interface=weli.wormhole.rpc.user.center.api.ILoginNeedPhoneService, version=0.0.0, timeout=5000}]], channel: /10.65.1.81:35450 -> /10.65.5.0:11090
  at org.apache.dubbo.rpc.protocol.AsyncToSyncInvoker.invoke(AsyncToSyncInvoker.java:63)
  at org.apache.dubbo.rpc.listener.ListenerInvokerWrapper.invoke(ListenerInvokerWrapper.java:78)
  at com.alibaba.dubbo.rpc.Invoker$CompatibleInvoker.invoke(Invoker.java:55)
  at suishen.esb.hystrix.dubbo.DubboHystrixCommand.run(DubboHystrixCommand.java:84)
  ... 30 more
Caused by: java.util.concurrent.ExecutionException: org.apache.dubbo.remoting.TimeoutException: Waiting server-side response timeout by scan timer. start time: 2022-03-15 20:33:31.937, end time: 2022-03-15 20:33:36.968, client elapsed: 0 ms, server elapsed: 5031 ms, timeout: 5000 ms, request: Request [id=690506, version=2.0.2, twoway=true, event=false, broken=false, data=RpcInvocation [methodName=loginWithThird, parameterTypes=[class weli.wormhole.rpc.user.center.model.LoginWithThirdRequest, class weli.wormhole.rpc.user.center.model.LoginCommonParam], arguments=[LoginWithThirdRequest(accessToken=54_y3n-YsttdKOklr_X-mbjRgKAKROLDKfx3PV23zAQHmBvmT3zVliFp9mWodHSUEcKgNnJGC4AP_bAOX-2ZSSsfCHfNM5wfjbJDyiQh-sCjFQ, openId=ofrsY6xraPjv9augA9L2kUIB0-Fk, unionId=null, code=null, type=WX, nickName=null, headImgurl=null, sex=null, skipPhoneBind=1), LoginCommonParam(channel=vivo, ip=null, verName=null, platform=null, deviceId=null, imei=null, app=peanut, subApp=peanut, dfId=DUE_59hD2L9BBgukNR3Jv2DjnW00iRtU8z0eRFVFXzU5aEQyTDlCQmd1a05SM0p2MkRqblcwMGlSdFU4ejBlc2h1, idfa=null, mac=null, oaid=null, androidId=null)], attachments={path=weli.wormhole.rpc.user.center.api.ILoginNeedPhoneService, interface=weli.wormhole.rpc.user.center.api.ILoginNeedPhoneService, version=0.0.0, timeout=5000}]], channel: /10.65.1.81:35450 -> /10.65.5.0:11090
  at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
  at org.apache.dubbo.rpc.protocol.AsyncToSyncInvoker.invoke(AsyncToSyncInvoker.java:56)
  ... 33 more
Caused by: org.apache.dubbo.remoting.TimeoutException: Waiting server-side response timeout by scan timer. start time: 2022-03-15 20:33:31.937, end time: 2022-03-15 20:33:36.968, client elapsed: 0 ms, server elapsed: 5031 ms, timeout: 5000 ms, request: Request [id=690506, version=2.0.2, twoway=true, event=false, broken=false, data=RpcInvocation [methodName=loginWithThird, parameterTypes=[class weli.wormhole.rpc.user.center.model.LoginWithThirdRequest, class weli.wormhole.rpc.user.center.model.LoginCommonParam], arguments=[LoginWithThirdRequest(accessToken=54_y3n-YsttdKOklr_X-mbjRgKAKROLDKfx3PV23zAQHmBvmT3zVliFp9mWodHSUEcKgNnJGC4AP_bAOX-2ZSSsfCHfNM5wfjbJDyiQh-sCjFQ, openId=ofrsY6xraPjv9augA9L2kUIB0-Fk, unionId=null, code=null, type=WX, nickName=null, headImgurl=null, sex=null, skipPhoneBind=1), LoginCommonParam(channel=vivo, ip=null, verName=null, platform=null, deviceId=null, imei=null, app=peanut, subApp=peanut, dfId=DUE_59hD2L9BBgukNR3Jv2DjnW00iRtU8z0eRFVFXzU5aEQyTDlCQmd1a05SM0p2MkRqblcwMGlSdFU4ejBlc2h1, idfa=null, mac=null, oaid=null, androidId=null)], attachments={path=weli.wormhole.rpc.user.center.api.ILoginNeedPhoneService, interface=weli.wormhole.rpc.user.center.api.ILoginNeedPhoneService, version=0.0.0, timeout=5000}]], channel: /10.65.1.81:35450 -> /10.65.5.0:11090
  at org.apache.dubbo.remoting.exchange.support.DefaultFuture.doReceived(DefaultFuture.java:189)
  at org.apache.dubbo.remoting.exchange.support.DefaultFuture.received(DefaultFuture.java:153)
  at org.apache.dubbo.remoting.exchange.support.DefaultFuture$TimeoutCheckTask.run(DefaultFuture.java:252)
  at org.apache.dubbo.common.timer.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:648)
  at org.apache.dubbo.common.timer.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:727)
  at org.apache.dubbo.common.timer.HashedWheelTimer$Worker.run(HashedWheelTimer.java:449)
  ... 1 more

2、排查處理

a、查詢使用者中心cat監控,確認使用者中心是否收到請求且正常響應。

當前狀態:

正常狀態:

對比結果:

使用者中心已收到正常請求,排除使用者端呼叫問題;

使用者中心當前響應時間異常高出平常的響應時間,判斷為使用者中心內部業務處理出現問題。

b、檢視kibana紀錄檔,發現無異常紀錄檔;

c、使用Arthas診斷工具,監控業務呼叫鏈各部分耗時情況;

定位原因為mongo查詢慢導致整體業務處理超時。

d、分析mongo查詢語句,檢視索引情況

public User getUserByDfId(String dfId, String app) {
        Criteria criteria = Criteria.where("userBindInfo.dfId").is(dfId).and("app").is(app).and("status").is(1);
        return this.mongoTemplate.findOne(Query.query(criteria), User.class);
}
db.getCollection('wormhole_user').getIndexes();
[
    {
        "v" : 1,
        "key" : {
            "app" : 1
        },
        "name" : "app",
        "ns" : "wormhole.wormhole_user",
        "background" : true
    },
    {
        "v" : 1,
        "key" : {
            "userBindInfo.dfId" : 1
        },
        "name" : "userBindInfo.dfId",
        "ns" : "wormhole.wormhole_user",
        "background" : true
    },
    ... (其餘不關注索引已刪除)
]

“userBindInfo.dfId”欄位存在索引,查詢語句也無明顯異常。

e、檢視華為雲後臺mongo慢紀錄檔

{
    "op": "query",
    "ns": "wormhole.wormhole_user",
    "command": {
        "find": "wormhole_user",
        "filter": {
            "userBindInfo.dfId": "DUQmN3QLpLRwC8PNe1joT_9SDRrzazT4cUc5RFVRbU4zUUxwTFJ3QzhQTmUxam9UXzlTRFJyemF6VDRjVWM1c2h1",
            "app": "maybe",
            "status": 1
        },
        "ntoreturn": -1
    },
    "keysExamined": 1870039,
    "docsExamined": 1870039,
    "cursorExhausted": true,
    "numYield": 14836,
    "nreturned": 0,
    "locks": {
        "Global": {
            "acquireCount": {
                "r": 14837
            }
        },
        "Database": {
            "acquireCount": {
                "r": 14837
            }
        },
        "Collection": {
            "acquireCount": {
                "r": 14837
            }
        }
    },
    "responseLength": 36,
    "millis": 14545,
    "planSummary": "IXSCAN { app: 1 }",
    "execStats": {
        "stage": "CACHED_PLAN",
        "nReturned": 0,
        "executionTimeMillisEstimate": 14552,
        "works": 1,
        "advanced": 0,
        "needTime": 0,
        "needYield": 0,
        "saveState": 14836,
        "restoreState": 14836,
        "isEOF": 1,
        "invalidates": 0,
        "inputStage": {
            "stage": "LIMIT",
            "nReturned": 0,
            "executionTimeMillisEstimate": 14145,
            "works": 1870040,
            "advanced": 0,
            "needTime": 1870039,
            "needYield": 0,
            "saveState": 14836,
            "restoreState": 14836,
            "isEOF": 1,
            "invalidates": 0,
            "limitAmount": 1,
            "inputStage": {
                "stage": "FETCH",
                "filter": {
                    "$and": [
                        {
                            "status": {
                                "$eq": 1
                            }
                        },
                        {
                            "userBindInfo.dfId": {
                                "$eq": "DUQmN3QLpLRwC8PNe1joT_9SDRrzazT4cUc5RFVRbU4zUUxwTFJ3QzhQTmUxam9UXzlTRFJyemF6VDRjVWM1c2h1"
                            }
                        }
                    ]
                },
                "nReturned": 0,
                "executionTimeMillisEstimate": 14117,
                "works": 1870040,
                "advanced": 0,
                "needTime": 1870039,
                "needYield": 0,
                "saveState": 14836,
                "restoreState": 14836,
                "isEOF": 1,
                "invalidates": 0,
                "docsExamined": 1870039,
                "alreadyHasObj": 0,
                "inputStage": {
                    "stage": "IXSCAN",
                    "nReturned": 1870039,
                    "executionTimeMillisEstimate": 931,
                    "works": 1870040,
                    "advanced": 1870039,
                    "needTime": 0,
                    "needYield": 0,
                    "saveState": 14836,
                    "restoreState": 14836,
                    "isEOF": 1,
                    "invalidates": 0,
                    "keyPattern": {
                        "app": 1
                    },
                    "indexName": "app",
                    "isMultiKey": false,
                    "multiKeyPaths": {
                        "app": []
                    },
                    "isUnique": false,
                    "isSparse": false,
                    "isPartial": false,
                    "indexVersion": 1,
                    "direction": "forward",
                    "indexBounds": {
                        "app": [
                            "["maybe", "maybe"]"
                        ]
                    },
                    "keysExamined": 1870039,
                    "seeks": 1,
                    "dupsTested": 0,
                    "dupsDropped": 0,
                    "seenInvalidated": 0
                }
            }
        }
    },
    "ts": {
        "$date": 1647359086553
    },
    "client": "10.65.5.0",
    "allUsers": [
        {
            "user": "mongosiud",
            "db": "wormhole"
        }
    ],
    "user": "mongosiud@wormhole"
}

通過慢紀錄檔發現,mongo並未使用“userBindInfo.dfId”作為索引查詢條件,而是使用了“app”作為索引查詢條件。

初步認為索引匹配度不夠,mongo沒正確使用索引。

f、新增聯合索引,提高索引匹配度,查詢時間恢復正常,介面恢復正常。

db.getCollection("wormhole_user").createIndex({
    "userBindInfo.dfId": 1,
    "app": 1,
    "status": 1 
}, {
    name: "idx_user_app_userBindInfo.dfId",
    background: true
});

3、問題分析

a、explain()語句分析

db.getCollection('wormhole_user').find({"userBindInfo.dfId": "DUqTmV8lQcInjzWtuyJuAntkbRi5_YZ3yHd9RFVxVG1WOGxRY0luanpXdHV5SnVBbnRrYlJpNV9ZWjN5SGQ5c2h1", "app": "maybe", "status": 1}).explain();
{
    "queryPlanner" : {
        "plannerVersion" : 1,
        "namespace" : "wormhole.wormhole_user",
        "indexFilterSet" : false,
        "parsedQuery" : {
            "$and" : [ 
                {
                    "app" : {
                        "$eq" : "maybe"
                    }
                }, 
                {
                    "status" : {
                        "$eq" : 1.0
                    }
                }, 
                {
                    "userBindInfo.dfId" : {
                        "$eq" : "DUqTmV8lQcInjzWtuyJuAntkbRi5_YZ3yHd9RFVxVG1WOGxRY0luanpXdHV5SnVBbnRrYlJpNV9ZWjN5SGQ5c2h1"
                    }
                }
            ]
        },
        "winningPlan" : {
            "stage" : "FETCH",
            "filter" : {
                "$and" : [ 
                    {
                        "app" : {
                            "$eq" : "maybe"
                        }
                    }, 
                    {
                        "status" : {
                            "$eq" : 1.0
                        }
                    }
                ]
            },
            "inputStage" : {
                "stage" : "IXSCAN",
                "keyPattern" : {
                    "userBindInfo.dfId" : 1
                },
                "indexName" : "userBindInfo.dfId",
                "isMultiKey" : false,
                "multiKeyPaths" : {
                    "userBindInfo.dfId" : []
                },
                "isUnique" : false,
                "isSparse" : false,
                "isPartial" : false,
                "indexVersion" : 1,
                "direction" : "forward",
                "indexBounds" : {
                    "userBindInfo.dfId" : [ 
                        "["DUqTmV8lQcInjzWtuyJuAntkbRi5_YZ3yHd9RFVxVG1WOGxRY0luanpXdHV5SnVBbnRrYlJpNV9ZWjN5SGQ5c2h1", "DUqTmV8lQcInjzWtuyJuAntkbRi5_YZ3yHd9RFVxVG1WOGxRY0luanpXdHV5SnVBbnRrYlJpNV9ZWjN5SGQ5c2h1"]"
                    ]
                }
            }
        },
        "rejectedPlans" : [ 
            {
                "stage" : "FETCH",
                "filter" : {
                    "$and" : [ 
                        {
                            "status" : {
                                "$eq" : 1.0
                            }
                        }, 
                        {
                            "userBindInfo.dfId" : {
                                "$eq" : "DUqTmV8lQcInjzWtuyJuAntkbRi5_YZ3yHd9RFVxVG1WOGxRY0luanpXdHV5SnVBbnRrYlJpNV9ZWjN5SGQ5c2h1"
                            }
                        }
                    ]
                },
                "inputStage" : {
                    "stage" : "IXSCAN",
                    "keyPattern" : {
                        "app" : 1
                    },
                    "indexName" : "app",
                    "isMultiKey" : false,
                    "multiKeyPaths" : {
                        "app" : []
                    },
                    "isUnique" : false,
                    "isSparse" : false,
                    "isPartial" : false,
                    "indexVersion" : 1,
                    "direction" : "forward",
                    "indexBounds" : {
                        "app" : [ 
                            "["maybe", "maybe"]"
                        ]
                    }
                }
            }, 
            {
                "stage" : "FETCH",
                "filter" : {
                    "$and" : [ 
                        {
                            "app" : {
                                "$eq" : "maybe"
                            }
                        }, 
                        {
                            "userBindInfo.dfId" : {
                                "$eq" : "DUqTmV8lQcInjzWtuyJuAntkbRi5_YZ3yHd9RFVxVG1WOGxRY0luanpXdHV5SnVBbnRrYlJpNV9ZWjN5SGQ5c2h1"
                            }
                        }, 
                        {
                            "status" : {
                                "$eq" : 1.0
                            }
                        }
                    ]
                },
                "inputStage" : {
                    "stage" : "AND_SORTED",
                    "inputStages" : [ 
                        {
                            "stage" : "IXSCAN",
                            "keyPattern" : {
                                "app" : 1
                            },
                            "indexName" : "app",
                            "isMultiKey" : false,
                            "multiKeyPaths" : {
                                "app" : []
                            },
                            "isUnique" : false,
                            "isSparse" : false,
                            "isPartial" : false,
                            "indexVersion" : 1,
                            "direction" : "forward",
                            "indexBounds" : {
                                "app" : [ 
                                    "["maybe", "maybe"]"
                                ]
                            }
                        }, 
                        {
                            "stage" : "IXSCAN",
                            "keyPattern" : {
                                "userBindInfo.dfId" : 1
                            },
                            "indexName" : "userBindInfo.dfId",
                            "isMultiKey" : false,
                            "multiKeyPaths" : {
                                "userBindInfo.dfId" : []
                            },
                            "isUnique" : false,
                            "isSparse" : false,
                            "isPartial" : false,
                            "indexVersion" : 1,
                            "direction" : "forward",
                            "indexBounds" : {
                                "userBindInfo.dfId" : [ 
                                    "["DUqTmV8lQcInjzWtuyJuAntkbRi5_YZ3yHd9RFVxVG1WOGxRY0luanpXdHV5SnVBbnRrYlJpNV9ZWjN5SGQ5c2h1", "DUqTmV8lQcInjzWtuyJuAntkbRi5_YZ3yHd9RFVxVG1WOGxRY0luanpXdHV5SnVBbnRrYlJpNV9ZWjN5SGQ5c2h1"]"
                                ]
                            }
                        }
                    ]
                }
            }
        ]
    },
    "serverInfo" : {
        "host" : "host-192-168-10-163",
        "port" : 27017,
        "version" : "4.0.3",
        "gitVersion" : "0ead8608f9d151a199b05117bcc79ccb8d5f44a0"
    },
    "ok" : 1.0,
    "operationTime" : Timestamp(1647486920, 2),
    "$clusterTime" : {
        "clusterTime" : Timestamp(1647486920, 2),
        "signature" : {
            "hash" : { "$binary" : "ylTAtLK6mVpNwDt7PTGwNST/9z4=", "$type" : "00" },
            "keyId" : NumberLong(7031512438860152835)
        }
    }
}
  • 發現mongo執行計劃選擇正確,這正好解釋了服務在之前很長的一段時間內都是正常的原因;
  • mongo慢紀錄檔中的執行計劃卻選擇錯誤,表明有什麼原因導致了mongo執行計劃的選擇發生了變更。

b、檔案查詢

  • MongoDB 查詢優化器會快取最有效的查詢計劃,關聯的計劃快取條目會用於具有相同查詢形狀的後續查詢;
  • 計劃快取會出現重新整理:
    • MongoDB重啟;
    • 索引或者集合的刪除新增更新等操作;
    • 最近最少使用 (LRU) 快取替換機制清除最近最少存取的快取條目。

結論:之前使用了計劃快取服務正常,到了某一時刻,計劃快取失效,MongoDB重新選擇計劃,此時選擇了錯誤的計劃並快取,導致後續查詢全部出現問題

c、查詢首次出現的慢紀錄檔

{
    "op":"query",
    "ns":"wormhole.wormhole_user",
    "command":{
        "find":"wormhole_user",
        "filter":{
            "userBindInfo.dfId":null,
            "app":"cybercat",
            "status":1
        },
        "ntoreturn":-1
    },
    "keysExamined":872550,
    "docsExamined":872550,
    "fromMultiPlanner":true,
    "replanned":true,
    "cursorExhausted":true,
    "numYield":13785,
    "nreturned":0,
    "locks":{
        "Global":{
            "acquireCount":{
                "r":13786
            }
        },
        "Database":{
            "acquireCount":{
                "r":13786
            }
        },
        "Collection":{
            "acquireCount":{
                "r":13786
            }
        }
    },
    "responseLength":36,
    "millis":27864,
    "planSummary":"IXSCAN { app: 1 }",
    "execStats":{
        "stage":"LIMIT",
        "nReturned":0,
        "executionTimeMillisEstimate":26216,
        "works":872551,
        "advanced":0,
        "needTime":872550,
        "needYield":0,
        "saveState":13785,
        "restoreState":13785,
        "isEOF":1,
        "invalidates":0,
        "limitAmount":1,
        "inputStage":{
            "stage":"FETCH",
            "filter":{
                "$and":[
                    {
                        "status":{
                            "$eq":1
                        }
                    },
                    {
                        "userBindInfo.dfId":{
                            "$eq":null
                        }
                    }
                ]
            },
            "nReturned":0,
            "executionTimeMillisEstimate":26164,
            "works":872551,
            "advanced":0,
            "needTime":872550,
            "needYield":0,
            "saveState":13785,
            "restoreState":13785,
            "isEOF":1,
            "invalidates":0,
            "docsExamined":872550,
            "alreadyHasObj":0,
            "inputStage":{
                "stage":"IXSCAN",
                "nReturned":872550,
                "executionTimeMillisEstimate":249,
                "works":872551,
                "advanced":872550,
                "needTime":0,
                "needYield":0,
                "saveState":13785,
                "restoreState":13785,
                "isEOF":1,
                "invalidates":0,
                "keyPattern":{
                    "app":1
                },
                "indexName":"app",
                "isMultiKey":false,
                "multiKeyPaths":{
                    "app":[

                    ]
                },
                "isUnique":false,
                "isSparse":false,
                "isPartial":false,
                "indexVersion":1,
                "direction":"forward",
                "indexBounds":{
                    "app":[
						"["cybercat", "cybercat"]"
                    ]
                },
                "keysExamined":872550,
                "seeks":1,
                "dupsTested":0,
                "dupsDropped":0,
                "seenInvalidated":0
            }
        }
    },
    "ts":{
        "$date":1647309521700
    },
    "client":"10.65.1.25",
    "allUsers":[
        {
            "user":"mongosiud",
            "db":"wormhole"
        }
    ],
    "user":"mongosiud@wormhole"
}

發現查詢條件中出現了null值,本地再次分析:

db.getCollection('wormhole_user').find({"userBindInfo.dfId": null, "app": "maybe", "status": 1}).explain();
{
    "queryPlanner" : {
        "plannerVersion" : 1,
        "namespace" : "wormhole.wormhole_user",
        "indexFilterSet" : false,
        "parsedQuery" : {
            "$and" : [ 
                {
                    "app" : {
                        "$eq" : "maybe"
                    }
                }, 
                {
                    "status" : {
                        "$eq" : 1.0
                    }
                }, 
                {
                    "userBindInfo.dfId" : {
                        "$eq" : null
                    }
                }
            ]
        },
        "winningPlan" : {
            "stage" : "FETCH",
            "filter" : {
                "$and" : [ 
                    {
                        "status" : {
                            "$eq" : 1.0
                        }
                    }, 
                    {
                        "userBindInfo.dfId" : {
                            "$eq" : null
                        }
                    }
                ]
            },
            "inputStage" : {
                "stage" : "IXSCAN",
                "keyPattern" : {
                    "app" : 1
                },
                "indexName" : "app",
                "isMultiKey" : false,
                "multiKeyPaths" : {
                    "app" : []
                },
                "isUnique" : false,
                "isSparse" : false,
                "isPartial" : false,
                "indexVersion" : 1,
                "direction" : "forward",
                "indexBounds" : {
                    "app" : [ 
                        "["maybe", "maybe"]"
                    ]
                }
            }
        },
        "rejectedPlans" : [ 
            {
                "stage" : "FETCH",
                "filter" : {
                    "$and" : [ 
                        {
                            "userBindInfo.dfId" : {
                                "$eq" : null
                            }
                        }, 
                        {
                            "app" : {
                                "$eq" : "maybe"
                            }
                        }, 
                        {
                            "status" : {
                                "$eq" : 1.0
                            }
                        }
                    ]
                },
                "inputStage" : {
                    "stage" : "IXSCAN",
                    "keyPattern" : {
                        "userBindInfo.dfId" : 1
                    },
                    "indexName" : "userBindInfo.dfId",
                    "isMultiKey" : false,
                    "multiKeyPaths" : {
                        "userBindInfo.dfId" : []
                    },
                    "isUnique" : false,
                    "isSparse" : false,
                    "isPartial" : false,
                    "indexVersion" : 1,
                    "direction" : "forward",
                    "indexBounds" : {
                        "userBindInfo.dfId" : [ 
                            "[undefined, undefined]", 
                            "[null, null]"
                        ]
                    }
                }
            }
        ]
    },
    "serverInfo" : {
        "host" : "host-192-168-10-163",
        "port" : 27017,
        "version" : "4.0.3",
        "gitVersion" : "0ead8608f9d151a199b05117bcc79ccb8d5f44a0"
    },
    "ok" : 1.0,
    "operationTime" : Timestamp(1647489666, 1),
    "$clusterTime" : {
        "clusterTime" : Timestamp(1647489666, 1),
        "signature" : {
            "hash" : { "$binary" : "1KJI3aoz2QbOwTKlbkNl9SmWLzw=", "$type" : "00" },
            "keyId" : NumberLong(7031512438860152835)
        }
    }
}

結果與線上表現一致

4、回顧

緊急處理時,直接建立了一個新的索引,導致了計劃快取失效,重新選擇計劃並快取,所以服務恢復了正常。

三、總結

  • 避免空值的查詢;
  • 儘可能的使用explain()分析各種不同的查詢情況;
  • 可以使用mongo提供的PlanCache相關功能,檢視計劃快取情況;
  • 可使用hint()推薦查詢索引。

到此這篇關於一次線上mongo慢查詢問題排查處理的文章就介紹到這了,更多相關mongo慢查詢問題排查內容請搜尋it145.com以前的文章或繼續瀏覽下面的相關文章希望大家以後多多支援it145.com!


IT145.com E-mail:sddin#qq.com