背景:
在我们的业务环境下部署如下
|服务器|硬件配置|
服务器 | 硬件配置 | 机器上部署了的组件 |
---|---|---|
ddx001 | 32C/128G/5.5T SSD | Zookeeper/Bookkeeper/Pulsar Broker |
ddx002 | 32C/128G/5.5T SSD | Zookeeper/Bookkeeper/Pulsar Broker |
ddx003 | 32C/128G/5.5T SSD | Zookeeper/Bookkeeper/Pulsar Broker |
在我们的机器上面有部署Debezium同步Mongo的数据
Debezium把Mongo的数据同步到mongo/hamster namespace中,flink程序把该命名空间的数据同步到bigdata/dwd中,在我们的运行过程中,出现单台broker hang死后,对应的端口6650和8080都存活,但是
Debezium无法将数据同步到Pulsar对应的broker,下游的Flink拿不到数据
晚上22点重启ddx001的broker后debezium数据同步至Pulsar恢复正常
flink早上消费了一会数据无法消费
查看debezium日志报错如下
2023-03-30T17:17:22,907+0800 [pulsar-client-io-1-10] WARN org.apache.pulsar.client.impl.ClientCnx - [id: 0x21142b91, L:/10.1.62.3:56732 - R:10.1.62.1/10.1.62.1:6650] request timeout {‘durationMs’: ‘30000’, ‘reqId’:‘2994521255180757909’, ‘remote’:‘10.1.62.1/10.1.62.1:6650’, ‘local’:‘/10.1.62.3:56732’}
2023-03-30T17:17:22,907+0800 [pulsar-client-io-1-10] ERROR org.apache.pulsar.client.impl.ProducerImpl - [persistent://mongo/hamster/dbservermongo.hamster.ClassSummary10] [pulsar-cluster-ddx-73-29] Failed to create producer: request timeout {‘durationMs’: ‘30000’, ‘reqId’:‘2994521255180757911’, ‘remote’:‘10.1.62.1/10.1.62.1:6650’, ‘local’:‘/10.1.62.3:56732’}
2023-03-30T17:17:22,907+0800 [pulsar-client-io-1-10] WARN org.apache.pulsar.client.impl.ConnectionHandler - [persistent://mongo/hamster/dbservermongo.hamster.ClassSummary10] [pulsar-cluster-ddx-73-29] Could not get connection to broker: request timeout {‘durationMs’: ‘30000’, ‘reqId’:‘2994521255180757911’, ‘remote’:‘10.1.62.1/10.1.62.1:6650’, ‘local’:‘/10.1.62.3:56732’} – Will try again in 56.137 s
2023-03-30T17:17:22,907+0800 [pulsar-client-io-1-10] WARN org.apache.pulsar.client.impl.ClientCnx - [id: 0x21142b91, L:/10.1.62.3:56732 - R:10.1.62.1/10.1.62.1:6650] request timeout {‘durationMs’: ‘30000’, ‘reqId’:‘2994521255180757911’, ‘remote’:‘10.1.62.1/10.1.62.1:6650’, ‘local’:‘/10.1.62.3:56732’}
2023-03-30T17:17:22,907+0800 [pulsar-client-io-1-10] ERROR org.apache.pulsar.client.impl.ProducerImpl - [persistent://mongo/hamster/dbservermongo.hamster.ClassRating1] [pulsar-cluster-ddx-73-66] Failed to create producer: request timeout {‘durationMs’: ‘30000’, ‘reqId’:‘2994521255180757913’, ‘remote’:‘10.1.62.1/10.1.62.1:6650’, ‘local’:‘/10.1.62.3:56732’}
2023-03-30T17:17:22,907+0800 [pulsar-client-io-1-10] WARN org.apache.pulsar.client.impl.ConnectionHandler - [persistent://mongo/hamster/dbservermongo.hamster.ClassRating1] [pulsar-cluster-ddx-73-66] Could not get connection to broker: request timeout {‘durationMs’: ‘30000’, ‘reqId’:‘2994521255180757913’, ‘remote’:‘10.1.62.1/10.1.62.1:6650’, ‘local’:‘/10.1.62.3:56732’} – Will try again in 57.424 s
2023-03-30T17:17:22,907+0800 [pulsar-client-io-1-10] WARN org.apache.pulsar.client.impl.ClientCnx - [id: 0x21142b91, L:/10.1.62.3:56732 - R:10.1.62.1/10.1.62.1:6650] request timeout {‘durationMs’: ‘30000’, ‘reqId’:‘2994521255180757913’, ‘remote’:‘10.1.62.1/10.1.62.1:6650’, ‘local’:‘/10.1.62.3:56732’}
通过我们的测试ddx001无法发送数据,ddx002,ddx003正常,重启ddx001上的broker之后,debezium可以把数据写入到broker
Q1:为什么debezium只能写到固定的broker
Q2: 为什么启动Flink后broker会hang死 端口还在
Q3:Debezium 在日常维护中无法做到自动切换别的节点,connector持续重启
Q4:Debezium在ddx001上有一个connector,上午10点ddx001机器挂了,11点切换到另外一个节点ddx003后,10点到11点中断期间的MySQL的数据Debezium拿不到,消息中间出现了空洞(GAP)