Websphere MQ: Automate Channel Sequence Number Mismatch Issue(AMQ9526) via Shell Scripting.

Mohammed Faizan
5 min readApr 5, 2021

“I always chose a lazy person to do a difficult job… because he’ll find an easy way to do it”. We all know who said this famous quote and he’s one of the most richest and successful person ever . Right?

I’m thrilled to share my article on some of my best work while being an MQ systems administrator. It’s an automation based on Shell Scripting.

Here’s my story:

I was deployed in a Migrations team where we have to migrate MQ Queue Managers from Solaris to Linux servers. While the migration was in UAT phase and the Production migration likely to commence there was very limited time and a lot of pressure. During that time this was my side project just like Alan Turing building his Christopher to decipher the Enigma and win the war!

Issue:

You have a channel from MQ queue manager A (QM_A) to queue manager B (QM_B). You are sending messages from QM_A to QM_B when a network error occurs and causes the channel to disconnect and when the channel starts back up you get the following message on the receiver side (QM_B):

AMQ9526 Message sequence number error for channel QM_A-to-QM_B.

Explanation:

The local and remote queue managers do not agree on the next message sequence number. A message with sequence number 15 has been sent when sequence number 13 was expected.

How we manually fix it?

Looking from the above example, if the channel is sender, then first number (15) should be reset and if it is receiver, second number (13) should be reset. The logic and resolution to this error is pretty simple but what if there is an occurrence to this error and if we don’t reset the channel immediately in critical environments? The messages will overflow from the XMITQ(Transmission) Queue to the System Trash Queue (SYSTEM.DEAD.LETTER.QUEUE) resulting in high priority incidents. We need to login to the server, check for the Queue Manager error logs and find the right sequence number, then we need to restart the channel and the steps are given below:

STOP CHANNEL(<CHANNEL_NAME>) MODE(FORCE)

RESET CHANNEL(<CHANNEL_NAME) SEQNUM(15)

START CHANNEL(<CHANNEL_NAME)

Pretty simple. Right? Well.. if we have a Queue Manager(s) having large number of channels, if there is a migration activity going on where the occurrence of this error is inevitable, it’s very hectic to manually resolve each and every channel so I thought Hey! why don’t I just automate this whole stuff and not have to worry about it as I have a lot of other things to take care of? Necessity is the mother of all inventions. Right? This is when I decided to start this solo journey which later became instrumental in boosting the migration process and achieving our almost unrealistic targets of migrating over 70 Queue Managers in very limited time.

How I automated this?

I was just a beginner in Shell Scripting at that time. So, I started googling in websites like GeeksforGeeks, Stack Overflow, Stack Exchange etc. I was able to write a script which can collect the channel name, sequence number and automatically execute the reset operation. But still, what was I missing? Why are the users still reporting issues? What I did not take care was that channels are repeatedly getting reset every time I run the script along with the newly affected channels. So, this script is not ready to be deployed in production. Rejected! It is then when I thought this little side project was over and maybe I should just give up and focus more on regular stuffs which I did. I gave up and left this thing for a while. But then I thought Hey! what if we can have a record tracking system something that the script makes use of the information that occurred in the past something that can tell my script whether which channel to execute and which one to ignore. It is just a combination of simple commands diff, mv and rm that did wonders for me.

Now every time the script runs, the information is collected and passed on to a file which is then compared to the file from the previous run using the diff command. The diff command finds out which channels to execute and which channels are already executed and hence, can be ignored. By using rm, the previous file is deleted and the newly generated file is replaced as the previous file using mv command. This way I was able to avoid the repetition issue. Eureka! Working as expected! The script is ready to be re-presented. My lead suggested to set up an alerting system in case the channel does not come up even after reset, we should get an alert mail. This is the back-out mechanism incase if the automation fails, we do it old-school! I thought ok, we’ve come this far let’s achieve perfection! After further googling, I was able to set-up the alerting systems using mailx command and we’re all set!

Impact:

What would’ve happened without it?

Support Engineers like me will have to keep monitoring the active error log for a brief period for any errors and if so, any delay in remediation would result in major outages. Trust me I’ve been there! Any manual change in production environment would have to go through specific change request which includes all the necessary business approvals. Each and every action would’ve to be reviewed and any minor mistake would result in critical outages. This would make the work more complicated and difficult.

What’s happened with it?

We decided to keep this invention a secret as it’d minimize the user’s participations for testing during the migration process which again relates to the movie “The Imitation Game” where they had to keep the invention a secret because the Germans would find out and change the Enigma settings again. Dear Users, If you want you raise the change request please do and if you can’t, hey let me tell you a secret it’s not a problem because the “The Dark Knight has got your back!”

So, when the migration process begins, I’d simply reroute the server DNS to the new Linux server, head over to the crontab and unleash the script to run every 5 minutes. Engage Autopilot! My script keeps scanning the logs while in camouflage and attacks immediately when he finds the predator.

This way we were able to reduce manual efforts, enhance security, reduce ticket count and speed up the entire migration process. Thus, we “shortened the war by more than two years, saving over 14 million lives”.

While I was on this journey of self learning and creating, I picked up on other various shell commands and discovered endless possibilities with bash and it was a real gift! I was able to automate other regular tasks, contributed towards other aspects of work where automation was required. My work became more fun when I was able to finish that thing efficiently with ease which otherwise would’ve been difficult and lengthy. Became famous within the team!

Please click on the link below to my Github account to view the script. In-depth explanation of how the script works and usage instructions are available in the README file so please do not use the script directly without reading the same. 😊

https://www.github.com/githubmofaizan/Websphere_MQ_Scripts

Script name: seq_mis_reset.sh

Tested Environment: Linux, MQ v9

--

--