Let me start by saying that StorageCraft ShadowProtect is not in the slightest way at fault for the story I’m about to relay. The customer who should know better is entirely at fault. Let this be a warning to users of ShadowProtect, that there are things you do not understand and you should not mess with at all.
We’ve got a client at Correct Solutions who has a number of servers in a datacentre (DC). This is an installation that we setup earlier this year and we’ve got ShadowProtect doing the backups of the virtual servers running on Hyper-V to a local location at the DC. We are using Continuous Incrementals because they are well suited to allow us to replicate the small incremental files offsite to a DR location that they have. Continuous Incrementals are cool technology because they take one base file and then smaller incrementals after that point. ImageManager is then used to roll up those small incrementals into consolidated daily, weekly and monthly images. It’s a cool system really and one I’ve used for a number of years in many client environments.
Back to this client, we have ImageManager at the DC that does the image consolidation and replication to the ShadowStream server at the remote DR location and it’s been working pretty well for a number of months. The files in the DC effectively are replicated to the remote DR site so that we can quickly bring things up should a major event happen. The client has their own IT Manager handling things on a day to day basis and so we’re called in to help when things are not working. Yesterday he called up – apparently the ShadowProtect backups are failing on the servers at the DC. We take a look at it and we find that it looks like a stack of files including the base images are missing from the DC location. We then check and find a similar (not the same) number of files are missing from the remote location. This is pretty strange as we only ever replicate the small incremental files from the DC to the DR location and if a file is deleted from the DC location it’s never automatically deleted from the DR location unless ImageManager tells it to be… in which case the files in the DC and DR locations would be identically deleted… which it was not. Ok – so that means someone or something else deleted the files… We did some further checking and could see that the client logged onto the DC servers AFTER the 9am backup and BEFORE the 10am backup… the 10am backup was the first one to fail. The client of course says he did not delete the files at all, but in passing mentions about cleaning old files… CLICK…. Yup – what he did was to look at the files in the backup folder. Below is a screenshot from one of my own servers that I’ve had using ShadowProtect since mid 2010.
You can see the file ending in SPF – it’s the first full backup of this volume – note it’s 168GB in size. The other SPI files are the incrementals or in this case the consolidated incrementals that are the rolled up incrementals for later backups. In order to restore any file from backup, you need the SPF and all the SPI files forward until today. Now – notice the time stamps on the files. The Base SPF file was created on 25th July 2010 – and has not been changed since then. The SPI files have been changed after that to incorporate the consolidation of the incrementals in accordance with the ImageManager plans that I configured. You see the –i109-cd-cm.spi file – it’s not been touched since 6th October 2010 – that’s again due to the consolidation and retention settings I’ve configured. This is all good and everything works fine.
Back to this customer… what I believe is that he saw that there were a heap of files not touched for some time and he deleted them all… ouch… that is how he broke ShadowProtect. To make it worse – he ALSO deleted them from the DR site. Naturally the client denies this entirely, but I’m pretty sure that this is what happened as ImageManager was NOT configured to delete things like the base files etc and I’ve used ImageManager enough to know it won’t mess up like that.
How do we fix it for this client? Well luckily the solution is simple… delete EVERYTHING and start again with new Base Images (SPF). Then allow those to replicate to the DR site and everything will be good. ImageManager needs to be enabled to replicate the base files, but this is a minor config change we can do. The faster way would be to grab a USB hard drive and copy the Base Images (over 300GB in this case) and then take that to the DR site to seed the DR server. We’re waiting on the clients decision for this.
Long story short – don’t mess with ShadowProtect SPF and SPI files unless you know EXACTLY what you are doing. This client has lost his entire backup history for the past 6 months as a result of what he did. ShadowProtect and ImageManager will handle things just fine for the most part if you leave them well enough alone. I hope this story can help others better understand WHY some files look like they’ve not been touched and better understand the relationship between the files.