Now that our new PS6000 is initialized and ready to go, we started to look at replication and what we wanted to achieve in terms of backup/recovery and failover/failback. The idea was to replicate our Hyper-V virtual Machines to this second PS Series group and also all our users data. The Second PS Series group will be onsite but at the furthest most point away from our primary PS Series group in a seperate building. We also needed to look at our SAN Network topology and how we needed to extend this to give us the High Availability and bandwidth we required.
EqualLogic comes with SAN based replication built into the inclusive licensing model and with tools such as SANHQ and Dell's Manual Transfer Utility or MTU for short configuring replication is once again a nice and easy process.
Firsty lets take a quick look at what EqualLogic replication is and what it involves.
Replication involves copying volume data from a PS Series group to another PS Series group. A replica contains the data on the volumes at the time the replica was created, replacted volumes have replica sets containing a set of replicas created over time. As mentiond above replication is between two PS Series groups, this is a must, it is not possible to replicate volumes within the same PS Series group. When replicating volume data you configure replication partners, you can configure a PS Series group to have multiple partners but you can only configure a volume to replicate to one of these partners. So one EqualLogic group can have many replication partners but a single volume can only be replicated to one of these partners. Each EqualLogic group can have up to 16 replication partners and each group can replicate up to 128 volumes and actively replicate these volumes at any time. In a group of only PS4000 arrays, the limits are two members per group, two replication partners, and 32 volumes configured for replication.
EqualLogic operates a point-in-time snapshot SAN replication model. Pointer based snapshots are replicated to the secondary PS series group, this means that large amounts of data can be replicated over lower bandwidth links efficiently.
- Primary PS Series group. This is the group that stores the live volume. The primary group configures the secondary group as a replication partner, with the use of a mutual authentication password. Replication from this group is outbound as the volumes are being replicated to another group.
- Secondary PS Series group. This is the group that stores the replicas, space is delegated on this group for the storing of replicas/replica sets. Delegated space can be increased or decreased aslong as the space is available or is not in use. The secondary group configures the primary group as a replication partner, again with the use of a mutual authentication password for security. Replication to this group is inbound, from the primary group.
Once again it is all about capacity planning, replication can take up more space then you expect, especially on your primary PS Series group. Plan the replication configuration for your volumes to determine the amount of space required on both the primary and secondary groups. This will determine if there is firstly enough local space available for replication and secondly help plan for the amount of space required on your secondary PS group.
Local space called, 'local replication reserve' is required to track changes to the volume as it is being replicated, it can also store a 'Failback Snapshot' if you choose this option when configuring replication.
The main points to consider when planning how much space you will require are;
- The size of the volumes you want to replicate - the larger the volume the more delegated space and local replica reserve is needed.
- How often does the data change? - how often do I want to replicate my volumes, if data changes frequently then you may want to replicate every 5 minutes.
- How many replicas do I want to keep ? - The more replicas you keep the more delegated space you will require.
- What are the recovery needs of you organisation ? - Answering this question greatly helps planning your replication strategy, and can be broken down in to two catagories.
RTO or Recovery time Objective.
The RTO defines the amount of time within which a business process or application must be restored after disruption or disaster. in Replication terms, the RTO will define wether or not to keep a failback snapshot, if you keep a failback snapshot if you need to failover to a replica and promote this to a volume when failing back to your primary PS Series group you have a snapshot of the previously replicated data, and therefore only changes to that replica need to be copied back to the primary store.
RPO or Recovery Point Objective.
The RPO defines how frequently you replicate your data and is about the acceptable amount of data loss over a specific amount of time. Basically how much data loss is acceptable?
With this in mind Dell suggest using the following formula to determine the number of replicas to keep on your replication partner.
Frequency of replications x time period to retain = The number of replicas to keep
when working out how many daily replicas of our Hyper-V volume to keep I used;
1 replica a day x 5 days = 5 Replicas
All of this can of course be scheduled and performed automatically to suit your organisations requirements.
Failover and Failback
In the event of a disaster you can host a volume on your secondary PS Series group and keep an application online, but when your primary PS Series group is back up and running you will want to failback your volume. This is where a failback snapshot comes in, if you decide to keep a failback snapshot then this process will be alot quicker reducing the amount of data that needs to be copied back to the primary PS Series group. You can still failback to the primary group without a failback snapshot but all of the data will need to be replicated back increasing the amount of time it will take to replicate, again this will be defined by the RTO of the application using the volume. A failback snapshot uses more space in the local replication reserve so keeping a failback snapshot increases the amount of space required locally.
With this in mind you can begin to plan your replication strategy, decide which volumes need to be replicated based on RPO and RTO or is using existing backup strategies enough? Plan how much space is required both on your secondary PS Series group and on your primary group, the amount of replicas you keep and how frequently you repliacte and wether or not your organisations RTO dictates that you need to keep a failback snapshot.
In the next part I will take a quick look at our SAN backend network and how we extended this to accomodate our secondary PS Series group.