Petabyte-scale storage server

June 22nd, 2013, 11:21 PM

I noticed several members here have plenty of IT experience. Is there any particular hardware that you all might recommend for petabyte-scale storage? Or any existing build lists that we can get ideas out of? We're thinking of building everything from scratch instead of using a turn-key expansion enclosure.

June 22nd, 2013, 11:58 PM

artemiso

I noticed several members here have plenty of IT experience. Is there any particular hardware that you all might recommend for petabyte-scale storage? Or any existing build lists that we can get ideas out of? We're thinking of building everything from scratch instead of using a turn-key expansion enclosure.

My last job before trading was VP of a high-end storage company.

It is really impossible to answer your question without knowing a whole lot more about the purpose.

What level of redundancy is needed? Active/active, active/passive, none?
Is it primarily random or sequential IO?
What level of throughput/iops in both scenarios above is required?
How are you going to access the data (FC SAN, iSCSI, etc)?
How many hosts are going to connect to it at once?
What type of file system requirements do you have? One huge file system? 100 small file systems?
How do you want to handle file sharing and LUN provisioning?

In the end, you can just build your own bare 4U chassis loaded with as many 3TB spindles as can fit, and get dozens of SAS cards and attach them to hosts running a Linux server. That is the poor man solution that will also require the most experience to handle.

If you are looking at Tier 1 storage and a seven figure budget, then the above would be laughed out of the park.

Mike

June 23rd, 2013, 12:53 AM

BTW, I haven't even talked about disaster recovery, snapshots, etc. Depending on your requirements, you may be able to do it at the software level, or may need a hardware solution to do it.

Mike

June 23rd, 2013, 01:14 AM

Thanks @Big Mike, excellent hardware advice as always. I'll answer to my best ability, but I have to confess this is not an area of my expertise.

In the end, you can just build your own bare 4U chassis loaded with as many 3TB spindles as can fit, and get dozens of SAS cards and attach them to hosts running a Linux server. That is the poor man solution that will also require the most experience to handle.

If you are looking at Tier 1 storage and a seven figure budget, then the above would be laughed out of the park.

Right. Currently our data storage is completely outsourced. There's a submarine cable that goes directly across the river to a data center in Boston. As one might guess, this is beginning to be expensive and we're looking to mitigate costs. The current arrangement will still be kept as it works well for some kind of operational redundancy (disaster recovery as you put it), but the plan is to set up a storage server inhouse. We've thought of hiring a dedicated database engineer to solve the problem - I'm surprised how far we've gone without one.

Our current, first-impression plan was indeed as you've described: poor man's solution, bare 4U chassis with plenty of 2 TB spindles, several stacks of these, probably nets the highest capacity/cost ratio and is very achievable with a 6-digit budget. What are the drawbacks?

A reason I'm asking for general ideas is not that I have to be doing most of the assembly, but I'm ultimately responsible for allocating the budget and it would be good to have an informed opinion before I green-light a build. Is there any reading material that you'd recommend in this area?

What level of redundancy is needed? Active/active, active/passive, none? How do you want to handle file sharing and LUN provisioning?
Active/active. Never thought too far. Opinions?

Is it primarily random or sequential IO?
Random.

What level of throughput/iops in both scenarios above is required? How are you going to access the data (FC SAN, iSCSI, etc)?
Haven't decided on the former, revamping a lot of our software layer lately. 16 Gb FC SAN. Possibly IB-based SAN.

How many hosts are going to connect to it at once?
<=15.

What type of file system requirements do you have? One huge file system? 100 small file systems?
One huge.

June 23rd, 2013, 02:01 AM

artemiso

Thanks @Big Mike, excellent hardware advice as always. I'll answer to my best ability, but I have to confess this is not an area of my expertise.

In the end, you can just build your own bare 4U chassis loaded with as many 3TB spindles as can fit, and get dozens of SAS cards and attach them to hosts running a Linux server. That is the poor man solution that will also require the most experience to handle.

If you are looking at Tier 1 storage and a seven figure budget, then the above would be laughed out of the park.

Right. Currently our data storage is completely outsourced. There's a submarine cable that goes directly across the river to a data center in Boston. As one might guess, this is beginning to be expensive and we're looking to mitigate costs. The current arrangement will still be kept as it works well for some kind of operational redundancy (disaster recovery as you put it), but the plan is to set up a storage server inhouse. We've thought of hiring a dedicated database engineer to solve the problem - I'm surprised how far we've gone without one.

Our current, first-impression plan was indeed as you've described: poor man's solution, bare 4U chassis with plenty of 2 TB spindles, several stacks of these, probably nets the highest capacity/cost ratio and is very achievable with a 6-digit budget. What are the drawbacks?

A reason I'm asking for general ideas is not that I have to be doing most of the assembly, but I'm ultimately responsible for allocating the budget and it would be good to have an informed opinion before I green-light a build. Is there any reading material that you'd recommend in this area?

What level of redundancy is needed? Active/active, active/passive, none? How do you want to handle file sharing and LUN provisioning?
Active/active. Never thought too far. Opinions?

Is it primarily random or sequential IO?
Random.

What level of throughput/iops in both scenarios above is required? How are you going to access the data (FC SAN, iSCSI, etc)?
Haven't decided on the former, revamping a lot of our software layer lately. 16 Gb FC SAN. Possibly IB-based SAN.

How many hosts are going to connect to it at once?
<=15.

What type of file system requirements do you have? One huge file system? 100 small file systems?
One huge.

Well keep in mind your on a trading site, and I left the industry 6 years ago.

That said, the first trouble area is wanting a single enormous file system. Technically you can do it but there are risks.

Random IO is also a bit of an issue, but you didn't give an IOPS number so its unclear how demanding the application is. If it's not that demanding, just enormous, then you can save a lot of money.

Basically, demanding random IO (high IOPS) means you need lots of SSD's. Otherwise you could get away with 3TB spindles for the most part, and then just a few (say 10%) SSD's if you use the right kind of file system, like ZFS, where it can cache to SSD for near line and push the other stuff to the slower spindles.

If you want full redundancy, it really means you should invest in at least a hardware raid controller head unit. For this level of capacity, it means you would need several of those. Each head unit can manage 100-200TB of capacity, depending on the level of performance you need. So each head unit would have say (100) 3TB spindles, and (10) 512GB SSD spindles hanging off it.

At this point you have the option of whether you want to present the spindles raw to the host, which might be preferred in some situations with ZFS, or if you want it to manage the RAID directly. Maybe (5) RAID 10 arrays, each made up of 20 spindles, for example on the HDD side, and (1) RAID 10 array on the SSD side. RAID 10 is best for high random IO but at high cost (more spindles) vs RAID 5 or RAID 6. Each box would give you roughly 50*2.6TB[usable] so 130TB usable capacity, plus the SSD cache for ZFS of roughly 2.2TB usable.

The hardware raid head units have two controllers per head unit, and each drive bay should be dual-port so there is two paths to the drive. This provides redundancy. Each of these head units would connect to your FC fabric.

You might check names like Infortrend and Promise for some example 1U active/active hardware raid head units, and they'll just use SFP to connect from the head units to the SAS JBOD chassis.

The next level up from there would be to buy a tier 1 named solution, at many times the price, but also guaranteed service levels and such. Probably 5x the cost of just buying the head units and JBOD chassis from Infortrend or etc, and buying your own drives, and managing all of it yourself.

As for where to turn for help, sorry but I am out of the industry and the company I was the VP of went out of business shortly after I quit. I would start with Infortrend or Promise website, see if any of their solutions make sense, and find a reseller. You can then issue some RFQ's.

You should also check hardforum.com and ask for advice there.

Mike

July 24th, 2013, 01:35 AM

@Big Mike

Great tips. We're starting work on this project. More complicated than initially expected and we might have to expand our office space because of the servers humming away. Will let you know how it goes.

August 5th, 2013, 03:25 AM

Confirmed! Breaking the contract on our current office lease to move. New office needs to be rewired for the considerable power delivery, three-phase etc. I wonder how other people are coping with their servers. Hope you're enjoying your holiday.

Petabyte-scale storage server

Discussion in Tech Support

Petabyte-scale storage server