The advent of the cloud services can be undoubtedly referred to as one of the most serious historical changes in data storage and access paradigm. Ease of deployment, scalability and economic efficiency appeared to become more than adequate competitive advantages to cause mass migration of businesses from the standard CAPEX model to the cloud. In the worldwide rush of employing the newfangled business practices and joining the others in reducing capital expenditures and achieving higher levels of agility and elasticity, the crucial question of observing privacy and controlling access to in-house data has somewhat fallen by the wayside. Ironically, recent spy scandals ignited through the efforts of Edward Snowden and the Wikileaks project make it clear that that very question had never been as acute as it is today. Once your data crosses the border of your company network, it becomes a sweet piece of cake for all sorts of folks hanging around, and (at least) knowing who those folks are and what they can do with your data can literally save your business one day.
Who is spying on me?
Revelations-2013 in information security area lead us to a clear understanding that the world of information is much more severe than it used to appear before. The NSA is not going to sit on its own hands; the Agency is developing wide-scale global surveillance projects, and is already capable of silently intercepting and intruding into personal and private business Internet communications. At the same time, the CIA, the FBI and intelligence bureaus of other countries are working on establishing backstage contacts with first-string online service providers, such as Microsoft, Google and Amazon. Despite the providers' efforts to refute the collaboration by stating that no mass disclosures of private data had taken place and a bloom of PR campaigns aiming to repair the shattered trust, the disclosures-related topics have received a wide public outcry and media coverage, and the precedent was established.
Incut. Tens of thousands of user account records are sent by Microsoft, Google, Facebook and Yahoo to US government bodies every six months on the basis of secret court orders, a statement distributed by the company officials at the beginning of February says. The numbers shown in the evidence published by Edward Snowden are an order of magnitude greater. One of the revealed internal reports indicates that the NSA gained access to address books of more than 680'000 active user accounts of Yahoo, Hotmail, Facebook and Gmail services during just one day in 2012.
Apart from global surveillance threats, we should keep in mind that the datacentres of major cloud providers fall into a set of the most attractive targets for hacker gangs, since by conducting a single successful break-in they gain access to a really huge amount of private and potentially sensitive information. We are therefore expecting a sharp rise in the number of professional attacks targeting cloud services, both public and private, in the near future, and we believe that a small percentage of them will prove to be successful. The technological side of the infrastructure is not the only one that is going to be the target and the instrument; a bribed or blackmailed service provider's employee might be as helpful in gaining access to the customers' data as a successful software attack, but at a much lower cost for the attackers.
What is much worse is that it is really tempting for cloud service providers to hush up the facts of the customer's data leakage. Making this information public affects the provider's reputation. Privacy regulators may apply tangible financial sanctions for the privacy legislation violations. Finally, usually there is little or no direct evidence of the leakage, and there is still a chance that it will never be disclosed in the future. That's why there is no real incentive for the service providers to shout about the leakage until this fact becomes evident from other, indirect sources.
One should not underestimate the risks of data theft from the cloud service providers' facilities due to technical or human imperfection. Life shows that even the largest and respectful companies make childish mistakes in implementing information security measures. Sometimes the consequences of such mistakes are really wide-scale. The need to provide data reliability and accessibility plays another trick on the service providers. Customers' data are usually stored in highly redundant form, across a variety of sites, with synchronization links between them. Often just one of those sites or links has to contain a flaw for an attack to be successful. Sometimes there is no need to attack the working infrastructure at all. Hard drives are known to eventually fail. Can you be entirely sure that none of the failed hard drives, written down and sent to a landfill by the service provider, ends up in wrong hands at some stage?
Incut. On 19 June 2011, following a "code update that introduced a bug affecting authentication mechanism", Dropbox users were able to authenticate to the server by using any password of their choice. A lot of users of the service reported their accounts were accessed and the private information collected by unauthorized parties.
The Orient Express
Global cloud service providers offer their users a choice of storing their data in storage facilities in a number of different countries. The international cloud legislation is still quite immature, yet it is important to remember that the data centres and all the data they contain are normally subject to the legislation of the country on the territory of which they reside. The service provider's country of incorporation may impose its own obligations with regard to the information it stores. Note that privacy and data protection legislation of third countries may differ significantly from the laws of your country, and it might be much easier for outsiders to access your data there. Natural disasters, revolutions, and local wars may cause uncontrollable chaos in the country, so the chances of their occurrence should always be considered when deciding on a country to keep your data in.
Incut. Former Microsoft UK managing director, Gordon Frazer, said that he could not guarantee data stored on Microsoft servers, wherever located, would not end up in the hands of the US government, because Microsoft, a company based in the United States, is subject to US laws, including the USA PATRIOT Act.
What I was trying to point out above is that storing your sensitive data in the cloud poses a huge number of risks. "So what", you ask, "go show me a business with no risks at all". You are perfectly correct. As a matter of fact, the presence of risks, their quantity and the impact they might cause is of little importance for a business. The only thing that matters is the business's ability to adequately mitigate those risks.
Imagine that you called in to a bank in the morning and deposited $1000 in cash to your current account. Later in the early afternoon, two huge lads in balaclavas broke into the bank and forced the cashier to hand them all the cash they had had -- including your $1000. Does it mean that the bank will now withdraw that $1000 from your account? No, and that's why they won't: the bank is responsible for the money you give them. They take the responsibility of storing your money safely, and take on all the risks associated with this commitment. The exact methods they deal with the risks, -- by building underground bunkers, hiring gorilla-looking security staff, or by transferring robbery-related risks to insurance companies, - are not something you should care about. You only know that as long as you handed your money to a pretty girl at the till, the bank will give your money back to you at any time you request it. And they will.
Cloud services share a lot of with banks, with the only difference that it's not your money that you hand to them but your megabytes. Just as banks do, they take on the responsibility of providing the maximal levels of data availability and protection. Now, what happens if your data is stolen or destroyed? The scope of the cloud providers' responsibility is written down in the Service Usage Agreement, and I'll bet you won't be happy when you finally open and read it:
You are responsible for properly configuring and using the Service Offerings and taking your own steps to maintain appropriate security, protection and backup of Your Content, which may include the use of encryption technology to protect Your Content from unauthorized access and routine archiving Your Content.
FURTHER, NEITHER WE NOR ANY OF OUR AFFILIATES OR LICENSORS WILL BE RESPONSIBLE FOR ANY COMPENSATION, REIMBURSEMENT, OR DAMAGES ARISING IN CONNECTION WITH: ... (D) ANY UNAUTHORIZED ACCESS TO, ALTERATION OF, OR THE DELETION, DESTRUCTION, DAMAGE, LOSS OR FAILURE TO STORE ANY OF YOUR CONTENT OR OTHER DATA. IN ANY CASE, OUR AND OUR AFFILIATES' AND LICENSORS' AGGREGATE LIABILITY UNDER THIS AGREEMENT WILL BE LIMITED TO THE AMOUNT YOU ACTUALLY PAY US UNDER THIS AGREEMENT FOR THE SERVICE THAT GAVE RISE TO THE CLAIM DURING THE 12 MONTHS PRECEDING THE CLAIM.
Basically, cloud service providers disclaim any responsibility for disclosing your data to any other party, as well as for losing or damaging it. They also suggest that you 'take your own steps to maintain appropriate security and protection of your data'. As no risks to the data are covered by the service providers at all, mitigating them becomes our responsibility. A good solution here, if not the best, would be to transfer the risks to specialized insurance companies. Unfortunately, due to novelty of the cloud concept and apparent difficulties in assessment of financial equivalents of such events as data loss or revelation, insurance companies struggle to offer any product suitable for cloud realities these days. I have high expectations about the role of insurance companies in the mission of introducing proper security quality standards for cloud services. It is difficult to imagine a better incentive for a service provider to improve the quality of their product than the cost of insurance premiums paid by their customers.
And while insurance companies are working hard on their new cloud-specific products (we hope), the only resort for us, the users of cloud services, is to take care of the protection of our data ourselves.
Death the Hack
I believe it doesn't make too much sense to write about the consequences of revelation of some abstract data belonging to an abstract business. Of course, the effect of the leakage highly depends on the amount and importance of the data leaked. It's not a big issue if disclosed records contain a school progress database, yet we can only guess about the consequences of the exposure of drawings of the latest Lockheed-Martin developments or a customer database of a large retailer like Amazon. Note that the data not only can be stolen, but can also be actively modified.
Your company might not be dealing with high-tech things like F-35, yet you know your business far better than me and I am sure you can outline the consequences of revelation of your company's data yourselves. Just imagine what happens if your data gets to the hands of your principal competitor? Your users? Inland Revenue Service? It is always useful to imagine the worst possible outcome, as an attacker will most probably try to exploit the worst-for-you scenario -- by blackmailing you or selling the data to your competitor.
What's in the box?
The majority of cloud service providers have their own security measures in place. Let's summarize what is on offer and what kind of protection those measures actually provide. The majority of providers offer the following security instruments: secure TLS-driven data transfer, user authorization and, sometimes, server-side encryption (SSE). The scheme below shows a typical route of data between the customer's environment and the cloud.
Picture 1. Data flow between the customer"s premises and the cloud storage
As we all know, the TLS protocol secures data transfers between two peers, or, in our case, between the customer's computer and the cloud front end. Provided that the customer's computer is configured correctly, the use of TLS gives us the ability of establishing the genuineness of the cloud service endpoint and protecting the data exchanged by the peers from passive or active interception by an external eavesdropper. Each side of the protocol works like a black box, which takes plain (unencrypted) data and produces encrypted output. When receiving data, the black box applies a reverse permutation to the encrypted data and passes the decrypted data up the stack.
With the help of user authorization the service identifies a particular account holder and confirms its identity. The authorization is normally based on establishing the fact of ownership of an authorization token. The most widely used types of authorization tokens are access keys, digital certificates and passwords. It is important to keep in mind that any user who is in possession of the authorization token can send authorized requests to the service, and the latter will treat them as legitimate account holders. Therefore it is really important to keep authorization tokens safe. Note that authorization tokens are an easy target for spyware.
Server-side encryption (SSE) is an additional layer of security adopted by certain cloud service providers. Before the customer's data is sent over to the provider's data centres for storage, it is encrypted with a strong symmetric cipher. If an attacker ever gets access to the data centre, they will be unable to decrypt the data without having the encryption key. The most critical point of this scheme is that the key and the data it encrypts reside in the same environment, the cloud provider's network. Even though they may be stored in different, completely isolated subnets, not having access to each other, whilst the data remains static in the data centres, the key and the data still meet each other at point B where the encryption or decryption is performed. Besides, the data in unencrypted form is fed to the input of the TLS server endpoint at point A, which normally is a front-end web server subject to the relevant attack risks. In case an adversary gets access to one of those points, they will be able to read the customers' data in the clear without any need for the encryption key.
It should be noted that if your authorization token leaks, any server side encryption won't be able to prevent the data theft. Having the token, the adversary can easily impersonate a legitimate user and get access to the data via a standard REST or SOAP interface provided by the service.
It isn't worth saying that server-side encryption won't protect the data from the eyes of the intelligence people, as they will obviously ask the service provider for the actual data and not for its internal representation in the provider's datacentres.
Ready. Steady. Go.
Hope I persuaded you that the data of any positive significance sent to the cloud must always be covered with an extra layer of protection in addition to security measures provided and adopted by the service providers. One of the simplest yet effective methods of building such an extra layer is to adopt client-side encryption (CSE), - in contrast to server-side encryption offered by the service providers. The main idea behind the CSE is that the customer encrypts their data before sending them over to the service provider, and decrypts the data after downloading them back. With CSE, the customer keeps the encryption key securely in its own environment, rather than entrusting this task to the service provider. This way, the customer's data go through the A and B points in customer-encrypted form and can't be recovered by a passive adversary who controls those points. Even if a lucky attacker, - or intelligence personnel, - gains full control over the entire provider's computational infrastructure, they will be unable to recover the data without obtaining the encryption key first.
Picture 2. A modified data flow with CSE in force
Even if an adversary manages to steal the victim customer's authorization token and impersonates them to the service, the data returned by the service will still be in encrypted form, and they will be unable to recover it without obtaining the CSE encryption key.
A variety of forms of encryption keys can be used with the CSE scheme, from generic symmetric encryption keys and asymmetric key pairs up to passwords and customer's biometric information. This flexibility is achieved by separating customers' encryption keys ('user keys') from object encryption keys ('session keys'). First, the data are encrypted with a randomly chosen session key SK. Next, the session key is, in turn, encrypted with a customer's user key, a public RSA key for example. The encrypted data together with the encrypted session key is then sent to the cloud service provider.
When reading the data back, the customer first decrypts the session key with their user key. Having recovered the session key, they are able to go on and decrypt the object itself.
The session keys-based scheme has a number of advantages, the most substantial of which are its ability to adopt user keys of virtually any nature and the uniqueness of per-object encryption keys. The latter makes it impossible for an attacker who manages to recover a single session key to decrypt the rest of encrypted objects, as they will be encrypted with different session keys.
Picture 3. The CSE from the inside
Apart from encryption, the customer may accompany protected objects with MDP (modification detection and prevention) records. An MDP record is basically a message digest computed by the writer over the unencrypted data and encrypted with it before sending it over to the cloud service. When reading the data back, the customer computes a message digest over the decrypted data and compares it to the message digest attached to the encrypted object. If an MDP-ed object was altered while residing in the cloud or in transit, the message digests won't match.
In certain scenarios it makes sense to use strong digital signatures instead of basic MDP records to provide for a higher level of modification detection in multi-user cloud environments. In particular, strong signatures might be useful for addressing proof of authorship and non-repudiation tasks.
Another attractive side of the CSE is that one can actually encrypt data with several different user keys at the same time. Objects encrypted in such way can be decrypted with any of the keys used for encrypting them. This feature can effectively be used to set up flexible access rights schemes or establish secure cloud-driven document flow within the organization.
"Wait a minute," an attentive reader would say, "does it mean that the protection of my client side encrypted data wholly depends on the encryption key?" Correct. Encryption key is a single point of access of the CSE scheme, just like it is in any other properly designed encryption scheme. That's why the task of the highest importance is adequate protection of encryption keys.
Per-object session keys are not an easy target for attack. In fact, the only way for an attacker to recover the session key is to guess it by trying different keys one by one. This task can be really tough; provided that the session key is of sufficient length and a cryptographically strong RNG is used for generating them, guessing the session key is an infeasible task. That's why the attacker is most likely to concentrate on gaining access to the user key, which might be much easier to achieve.
Operating systems and third party vendors provide a variety of mechanisms for securing user keys, from protected operating system areas to dedicated cryptographic hardware. Some only give the key away after authenticating the user who requests it; the others do not export keys at all, instead performing the decryption on board on behalf of the requesting user. In either case it is the responsibility of the customer environment administrator to define and adopt proper key usage and storage policies in order to minimize the risk of revealing the keys to unauthorized parties.
Now that you understand the basics of client-side encryption, we can consider the task of integrating the mechanism into existing cloud-driven infrastructures. As I described above, this is mainly a task of adopting appropriate key management policies, techniques and procedures.
The majority of cloud-driven environments have the following types of logical links between the local customer infrastructure and the cloud storage service:
- One-to one: requests to the cloud storage are made from one or a fixed small number of customer's internal accounts;
- Many-to-one: requests to the cloud storage are made from a large number of customer's internal accounts, the set of which is highly volatile.
Even if your particular cloud-driven infrastructure utilizes links of different types, it is very likely that they can be decomposed into a set of smaller one-to-one and many-to-one primitives. Even if they can't, the main idea here is to treat encryption keys just like any other corporate resource, and define the corresponding policies and access rights accordingly. For example, if there is just one user account that is expected to make requests to the cloud storage, only that user account should have access to the encryption keys.
The links of the one-to-one type are typical for scenarios where the cloud is used for storing large amounts of data in centralized way, mainly as a replacement for classic relational database. In particular, this might be a cloud database that is used within an outer, larger, computational cloud. Another typical example is a personal backup or synchronization tool.
Where the number of links is small, or where there is no need for an extra database abstraction layer, the encryption keys can be stored directly on the computers that access the storage, either in protected operating system area (such as Enterprise SSO) or on attached security devices. This particularly concerns the 'worker' computational roles residing in the cloud. Otherwise, access to the cloud storage can be organized through a small set of gateway computers that will act as a standard interface (e.g. ODBC) for the outer data storage and will store the needed encryption keys on board.
Picture 4. Classic and cloud-driven infrastructures
The links of the many-to-one type are typical for environments where the cloud storage is used for sharing or synchronizing data between different company users or departments. As the sets of users approved to access different areas of the cloud storage is constantly changing with time, encryption keys require a higher level of protection and manageability. This is the place where enterprise-level hardware security modules may be of great help. What we are going to do is use the HSM to store all encryption keys for us, and define individual access rights to them for each user who needs to access the cloud. Whenever a user needs to decrypt an object downloaded from the cloud, it asks the HSM to decrypt the session key for them. The HSM checks if the policies that are currently in force allow the user to access that particular encryption key, and decrypts the session key if they do. As the keys never leave the HSM, the user only gains access to that particular requested session key, and will be unable to decrypt other objects encrypted with the same encryption key after losing access to the key due to e.g. changes in their work role. Access to cloud objects therefore is easily managed in real time by tuning access rights to particular encryption keys.
Besides addressing session key decryption and access restriction tasks, a hardware HSM can be used to audit access to the cloud storage, as all read operations will be passing though the device.
Picture 5. Cloud-driven infrastructure with advanced key management rules in force
The CSE scheme may effectively be used for managing user access rights in environments with complex data access rules, by adopting multiple encryption keys procedures. Basically, we encrypt the objects that are to be accessed by several different categories of users with several keys, each of which corresponds to an individual category. Imagine that there are three types of users in our environment with different access to objects stored in the cloud: lieutenants, colonels and generals. We can encrypt objects that can only be accessed by generals with a key KG, objects that can be accessed by colonels with a key KC, and objects that can be accessed by lieutenants with a key KL. Obviously, there in our storage will be a lot of objects encrypted with all the three keys, a smaller set of objects encrypted with KG and KC keys, and probably only a few encrypted with a single KG key. We will then configure our HSM to allow access to KL for all lieutenants; access to KC for all colonels; and access to KG for all generals. To reflect the fact of promotion or demotion of an employee, we will only need to adjust the employee's access records on the HSM so that the keys they have access to matched their new rank.
This way, deployment of client-side encryption into an existing or planned infrastructure should start with deep analysis of links between the elements of the company infrastructure and the cloud storage, followed by development and adoption of proper key management procedures.
In Dollars and Cents
How much will it cost to deploy the CSO scheme in a live environment? The final bill will highly depend on the size of your infrastructure and the chosen key management strategy. Let's try to break down the costs.
Extra traffic and cloud storage capacity costs are floating somewhere around zero. A protected object is only up to a kilobyte larger than a matching unprotected object, so from a point of view of traffic and space economy it virtually does not matter whether you work with protected or unprotected objects.
Compensation of productivity decrease caused by extra computational burden on the customer's resources. In practice, the burden is imperceptible. Modern processors provide encryption speeds of up to 8-10 MB/sec, much faster than an average speed the cloud storage providers are capable of accepting or giving away the data with. This means that a customer-side computer that encrypts outgoing data on-the-fly will be idling for certain amount of time waiting for the preceding portion of encrypted data to be delivered to the cloud frontend.
Expenses related to key management can make a significant part of the migration budget, especially if many-to-one links are present and the HSMs are adopted to manage the keys. The costs in this case directly depend on the size and complexity of the infrastructure, the type of encryption keys involved, and the expected frequency and number of object read/write operations. The HSM(s) should be chosen basing on the key management requirements, and should be capable of handling the estimated operation load.
The cost of implementation of the CSE protocol may pose another unpleasant surprise. The current labour rate of qualified software developers, especially those with good knowledge of cryptography, is not among the cheapest on the software market, and a proper implementation of the CSE takes a significant amount of time. Happily, the market offers some out-of-the-box software products that can help adopt the CSE in faster, cheaper and friendlier way.
A popular SecureBlackbox middleware from EldoS offers cloud storage access components that come with built-in support for CSE. The product supports the majority of popular cloud services, including S3, Azure, SkyDrive, Dropbox and Google Drive. Besides the encryption components themselves, SecureBlackbox offers out-of-the-box support for a variety of encryption key types, from easy-to-use password-based keys to cryptographically strong symmetric and asymmetric keys residing on hardware security devices. The one-off asking price for the product is fairly competitive and equals to 3-4 hours of work of a qualified security expert. A good bargain, taking into account that integrating SecureBlackbox-driven CSE into an existing infrastructure is an easy-to-do task that only requires common programming skills.
The price for the product starts at an acceptable level of about $440 for the cheapest license. Integrating SecureBlackbox-driven CSE into an existing infrastructure is not a big deal and only requires a basic knowledge of cryptography and average programming skills.
* * *
That was everything I wanted to tell you about the threats to the information you are entrusting to cloud service providers and the ways of resisting them. Undoubtedly that as a person who knows this area far and wide, I had a risk of blundering away questions of great importance that appeared to me too evident, unworthy the attention, or just slipped away from me. To restore the order of things, just before handing this article to the editor, I gave it to my IT friends who work with the cloud daily as integrators and users, and asked them to forward me any questions they might still have after the reading. Below I am answering to the most typical of them. Before we continue, I'd like to give you a small notice. There are at least 217 different methods of painting a fence. I can only tell about our own experience in cloud security. There is nothing that could prevent anyone else from implementing encryption or key management procedures in their own, totally different way, and ending up with an encryption scheme with a totally different set of features and properties. If you are interested in our approach and would like to get a deeper insight, drop me a line at Ivanov@eldos.com, and I'll be happy to provide further details about the scheme.
Which cloud services do support client-side encryption?
Client-side encryption does not require any explicit support on the service side and integrates transparently into virtually all existing cloud services.
What happens if someone corrupts my encrypted data in the cloud? Will I be able to recover at least some fragments of it?
The untouched fragments, generally, can be restored. The corrupted fragments will, obviously, be lost. If you use MDP, you can detect the fact of data corruption.
If I make a raw copy of a protected object in the cloud, will I be able to decrypt it with the key that was used to encrypt the original object (variant: if I restore a protected object from a service-side backup, will I be able to decrypt it)?
I lost my encryption key (forgot the password). Can I still recover the data encrypted with it?
No, sorry. The scheme is honest and relies solely on the encryption key to provide adequate security. It would be silly to leave backdoors in protection just to cater for recovery after key loss. Please be careful and store your keys responsibly.
Is it easy to change a compromised encryption key?
The answer heavily depends on the specifics of the cloud storage. In certain cases the only way of changing the key is downloading all the objects to a local computer, re-encrypting them and uploading them back. This can be automated with an in-cloud application. In some cases the key can be changed by updating the protected objects' headers.
Can client-side encryption cause any side effects when used with server-side encryption?
No -- your files will just be encrypted twice. It's just like sending a password-protected ZIP file over a secure TLS channel.
Are there any problems with accessing protected objects from mobile devices?
You will need to install the cloud access software that supports client-side encryption to those mobile devices you wish to access protected objects from. The product I referenced above (SecureBlackbox) supports all popular mobile platforms.
Are there any limitations on the size of objects to be protected? Do encrypted objects support random access?
There are no size limitations. Random access is supported, provided that it is supported by the underlying cloud API.
I have a large cloud of unprotected objects. Will it be easy for me to switch to client-side encrypted cloud?
You will need to download all your objects to your organization's computer, encrypt them and upload them back to the cloud. This task can be automated with an in-cloud application.
Is client side encryption a panacea? Can I be 100% sure that my data are safe after employing it?
You need to understand the scope of tasks that are solved by client-side encryption and the tasks which you, as a user, are responsible for. Client-side encryption makes your data unreadable to everyone who gains unauthorized access to it. However, it is your responsibility to store encryption keys securely in your local environment and prevent them from being accessed by people not having the right to. Looking after one, a dozen, or even a hundred keys in your own network is much easier than looking after gigabytes of data residing no one knows where, isn't it?
* * *
Thank you for the time you spent on reading this. I hope you enjoyed the article and found something useful to yourselves, if not of practical then at least of general sense. I will be happy to any questions that you feel are unanswered. Please feel free to contact me at Ivanov@eldos.com for more information about the matter.
About the author
Ken Ivanov, PhD, is a director and a chief security expert at EldoS Corporation (UK), a leading provider of security and file system solutions for software vendors. For more than 12 years Ken is proud to help businesses, individuals and governmental agencies all over the world to adopt information security measures. Being really passionate about security and privacy, as well as liberal values, Ken is concerned about the increasing influence of international intelligence services. That's why the question of privacy for data residing in the cloud is of particular importance to him.