Source IP Based Conditional Access for SaaS – Wait You Put Donuts On Your Ferrari?

No one adopts SaaS applications with the intent of them being a slower and less productive user experience versus their previous on-premises instantiations. Yet, in the age of cloud based SaaS applications and mobility (aka Work From Anywhere) performance impacting Source IP Based Conditional Access controls is somehow still a thing thats popping up in otherwise modern enterprise network and security architecture

A common example of this that I have often seen is an enterprise requiring that end user access to Office 365 be limited to only users who are coming from a known corporate source IP address. This has several unfortunate side effects, the largest being the unnecessary performance impacting latency introduced in forcing both on and off-network end user traffic back through the corporate data center further straining any centralized outbound security hardware appliances in the process. Another side effect is a poor end user experience in making them jump through hoops such as having to turn on a remote access VPN just to get access to a public cloud hosted SaaS application.

When we have open discussions with enterprises and start to break down why this is still a thing we often land on a common set of requirements that center around reduction of risk:

1.) User identity alone isn’t good enough, we need to know that the user is coming from one of our managed devices with our AV/EDR installed

2.) We want to ensure that the user is passing through our outbound security controls and that we have visibility into this key SaaS application traffic so we force them to come back on-premises

3.) We want to prevent intentional or accidental data loss 

While these are absolutely important risk factors that we need to account for in our SaaS adoption strategy, we really need to rethink the mechanisms by which we employ our controls so as not to defeat the original goals of migrating to SaaS applications that no longer live on our network in the first place.

Let’s look at some of the more modern alternative methods of implementing SaaS access risk reduction while not adversely impacting performance and end user experience.

Access from managed vs unmanaged devices

Ideally this can be controlled by leveraging a combination of your SAML identity provider’s conditional access criteria and an inline cloud security solution to assess the posture of the device that the end user is requesting SaaS access from. Alternatively, we may want to grant ‘reduced’ access in the event that the end user is not coming from a managed device such that they can still reach the SaaS application, yet don’t get risky full unfettered access.

Inline Security and Visibility

For WFA scenarios this doesn’t have to mean bringing the user back onto the corporate network in order to securely access SaaS. The reality here is that we really should be assessing a combination of the current device posture and real time end user risk profile based on recently observed potentially risky behavior. Another potential attribute is where the user is coming from, not based solely on their source IP address, but through the lens of detecting ‘impossible travel’ where the user access request is coming from a completely different geo than where the user’s most recent internet/SaaS bound traffic is coming from.

Preventing Data Loss

There are multiple different potential avenues when it comes to preventing data loss from a SaaS application. An inline DLP solution (preferably cloud based) combined with an API based cloud security access broker and endpoint DLP would all help to curtail the risk of data loss without forcing SaaS access back through a centralized set of on-network security controls.

These of course work fine for our managed endpoints, but what about unmanaged? A combination of Identity Proxy and cloud browser isolation can be a powerful tandem. With an Identity Proxy in place an end user coming from an unmanaged device can be redirected by your SAML IDP to an inline cloud security solution which can now redirect this unmanaged user SaaS access request into an isolated cloud browser. This isolated browser allows the user to still get access to the SaaS application, but their potentially ‘dirty’ unmanaged device doesn’t interact directly with it. From a data protection perspective read-only access controls can be enforced to prevent file uploads/downloads and inputting of any data into the SaaS application.

Photo by José Pinto on Unsplash 

Stepping Back On the Gas for SaaS

In summary there are far more thorough and effective methods of injecting risk reducing controls for SaaS access into your enterprise network and security architecture.

So let’s put the Michelin Pilot Sport tires back on our new Ferrari and migrate away from those legacy fixed location based controls. This will empower our users to be more productive wherever and whenever with out sacrificing the security and visibility of our key SaaS applications.

Disclaimer: The views expressed here are my own and do not necessarily reflect the views of my employer Zscaler, Inc.

How hard is the Zero Trust journey?

Just how hard is it get started with Zero Trust? If its truly a journey then when should an enterprise expect to start seeing the benefits?

What prompted me to write this post was a recent review of NIST Special Publication 800-207 “Zero Trust Architecture” authored in August of 2020. It first nicely lays out the fundamental principles of Zero Trust which I will quickly summarize here.

  • Zero Trust is a model which aims to reduce the exposure of resources to attackers and minimize or prevent lateral movement within an enterprise should a host asset be compromised.
  • Trust is never granted implicitly and should be continuously verified via authentication and authorization of identity and security posture upon each access request.
  • There is no complete end to end Zero Trust product offering. It is instead an ecosystem of component systems that when properly integrated allow for creation of a Zero Trust Architecture.
  • Implementing a Zero Trust Architecture is a journey which is all about continuing to reduce risk.

110%, could not agree more with their explanation of the tenets of a Zero Trust model

There is also a really good explanation of all the vital ecosystem components required to interact with each other in order to facilitate translation of Zero Trust principles into an implementation of a Zero Trust Architecture.

However, Section 7, “Migrating to a Zero Trust Architecture”, was a little discouraging for the reader. Reading this section makes it seem like an arduous and daunting task to move towards a Zero Trust Architecture in order to start reducing risk. After some poking around and seeing comments on various public forums I’m apparently not the only individual who had this as a take away. Is it really this hard to get started?

There is an assumption made here in Section 7 that in order to make progress on the zero trust journey an enterprise must first understand where all of the existing enterprise resources are and who needs to have access to these and that if not done prior, any attempt at initial implementation will prevent access to key resources…in other words you will break things and prevent users from getting their work done.

Fortunately since the time of NIST 800-207 was published in August 2020 there have been signficant gains in the maturity of the Zero Trust ecosystem ranging from enhanced functionality of Identity Providers, Integrations with EDR vendors for device context/posturing and even advances in automation of access policy. Thanks in a large part to the COVID-19 pandemic a lot of operational insight has also been gained into how to transition an Enterprise towards Zero Trust.

Most importantly in getting started with Zero Trust, there are commercially available traditional VPN alternative offerings that are a piece of the Zero Trust Architecture ecosystem for which step 1 in their implementation is to actually facilitate this application to end user + user device access patterns discovery. This can be done without concern of inadvertently removing any previously granted ability for a user to access a key required application resource while providing additional risk reduction benefits that are worth mentioning. I will quickly summarize some of these below.

Potential benefits of the initial phase of a Zero Trust Architecture rollout

  • This one bears repeating and expanding on slightly – Immediate granular visibility into all of the applications users are requesting access to, at what time, from which device, from where and for how long which can be then fed into your SIEM. Discover exactly which private resource assets exist and where they actually physically reside. Yes, you will inevitably discover Shadow IT and realize that you have way more applications than you had originally thought 😉
  • Kick the remote users off the internal private network – Once all users are off the network there is no longer a network-centric implicit trust. Determining trust for whether an individual application access request is approved is now based on a continuously assessed combination of user identity and contextual attributes. Application access, not network access, also reduces the risk of lateral propagation of malware
  • Removal of a public facing inbound VPN listener which can be DDOS’d or compromised – This is a huge risk reduction given all of the reported CVE’s in 2020/2021 for RCE vulnerabilities

What’s Next?

So where does one go next after Phase 1?  Phase 1+ is about assessing the discovered user to application resource workflows and then selectively removing more and more risk by locking down access via policy to key applications to only required groups and individuals. Think of these as ‘Crown Jewels’ applications and internal infrastructure components where compromise and potential data exfiltration will be the most costly.

Implement device posture profiles which further provide device context and take advantage of any potential endpoint integrations that provide additional risk assessment scoring for the device that can be used in access policy. An enterprise should also immediately start to look to restrict 3rd party access to only the resource(s) that are required. This is really all about continuing to move towards more identity and contextual least privilege access around the things that are most vulnerable in order to continue to reduce risk.

The maybe not so obvious benefits of migrating towards a Zero Trust Architecture

  • Improved performance – For applications being served out of an IaaS cloud like Azure or AWS an authorized user on a postured device can now connect more directly to that private resource as opposed to be being backhauled to a centralized location and then connected out over private links to an IaaS Provider whose Data Center is most likely closer to the remote user than the centralized interconnect point. A user can connect directly to private apps in multiple different locations simulataneously
  • Improved user experience – “Always on Identity and Contextual based least privilege access”. There is no longer a concept of having to be on or off-VPN, its just in time connectivity to any authorized user on an appropriately postured device to any private application anywhere without any change to the way the user would go about accessing the application.
  • Zero Trust isn’t just for remote access – Since Zero Trust is focused on not implicitly trusting the user device’s network location the ability to extend zero trust policy for on-prem users who are already resident on the internal corporate network is a huge plus. To do this the vendor technology must support intelligent interception of client application resource access requests and forward those to an on-prem policy enforcement point as opposed to allowing traditional direct network level access to the requested target resource simply because network reachability exists
Excellent summary of how to get started with ZTA adoption – be sure to check out the full Zero Trust Adoption Best Practices video here

Hopefully the reader finds this helpful and if interested in a tailored phased plan for how to get started on your Zero Trust journey feel free to reach out to your local Zscaler Solutions Engineer or attend one our user group events where you can connect with other enterprise customers who have already embarked on their Zero Trust journey

For additional insights into operationalizing Zero Trust check out this timely podcast “Maturing zero trust via an operational mindset” featured on our CXO REvolutionaries site

Disclaimer: The views expressed here are my own and do not necessarily reflect the views of my employer Zscaler, Inc.

Never Waste A Good Crisis

While listening to a recent Podcast featuring one of my colleagues, a former Enterprise IT executive, she mentioned the phrase “never waste a good crisis”.  The last time that I had heard that phase was during the 2008 financial crisis.

“You never want a serious crisis to go to waste. And what I mean by that is an opportunity to do things that you think you could not do before.”

― Rahm Emanuel

The Crisis

Over the last 16 months I’ve listened to a similar story from IT leaders all over the globe across almost every industry vertical who sent employees home on a Friday only to have to figure out on Monday how to get them access to the Enterprise applications they needed to do their job. 

The seemingly ‘easy’ route was to lean on traditional remote access VPN. However, that VPN hardware was originally scoped to support ~35-40% of an Enterprise’s employees working remotely at any given time. Those VPN concentrators became immediately saturated and it was nearly impossible to quickly instantiate additional capacity in a traditional appliance based model. In order to attempt to alleviate some of the capacity strain and end user performance impact of being backhauled to the corporate WAN via VPN, Enterprises were forced to implement split-tunneling policies allowing for higher bandwidth and performance impacted applications like O365, Zoom, WebEx and Teams to be split off from the VPN and go direct to Internet creating risk via a security and visibility gap

The Opportunity

While some forward thinking IT leaders had envisioned and already shifted employee remote access away from a traditional VPN paradigm to an elastically scalable cloud delivered zero trust architecture, others had been considering for some time how they might embrace more of a zero trust paradigm and kick their users off the corporate network. Well, here is where the opportunity emerged amid crisis….COVID-19 sent almost everyone home and kicked the users off the corporate network. The conundrum facing IT leaders was now no longer how do I go about kicking the users off the corporate network, but how do I go about securely providing them just enough access to a limited set of things that they need in order to do their jobs.

Tackling Legacy Technical Debt

Some legacy enterprise IT applications that were designed around a fundamental principal of having direct network access to user endpoints.

In response to this challenge many Enterprises went about investigating how to either adapt the way these legacy network application communication flows worked from a push model (server-to-client) to a pull model (client-to-server) or abandon on-prem hosted entirely in favor of adopting cloud delivered instances of the same function that would address providing these functions to off-network users without having to bring a user back onto the internal private network. In my many discussions with global Enterprises I had the benefit of learning what their IT teams had done to adjust to this crisis and below I list some typical examples of migrations away from traditional on-prem hosted server to client communication models to client to server and remote work friendly cloud hosted models.

It is very important to note that this is not a recommendation or endorsement on my part of any of the specific products below, simply just a recap of what I have seen come up in conversations with prospects and existing customers.

Patch Management – Pulling down software updates over traditional VPNs leads to several problems. First, if there is a large number of off-net users attempting to pull patches that can put significant strain on your existing VPN concentrators and internal network bandwidth. Secondly, if the patch management system is only available to remote users over VPN then they need to know to turn on the VPN in order for the patch update to even happen which can lead to lag time where some systems remain unpatched and vulnerable. Instead of forcing users to come back onto the corporate network over VPN for patch management, some enterprises moved to completely cloud hosted (SaaS) implementations of patch management (cloud management gateways) that allowed users access to updates directly over the public internet without care or concern around internal private access or whether their VPN was turned on. Others who already had replaced their traditional VPN solution with a Zero Trust no network access offering simply flipped their patch management to leverage a pull model whereby the client device would ‘check-in’ at regular intervals looking for updates. In either event both approaches result in preventing any significant patch update lag that would otherwise lead to unpatched systems and increased risk.

Remote Desktop Support Management – When users need support personnel to access their machine in order to help with troubleshooting an issue it was not uncommon to leverage things like Remote Desktop from the support engineer’s PC to the end user’s endpoint. This type of connectivity model clearly only works when there is direct network connectivity from IT support to the afflicted end users machine. Several implementations exist which allow for “meet in the middle’ type of remote support access where a cloud delivered solution can securely enable IT support personnel to remote access an end user device regardless of what network it resides on.  While clearly not an exhaustive list, some popular examples I have heard mentioned are Microsoft Quick Assist, BeyondTrust Remote Support (formerly Bomgar), ConnectWise and TeamViewer.

Legacy VOIP – For VOIP Softphone usage and for Call Center VOIP implementations it was common to see offerings like Avaya and Cisco Call Manager that leveraged traditional direct network access in order to function properly. What was pretty common across all enterprises was that the end user base dependent on these systems represented a small percentage of the total overall end user count.  Some enterprises simply maintained a much smaller deployment of traditional remote access VPN technology to address this user base while adjusting their IT planning and budgets towards future deployments of UCaaS (Unified Communications As A Service) implementations of these types of systems with the eventual goal being to retire traditional remote access VPN entirely. Others expedited existing plans and completely migrated users toward UCaaS implementations like Teams, WebEx and Zoom. For Call Center applications some enterprises are looking to adopt CCaaS (Contact Center As a Service) solutions like Genesys Cloud, Amazon Connect Contact Center, Five9 or Nice’s InContact.

Vulnerability Scanning – Having to have a user’s device brought onto the corporate network in order to scan it for vulnerabilities runs counter to the goal of a zero trust model. If we do vulnerability scanning because we don’t trust that the end user’s device isn’t compromised then why would we want to risk bringing a potentially compromised endpoint onto the network where that compromise can potentially spread laterally? If we look at what happened with the SolarWinds supply chain compromise as an example, then its probably not the best of ideas to give a tool complete unfettered access to everything on the entire internal private network. Having a good modern EDR tool like CrowdStrike, CarbonBlack, SentinelOne or Windows Defender combined with an agent providing asset vulnerability data that can phone home to the cloud from an off-network remote user device runs more inline with the goals of a zero trust model.

In summary, those Enterprises that had seized the opportunity to accelerate shifting away from legacy network-centric applications like the ones covered above towards cloud delivered models didn’t need to bring their end users back onto the corporate network. They were able to migrate away from traditional remote access VPN model to a more identity and context based least privilege access model where users only get access to the applications the need to do their job, not access to the internal private network. By ‘keeping the users kicked off the network’ not only are these organizations inherently more secure, but they can now start to evaluate further security and network transformation starting with “if my users can do their jobs off of the internal corporate network, then do we even need to operate a traditional internal private WAN anymore?”.

Curious to hear thoughts and experiences from others on the challenges they faced in addressing the shift to remote work. Feel free to leave a comment !

Disclaimer: The views expressed here are my own and do not necessarily reflect the views of my employer Zscaler, Inc.

Making The Case For SSL Inspecting Corporate Traffic

Almost every stakeholder, from Enterprise Security Architect to CISO that I speak with these days wants to be able to inspect their organization’s encrypted traffic and data flowing between the internet and the corporate devices and end users that they are chartered to safeguard.

When asked what are their primary drivers for wanting to enable SSL/TLS inspection the primary top of mind concerns are as follows:

  • Lack of visibility – Upwards of 75-80% of our traffic headed to the internet and SaaS is SSL/TLS encrypted
  • We know that bad actors are leveraging SSL/TLS to mimic legitimate sites to carry out phishing attacks as well as hide malware downloads and Command and Control (C&C) activities
  • I need to know where our data resides – We know bad actors are using SSL/TLS encrypted channels to attempt to circumvent Data Loss Prevention (DLP) controls and exfiltrate sensitive data. Our own employees may intentionally or unintentionally post sensitive data externally

With a pretty clear understanding of the risks faced by not inspecting SSL/TLS encrypted traffic one would assume that every enterprise has already taken steps to enable this right? Well…not neccessarily. There are 2 main issues to overcome in order to implement this initiative, one is a technical hurdle, the other is a political hurdle.

The technical hurdle is essentially ensuring that your enterprise network and security architecture supports a traffic forwarding flow for both your on-prem and off-net roaming users which traverses an active inline SSL/TLS inspection device capable of scaling to the processing load imposed by 75-80% of your internet and SaaS bound traffic being encrypted. In an enterpise network and security architecture where all end user traffic, even remote users, flows through one or more egress security gateway stack choke points comprised of traditional hardware appliances the processing load imposed in doing SSL/TLS interception dramatically reduces the forwarding and processing capacity of those hardware appliances as evidenced in recent testing by NSS labs.

This is critical in that most enterprises would need to augment their existing security appliance processing and throughput capacity by at least 3x to enable comprehensive SSL/TLS inspection. This constitutes a signficant re-investment in legacy security appliance technology that doesn’t align with a more modern direct to cloud shift in their enterprise network and security architecture design

The second concern, and the primary topic of a recent whitepaper issued by Zscaler, is balancing the user privacy concerns of SSL/TLS inspection versus the threat risks of not inspecting a enterprise’s corporate device internet traffic.

Some of the key things to consider in the privacy vs risk assessment and subsequent move to proceed with an SSL/TLS inspection policy are as follows:

  • An organization can not effectively protect the end user and the corporate device from advanced threats without SSL/TLS interception in place
  • An organization will also struggle to prevent sensitive data exfiltration without SSL/TLS interception
  • Organizations should take the time to educate their end users that instituting an SSL/TLS inspection policy is a security safeguard and not a ‘big brother’ control
  • Organizations should inform employees as to the extent of what will and will not be inspected. This should be defined as part of an acceptable usage policy for internet use on corporate issued assets and this policy should be incorporated into their terms of employment agreements
  • Organizations should review this policy with in house legal counsel, external experts and any associated worker’s councils or unions as well as paying careful consideration to regional data safeguard compliance frameworks like GDPR
  • Organizations should take the neccessary steps to ensure appropriate safeguards are put in place for the processing and storing of the logs associated with decrypted transactions such as obfuscating usernames

For a more comprehensive review of how to navigate the security vs privacy concerns and implement a successful SSL/TLS inspection campaign take a look at the recent whitepaper that Zscaler has authored – https://www.zscaler.com/resources/white-papers/encryption-privacy-data-protection.pdf

Disclaimer: The views expressed here are my own and do not necessarily reflect the views of my employer Zscaler, Inc.

Adapting to evolving Ransomware extortion tactics

Effective ransomware controls will now have to go past well maintained backup programs and SSL/TLS inspection backed zero-day threat detection to include comprehensive Data Loss Prevention programs.

In the beginning the cybercriminals launching ransomware campaigns simply demanded infected organizations pay a ransom in cryptocurrency in order to get their encrypted files back

As part of a defense strategy against the impacts of a potential ransomware outbreak, organizations began backing up critical assets in order to be able to more quickly mitigate the impact and resume business critical operations in the event that they were compromised by such an attack. In addition to the obvious benefit of protecting business continuity this also effectively helps mitigate the need to pay the campaign’s ransom.

This tightening of business continuity/disaster recovery plans to lessen the impact of ransomware infections has in turn prompted  ransomware campaign originators to counter by adapting their extortion plans to include new impact elements.

The first shift was noted in mid-December of 2019 via a ‘naming and shaming’ campaign whereby the authors of the Maze ransomware strain began posting a list of the companies who fell victim to their ransomware, yet refused to pay the actual ransom.

Publicly shaming victims was apparently just the beginning. Within less than a month, the Maze Ransomware campaign began to demand that the organization’s actual encrypted data (which they had successfully exfiltrated) would be exposed publicly.  The most recent example being US cable and wire manufacturer Southwire, which was threatened with exfiltration of their data if they did not pay a $6 million ransom. 

In some cases, this exfiltration of potentially sensitive corporate data may be more costly and have longer lasting effects than the short term interruption to critical business functions posed by the temporary lack of access to the ransomware encrypted data itself

To combat and help mitigate this latest round of extortion tactics from ransomware campaigns an enterprise should consider looking at:

  • This should go without saying, but as with any cyber security initiative end user education around not clicking on suspicious links and exhibiting more caution with email attachments is critical
  • Well maintained backup programs of business critical systems and data
  • SSL/TLS decryption to aid zero day threat detection controls like active inline Sandbox solutions applied to both on-prem and roaming user device traffic
  • Implementing caution or coaching pages within your web proxy service that informs an end user that they are about to download a certain file type from a site that falls into a category deemed risky by their organization
  • Consider replacing legacy VPN technology with a more secure zero trust approach (https://www.zscaler.com/blogs/research/remote-access-vpns-have-ransomware-their-hands?utm_source=linkedin&utm_medium=social&utm_campaign=linkedin-remote-access-vpns-have-ransomware-their-hands-blog-2019)
  • A comprehensive Data Loss Prevention program that covers both on-net and off-net users while inspecting SSL/TLS encrypted outbound data 
  • Since no set of security controls is ever infallible, an appropriate amount of cyber security insurance coverage may prove to be a helpful additional compensating control

Disclaimer: The views expressed here are my own and do not necessarily reflect the views of my employer Zscaler, Inc.

Visualizing A Zero Trust Architecture

It’s more than just re-branding VPNs and NGFWs

Photo by Petter Lagson on Unsplash

Enterprise Network and Security Architects are faced with sifting through the myriad of Cyber Security Vendors all espousing their ‘Zero Trust’ offerings. Before we get into how to break down each vendor’s offering lets first start by identifying some of the key principles and benefits of a Zero Trust architecture.

  • Establish user identity and authorization prior to access
  • Access to private applications, not access to the network – (no need for VPN)
  • Since no network access is granted, the focus can shift to application level segmentation as opposed to network level segmentation
  • No inbound listeners means applications are invisible to unauthorized users, you can’t attempt to hack or brute force what you can not even see

So how should one go about visualizing what a security vendor offering actually looks like in order to see if a vendor solution really walks the zero trust walk? I’m going to introduce two scenarios which should help easily draw the distinctions between a re-branded VPN solution and a real zero trust offering

Traditional VPN

Lets picture a scenario where your Security Vendor Sales Rep comes to visit you. He or she checks in at the front reception desk, is given a badge and then escorted to a conference room. On the way to the conference room they can easily survey how many floors are in the building, where there are individual offices, media/printing rooms, open floor plan seating areas, telecom equipment closets and maybe even where the corporate Data Center server room is. If your vendor rep leaves the conference room they could hypothetically walk up and down the hall where they can jiggle the door handles of any office door they see, scan the visible content on whiteboards or on top of desks in the open floor plan seating areas for sensitive information and strike up casual conversations with anyone in any area they can manage to roam through. This is akin to level of trust provided when giving network level access to a user via a traditional VPN. Instead of the fictitious Sales Rep, imagine that this was a malware infected endpoint brought onto the network by one of your remote employees, a contractor or other 3rd party.

Zero Trust

In this model the same Security Vendor Sales Rep visits and checks in at the front desk to get their badge. This time the Rep only sees one door, the door to the conference room. There are no floors, no visbile office doors, media/printing rooms, open seating areas or telecom equipment closet doors. Only the door to the conference room appears as this is the only thing that your Rep is authorized to see or access. There is no hallway to walk down, no office doors to attempt to pry open and no visibility of the internal environment whatsoever. This is more like what access via a zero trust solution should look like.

To take this a bit further, a security vendor might still say that they can support the objectives of the Zero Trust scenario described above. What are some key red flags to look out for to ensure that this isn’t just a rebranded VPN or NGFW solution?

If a prospective security vendor says they meet the objectives of a Zero Trust implementation, but uses language like ‘perimeter’, ‘micro-perimeter’, ‘use your existing NGFW as a network segmentation gateway’, ‘verify and never trust anything ON your network’, or ‘there is no need to rip and replace your existing network appliances’ be very wary that this is likely just a perpetuation of a previous remote access model and not truly architecting for Zero Trust.

Disclaimer: The views expressed here are my own and do not necessarily reflect the views of my employer Zscaler, Inc.

TLS 1.3 – The end of passive-mode packet capture?

After 4 years and 28 draft versions, TLS 1.3 is here and it will force a change to the way we do forensic investigations in Cyber Security. In order to fully understand the impact of TLS 1.3 on Security Incident Response we should first look at the role that packet capture takes in providing forensic data and how it is has historically been implemented.

How is packet capture implemented?

Packet capture can be thought of as a virtual security camera constantly watching what enters and exits the network.  By that defintion it is neccessary to establish a choke point and funnel all of the enterprises’s ingress and egress traffic through such a system.  This tends to enforce a network architecture model known as hub and spoke where traffic from remote locations (Spokes) is backhauled to a centralized location (Hub) where the packet capture function is employed. 

In these Hub locations packet capture is commonly facilitated by utilzing smart inline taps, commonly refferred to as packet brokers, which send copies of the intercepted traffic to the various monitoring devices that might need to analyze the data. These packet brokers provide several functions like load balancing the traffic across destination monitoring systems, removing VLAN or MPLS headers, and pre-stage filtering to parse out specific protocol traffic to send to a particluar device. The addition of metadata like timestamps and geolocation info to the captured packets are also provided.  Most importantly, in the context of this particular blog post, is the ability of the packet broker system to be preloaded with private encryption keys to decrypt SSL encrypted traffic prior to sending the data to monitoring devices. In this model, SSL decryption capability is performed passively, out of band and after the fact, rather than by having the packet broker system act as a man-in-the-middle proxy for the purpose of SSL decryption.

The role of packet capture in incident response

Packet capture provides the Security Incident Response Team the ability to go back in time and look into a potential security incursion that has occurred. This includes examining the behavior of a specific piece of malware such as determining what propagation techniques it uses, what additional files it attempts to download as well as determining the C&C domains and IPs in use that would need to be blocked by inline security controls to prevent additional attempts to download more malware or exfiltrate sensitive data from the impacted Enterprise.  All of this data can be used to write detection signatures to prevent future occurences.  Another benefit to incident response is the ability to replay the originally captured traffic through those newly written signatures to test proper detection of a specific threat. It is also very valuable in helping to determine the impact of a breach such as what accounts and systems were accessed and what sensitive data was actually exfiltrated.

However, with more and more enterprise traffic destined to the open internet and SaaS applications this backhauling model to do centralized packet capture comes at a cost.  There is considerable impact to remote branch office and remote user performance due to the latency incurred in this traffic backhauling to a centralized “Hub” location model. Extending packet capture functionality to remote users that are off the corporate network requires enforcing a full tunnel VPN solution to bring them back onto the network where packet capture can happen at the network ingress/egress boundary.  While it can also be debated that packet capture is primarily used forensically in a way that is analogous to solving crimes rather than preventing them, nevertheless it’s been an important tool in the Enterprise Security Incident Response Team’s toolbox for quite some time now.

So what is changing with TLS 1.3? 

TLS 1.3 (RFC 8446) brings about some important changes which deliver security improvement and performance enhancements over version 1.2. Some of the more salient changes are:

  • Faster session setup times as session completion is done in 1 round-trip versus the 2 round-trips required in TLS 1.2
  • Zero Round Trip (0-RTT) which essentially lets you immediately send data to a server that you’ve previously communicated with also increases performance over TLS 1.2
  • Removal of previously used insecure protocols, ciphers and algorithms
  • Perfect Forward Secrecy (PFS) uses Ephemeral Diffie-Hellman key exchange protocol for generating dynamic one-time per session keys rather than a single static private key for every session

Why is TLS 1.3 going to impact our ability to effectively do packet capture?

This mandate of perfect forward secrecy (PFS) is what is going to force a change to the way we implement packet capture.  PFS will prevent us from going back and doing passive after the fact decryption on traffic as there is no longer a single private key that can be used to decrypt prior sessions.  Prior to the actual packet capture, TLS 1.3 decryption is going to require the use of an active “man-in-the-middle” (MITM) proxy which terminates each unique SSL (TLS 1.3) session from the client and opens a new TLS 1.3 session onward towards the origin content server that the client seeks to communicate with.

This will have both a resource and financial impact on the enterprise in that it will require either the purchase and deployment of dedicated MITM SSL decryption devices, or enabling MITM SSL decryption on previously in use web proxy appliances.  Hopefully these previously purchased proxy appliances can actually support TLS 1.3 interception without requiring a hardware upgrade of the crypto chipset used under the hood. Even if a hardware upgrade/refresh isn’t required to support TLS 1.3, existing proxies will likely struggle to keep up with the performance impact of SSL inspecting all of this traffic and require additional capacity augmentation to be purchased.  The net effect here is that continuing to do packet capture with TLS 1.3 in play will require a signifcant re-investment to the current “hub-and-spoke” centralized outbound security gateway stack model.

Of course full proliferation of TLS 1.3 onto both the client and server side is going to take some time. Major browsers like Chrome and Firefox already support it as of October 2018 and on the server side some large providers like Facebook, Google, Twitter, Microsoft and Cloudflare’s CDN have already started to run TLS 1.3 as well. Despite some early adoption, as of August 2018, Sandvine reports that only half a percent of all the encrypted traffic it sees is TLS 1.3.

Since it will take awhile before TLS 1.3 is maintstream, perhaps now is an opportunitistic time to really rethink our overall longer term network and security architecture strategy and consider whether continued re-investment in centralized backhaul and security appliance refresh is the best approach in the era of increasing cloud application usage and end user mobility.  With applications and users leaving the traditional enterprise network perimeter does it really make sense to force users back onto the corporate network and continue to spend time and resources on a legacy hardware appliance based approach?

At Zscaler, we certainly believe that there is a better way to deliver a modern network and security architecture with full visibility into all of your enterprise’s encrypted traffic without compromises.

Disclaimer: The views expressed here are my own and do not necessarily reflect the views of my employer Zscaler, Inc.

References:

TLS 1.3 is moving forward: what you need to know today to get ready

TLS 1.3 – Impact on Network Based Security – draft-camwinget-tls-use-cases-00

An Overview of TLS 1.3 and Q&A

Why TLS 1.3 isn’t in all browsers yet

A Tale Of 2 Meetings

Fine Tuning The Sales Rep/Sales Engineer Dynamic

If you have been a Sales Rep or Sales Engineer for more than 5 minutes the following semi-fictional scenario should be all to familiar for you in some ways.

You are in the lobby of Acme Corp, a potential prospect who you’re meeting with for the first time. You are all signed in and waiting for your contact to bring you up to the conference room for the meeting. They are of course running late. Then once ushered into the conference room you need to wrestle with the media connector for projecting to the folks in the room and of course with the guest WiFi so that folks who are remote can join the web conference session.  Now comes the obligatory round of introductions and making sure that the folks who  are remote on the bridge can hear you and see your slide presentation.

You had originally requested a 1 hour meeting time slot to cover intros, 30 mins of content and 10-15 mins of QA, but are now already running 20-25 mins late. You and your counterpart feel the pressure as you now need to scramble in order to still pull off a successful meeting.

The introductions are over and now Sam the Sales Rep takes over and what ensues is a litany of buzzwords, current customer name dropping and bold claims of solving very specific problems. One of these immediately strikes accord with the prospect who then asks for specifics on how your solution can deliver on that.  Sam the Sales Rep then turns to Steve the SE and asks him to explain exactly how they can do it. What happens next will define whether this meeting has a chance of being successful or not.

Steve the SE panics a bit as he thought he would have 30 mins to give the customer a really good idea of what the product/solution does and how it could benefit Acme Corp.  Instead Steve feels compelled to get down into the weeds right out the gate. What Steve really wants to say, but clearly thinks better of it, is “Sam, you’ve been doing a great job explaining this already…I think you’ve got this…go ahead and explain how we do XYZ”.  Wanting to be generally helpful and show that he is knowledgeable about the solution, Steve starts to explain how they can potentially solve problem XYZ to the customer. This doesn’t go well at all as Steve has to keep zooming out a bit to explain different aspects of how the product works and what it does in order for the customer to even begin to understand how the it will solve for XYZ. Each customer question taking him further down a rathole and farther away from the high level overview and value proposition he had wanted to start with. He is clearly and visibly off his game plan and uncomfortable with the way things are going yet as an Engineer he feels compelled to keep answering the technical questions.

Suddenly there is only 8-10 mins left in the meeting. Sam the Sales Rep interrupts with a time check and asks if there are any other key questions and what the next steps should be. Meanwhile, the prospect never did get the basic understanding of what the product does as a whole and what problems it was designed to help them solve. They only have bits and pieces based on what questions they asked and perhaps will leave the room thinking that the vendor’s solution is very niche and only really does the one thing they’ve spent the last 20 or so mins talking about.

The prospect responds with “we will need to discuss this internally afterwards and see if this is something that we would be interested in pursuing”… the Sales Rep/Sales Engineer tandem completley missed the mark with this meeting.

Sales Rep’s perspective of the meeting:

Knowing we didn’t have a lot of time left I managed it by getting right to the point and telling them how great we are and how we can save them tons of money on XYZ.  I needed to find something that would stick so that we could get them to agree to a next step of some kind.  Then my SE droned on and on about how we actually do it.  They kept asking a lot of questions, but didn’t really seem to get the big picture though. I don’t think we should have spent that much time talking about the technical bits as they seemed to indicate that they werent all that interested.

Sales Engineer’s perspective of the meeting:

We should have just stuck to the plan and given the basic technical overview and then let them ask us their questions. Sam’s buzzword bingo and bold claims of how can we can help them got us off track right away and we ran out of time as I answered all their technical questions. I still don’t know what product/service they are using today and why they would even entertain pursuing something else.

Reality of the meeting:

They were both wrong and stuck to their preconceived notion of what their role in the meeting should be.  Neither did anything to really keep the meeting focused and on track and headed towards a successful outcome. This is not a Sales Rep/Sales Engineer tandem that is going to be very successful.

On the Sales Rep side Sam could have just briefly introduced the company and what the product/solution was designed to do at a high level while perhaps mentioning previous success with some of Acme’s industry peers in using the product/solution.  It would be good if Sam actually asked them what they were utilizing today, potential problems they were having with the current model or approach and where they see themselves trying to get to in order to be successful.  At this point Sam should be indicating that Steve would then walk them through the details of how it’s done, talking specifically to their desired end state and goals while answering any questions they have.

On the Sales Engineer side, Steve could have gently pushed back and said “before we get too deep into how we do the specifics around XYZ, lets make sure that you have a firm understanding of what we do as a whole and why we feel it’s different….I’d also like to make sure we understand the problems you face so that we can focus more on the aspects of the solution that could potentially benefit Acme Corp”.  This is the difference between an Engineer who understands all the “nerd knobs” of how a product works and a Sales Engineer who can articulate how the product can solve real business problems for the prospect.

Some key attributes of a successful Sales Rep/SE tandem are knowing your respective role in this partnership, how to play off of each other and above all trust.  To a degree you also have to be able to do a little bit of each other’s job if and when needed.  If your SE isn’t able to understand how the problems your product/solution solves matters to the prospect’s business and how to work that into the presentation then they are an Engineer not a Sales Engineer.  If your Sales Rep can’t set the table and explain at high level what your product does and why it matters in 5 mins or less then they shouldn’t be in Sales.  Sales Reps will often encounter prospects without their SE around to provide cover and need to know just enough about the product to quickly cut to the chase on whether there is a real opportunity there or not to pursue.  Although rare, Sales Engineers should be able to move a meeting with a prospect along without their Sales Rep counterpart. This means being able to drive a meeting towards a meaningful next step, not just explaining how the tech works.

So what could have happened here that might have saved this meeting?

Hint: All of this could have been prevented before they even arrived at Acme Corp

1.) Define A Successful Outcome

In a very brief pre-meeting, define what a successful outcome for Acme Corp meeting should look like that way the Sales Rep/Sales Engineer tandem can be more flexible in adjusting on the fly to accomdate the desired outcome. Are you trying to getting another meeting at Acme with a different team/group or decision maker? Are you trying to move them towards a trial or pilot? Are you just trying to see if there is even an opportunity here at all?

2.) Set The Table

Define what should be covered and to what depth in the initial introduction of your company and it’s product/solution and who is going to actually do it. Strike a balance right out of the gate between a very high level intro of what you do and what problems you can help solve to ensure there is interest and then indicate that you will walk them through the specifics as the meeting progresses. Avoid bold claims and assumptions about what the customer needs as you just don’t know yet. Everyone says their router/switch is faster and scales better than every other competitor’s. Every security vendor says their product is more secure than the other guy’s. If you don’t know what they are doing and using today and why then you just sound like every other vendor that walked through the door.

3.) Do Discovery

What is the customer doing today, how are they doing it, why are they doing it that way and what are the big challenges they are trying to solve? Take mental (as well as physical ) notes on this as you can then work these in as truly relevant examples as you explain what your product/solution does and how it can potentially help Acme Corp.

4.) Play Off Of Each Other

If you’ve done #2 and #3 right you can refer back to each other’s previous statements and relate the tech back to the business or the business back to the tech.  Not only while you look and sound more like a Sales tandem, but you clearly are now able to reinforce the value proposition of your product while pleasing the technical and operational stakeholders too.  “Like Sam said in the begining about being able to potentially save you as much as 40% on your costs of XYZ while streamlining and simplifying the way you run things today, this is the way we would go about helping you achieve that” or “This particular feature set Steve is explaining is how Big Global Corp saved N% on their annual cost of XYZ”.

5.) Manage Your Time

Meetings rarely ever start or end on time or go off without a prospect being called out of the meeting for something critical.  Both Sales Rep and SE have an equal role to play in making sure the pre-agreed upon requirements are being hit to lead to the desired successful outcome.  This really goes back to #2 in making sure that you set the table well enough that the prospect leaves the meeting knowing exactly what you can do and to #3  in that based on what they do today and where they want to be you are armed with clear examples of how you can potentially help them. This is how you are going to get to a successful outcome when you ask them for that next step of another meeting or to kick-off a trial/pilot of your product.

Spend some time with your Sales Rep or Sales Engineer and discuss the roles each of you will play in scripting out a successful meeting.  Build a story of who your company is and what you do that doesn’t require any slides.  This way you can easily adapt on the fly and quickly get your message across when the projector or virtual meeting system doesn’t work properly.  Practice, execute and refine it as much as needed.

Disclaimer: The views expressed here are my own and do not necessarily reflect the views of my employer Zscaler, Inc.

How will security look when the entire web is encrypted?

Photo by Rubén Bagüés on Unsplash

Will the entire internet be SSL/TLS encrypted soon?

There were some pretty simple drivers for securing the web that lead to the adoption of SSL/TLS , or HTTPS as it’s commonly referred to.  Principally confidentiality and integrity was desirable before we would ever trust transmitting a credit card number to purchase something in our online shopping cart.  We wanted to ensure that we were sending our sensitive data to the entity that we actually intended to and that this sensitive data is not being transmitted in the clear where it’s at risk of potential interception.

SSL/TLS encryption has found it’s way into the mainstream of almost every popular website, cloud application and mobile App these days.  In fact, as of the time of this writing, 81 of the top 100 web sites default to HTTPS.

The proliferation of free SSL certificates via entities like LetsEncrypt have certainly made securing sites via SSL even easier.

So what’s next?  Google who has led the charge in helping push for a more secure web has just announced that in July of 2018 their Chrome browser will start to actively warn end users when they are accessing a site that is not HTTPS encrypted.   This will no doubt cause a scramble by site owners to ensure that their web sites are encrypted greatly increasing the number of sites on the web that are encrypted via HTTPS.

So what does this mean for the traditional enterprise internet security architecture model?

First and foremost, a further increase in HTTPS traffic is going to further reduce the effectiveness of security stacks that are attempting to do web content filtering, cloud application visibilty and control, advanced threat prevention, sandboxing of zero day threats and Data Loss Prevention (DLP). This is because malicious actors are ironically using the very same protocol that was meant to keep us safe on the web as a way of obscuring their activities like phishing and the distribution of malware like ransomware.

The typical enterprise already experiences HTTPS encryption of somewhere between 50-70% of the traffic that passes through their security gateway stack of appliances.  If they are not currently doing SSL inspection of their traffic then that translates to an effectiveness of only scanning 30-50% of their traffic using their existing security controls. What does that effectiveness rate look like in the wake of more and more of the web becoming encrypted as a result of Google’s upcoming “not secure” notification intentions?

Its time for enterprises to enable SSL inspection in their security controls else those tools are going to be blind to the overwhelming majority of the traffic traversing the web and cloud applications.  This will need to be done in a highly scalable and cost effective way which, as I’ve written about before,  isn’t attainable via coventional enterprise security stack deployment models. The cloud is going to have to be the delivery model for implementing this in a way that is always on regardless of end user location and flexibly scales to meet the enterprise’s demands in a way that is affordable.

For more information on the current threat landscape that is levaraging and hiding inside of SSL/TLS and how Zscaler can help check out this Zscaler Threatlabz webcast on “The Latest In SSL Security Attacks

Disclaimer: The views expressed here are my own and do not necessarily reflect the views of my employer Zscaler, Inc.

14.4 Terabits in a single rack unit ?

Have switching ASICs gotten too fast?

Looking back at the last few years it certainly appears that ethernet switching ASICs and front panel interface bandwidth are clearly moving at a different pace in that a faster switching ASIC comes just ahead of the required ethernet interface speed and optic form factor size required to drive the full bandwidth the ASIC actually provides while still fitting into a 1RU top-of-rack ethernet switch or line card profile.

Current 6.4+ Tbps system on-a-chip (SOC) ASIC based switching solutions have moved past the available front panel interface bandwith inside of a single rack unit (RU).  The QSFP28 (Quad-SFP) form factor currently occupies the entire front panel real estate of a 1RU switch at 32x100G QSFP28 ports prompting switching vendors to release 2RU platforms in order to cram 64x100G ports and fully drive the newest switching ASICs. With higher bandwidth switching ASICs on the near horizon the industry clearly needs a higher ethernet interface speed and new form factors to address the physical real estate restrictions.

So where do we go from here?

First lets look at the 3 available dimensions at our disposal for scaling up the interface bandwidth.

1.)  Increase the symbol rate per lane.

This means we need an advance in the actual optical component and thermal management used to deliver the needed increase in bandwidth in a power efficient manner.  Put more simply in the words of a ceratin  Evil Scientist who wakes up after being frozen for 30 years “I’m going to need a better laser okay”.

2.)  Increase the number of parallel lanes that the optical interface supports

As an example in the case of the 40Gbps QSFP form factor this meant running 4 parallel lanes of 10Gbps to achieve 40Gbps of bandwidth

3.)  Stuff (encode) more bits into each symbol per lane by using a different modulation scheme.

For example PAM4 encodes 2 bits per signal which effectively doubles the bit rate per lane and is the basis for delivering 50Gbps per lane and 200Gbps aggregate across 4 lanes.

Looking Beyond QSFP28

Next looks look at what is potentially coming down the pike for better interface bandwidth (greater than 100Gbps) and front panel port density.

Smaller form factor 100G

One approach is to simply used a more compact form factor and this is exactly what the micro QSFP is being designed to do.  uQSFP is the same width as an SFP form factor optic yet uses the same 4 lane design of QSFP28. This translates into a 33% increase in the front panel density of a 1RU switch when compared with the existing QSFP28 form factor. The uQSFP also draws the same 3.5W of power as the larger form factor QSFP28.  Its now going to be possible to fit up to 72 ports of uQSFP (72x100G) into a 1RU platform or line card allowing for the support of switching ASICs operating at 7.2Tbps when the uQSFP runs at 25Gbps per channel (4 lanes of 25Gbps).  If broken out into 4x25G ports a single 1RU could terminate up to 288 x 25G ports.  uQSFP is also expected to support PAM4 enabling 50Gbps per channel for an effective bandwidth of 200Gbps in a single port paving the way for enough front panel bandwidth to drive 14+Tbps of switching ASIC capacity in a 1RU switching device form factor.  There may however be technical challenges in engineering a product with 3 rows of optics on the front panel.

Image courtesy of http://www.microqsfp.com/

Double-Density Form Factors

Another approach is the QSFP-DD (double density) form factor.

QSFP28-DD is the same height and width of QSFP28, but slightly longer allowing for a second row of electrical contacts.  This second row provides for 8 signal lanes operating at 25Gbps for a total of 200Gbps in the same amount of space as the previous QSFP28 operating at 100Gbps.  This provides enough interface bandwidth and front panel density for 36 x 200Gbps and a 7.2Tbps switching ASIC.  There are break-out solutions coming that will allow for breaking out into 2x100Gbps QSFP28 connections with QSFP-DD optics on the 100G end.   What is not yet clear is whether a product will emerge which would allow for 8x25G breakouts of a QSFP28-DD into server cabinets.

400G

CFP8 is going to be the first new form factor to arrive for achieving 400G, but is going to be too large a form factor to fit into the more traditional model of 32 front panel ports in 1RU of space.  CFP8 dimensions are W 40 x L 102 x H 9.5 which should max out at around 18 ports per 1RU of space.  At 15-18W (3x the power of QSFP28), power consumption is another challenge in designing a line card that can accomodate it.  CFP8 is more likely to be used by service providers for router to router and router to transport longer haul transmissions rather than traditional ethernet switching devices found in the Data Center rack.

QSFP56-DD consists of 8 lanes of 50Gbps with PAM4 modulation for 400Gbps operation.  Its the same size form factor as QSFP/QSFP28 allowing for up to 36 ports in 1RU of space and flexible product designs where QSFP, QSFP28 or QSFP56-DD modules could alternatively be used in the same port.  These 36 ports of 400Gbps would support ASICs with 14.4Tbps in a single RU of space.  QSFP56-DD should also support short reach 4x100Gbps breakout into 4x SFP-DD which is the same size as SFP+/SFP28 making it eventually ideal for server connectivity.

Octal SFP (OSFP) is another new form factor with 8 lanes of 50Gbps for an effective bandwidth of 400G.  Its slightly wider than QSFP, but should still be capable of supporting up to 32 ports of 400G, a total of 12.8Tbps in 1RU of front panel space.  The challenge for OSFP adoption will be that its a completely different size form factor than the previous QSFP/QSFP28 which will require a completely new design for 1RU switches and line cards.  In other words there will be no backwards compatability where a QSFP/QSFP28 could be alternatively be plugged into the same port on line card or fixed switch. An adapter for allowing a QSFP28 optic to be inserted into the OSFP form factor is apparently under discussion.

So in conclusion just while ASICs seemed to be quickly outpacing interface bandwidth and front panel real estate there are viable options coming soon that will be able to take us to the 12.8 to 14.4Tbps level in a single RU.

Disclaimer: The views expressed here are my own and do not necessarily reflect the views of my employer Juniper Networks