查看原文
其他

Google新书:《构建安全可靠的系统》

riusksk 漏洞战争 2023-03-05


近日Google安全团队发布一本新书,叫《Building Secure & Reliable Systems》,由著名的O'Reilly出版社发行,用户可以购买纸质书,或者下载免费的电子书,可见他们在知识分享和基础安全建设贡献上,着实对安全行业的发展分享不少的经验,力行推动行业发展。


之前Google为了让亿万用户使用更加稳定可靠的服务,他们组建了一支专业的团队去负责此块工作,这个团队叫“Site Reliability Engineers (SREs)”(网站可靠性工程师),即DevOps的践行者,主要职责都是构建、部署、监控、维护软件系统等等,此书正是由该团队编写的。


01

SREs、安全工程师与软件工程师


与软件工程师不同的是:

  • 网站可靠性工程师(SREs)和安全工程师都倾向于故障修复和构建开发;

  • 除了开发,他们的工作也包括运维事项;

  • 他们常被视为业务的拦路虎,而非推动者;

  • 他们常被孤立,鲜能集成进产品团队中。


这次他们把安全嵌入到SRE中,即现在所流行的DevSecOps方法论,所以你若对DevSecOps感兴趣,推荐看看。



02


关键内容


本书主要分享安全可靠系统构建过程中的:

  • 设计策略

  • 编码、测试和调试的实践建议

  • 对事故的防御、响应和恢复建议

  • 跨团队协作的最佳实践文化


再聊点安全相关的内容,主要是第12章关于编码安全的介绍,包括常见Web漏洞的防御,安全框架的使用,Sanitize安全编译功能等等,介绍了“你不需要它(YAGNI)”的软件原则:只实现当前需要的功能,千万不要去实现那些你认为以后有可能会用到的功能。“

第13章主要介绍Fuzzing和单元测试,介绍了一些常见的主流Fuzzer,包括oss-fuzz、AFL、libFuzzer、Honggfuzz等等,并举例libfuzzer的使用,分享了一些Fuzz工程、持续化Fuzzing的建设思路,这里重点分享oss-fuzz和clusterfuzz。最后介绍了一些静态代码分析方法,重点介绍了clang-tidy这个静态代码分析框架,它是基于clang实现的,支持C/C++/Objective-C,不过看起来更偏于代码质量分析的。主要思路就是介绍如何在CI/CD流水线中将所有这些测试和分析的工作集成进去,实现可持续化的自动化测试分析,这也是当前流行的DevSecOps方法中的思路。


3


书籍目录


全书557页,属于大块头书籍。


Part I. Introductory Material


1. The Intersection of Security and Reliability

    On Passwords and Power Drills

    Reliability Versus Security: Design Considerations

    Confidentiality, Integrity, Availability

        Confidentiality

        Integrity

        Availability

    Reliability and Security: Commonalities

        Invisibility

        Assessment

        Simplicity

        Evolution

        Resilience

        From Design to Production

        Investigating Systems and Logging

        Crisis Response

        Recovery

    Conclusion


2. Understanding Adversaries

    Attacker Motivations

    Attacker Profiles

        Hobbyists

        Vulnerability Researchers

        Governments and Law Enforcement

        Activists

        Criminal Actors

        Automation and Artificial Intelligence

        Insiders

    Attacker Methods

        Threat Intelligence

        Cyber Kill Chains™ 

        Tactics, Techniques, and Procedures

    Risk Assessment Considerations

    Conclusion


Part II. Designing Systems


3. Case Study: Safe Proxies

    Safe Proxies in Production Environments

    Google Tool Proxy

    Conclusion


4. Design Tradeoffs

    Design Objectives and Requirements

        Feature Requirements

        Nonfunctional Requirements

        Features Versus Emergent Properties

        Example: Google Design Document

    Balancing Requirements

        Example: Payment Processing

    Managing Tensions and Aligning Goals

        Example: Microservices and the Google Web Application Framework

        Aligning Emergent-Property Requirements

    Initial Velocity Versus Sustained Velocity

    Conclusion


5. Design for Least Privilege

    Concepts and Terminology

        Least Privilege

        Zero Trust Networking

        Zero Touch

    Classifying Access Based on Risk

    Best Practices

        Small Functional APIs

        Breakglass

        Auditing

        Testing and Least Privilege

        Diagnosing Access Denials

        Graceful Failure and Breakglass Mechanisms

    Worked Example: Configuration Distribution

        POSIX API via OpenSSH

        Software Update API

        Custom OpenSSH ForceCommand

        Custom HTTP Receiver (Sidecar)

        Custom HTTP Receiver (In-Process)

        Tradeoffs

    A Policy Framework for Authentication and Authorization Decisions

        Using Advanced Authorization Controls

        Investing in a Widely Used Authorization Framework

        Avoiding Potential Pitfalls

    Advanced Controls

        Multi-Party Authorization (MPA)

        Three-Factor Authorization (3FA)

        Business Justifications

        Temporary Access

        Proxies

    Tradeoffs and Tensions

        Increased Security Complexity

        Impact on Collaboration and Company Culture

        Quality Data and Systems That Impact Security

        Impact on User Productivity

        Impact on Developer Complexity

    Conclusion


6. Design for Understandability

    Why Is Understandability Important?

        System Invariants

        Analyzing Invariants

        Mental Models

    Designing Understandable Systems

        Complexity Versus Understandability

        Breaking Down Complexity

        Centralized Responsibility for Security and Reliability Requirements

    System Architecture

        Understandable Interface Specifications

        Understandable Identities, Authentication, and Access Control

        Security Boundaries

    Software Design

        Using Application Frameworks for Service-Wide Requirements

        Understanding Complex Data Flows

        Considering API Usability

    Conclusion


7. Design for a Changing Landscape

    Types of Security Changes

    Designing Your Change

    Architecture Decisions to Make Changes Easier

        Keep Dependencies Up to Date and Rebuild Frequently

        Release Frequently Using Automated Testing

        Use Containers

        Use Microservices

    Different Changes: Different Speeds, Different Timelines

        Short-Term Change: Zero-Day Vulnerability

        Medium-Term Change: Improvement to Security Posture

        Long-Term Change: External Demand

    Complications: When Plans Change

    Example: Growing Scope—Heartbleed

    Conclusion


8. Design for Resilience

    Design Principles for Resilience

    Defense in Depth

        The Trojan Horse

        Google App Engine Analysis

    Controlling Degradation

        Differentiate Costs of Failures

        Deploy Response Mechanisms

    Automate Responsibly

    Controlling the Blast Radius

        Role Separation

        Location Separation

        Time Separation

    Failure Domains and Redundancies

        Failure Domains

        Component Types

        Controlling Redundancies

    Continuous Validation

        Validation Focus Areas

        Validation in Practice

    Practical Advice: Where to Begin

    Conclusion


9. Design for Recovery

    What Are We Recovering From?

        Random Errors

        Accidental Errors

        Software Errors

        Malicious Actions

    Design Principles for Recovery

        Design to Go as Quickly as Possible (Guarded by Policy)

        Limit Your Dependencies on External Notions of Time

        Rollbacks Represent a Tradeoff Between Security and Reliability

        Use an Explicit Revocation Mechanism

        Know Your Intended State, Down to the Bytes

        Design for Testing and Continuous Validation

    Emergency Access

        Access Controls

        Communications

        Responder Habits

    Unexpected Benefits

    Conclusion


10. Mitigating Denial-of-Service Attacks

    Strategies for Attack and Defense

        Attacker’s Strategy

        Defender’s Strategy

    Designing for Defense

        Defendable Architecture

        Defendable Services

    Mitigating Attacks

        Monitoring and Alerting

        Graceful Degradation

        A DoS Mitigation System

        Strategic Response

    Dealing with Self-Inflicted Attacks

        User Behavior

        Client Retry Behavior

    Conclusion


Part III. Implementing Systems


11. Case Study: Designing, Implementing, and Maintaining a Publicly Trusted CA

    Background on Publicly Trusted Certificate Authorities

    Why Did We Need a Publicly Trusted CA?

    The Build or Buy Decision

    Design, Implementation, and Maintenance Considerations

        Programming Language Choice

        Complexity Versus Understandability

        Securing Third-Party and Open Source Components

        Testing

        Resiliency for the CA Key Material

        Data Validation

    Conclusion


12. Writing Code

    Frameworks to Enforce Security and Reliability

        Benefits of Using Frameworks

        Example: Framework for RPC Backends

    Common Security Vulnerabilities

        SQL Injection Vulnerabilities: TrustedSqlString

        Preventing XSS: SafeHtml

    Lessons for Evaluating and Building Frameworks

        Simple, Safe, Reliable Libraries for Common Tasks

        Rollout Strategy

    Simplicity Leads to Secure and Reliable Code

        Avoid Multilevel Nesting

        Eliminate YAGNI Smells

        Repay Technical Debt

        Refactoring

    Security and Reliability by Default

        Choose the Right Tools

        Use Strong Types

        Sanitize Your Code

    Conclusion


13. Testing Code

    Unit Testing

        Writing Effective Unit Tests

        When to Write Unit Tests

        How Unit Testing Affects Code

    Integration Testing

        Writing Effective Integration Tests

    Dynamic Program Analysis

    Fuzz Testing

        How Fuzz Engines Work

        Writing Effective Fuzz Drivers

        An Example Fuzzer

        Continuous Fuzzing

    Static Program Analysis

        Automated Code Inspection Tools

        Integration of Static Analysis in the Developer Workflow

        Abstract Interpretation

        Formal Methods

    Conclusion


14. Deploying Code

    Concepts and Terminology

    Threat Model

    Best Practices

        Require Code Reviews

        Rely on Automation

        Verify Artifacts, Not Just People

        Treat Configuration as Code

    Securing Against the Threat Model

    Advanced Mitigation Strategies

        Binary Provenance

        Provenance-Based Deployment Policies

        Verifiable Builds

        Deployment Choke Points

        Post-Deployment Verification

    Practical Advice

        Take It One Step at a Time

        Provide Actionable Error Messages

        Ensure Unambiguous Provenance

        Create Unambiguous Policies

        Include a Deployment Breakglass

    Securing Against the Threat Model, Revisited

    Conclusion


15. Investigating Systems

    From Debugging to Investigation

        Example: Temporary Files

        Debugging Techniques

        What to Do When You’re Stuck

        Collaborative Debugging: A Way to Teach

        How Security Investigations and Debugging Differ

    Collect Appropriate and Useful Logs

        Design Your Logging to Be Immutable

        Take Privacy into Consideration

        Determine Which Security Logs to Retain

        Budget for Logging

    Robust, Secure Debugging Access

        Reliability

        Security

    Conclusion


Part IV. Maintaining Systems


16. Disaster Planning

    Defining “Disaster”

    Dynamic Disaster Response Strategies

    Disaster Risk Analysis

    Setting Up an Incident Response Team

        Identify Team Members and Roles

        Establish a Team Charter

        Establish Severity and Priority Models

        Define Operating Parameters for Engaging the IR Team

        Develop Response Plans

        Create Detailed Playbooks

        Ensure Access and Update Mechanisms Are in Place

    Prestaging Systems and People Before an Incident

        Configuring Systems

        Training

        Processes and Procedures

    Testing Systems and Response Plans

        Auditing Automated Systems

        Conducting Nonintrusive Tabletops

        Testing Response in Production Environments

        Red Team Testing

        Evaluating Responses

    Google Examples

        Test with Global Impact

        DiRT Exercise Testing Emergency Access

        Industry-Wide Vulnerabilities

    Conclusion


17. Crisis Management

    Is It a Crisis or Not?

        Triaging the Incident

        Compromises Versus Bugs

    Taking Command of Your Incident

        The First Step: Don’t Panic!

        Beginning Your Response

        Establishing Your Incident Team

        Operational Security

        Trading Good OpSec for the Greater Good

        The Investigative Process

    Keeping Control of the Incident

        Parallelizing the Incident

        Handovers

        Morale

    Communications

        Misunderstandings

        Hedging

        Meetings

        Keeping the Right People Informed with the Right Levels of Detail

    Putting It All Together

        Triage

        Declaring an Incident

        Communications and Operational Security

        Beginning the Incident

        Handover

        Handing Back the Incident

        Preparing Communications and Remediation

        Closure

    Conclusion


18. Recovery and Aftermath

    Recovery Logistics

    Recovery Timeline

    Planning the Recovery

        Scoping the Recovery

        Recovery Considerations

        Recovery Checklists

    Initiating the Recovery

        Isolating Assets (Quarantine)

        System Rebuilds and Software Upgrades

        Data Sanitization

        Recovery Data

        Credential and Secret Rotation

    After the Recovery

        Postmortems

    Examples

        Compromised Cloud Instances

        Large-Scale Phishing Attack

        Targeted Attack Requiring Complex Recovery

    Conclusion


Part V. Organization and Culture


19. Case Study: Chrome Security Team

    Background and Team Evolution

    Security Is a Team Responsibility

    Help Users Safely Navigate the Web

    Speed Matters

    Design for Defense in Depth

    Be Transparent and Engage the Community

    Conclusion


20. Understanding Roles and Responsibilities

    Who Is Responsible for Security and Reliability?

        The Roles of Specialists

        Understanding Security Expertise

        Certifications and Academia

    Integrating Security into the Organization

        Embedding Security Specialists and Security Teams

        Example: Embedding Security at Google

        Special Teams: Blue and Red Teams

        External Researchers

    Conclusion


21. Building a Culture of Security and Reliability

    Defining a Healthy Security and Reliability Culture

        Culture of Security and Reliability by Default

        Culture of Review

        Culture of Awareness

        Culture of Yes

        Culture of Inevitably

        Culture of Sustainability

    Changing Culture Through Good Practice

        Align Project Goals and Participant Incentives

        Reduce Fear with Risk-Reduction Mechanisms

        Make Safety Nets the Norm

        Increase Productivity and Usability

        Overcommunicate and Be Transparent

        Build Empathy

    Convincing Leadership

        Understand the Decision-Making Process

        Build a Case for Change

        Pick Your Battles

        Escalations and Problem Resolution

    Conclusion


Appendix. A Disaster Risk Assessment Matrix


点击下方的“阅读原文”可直接下载 PDF电子书。








您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存