验证码反作弊——虚假轨迹检测思路

拟合法

一般我们在生成轨迹时，都是通过曲线函数，因此可以通过分析轨迹的特征来实现。生成的轨迹通常具有以下特征：

平滑性：轨迹通常非常平滑，没有明显的抖动或不规则性。
速度一致性：轨迹速度变化较为平滑，没有突然的加速或减速。
曲率变化：轨迹曲率变化较为平滑，没有突然的转折。

因此我们可以利用数学思想，如果验证程序在区间极值点里面拟合求导，发现二阶或者三阶导为0的话，贝塞尔曲线也会失灵，验证程序还可以不断提高贝塞尔的阶数

import numpy as np
from scipy.signal import savgol_filter
import matplotlib.pyplot as plt

def detect_fake_mouse_trajectory(x, y):
    # 计算速度
    dx = np.diff(x)
    dy = np.diff(y)
    speed = np.sqrt(dx**2 + dy**2)
    
    # 使用Savitzky-Golay滤波器平滑速度
    smoothed_speed = savgol_filter(speed, window_length=11, polyorder=2)
    
    # 计算速度变化率
    speed_change = np.abs(np.diff(smoothed_speed))
    
    # 检测平滑性
    smoothness_threshold = 0.1
    is_smooth = np.all(speed_change < smoothness_threshold)
    
    # 检测速度一致性
    speed_consistency_threshold = 0.5
    is_consistent = np.std(smoothed_speed) < speed_consistency_threshold
    
    return is_smooth and is_consistent

# 示例鼠标轨迹
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([0, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5])

# 检测鼠标轨迹是否为贝塞尔曲线生成
is_fake = detect_fake_mouse_trajectory(x, y)
print("Is fake mouse trajectory:", is_fake)

# 可视化速度和平滑速度
dx = np.diff(x)
dy = np.diff(y)
speed = np.sqrt(dx**2 + dy**2)
smoothed_speed = savgol_filter(speed, window_length=11, polyorder=2)

plt.plot(speed, label='Original Speed')
plt.plot(smoothed_speed, label='Smoothed Speed')
plt.legend()
plt.show()

利用上面的代码，我们就可以通过单纯的数学特征来初步过滤掉虚假的轨迹了

计算速度：通过计算相邻点之间的距离来计算速度。
平滑速度：使用Savitzky-Golay滤波器平滑速度，以减少噪声。
计算速度变化率：通过计算平滑速度的变化率来检测轨迹的平滑性。
检测平滑性和速度一致性：通过设置阈值来检测轨迹的平滑性和速度一致性。

实际上，以上方法只是一个简单的示例，实际应用中可能需要更复杂的算法和更多的特征来检测假鼠标轨迹。可以结合其他特征（如曲率变化、加速度等）来提高检测的准确性。

通过这种方法，可以初步检测鼠标轨迹是否是利用贝塞尔曲线生成的。

ai法

在初步进行快速判断后，我们可以利用大量的轨迹数据，使用传统机器学习方法，使用 scikit-learn 库来训练一个随机森林分类器

或利用现在较为常用的深度学习方法，例如LSTM（长短期记忆网络）来处理时间序列数据

# 示例数据
# 假设我们有一个包含鼠标轨迹数据的DataFrame
# 每行表示一个轨迹，包含特征和标签（1表示真人，0表示模拟）
data = pd.DataFrame({
    'avg_speed': np.random.rand(100),
    'speed_variation': np.random.rand(100),
    'avg_acceleration': np.random.rand(100),
    'acceleration_variation': np.random.rand(100),
    'curvature_variation': np.random.rand(100),
    'jitter_frequency': np.random.rand(100),
    'label': np.random.randint(2, size=100)
})

# 分离特征和标签
X = data.drop('label', axis=1).values
y = data['label'].values

# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 重塑数据以适应LSTM输入要求
X_train = X_train.reshape((X_train.shape[0], 1, X_train.shape[1]))
X_test = X_test.reshape((X_test.shape[0], 1, X_test.shape[1]))

# 构建LSTM模型
model = Sequential()
model.add(LSTM(50, input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(Dense(1, activation='sigmoid'))

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# 训练模型
model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_test, y_test))

# 评估模型
loss, accuracy = model.evaluate(X_test, y_test)
print(f'Accuracy: {accuracy}')

相比数学特征，通过机器学习或深度学习的方法，可以更加有效地检测鼠标轨迹是否为真人生成，这也是目前反爬虫的厂商的普遍做法。关键在于收集足够的数据、提取有用的特征，并选择合适的模型进行训练和评估。