345 lines
10 KiB
Markdown
345 lines
10 KiB
Markdown
# Vulkan 设备丢失恢复机制
|
||
|
||
## 问题描述
|
||
|
||
当系统从休眠/睡眠状态唤醒时,Vulkan 渲染会失败并报错:
|
||
|
||
```
|
||
Failed to submit draw command buffer! Error code = -4
|
||
```
|
||
|
||
错误码 `-4` 对应 `VK_ERROR_DEVICE_LOST`,表示 Vulkan 逻辑设备已丢失。这是因为:
|
||
|
||
1. 系统休眠时 GPU 驱动会被挂起或重置
|
||
2. 唤醒后 GPU 物理设备重新初始化
|
||
3. 之前创建的 Vulkan 逻辑设备和资源变为无效状态
|
||
4. 任何 Vulkan 命令调用都会返回 `VK_ERROR_DEVICE_LOST`
|
||
|
||
## 解决方案
|
||
|
||
实现了一个完整的设备丢失检测和恢复机制,包括以下几个关键步骤:
|
||
|
||
### 1. 添加设备丢失状态标志
|
||
|
||
在 `VulkanWidget` 类中添加了 `m_deviceLost` 布尔标志来跟踪设备状态:
|
||
|
||
```cpp
|
||
bool m_deviceLost; // 标记设备是否丢失(如休眠后唤醒)
|
||
```
|
||
|
||
### 2. 在关键 Vulkan 调用点检测设备丢失
|
||
|
||
在三个可能返回 `VK_ERROR_DEVICE_LOST` 的关键位置添加了检测:
|
||
|
||
#### a) `vkAcquireNextImageKHR` - 获取交换链图像
|
||
```cpp
|
||
VkResult result = vkAcquireNextImageKHR(...);
|
||
if (result == VK_ERROR_DEVICE_LOST || result == -4) {
|
||
qDebug() << "VK_ERROR_DEVICE_LOST detected in vkAcquireNextImageKHR!";
|
||
m_deviceLost = true;
|
||
if (handleDeviceLost()) {
|
||
qDebug() << "Device recovery successful";
|
||
} else {
|
||
qDebug() << "Device recovery failed!";
|
||
setError("Failed to recover from device lost error");
|
||
}
|
||
return;
|
||
}
|
||
```
|
||
|
||
#### b) `vkQueueSubmit` - 提交命令缓冲
|
||
```cpp
|
||
result = vkQueueSubmit(m_queue, 1, &submitInfo, m_inFlightFences[m_currentFrame]);
|
||
if (result != VK_SUCCESS) {
|
||
if (result == VK_ERROR_DEVICE_LOST || result == -4) {
|
||
qDebug() << "VK_ERROR_DEVICE_LOST detected! Attempting to recover device...";
|
||
m_deviceLost = true;
|
||
if (handleDeviceLost()) {
|
||
qDebug() << "Device recovery successful";
|
||
} else {
|
||
qDebug() << "Device recovery failed!";
|
||
setError("Failed to recover from device lost error");
|
||
}
|
||
}
|
||
return;
|
||
}
|
||
```
|
||
|
||
#### c) `vkQueuePresentKHR` - 呈现图像
|
||
```cpp
|
||
result = vkQueuePresentKHR(m_queue, &presentInfo);
|
||
if (result == VK_ERROR_DEVICE_LOST || result == -4) {
|
||
qDebug() << "VK_ERROR_DEVICE_LOST detected in vkQueuePresentKHR!";
|
||
m_deviceLost = true;
|
||
if (handleDeviceLost()) {
|
||
qDebug() << "Device recovery successful";
|
||
} else {
|
||
qDebug() << "Device recovery failed!";
|
||
setError("Failed to recover from device lost error");
|
||
}
|
||
}
|
||
```
|
||
|
||
### 3. 实现设备恢复函数
|
||
|
||
#### `handleDeviceLost()` - 设备丢失处理函数
|
||
|
||
```cpp
|
||
bool VulkanWidget::handleDeviceLost()
|
||
{
|
||
qDebug() << "=== Handling device lost error ===";
|
||
|
||
// 1. 停止渲染定时器,防止在恢复期间继续渲染
|
||
if (m_renderTimer && m_renderTimer->isActive()) {
|
||
m_renderTimer->stop();
|
||
}
|
||
|
||
// 2. 等待设备空闲(可能会失败,但仍然尝试)
|
||
if (m_device != VK_NULL_HANDLE) {
|
||
vkDeviceWaitIdle(m_device);
|
||
}
|
||
|
||
// 3. 重新创建设备和所有资源
|
||
bool success = recreateDevice();
|
||
|
||
if (success) {
|
||
m_deviceLost = false;
|
||
|
||
// 4. 如果渲染已启用,重新启动定时器
|
||
if (m_renderingEnabled && m_renderTimer && !m_renderTimer->isActive()) {
|
||
m_renderTimer->start(16); // ~60 FPS
|
||
}
|
||
}
|
||
|
||
return success;
|
||
}
|
||
```
|
||
|
||
#### `recreateDevice()` - 设备重建函数
|
||
|
||
完整的资源重建流程:
|
||
|
||
```cpp
|
||
bool VulkanWidget::recreateDevice()
|
||
{
|
||
// 1. 清理 VulkanRenderer
|
||
if (m_renderer) {
|
||
delete m_renderer;
|
||
m_renderer = nullptr;
|
||
}
|
||
|
||
// 2. 清理同步对象(Semaphores、Fences)
|
||
for (size_t i = 0; i < MAX_FRAMES_IN_FLIGHT; i++) {
|
||
vkDestroySemaphore(m_device, m_renderFinishedSemaphores[i], nullptr);
|
||
vkDestroySemaphore(m_device, m_imageAvailableSemaphores[i], nullptr);
|
||
vkDestroyFence(m_device, m_inFlightFences[i], nullptr);
|
||
}
|
||
|
||
// 3. 清理命令对象(Command Pool 和 Command Buffers)
|
||
vkFreeCommandBuffers(m_device, m_commandPool, ...);
|
||
vkDestroyCommandPool(m_device, m_commandPool, nullptr);
|
||
|
||
// 4. 清理交换链(Swapchain 和相关资源)
|
||
cleanupSwapchain();
|
||
|
||
// 5. 销毁逻辑设备
|
||
vkDestroyDevice(m_device, nullptr);
|
||
|
||
// 6. 销毁 Surface
|
||
vkDestroySurfaceKHR(m_instance, m_surface, nullptr);
|
||
|
||
// === 重建阶段 ===
|
||
|
||
// 7. 重新创建 Surface
|
||
if (!createSurface()) return false;
|
||
|
||
// 8. 重新创建逻辑设备
|
||
if (!createDevice()) return false;
|
||
|
||
// 9. 重新创建交换链
|
||
if (!createSwapchain()) return false;
|
||
|
||
// 10. 重新创建命令对象
|
||
if (!createCommandObjects()) return false;
|
||
|
||
// 11. 重新创建同步对象
|
||
if (!createSyncObjects()) return false;
|
||
|
||
// 12. 重新创建 VulkanRenderer
|
||
m_renderer = new VulkanRenderer();
|
||
if (!m_renderer->initialize(...)) {
|
||
delete m_renderer;
|
||
m_renderer = nullptr;
|
||
return false;
|
||
}
|
||
|
||
// 13. 重置帧计数器
|
||
m_currentFrame = 0;
|
||
|
||
return true;
|
||
}
|
||
```
|
||
|
||
### 4. 在渲染启用时主动检查
|
||
|
||
在 `setRenderingEnabled(true)` 时主动检查设备是否丢失:
|
||
|
||
```cpp
|
||
void VulkanWidget::setRenderingEnabled(bool enabled)
|
||
{
|
||
if (m_renderingEnabled) {
|
||
// 检查设备是否丢失(例如从睡眠中唤醒后)
|
||
if (m_deviceLost) {
|
||
qDebug() << "Device lost detected on resume, attempting recovery...";
|
||
if (!handleDeviceLost()) {
|
||
qDebug() << "Failed to recover device, rendering cannot resume";
|
||
m_renderingEnabled = false;
|
||
return;
|
||
}
|
||
}
|
||
|
||
// 启动渲染定时器
|
||
if (!m_renderTimer->isActive()) {
|
||
m_renderTimer->start(16);
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
### 5. 渲染前防护检查
|
||
|
||
在 `renderFrame()` 开始时添加保护检查:
|
||
|
||
```cpp
|
||
void VulkanWidget::renderFrame()
|
||
{
|
||
// 如果设备丢失,不进行渲染
|
||
if (m_deviceLost) {
|
||
return;
|
||
}
|
||
|
||
// 正常渲染流程...
|
||
}
|
||
```
|
||
|
||
## 工作流程
|
||
|
||
### 正常流程
|
||
1. 用户锁定屏幕 → 停止动画,渲染一帧锁屏画面
|
||
2. 系统进入休眠/睡眠
|
||
3. 系统唤醒 → 触发 `aboutToWakeUp` 信号
|
||
4. 调用 `setRenderingEnabled(true)`
|
||
5. 检测到 `m_deviceLost == false`,正常恢复渲染
|
||
|
||
### 设备丢失恢复流程
|
||
1. 用户锁定屏幕 → 停止动画
|
||
2. 系统进入休眠/睡眠 → **GPU 驱动重置,设备丢失**
|
||
3. 系统唤醒 → 触发 `aboutToWakeUp` 信号
|
||
4. 调用 `setRenderingEnabled(true)`
|
||
5. 检测到 `m_deviceLost == true`(或首次渲染时检测到)
|
||
6. 调用 `handleDeviceLost()`
|
||
7. 执行 `recreateDevice()` 重建所有 Vulkan 资源
|
||
8. 设置 `m_deviceLost = false`
|
||
9. 恢复正常渲染
|
||
|
||
### 渲染时设备丢失流程
|
||
1. 正在渲染中
|
||
2. 调用 `vkQueueSubmit` 返回 `VK_ERROR_DEVICE_LOST`
|
||
3. 检测到错误,设置 `m_deviceLost = true`
|
||
4. 调用 `handleDeviceLost()` 进行恢复
|
||
5. 下一帧继续正常渲染
|
||
|
||
## 关键点说明
|
||
|
||
### 为什么需要销毁 Surface?
|
||
|
||
虽然 Vulkan 规范中 Surface 不直接依赖于逻辑设备,但在某些平台(特别是 Windows)上:
|
||
- GPU 驱动重置可能影响 Surface 的底层窗口系统连接
|
||
- 旧的 Surface 可能与新设备不兼容
|
||
- 重新创建 Surface 确保与新设备的完全兼容性
|
||
|
||
### 为什么不保留 Instance 和 PhysicalDevice?
|
||
|
||
- `VkInstance`:Vulkan 加载器级别的对象,不受设备丢失影响,可以保留
|
||
- `VkPhysicalDevice`:物理设备句柄,表示 GPU 硬件,也不受逻辑设备丢失影响,可以保留
|
||
- `VkDevice`:逻辑设备,设备丢失后**必须**重新创建
|
||
- `VkSurface`:虽然理论上可以保留,但为了保证跨平台兼容性,选择重新创建
|
||
|
||
### 同步问题
|
||
|
||
在清理资源前调用 `vkDeviceWaitIdle()` 确保:
|
||
- 所有提交的命令都已完成
|
||
- 没有资源正在被 GPU 使用
|
||
- 避免在销毁时出现验证层错误
|
||
|
||
即使 `vkDeviceWaitIdle()` 因设备丢失而失败,我们仍然继续清理流程,因为:
|
||
- 设备已丢失,所有命令都已停止
|
||
- 资源对象仍需要正确释放以避免内存泄漏
|
||
|
||
## 测试场景
|
||
|
||
1. **正常休眠/唤醒**
|
||
- 锁定屏幕 → 休眠 → 唤醒 → 解锁
|
||
- 应该能够正常恢复渲染,无错误
|
||
|
||
2. **长时间休眠**
|
||
- 休眠超过数小时或过夜
|
||
- GPU 驱动更可能被完全重置
|
||
- 应该能够检测设备丢失并成功恢复
|
||
|
||
3. **多次休眠/唤醒循环**
|
||
- 连续多次休眠和唤醒
|
||
- 每次都应该能够正确恢复
|
||
- 无内存泄漏
|
||
|
||
4. **渲染中途设备丢失**
|
||
- 在活跃渲染期间触发设备丢失
|
||
- 应该能够捕获错误并恢复
|
||
|
||
## 日志输出
|
||
|
||
成功恢复的日志示例:
|
||
|
||
```
|
||
📤 系统已从睡眠中唤醒
|
||
MainWindow: Screen unlocked event received
|
||
Vulkan rendering ENABLED - Resuming animations
|
||
Device lost detected on resume, attempting recovery...
|
||
=== Handling device lost error ===
|
||
Render timer stopped for device recovery
|
||
=== Recreating Vulkan device ===
|
||
Renderer cleaned up
|
||
Sync objects cleaned up
|
||
Command objects cleaned up
|
||
Logical device destroyed
|
||
Surface destroyed
|
||
Surface recreated
|
||
Logical device recreated
|
||
Swapchain recreated
|
||
Command objects recreated
|
||
Sync objects recreated
|
||
VulkanRenderer recreated successfully!
|
||
=== Device recreation complete ===
|
||
Device recovery successful!
|
||
Render timer restarted after recovery
|
||
```
|
||
|
||
## 注意事项
|
||
|
||
1. **帧计数器重置**:设备恢复后 `m_currentFrame` 重置为 0,确保同步对象索引正确
|
||
|
||
2. **渲染器状态**:VulkanRenderer 会被完全重建,所有内部状态都会重置
|
||
|
||
3. **用户体验**:恢复过程通常在 100-500ms 内完成,用户可能会注意到短暂的黑屏或暂停
|
||
|
||
4. **错误处理**:如果恢复失败,渲染会被禁用,防止崩溃,用户仍可使用应用的其他功能
|
||
|
||
5. **跨平台**:此方案在 Windows、Linux 和 macOS 上都应该工作,但具体行为可能因驱动而异
|
||
|
||
## 相关文件
|
||
|
||
- `src/vulkanwidget.h` - 添加了 `m_deviceLost` 标志和恢复函数声明
|
||
- `src/vulkanwidget.cpp` - 实现了完整的设备丢失检测和恢复逻辑
|
||
- `src/powermonitor.cpp` - 发送 `aboutToWakeUp` 信号
|
||
- `src/mainwindow.cpp` - 连接信号并调用 `setRenderingEnabled(true)`
|