345 lines
10 KiB
Markdown
345 lines
10 KiB
Markdown
|
|
# Vulkan 设备丢失恢复机制
|
|||
|
|
|
|||
|
|
## 问题描述
|
|||
|
|
|
|||
|
|
当系统从休眠/睡眠状态唤醒时,Vulkan 渲染会失败并报错:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Failed to submit draw command buffer! Error code = -4
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
错误码 `-4` 对应 `VK_ERROR_DEVICE_LOST`,表示 Vulkan 逻辑设备已丢失。这是因为:
|
|||
|
|
|
|||
|
|
1. 系统休眠时 GPU 驱动会被挂起或重置
|
|||
|
|
2. 唤醒后 GPU 物理设备重新初始化
|
|||
|
|
3. 之前创建的 Vulkan 逻辑设备和资源变为无效状态
|
|||
|
|
4. 任何 Vulkan 命令调用都会返回 `VK_ERROR_DEVICE_LOST`
|
|||
|
|
|
|||
|
|
## 解决方案
|
|||
|
|
|
|||
|
|
实现了一个完整的设备丢失检测和恢复机制,包括以下几个关键步骤:
|
|||
|
|
|
|||
|
|
### 1. 添加设备丢失状态标志
|
|||
|
|
|
|||
|
|
在 `VulkanWidget` 类中添加了 `m_deviceLost` 布尔标志来跟踪设备状态:
|
|||
|
|
|
|||
|
|
```cpp
|
|||
|
|
bool m_deviceLost; // 标记设备是否丢失(如休眠后唤醒)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2. 在关键 Vulkan 调用点检测设备丢失
|
|||
|
|
|
|||
|
|
在三个可能返回 `VK_ERROR_DEVICE_LOST` 的关键位置添加了检测:
|
|||
|
|
|
|||
|
|
#### a) `vkAcquireNextImageKHR` - 获取交换链图像
|
|||
|
|
```cpp
|
|||
|
|
VkResult result = vkAcquireNextImageKHR(...);
|
|||
|
|
if (result == VK_ERROR_DEVICE_LOST || result == -4) {
|
|||
|
|
qDebug() << "VK_ERROR_DEVICE_LOST detected in vkAcquireNextImageKHR!";
|
|||
|
|
m_deviceLost = true;
|
|||
|
|
if (handleDeviceLost()) {
|
|||
|
|
qDebug() << "Device recovery successful";
|
|||
|
|
} else {
|
|||
|
|
qDebug() << "Device recovery failed!";
|
|||
|
|
setError("Failed to recover from device lost error");
|
|||
|
|
}
|
|||
|
|
return;
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### b) `vkQueueSubmit` - 提交命令缓冲
|
|||
|
|
```cpp
|
|||
|
|
result = vkQueueSubmit(m_queue, 1, &submitInfo, m_inFlightFences[m_currentFrame]);
|
|||
|
|
if (result != VK_SUCCESS) {
|
|||
|
|
if (result == VK_ERROR_DEVICE_LOST || result == -4) {
|
|||
|
|
qDebug() << "VK_ERROR_DEVICE_LOST detected! Attempting to recover device...";
|
|||
|
|
m_deviceLost = true;
|
|||
|
|
if (handleDeviceLost()) {
|
|||
|
|
qDebug() << "Device recovery successful";
|
|||
|
|
} else {
|
|||
|
|
qDebug() << "Device recovery failed!";
|
|||
|
|
setError("Failed to recover from device lost error");
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
return;
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### c) `vkQueuePresentKHR` - 呈现图像
|
|||
|
|
```cpp
|
|||
|
|
result = vkQueuePresentKHR(m_queue, &presentInfo);
|
|||
|
|
if (result == VK_ERROR_DEVICE_LOST || result == -4) {
|
|||
|
|
qDebug() << "VK_ERROR_DEVICE_LOST detected in vkQueuePresentKHR!";
|
|||
|
|
m_deviceLost = true;
|
|||
|
|
if (handleDeviceLost()) {
|
|||
|
|
qDebug() << "Device recovery successful";
|
|||
|
|
} else {
|
|||
|
|
qDebug() << "Device recovery failed!";
|
|||
|
|
setError("Failed to recover from device lost error");
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 3. 实现设备恢复函数
|
|||
|
|
|
|||
|
|
#### `handleDeviceLost()` - 设备丢失处理函数
|
|||
|
|
|
|||
|
|
```cpp
|
|||
|
|
bool VulkanWidget::handleDeviceLost()
|
|||
|
|
{
|
|||
|
|
qDebug() << "=== Handling device lost error ===";
|
|||
|
|
|
|||
|
|
// 1. 停止渲染定时器,防止在恢复期间继续渲染
|
|||
|
|
if (m_renderTimer && m_renderTimer->isActive()) {
|
|||
|
|
m_renderTimer->stop();
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// 2. 等待设备空闲(可能会失败,但仍然尝试)
|
|||
|
|
if (m_device != VK_NULL_HANDLE) {
|
|||
|
|
vkDeviceWaitIdle(m_device);
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// 3. 重新创建设备和所有资源
|
|||
|
|
bool success = recreateDevice();
|
|||
|
|
|
|||
|
|
if (success) {
|
|||
|
|
m_deviceLost = false;
|
|||
|
|
|
|||
|
|
// 4. 如果渲染已启用,重新启动定时器
|
|||
|
|
if (m_renderingEnabled && m_renderTimer && !m_renderTimer->isActive()) {
|
|||
|
|
m_renderTimer->start(16); // ~60 FPS
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
return success;
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### `recreateDevice()` - 设备重建函数
|
|||
|
|
|
|||
|
|
完整的资源重建流程:
|
|||
|
|
|
|||
|
|
```cpp
|
|||
|
|
bool VulkanWidget::recreateDevice()
|
|||
|
|
{
|
|||
|
|
// 1. 清理 VulkanRenderer
|
|||
|
|
if (m_renderer) {
|
|||
|
|
delete m_renderer;
|
|||
|
|
m_renderer = nullptr;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// 2. 清理同步对象(Semaphores、Fences)
|
|||
|
|
for (size_t i = 0; i < MAX_FRAMES_IN_FLIGHT; i++) {
|
|||
|
|
vkDestroySemaphore(m_device, m_renderFinishedSemaphores[i], nullptr);
|
|||
|
|
vkDestroySemaphore(m_device, m_imageAvailableSemaphores[i], nullptr);
|
|||
|
|
vkDestroyFence(m_device, m_inFlightFences[i], nullptr);
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// 3. 清理命令对象(Command Pool 和 Command Buffers)
|
|||
|
|
vkFreeCommandBuffers(m_device, m_commandPool, ...);
|
|||
|
|
vkDestroyCommandPool(m_device, m_commandPool, nullptr);
|
|||
|
|
|
|||
|
|
// 4. 清理交换链(Swapchain 和相关资源)
|
|||
|
|
cleanupSwapchain();
|
|||
|
|
|
|||
|
|
// 5. 销毁逻辑设备
|
|||
|
|
vkDestroyDevice(m_device, nullptr);
|
|||
|
|
|
|||
|
|
// 6. 销毁 Surface
|
|||
|
|
vkDestroySurfaceKHR(m_instance, m_surface, nullptr);
|
|||
|
|
|
|||
|
|
// === 重建阶段 ===
|
|||
|
|
|
|||
|
|
// 7. 重新创建 Surface
|
|||
|
|
if (!createSurface()) return false;
|
|||
|
|
|
|||
|
|
// 8. 重新创建逻辑设备
|
|||
|
|
if (!createDevice()) return false;
|
|||
|
|
|
|||
|
|
// 9. 重新创建交换链
|
|||
|
|
if (!createSwapchain()) return false;
|
|||
|
|
|
|||
|
|
// 10. 重新创建命令对象
|
|||
|
|
if (!createCommandObjects()) return false;
|
|||
|
|
|
|||
|
|
// 11. 重新创建同步对象
|
|||
|
|
if (!createSyncObjects()) return false;
|
|||
|
|
|
|||
|
|
// 12. 重新创建 VulkanRenderer
|
|||
|
|
m_renderer = new VulkanRenderer();
|
|||
|
|
if (!m_renderer->initialize(...)) {
|
|||
|
|
delete m_renderer;
|
|||
|
|
m_renderer = nullptr;
|
|||
|
|
return false;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// 13. 重置帧计数器
|
|||
|
|
m_currentFrame = 0;
|
|||
|
|
|
|||
|
|
return true;
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 4. 在渲染启用时主动检查
|
|||
|
|
|
|||
|
|
在 `setRenderingEnabled(true)` 时主动检查设备是否丢失:
|
|||
|
|
|
|||
|
|
```cpp
|
|||
|
|
void VulkanWidget::setRenderingEnabled(bool enabled)
|
|||
|
|
{
|
|||
|
|
if (m_renderingEnabled) {
|
|||
|
|
// 检查设备是否丢失(例如从睡眠中唤醒后)
|
|||
|
|
if (m_deviceLost) {
|
|||
|
|
qDebug() << "Device lost detected on resume, attempting recovery...";
|
|||
|
|
if (!handleDeviceLost()) {
|
|||
|
|
qDebug() << "Failed to recover device, rendering cannot resume";
|
|||
|
|
m_renderingEnabled = false;
|
|||
|
|
return;
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// 启动渲染定时器
|
|||
|
|
if (!m_renderTimer->isActive()) {
|
|||
|
|
m_renderTimer->start(16);
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 5. 渲染前防护检查
|
|||
|
|
|
|||
|
|
在 `renderFrame()` 开始时添加保护检查:
|
|||
|
|
|
|||
|
|
```cpp
|
|||
|
|
void VulkanWidget::renderFrame()
|
|||
|
|
{
|
|||
|
|
// 如果设备丢失,不进行渲染
|
|||
|
|
if (m_deviceLost) {
|
|||
|
|
return;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// 正常渲染流程...
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 工作流程
|
|||
|
|
|
|||
|
|
### 正常流程
|
|||
|
|
1. 用户锁定屏幕 → 停止动画,渲染一帧锁屏画面
|
|||
|
|
2. 系统进入休眠/睡眠
|
|||
|
|
3. 系统唤醒 → 触发 `aboutToWakeUp` 信号
|
|||
|
|
4. 调用 `setRenderingEnabled(true)`
|
|||
|
|
5. 检测到 `m_deviceLost == false`,正常恢复渲染
|
|||
|
|
|
|||
|
|
### 设备丢失恢复流程
|
|||
|
|
1. 用户锁定屏幕 → 停止动画
|
|||
|
|
2. 系统进入休眠/睡眠 → **GPU 驱动重置,设备丢失**
|
|||
|
|
3. 系统唤醒 → 触发 `aboutToWakeUp` 信号
|
|||
|
|
4. 调用 `setRenderingEnabled(true)`
|
|||
|
|
5. 检测到 `m_deviceLost == true`(或首次渲染时检测到)
|
|||
|
|
6. 调用 `handleDeviceLost()`
|
|||
|
|
7. 执行 `recreateDevice()` 重建所有 Vulkan 资源
|
|||
|
|
8. 设置 `m_deviceLost = false`
|
|||
|
|
9. 恢复正常渲染
|
|||
|
|
|
|||
|
|
### 渲染时设备丢失流程
|
|||
|
|
1. 正在渲染中
|
|||
|
|
2. 调用 `vkQueueSubmit` 返回 `VK_ERROR_DEVICE_LOST`
|
|||
|
|
3. 检测到错误,设置 `m_deviceLost = true`
|
|||
|
|
4. 调用 `handleDeviceLost()` 进行恢复
|
|||
|
|
5. 下一帧继续正常渲染
|
|||
|
|
|
|||
|
|
## 关键点说明
|
|||
|
|
|
|||
|
|
### 为什么需要销毁 Surface?
|
|||
|
|
|
|||
|
|
虽然 Vulkan 规范中 Surface 不直接依赖于逻辑设备,但在某些平台(特别是 Windows)上:
|
|||
|
|
- GPU 驱动重置可能影响 Surface 的底层窗口系统连接
|
|||
|
|
- 旧的 Surface 可能与新设备不兼容
|
|||
|
|
- 重新创建 Surface 确保与新设备的完全兼容性
|
|||
|
|
|
|||
|
|
### 为什么不保留 Instance 和 PhysicalDevice?
|
|||
|
|
|
|||
|
|
- `VkInstance`:Vulkan 加载器级别的对象,不受设备丢失影响,可以保留
|
|||
|
|
- `VkPhysicalDevice`:物理设备句柄,表示 GPU 硬件,也不受逻辑设备丢失影响,可以保留
|
|||
|
|
- `VkDevice`:逻辑设备,设备丢失后**必须**重新创建
|
|||
|
|
- `VkSurface`:虽然理论上可以保留,但为了保证跨平台兼容性,选择重新创建
|
|||
|
|
|
|||
|
|
### 同步问题
|
|||
|
|
|
|||
|
|
在清理资源前调用 `vkDeviceWaitIdle()` 确保:
|
|||
|
|
- 所有提交的命令都已完成
|
|||
|
|
- 没有资源正在被 GPU 使用
|
|||
|
|
- 避免在销毁时出现验证层错误
|
|||
|
|
|
|||
|
|
即使 `vkDeviceWaitIdle()` 因设备丢失而失败,我们仍然继续清理流程,因为:
|
|||
|
|
- 设备已丢失,所有命令都已停止
|
|||
|
|
- 资源对象仍需要正确释放以避免内存泄漏
|
|||
|
|
|
|||
|
|
## 测试场景
|
|||
|
|
|
|||
|
|
1. **正常休眠/唤醒**
|
|||
|
|
- 锁定屏幕 → 休眠 → 唤醒 → 解锁
|
|||
|
|
- 应该能够正常恢复渲染,无错误
|
|||
|
|
|
|||
|
|
2. **长时间休眠**
|
|||
|
|
- 休眠超过数小时或过夜
|
|||
|
|
- GPU 驱动更可能被完全重置
|
|||
|
|
- 应该能够检测设备丢失并成功恢复
|
|||
|
|
|
|||
|
|
3. **多次休眠/唤醒循环**
|
|||
|
|
- 连续多次休眠和唤醒
|
|||
|
|
- 每次都应该能够正确恢复
|
|||
|
|
- 无内存泄漏
|
|||
|
|
|
|||
|
|
4. **渲染中途设备丢失**
|
|||
|
|
- 在活跃渲染期间触发设备丢失
|
|||
|
|
- 应该能够捕获错误并恢复
|
|||
|
|
|
|||
|
|
## 日志输出
|
|||
|
|
|
|||
|
|
成功恢复的日志示例:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
📤 系统已从睡眠中唤醒
|
|||
|
|
MainWindow: Screen unlocked event received
|
|||
|
|
Vulkan rendering ENABLED - Resuming animations
|
|||
|
|
Device lost detected on resume, attempting recovery...
|
|||
|
|
=== Handling device lost error ===
|
|||
|
|
Render timer stopped for device recovery
|
|||
|
|
=== Recreating Vulkan device ===
|
|||
|
|
Renderer cleaned up
|
|||
|
|
Sync objects cleaned up
|
|||
|
|
Command objects cleaned up
|
|||
|
|
Logical device destroyed
|
|||
|
|
Surface destroyed
|
|||
|
|
Surface recreated
|
|||
|
|
Logical device recreated
|
|||
|
|
Swapchain recreated
|
|||
|
|
Command objects recreated
|
|||
|
|
Sync objects recreated
|
|||
|
|
VulkanRenderer recreated successfully!
|
|||
|
|
=== Device recreation complete ===
|
|||
|
|
Device recovery successful!
|
|||
|
|
Render timer restarted after recovery
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 注意事项
|
|||
|
|
|
|||
|
|
1. **帧计数器重置**:设备恢复后 `m_currentFrame` 重置为 0,确保同步对象索引正确
|
|||
|
|
|
|||
|
|
2. **渲染器状态**:VulkanRenderer 会被完全重建,所有内部状态都会重置
|
|||
|
|
|
|||
|
|
3. **用户体验**:恢复过程通常在 100-500ms 内完成,用户可能会注意到短暂的黑屏或暂停
|
|||
|
|
|
|||
|
|
4. **错误处理**:如果恢复失败,渲染会被禁用,防止崩溃,用户仍可使用应用的其他功能
|
|||
|
|
|
|||
|
|
5. **跨平台**:此方案在 Windows、Linux 和 macOS 上都应该工作,但具体行为可能因驱动而异
|
|||
|
|
|
|||
|
|
## 相关文件
|
|||
|
|
|
|||
|
|
- `src/vulkanwidget.h` - 添加了 `m_deviceLost` 标志和恢复函数声明
|
|||
|
|
- `src/vulkanwidget.cpp` - 实现了完整的设备丢失检测和恢复逻辑
|
|||
|
|
- `src/powermonitor.cpp` - 发送 `aboutToWakeUp` 信号
|
|||
|
|
- `src/mainwindow.cpp` - 连接信号并调用 `setRenderingEnabled(true)`
|