'컴퓨터과학/언리얼' 카테고리의 글 목록 (4 Page)

컴퓨터과학/언리얼

HandDesktop15 - 손 관절 가지고 놀기3 손이동 구현 2024.02.01
HandDesktop14 - 손 관절 가지고 놀기2 컴포넌트공간 관절회전 2024.02.01
HandDesktop13 - 손 관절 가지고 놀기 2024.02.01
HandDesktop12 - 손 관절 회전 방향 구하기(fail?) 2024.02.01
HandDesktop11 - 손 메시 제어하기 2024.01.31
HandDesktop10 - blazehand c++, unreal 적용 2024.01.29
HandDesktop09 - 기존 문제들 원인 찾음, blaze 갈아엎기, 블라즈팜 문제해결 2024.01.26
HandDesktop08 - blaze hand 사용하기 2024.01.26
HandDesktop07 - blaze hand 사용 준비하기 2024.01.26
HandDesktop06 - blaze palm 사용하기 2024.01.25

HandDesktop15 - 손 관절 가지고 놀기3 손이동 구현

ggamji 2024. 2. 1. 17:56

2024. 2. 1. 17:56

일단 컴포넌트, 월드 공간으로 어떻게 회전하는건지 이해했다.

이제 손 제어하기 전에

각 포인트들의 RPY값들을 좀 보려고함.

MapBoneLeft, MapBoneRight에는

원래 키로 2~16 사용하는 손관절들 뿐이었는데

나머지 관절 이름을 저장한다고

MapBoneLeft.Add(16, FString("b_l_wrist"));
MapBoneLeft.Add(17, FString("b_l_thumb0"));
MapBoneLeft.Add(18, FString("b_l_thumb1"));
MapBoneLeft.Add(19, FString("b_l_pinky0"));

로 추가한상태

하지만 MapRoll/Pitch/Yaw에는 2~16 관절 rpy값만 들어있다.

그런데 이전에 작성한 rpy 코드가 맞는질 몰라서 값좀 출력해보도록 수정

관절보다가 이거보니 이게왜이렇게됫는지 생각이안난다.

내가 구현한 피치요대신 찾아넨 calc rotation 썻는대

/*
void ADesktopGameModeBase::get_pitch_yaw(cv::Point3f pt_start, cv::Point3f pt_end, float& pitch, float& yaw) {
	float dx = pt_end.x - pt_start.x;
	float dy = pt_end.y - pt_start.y;
	dy *= -1;
	yaw = std::atan2(dy, dx) * 180 / CV_PI;
	yaw = yaw - 90;

	float dz = pt_end.z - pt_start.z;
	float xy_norm = std::sqrt(dx * dx + dy * dy);
	pitch = std::atan2(dz, xy_norm) * 180 / CV_PI;
	pitch *= -1;
}
*/

void ADesktopGameModeBase::calculateRotation(const cv::Point3f& pt1, const cv::Point3f& pt2, float& roll, float& pitch, float& yaw) {
	cv::Point3f vec = pt2 - pt1;  // 두 벡터를 연결하는 벡터 계산

	roll = atan2(vec.y, vec.x) * 180 / CV_PI;  // 롤(Roll) 회전 계산
	pitch = asin(vec.z / cv::norm(vec)) * 180 / CV_PI;  // 피치(Pitch) 회전 계산

	cv::Point2f vecXY(vec.x, vec.y);  // xy 평면으로 투영
	yaw = (atan2(vec.y, vec.x) - atan2(vec.z, cv::norm(vecXY))) * 180 / CV_PI;  // 요(Yaw) 회전 계산
}

어제 하던 파이썬 코드 돌아와서 다시보자

손 편상태 랜드마크를 npy로 저장한걸 다시 로드하고 반정규화 처리한걸 3d 플롯시켰었는데

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd


%matplotlib widget

# 3차원 산점도 그리기
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

x = denormalize_landmark[:, 0]
y = denormalize_landmark[:, 1]
z = denormalize_landmark[:, 2]

# 색상 지정
colors = ['black'] + ['red'] * 4 + ['orange'] * 4 + ['yellow'] * 4 + ['green'] * 4 + ['blue'] * 4
ax.scatter(x, y, z, c=colors)


#손가락
colors = ['red', 'orange', 'yellow', 'green', 'blue']
groups = [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16], [17, 18, 19, 20]]
for i, group in enumerate(groups):
    for j in range(len(group)-1):
        plt.plot([x[group[j]], x[group[j+1]]], [y[group[j]], y[group[j+1]]], [z[group[j]], z[group[j+1]]], color=colors[i])
#손등
lines = [[0, 1], [0, 5], [0, 17], [5, 9], [9, 13], [13, 17]] 
for line in lines:
    ax.plot([x[line[0]], x[line[1]]], [y[line[0]], y[line[1]]], [z[line[0]], z[line[1]]], color='gray')



ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')


plt.gca().invert_xaxis()
plt.show()

calcRotation을 python으로 만들어

검지 계산을 위해 점 5, 6을 넣어서 rpy를 구해보면

import numpy as np
import math

def calculateRotation(pt1, pt2):
    vec = pt2 - pt1

    roll = math.atan2(vec[1], vec[0]) * 180 / math.pi
    pitch = math.asin(vec[2] / np.linalg.norm(vec)) * 180 / math.pi

    vecXY = np.array([vec[0], vec[1]])
    yaw = (math.atan2(vec[1], vec[0]) - math.atan2(vec[2], np.linalg.norm(vecXY))) * 180 / math.pi

    return roll, pitch, yaw


roll, pitch, yaw = calculateRotation(denormalize_landmark[15], denormalize_landmark[16])
print(roll, pitch, yaw)

-102, -7, -95가 나왔다.

yz 평면을 보면 -7이 맞을것같고

영 이해가 안되서 다시 그림그려봄

15, 16 pt 포인트가 저러므로

z축 회전각 yaw는 그렇게 크진 않아야한다.

근데 위에서 나온 yaw는 -95씩이나 된다.

좀 이상해서 보니 1을 더붙여서 엉뚱한 손가락보고있었다.

검지 시작하는 5,6 을보면

xy 평면으로 보면 위 그림과 x축 기준으로 반대가 되어있는데

위 그림은 아랫방향에서 내려가며 커지기 때문

그리고 위의 calc Rot가 잘못된게

x,y 좌표값으로 얻은건 z축 회전 yaw이므로 좀 수정해서봐야할듯

필요한건 저각도이므로

atan2(dy/dx) -> deg

y 방향이 반대이므로 -1, 그리고 보라색 축 기준 각도가 필요하므로 90 - 계산결과

    vec = pt2 - pt1

    yaw = 90 - (math.atan2(vec[1], vec[0]) * 180 / math.pi * -1)

피치는 zy평면의 저 각도이므로

atan2( dz/ norm(d xy)) -> deg

    pitch = math.atan2(vec[2], np.linalg.norm(vec[0:2])) * 180 / math.pi

롤은 xz 평면에서 봐야하는데

저 각일듯 하니

atan2(dz, dx) ->deg 쯤 되지않을까.

x가 오른쪽으로 갈수록 줄어드는데 주의하자

xz 평면에 놓고 생각하면 이런식으로 될듯

계산된 RPY는 -16, -5, 18이 되었다.

스켈레톤 에디터에서 해봤는데

롤은 전역(컴포넌트) 공간 90-16 = 74

피치는 -5이므로 0 - 5 = -5

요는 0 + 18 = 18하려했는데 방향이 이상해서 -18로 하니비슷하게됨

다음 손가락도 해보면

74 - 31 = 43

-5 -12 = -17

-18 -20 = -38

뭔가 비슷한 느낌이나긴한데 방향은 이상하다

43 - 33 = 10

-17 - 11 = -28

-38 - 17 = 55

방향은 이상하지만 뭔가 좀 비슷하게 된거같다.

손이 처음부터 90도 회전되있어서 전역계 설정이 너무 어려운데

roll은 제외하고 피치, 요만 다루는게 맞긴할듯

아까 계산한 피치는

-5, -12, - 11이었는데

이것도 -방향해줘야햇음

12,

요가

18, 20, 17 이었는데

- 방향으로 회전해줘야 맞게감

-18,

-20,

-17로 설정하면될듯

새끼손까락을보면

피치 -10, -11, -9 -> 10, 11, 9

요 -41, -43, -36 -> 41, 43, 36

엄지0은 90, 17, -47일때 정상적인 손모양

썸1은 131, 0, -44

이때 모든 좌표계 방향이 비슷함

근대 썸이 약간 기울어진게아쉽긴한데 이대로하고

어제에서 진행된건 없지만 조금 이해했으니

그냥 어제 만든 코드로 돌아가고 손 이동 좌표 구상부터

1920 / 640 = 3 이므로

9번점(중지 시작) x 좌표 * 3한걸 왼손 액터의 Y축 위치 값 설정

1020 / 480 = 2.25 이므로

int(9번점 y좌표 * 2.25)로 왼속액터 z축 위치값 설정

이 아니라

-700 ~ 700 = 1400

0 ~ 600이 적당해보임

x는 2.2 곱

y는 1.25

bP 왼손에 좌표가져오기위한 변수추가

// Fill out your copyright notice in the Description page of Project Settings.

#pragma once


#include "Blaze.h"

#include "Windows/AllowWindowsPlatformTypes.h"
#include <Windows.h>
#include "Windows/HideWindowsPlatformTypes.h"

#include "PreOpenCVHeaders.h"
#include <opencv2/opencv.hpp>
#include "PostOpenCVHeaders.h"

#include "CoreMinimal.h"
#include "GameFramework/GameModeBase.h"
#include "DesktopGameModeBase.generated.h"

/**
 * 
 */
UCLASS()
class HANDDESKTOP_API ADesktopGameModeBase : public AGameModeBase
{
	GENERATED_BODY()

protected:
	// Called when the game starts or when spawned
	virtual void BeginPlay() override;

public:
	UFUNCTION(BlueprintCallable)
	void ReadFrame();




	int monitorWidth = 1920;
	int monitorHeight = 1080;
	UPROPERTY(EditAnywhere, BlueprintReadWrite)
	UTexture2D* imageTextureScreen1;
	UPROPERTY(EditAnywhere, BlueprintReadWrite)
	UTexture2D* imageTextureScreen2;
	UPROPERTY(EditAnywhere, BlueprintReadWrite)
	UTexture2D* imageTextureScreen3;
	cv::Mat imageScreen1;
	cv::Mat imageScreen2;
	cv::Mat imageScreen3;

	void ScreensToCVMats();
	void CVMatsToTextures();



	int webcamWidth = 640;
	int webcamHeight = 480;

	cv::VideoCapture capture;
	cv::Mat webcamImage;
	UPROPERTY(EditAnywhere, BlueprintReadWrite)
	UTexture2D* webcamTexture;
	void MatToTexture2D(const cv::Mat InMat);


	//var and functions with blaze
	Blaze blaze;
	cv::Mat img256;
	cv::Mat img128;
	float scale;
	cv::Scalar pad;





	// vars and funcs for rotator
	int hand_conns_indexes[14] = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15};
	void get_pitch_yaw(cv::Point3f pt_start, cv::Point3f pt_end, float& pitch, float& yaw);
	//void calculateRotation(const cv::Point3f& pt1, const cv::Point3f& pt2, float& roll, float& pitch, float& yaw);

	void make_map_for_rotators(std::vector<cv::Mat> denorm_imgs_landmarks);
	void make_map_bone();

	UPROPERTY(BlueprintReadWrite, Category = "RotatorMap")
	TMap<int32, float> MapRoll;
	UPROPERTY(BlueprintReadWrite, Category="RotatorMap")
	TMap<int32, float> MapPitch;
	UPROPERTY(BlueprintReadWrite, Category = "RotatorMap")
	TMap<int32, float> MapYaw;

	UPROPERTY(BlueprintReadWrite, Category = "RotatorMap")
	TMap<int32, FString> MapBoneLeft;
	UPROPERTY(BlueprintReadWrite, Category = "RotatorMap")
	TMap<int32, FString> MapBoneRight;

	UPROPERTY(BlueprintReadWrite, Category = "HandCoord")
	float HandLeftX;
	UPROPERTY(BlueprintReadWrite, Category = "HandCoord")
	float HandLeftY;
};

// Fill out your copyright notice in the Description page of Project Settings.

#include "DesktopGameModeBase.h"


void ADesktopGameModeBase::BeginPlay()
{
	Super::BeginPlay();
	blaze = Blaze();

	capture = cv::VideoCapture(0);
	if (!capture.isOpened())
	{
		UE_LOG(LogTemp, Log, TEXT("Open Webcam failed"));
		return;
	}
	else
	{
		UE_LOG(LogTemp, Log, TEXT("Open Webcam Success"));
	}
	capture.set(cv::CAP_PROP_FRAME_WIDTH, webcamWidth);
	capture.set(cv::CAP_PROP_FRAME_HEIGHT, webcamHeight);


	webcamTexture = UTexture2D::CreateTransient(monitorWidth, monitorHeight, PF_B8G8R8A8);


	imageScreen1 = cv::Mat(monitorHeight, monitorWidth, CV_8UC4);
	imageScreen2 = cv::Mat(monitorHeight, monitorWidth, CV_8UC4);
	imageScreen3 = cv::Mat(monitorHeight, monitorWidth, CV_8UC4);
	imageTextureScreen1 = UTexture2D::CreateTransient(monitorWidth, monitorHeight, PF_B8G8R8A8);
	imageTextureScreen2 = UTexture2D::CreateTransient(monitorWidth, monitorHeight, PF_B8G8R8A8);
	imageTextureScreen3 = UTexture2D::CreateTransient(monitorWidth, monitorHeight, PF_B8G8R8A8);
	
	make_map_bone();
}


void ADesktopGameModeBase::ReadFrame()
{
	if (!capture.isOpened())
	{
		return;
	}
	capture.read(webcamImage);

	/*
	get filtered detections
	*/
	blaze.ResizeAndPad(webcamImage, img256, img128, scale, pad);
	//UE_LOG(LogTemp, Log, TEXT("scale value: %f, pad value: (%f, %f)"), scale, pad[0], pad[1]);
	std::vector<Blaze::PalmDetection> normDets = blaze.PredictPalmDetections(img128);
	std::vector<Blaze::PalmDetection> denormDets = blaze.DenormalizePalmDetections(normDets, webcamWidth, webcamHeight, pad);
	std::vector<Blaze::PalmDetection> filteredDets = blaze.FilteringDets(denormDets, webcamWidth, webcamHeight);





	std::vector<cv::Rect> handRects = blaze.convertHandRects(filteredDets);
	std::vector<cv::Mat> handImgs;
	blaze.GetHandImages(webcamImage, handRects, handImgs);
	std::vector<cv::Mat> imgs_landmarks = blaze.PredictHandDetections(handImgs);
	std::vector<cv::Mat> denorm_imgs_landmarks = blaze.DenormalizeHandLandmarks(imgs_landmarks, handRects);

	make_map_for_rotators(denorm_imgs_landmarks);



	//draw hand rects/ plam detection/ dets info/ hand detection
	blaze.DrawRects(webcamImage, handRects);
	blaze.DrawPalmDetections(webcamImage, filteredDets);
	blaze.DrawDetsInfo(webcamImage, filteredDets, normDets, denormDets);
	blaze.DrawHandDetections(webcamImage, denorm_imgs_landmarks);

	//cv::mat to utexture2d
	MatToTexture2D(webcamImage);


	/*
	모니터 시각화
	*/
	ScreensToCVMats();
	CVMatsToTextures();
}


void ADesktopGameModeBase::MatToTexture2D(const cv::Mat InMat)
{
	if (InMat.type() == CV_8UC3)//example for pre-conversion of Mat
	{
		cv::Mat resizedImage;
		cv::resize(InMat, resizedImage, cv::Size(monitorWidth, monitorHeight));
		cv::Mat bgraImage;
		//if the Mat is in BGR space, convert it to BGRA. There is no three channel texture in UE (at least with eight bit)
		cv::cvtColor(resizedImage, bgraImage, cv::COLOR_BGR2BGRA);

		//actually copy the data to the new texture
		FTexture2DMipMap& Mip = webcamTexture->GetPlatformData()->Mips[0];
		void* Data = Mip.BulkData.Lock(LOCK_READ_WRITE);//lock the texture data
		FMemory::Memcpy(Data, bgraImage.data, bgraImage.total() * bgraImage.elemSize());//copy the data
		Mip.BulkData.Unlock();
		webcamTexture->PostEditChange();
		webcamTexture->UpdateResource();
	}
	else if (InMat.type() == CV_8UC4)
	{
		//actually copy the data to the new texture
		FTexture2DMipMap& Mip = webcamTexture->GetPlatformData()->Mips[0];
		void* Data = Mip.BulkData.Lock(LOCK_READ_WRITE);//lock the texture data
		FMemory::Memcpy(Data, InMat.data, InMat.total() * InMat.elemSize());//copy the data
		Mip.BulkData.Unlock();
		webcamTexture->PostEditChange();
		webcamTexture->UpdateResource();
	}
	//if the texture hasnt the right pixel format, abort.
	webcamTexture->PostEditChange();
	webcamTexture->UpdateResource();
}



void ADesktopGameModeBase::ScreensToCVMats()
{
	HDC hScreenDC = GetDC(NULL);
	HDC hMemoryDC = CreateCompatibleDC(hScreenDC);
	int screenWidth = GetDeviceCaps(hScreenDC, HORZRES);
	int screenHeight = GetDeviceCaps(hScreenDC, VERTRES);

	HBITMAP hBitmap = CreateCompatibleBitmap(hScreenDC, screenWidth, screenHeight);
	HBITMAP hOldBitmap = (HBITMAP)SelectObject(hMemoryDC, hBitmap);

	//screen 1
	BitBlt(hMemoryDC, 0, 0, screenWidth, screenHeight, hScreenDC, 0, 0, SRCCOPY);
	GetBitmapBits(hBitmap, imageScreen1.total() * imageScreen1.elemSize(), imageScreen1.data);

	//screen 2
	BitBlt(hMemoryDC, 0, 0, screenWidth, screenHeight, hScreenDC, 1920, 0, SRCCOPY);
	GetBitmapBits(hBitmap, imageScreen2.total() * imageScreen2.elemSize(), imageScreen2.data);

	//screen 3
	BitBlt(hMemoryDC, 0, 0, screenWidth, screenHeight, hScreenDC, 3840, 0, SRCCOPY);
	GetBitmapBits(hBitmap, imageScreen3.total() * imageScreen3.elemSize(), imageScreen3.data);
	SelectObject(hMemoryDC, hOldBitmap);


	DeleteDC(hScreenDC);
	DeleteDC(hMemoryDC);

	DeleteObject(hBitmap);
	DeleteObject(hOldBitmap);

}


void ADesktopGameModeBase::CVMatsToTextures()
{
	for (int i = 0; i < 3; i++)
	{
		if (i == 0)
		{
			FTexture2DMipMap& Mip = imageTextureScreen1->GetPlatformData()->Mips[0];
			void* Data = Mip.BulkData.Lock(LOCK_READ_WRITE);//lock the texture data
			FMemory::Memcpy(Data, imageScreen1.data, imageScreen1.total() * imageScreen1.elemSize());//copy the data
			Mip.BulkData.Unlock();

			imageTextureScreen1->PostEditChange();
			imageTextureScreen1->UpdateResource();
		}
		else if (i == 1)
		{
			FTexture2DMipMap& Mip = imageTextureScreen2->GetPlatformData()->Mips[0];
			void* Data = Mip.BulkData.Lock(LOCK_READ_WRITE);//lock the texture data
			FMemory::Memcpy(Data, imageScreen2.data, imageScreen2.total() * imageScreen2.elemSize());//copy the data
			Mip.BulkData.Unlock();

			imageTextureScreen2->PostEditChange();
			imageTextureScreen2->UpdateResource();
		}
		else if (i == 2)
		{
			FTexture2DMipMap& Mip = imageTextureScreen3->GetPlatformData()->Mips[0];
			void* Data = Mip.BulkData.Lock(LOCK_READ_WRITE);//lock the texture data
			FMemory::Memcpy(Data, imageScreen3.data, imageScreen3.total() * imageScreen3.elemSize());//copy the data
			Mip.BulkData.Unlock();

			imageTextureScreen3->PostEditChange();
			imageTextureScreen3->UpdateResource();
		}


	}
}





void ADesktopGameModeBase::get_pitch_yaw(cv::Point3f pt_start, cv::Point3f pt_end, float& pitch, float& yaw) {
	float dx = pt_end.x - pt_start.x;
	float dy = pt_end.y - pt_start.y;
	dy *= -1;
	yaw = std::atan2(dy, dx) * 180 / CV_PI;
	yaw = yaw - 90;

	float dz = pt_end.z - pt_start.z;
	float xy_norm = std::sqrt(dx * dx + dy * dy);
	pitch = std::atan2(dz, xy_norm) * 180 / CV_PI;
	pitch *= -1;
}

/*
void ADesktopGameModeBase::calculateRotation(const cv::Point3f& pt1, const cv::Point3f& pt2, float& roll, float& pitch, float& yaw) {
	cv::Point3f vec = pt2 - pt1;  // 두 벡터를 연결하는 벡터 계산

	roll = atan2(vec.y, vec.x) * 180 / CV_PI;  // 롤(Roll) 회전 계산
	pitch = asin(vec.z / cv::norm(vec)) * 180 / CV_PI;  // 피치(Pitch) 회전 계산

	cv::Point2f vecXY(vec.x, vec.y);  // xy 평면으로 투영
	yaw = (atan2(vec.y, vec.x) - atan2(vec.z, cv::norm(vecXY))) * 180 / CV_PI;  // 요(Yaw) 회전 계산
}
*/



void ADesktopGameModeBase::make_map_for_rotators(std::vector<cv::Mat> denorm_imgs_landmarks)
{
	for (auto& denorm_landmarks : denorm_imgs_landmarks)
	{
		std::vector<std::array<int, 2>> HAND_CONNECTIONS = blaze.HAND_CONNECTIONS;
		for (auto& hand_conns_index : hand_conns_indexes)
		{
			std::array<int, 2> hand_conns = HAND_CONNECTIONS.at(hand_conns_index);
			float roll, pitch, yaw;
			cv::Point3f pt_start, pt_end;

			pt_start.x = denorm_landmarks.at<float>(hand_conns.at(0), 0);
			pt_start.y = denorm_landmarks.at<float>(hand_conns.at(0), 1);
			pt_start.z = denorm_landmarks.at<float>(hand_conns.at(0), 2);

			pt_end.x = denorm_landmarks.at<float>(hand_conns.at(1), 0);
			pt_end.y = denorm_landmarks.at<float>(hand_conns.at(1), 1);
			pt_end.z = denorm_landmarks.at<float>(hand_conns.at(1), 2);


			//calculateRotation(pt_start, pt_end, roll, pitch, yaw);

			get_pitch_yaw(pt_start, pt_end, pitch, yaw);
			MapRoll.Add(hand_conns_index, roll);
			MapPitch.Add(hand_conns_index, pitch);
			MapYaw.Add(hand_conns_index, yaw);

		}
		HandLeftX = denorm_landmarks.at<float>(9, 0);
		HandLeftY = denorm_landmarks.at<float>(9, 1);

	}
}


/*
	std::vector<std::array<int, 2>> HAND_CONNECTIONS = {
	{0, 1}, {1, 2}, {2, 3}, {3, 4},
	{5, 6}, {6, 7}, {7, 8},
	{9, 10}, {10, 11}, {11, 12},
	{13, 14}, {14, 15}, {15, 16},
	{17, 18}, {18, 19}, {19, 20},
	{0, 5}, {5, 9}, {9, 13}, {13, 17}, {0, 17}
	};
*/


void ADesktopGameModeBase::make_map_bone()
{
	MapBoneLeft.Add(2, FString("b_l_thumb2"));
	MapBoneLeft.Add(3, FString("b_l_thumb3"));
	MapBoneLeft.Add(4, FString("b_l_index1"));
	MapBoneLeft.Add(5, FString("b_l_index2"));
	MapBoneLeft.Add(6, FString("b_l_index3"));
	MapBoneLeft.Add(7, FString("b_l_middle1"));
	MapBoneLeft.Add(8, FString("b_l_middle2"));
	MapBoneLeft.Add(9, FString("b_l_middle3"));
	MapBoneLeft.Add(10, FString("b_l_ring1"));
	MapBoneLeft.Add(11, FString("b_l_ring2"));
	MapBoneLeft.Add(12, FString("b_l_ring3"));
	MapBoneLeft.Add(13, FString("b_l_pinky1"));
	MapBoneLeft.Add(14, FString("b_l_pinky2"));
	MapBoneLeft.Add(15, FString("b_l_pinky3"));


	MapBoneRight.Add(2, FString("b_r_thumb2"));
	MapBoneRight.Add(3, FString("b_r_thumb3"));
	MapBoneRight.Add(4, FString("b_r_index1"));
	MapBoneRight.Add(5, FString("b_r_index2"));
	MapBoneRight.Add(6, FString("b_r_index3"));
	MapBoneRight.Add(7, FString("b_r_middle1"));
	MapBoneRight.Add(8, FString("b_r_middle2"));
	MapBoneRight.Add(9, FString("b_r_middle3"));
	MapBoneRight.Add(10, FString("b_r_ring1"));
	MapBoneRight.Add(11, FString("b_r_ring2"));
	MapBoneRight.Add(12, FString("b_r_ring3"));
	MapBoneRight.Add(13, FString("b_r_pinky1"));
	MapBoneRight.Add(14, FString("b_r_pinky2"));
	MapBoneRight.Add(15, FString("b_r_pinky3"));



	MapBoneLeft.Add(16, FString("b_l_wrist"));
	MapBoneLeft.Add(17, FString("b_l_thumb0"));
	MapBoneLeft.Add(18, FString("b_l_thumb1"));
	MapBoneLeft.Add(19, FString("b_l_pinky0"));
	MapBoneRight.Add(16, FString("b_r_wrist"));
	MapBoneRight.Add(17, FString("b_r_thumb0"));
	MapBoneRight.Add(18, FString("b_r_thumb1"));
	MapBoneRight.Add(19, FString("b_r_pinky0"));
}

대충 손 움직이기 구현은 했다.

왜 관절이 이렇게 되나 해매느라 구현 상 진전은 크진않음.

저작자표시

'컴퓨터과학 > 언리얼' 카테고리의 다른 글

HandDesktop17 - 손 관절 가지고 놀기5 좌표로 다루기 구현 (0)	2024.02.02
HandDesktop16 - 손 관절 가지고 놀기4 rpy 대신 좌표 다루기 구상 (0)	2024.02.02
HandDesktop14 - 손 관절 가지고 놀기2 컴포넌트공간 관절회전 (0)	2024.02.01
HandDesktop13 - 손 관절 가지고 놀기 (0)	2024.02.01
HandDesktop12 - 손 관절 회전 방향 구하기(fail?) (0)	2024.02.01

HandDesktop14 - 손 관절 가지고 놀기2 컴포넌트공간 관절회전

ggamji 2024. 2. 1. 15:07

2024. 2. 1. 15:07

일단 이전 글에서 삽질하면서

그동안 component space를 로컬 좌표계라 오해하고

이상하게 회전했다는 사실을 알았다.

오큘러스 손은 90도 롤 회전이 되어있어

roll 회전을 풀면 이런식이 된다.

요 축을 회전해서 검지 손가락을 접도록 구상해보자


왼손	오른손

검지, 중지 약지 관절들을 로컬 rpy 0,0,0으로 설정했는데

설정안한 약지까지도 기준좌표계랑 동일한 형태를 갖고 있지만

엄지 손가락의 경우 좌표계가 회전을 했는지 뒤집어진걸 볼수있따.

롤로 회전을 좀했는지 y축이 아래를 향하고 있음.

보니 60도 회전해있다고 하는데

이를 0으로 고쳐서 기준 좌표계랑 맞추면 엄지손가락이 어그러질것

로컬로 x축 60도 회전한 탓에 y축이 아래를 향하고 있었지만

x축을 0도 회전으로 고치면 y축이 올라와 원래 공간 좌표계와 비슷한 방향을 향하게 된다.

어그러질줄 알았는데 방향이 이쪽이 아니라그런가

엄지손쪽살만 좀 이동하고만다.

그래도 하위 뼈들이 뒤집혀지지 않은채 공간 좌표랑 비슷한 방향으로 나온다.

손을 쥐는 형태로 만들려 했는데 잠깐 좌표계보느라 밀렸네

그리고 나도모르는세 오른손으로 보고있엇음

왼손으로 다시보면

x롤을 0으로 만들어서 회전을 먼저 풀자

x를 풀어 0으로 만드는것 컴포넌트 공간에서 rotator를 0,0,0으로 설정하는거라 생각하면될듯

일단 기존 손을 지우고

왼손 다시 만든뒤, 회전 0,0,0 주면 월드 좌표계랑 비슷하게 나온다.

하지만 컴포넌트 좌표계에서 보면

롤 90도 회전이 되어있는 상태

손을 들어올린다고하면

월드 좌표계 기준 y축 90도 회전하면 되므로, xy 90,90하면 손이 그대로 들어올려진다.

근대 생각해보니 컴포넌트 좌표계에서도 y축으로 90도 올리면되네?

지금보니 월드 공간에서 손을 회전안주니 컴포넌트 공간이랑 똑같이 나온다

차이를 보기위해

손을 z축 90도 회전시켜놓고 돌려보면될듯

월드 공간 xy 90, 90하면 위 이미지처럼 될것이고

컴포넌트 공간 xy 90,90하면 이상태에서만 손만들어질듯

생각한 대로 되었다.

이제 좀 월드 공간, 컴포넌트 공간 구분이 될랑말랑한다.


컴포넌트 공간에서 할때	월드 공간 기준 회전할때

이재 쬐까 이해됬으니

손을 손등이 보이게 놓고 다뤄보자.

결국 스켈레톤에디터에서 월드 공간 = 컴포넌트 공간 이더라

검지 50, 50, 50도씩 접는다고하면

검지1의 피치가 4이므로

검지1 피치 54, 검지2 피치 104, 검지3 피치 154하면될듯

마참내 생각한데로 되었따.

엄지를 제외한 나머지 손가락도 이런식으로 접어보자

편의를 위해서

1은 90, 54, 0

2는 90, 104, 0

3은 90, 154, 0 으로 설정

엄지를 제외한 나머지 손가락이 원하는 형태로 접혀졌다.

저작자표시

'컴퓨터과학 > 언리얼' 카테고리의 다른 글

HandDesktop16 - 손 관절 가지고 놀기4 rpy 대신 좌표 다루기 구상 (0)	2024.02.02
HandDesktop15 - 손 관절 가지고 놀기3 손이동 구현 (0)	2024.02.01
HandDesktop13 - 손 관절 가지고 놀기 (0)	2024.02.01
HandDesktop12 - 손 관절 회전 방향 구하기(fail?) (0)	2024.02.01
HandDesktop11 - 손 메시 제어하기 (0)	2024.01.31

HandDesktop13 - 손 관절 가지고 놀기

ggamji 2024. 2. 1. 12:40

2024. 2. 1. 12:40

손의 스켈레탈 매시가 좀 이해안되서 블랜더로 내뺴서 보려고함

uasset 스켈레탈매시 내보내는 방법은 여기참고

https://forums.unrealengine.com/t/how-can-i-open-a-uasset-file-in-blender/309306

How can I open a .uasset file in Blender?

I would like to open the SM_HatTemplateRibbon.uasset file in Blender but I can’t figure out how to import it.

forums.unrealengine.com

좀 이상하게 나온다

b_l_index2 rot 0, 0,0 설정시

b_l_index2의 디폴트 rot -2.9, 0.8, -0.39

하지만 월드 좌표계기준 90, 5, 3

r이 90인것은 루트본의 r부터가 90이기때문.

루트본인 b_l_wrist와 연결된

b_l_index1의 컴포넌트 좌표계 회전은 3.42, 2.31, -4.89

루트본 b_l_wrist에서 r로 90도 회전한 상태이어선지

b_l_index1의 r은 90(루트본r) + 3.4(컴포넌트r) = 93.6 인가?

일단 손목 회전부터 본다고 하는데

월드나 로컬 좌표계 회전이나 똑같이 나온다.

로컬좌표계 요를 줬는데, 로컬좌표계 뷰로해도 요가 안오르고 피치가 오른다.

글로벌 좌표계에서 피치가 오르는게 맞긴한데

왜 동일하게 나오는지 모르겟음

롤 90도 회전하여

전역 좌표계가 지금의 로컬 좌표계로 변환

전체 관절의 월드좌표계 회전 0,0,0 으로 했을때 왼손

검지 중지 약지는 맞아보이긴한데

엄지, 소지가 손등에 붙어버린다.

실제 인게임에도 이렇게 해보자

전체 뼈를 보려고하는데 all bone 대신 all socket 네임즈가 있길래

socket이 bone과 비슷한거같긴한데 뭔진 잘몰라서 출력해보니

뼈로 알고있던 것들 전체가 나온다.

생각해보니

어제 사용하는 뼈이름목록을 맵으로 만들어둔게 있었으니 이거쓰면될듯

어제 만들어둔 뼈이름목록맵으로 월드공간 0,0,0 설정하는 bp 작성

아까 스켈레탈메시를 했던거랑 다르게되는데

생각해보니

이 맵에 전체 뼈 이름을 넣은게 아니엇음.

블라즈핸드로 랜드마크가 21개가 나오지만

뼈랑 매칭이 안되고 몇개를 뺏기때문

void ADesktopGameModeBase::make_map_bone()
{
	MapBoneLeft.Add(2, FString("b_l_thumb2"));
	MapBoneLeft.Add(3, FString("b_l_thumb3"));
	MapBoneLeft.Add(4, FString("b_l_index1"));
	MapBoneLeft.Add(5, FString("b_l_index2"));
	MapBoneLeft.Add(6, FString("b_l_index3"));
	MapBoneLeft.Add(7, FString("b_l_middle1"));
	MapBoneLeft.Add(8, FString("b_l_middle2"));
	MapBoneLeft.Add(9, FString("b_l_middle3"));
	MapBoneLeft.Add(10, FString("b_l_ring1"));
	MapBoneLeft.Add(11, FString("b_l_ring2"));
	MapBoneLeft.Add(12, FString("b_l_ring3"));
	MapBoneLeft.Add(13, FString("b_l_pinky1"));
	MapBoneLeft.Add(14, FString("b_l_pinky2"));
	MapBoneLeft.Add(15, FString("b_l_pinky3"));


	MapBoneRight.Add(2, FString("b_r_thumb2"));
	MapBoneRight.Add(3, FString("b_r_thumb3"));
	MapBoneRight.Add(4, FString("b_r_index1"));
	MapBoneRight.Add(5, FString("b_r_index2"));
	MapBoneRight.Add(6, FString("b_r_index3"));
	MapBoneRight.Add(7, FString("b_r_middle1"));
	MapBoneRight.Add(8, FString("b_r_middle2"));
	MapBoneRight.Add(9, FString("b_r_middle3"));
	MapBoneRight.Add(10, FString("b_r_ring1"));
	MapBoneRight.Add(11, FString("b_r_ring2"));
	MapBoneRight.Add(12, FString("b_r_ring3"));
	MapBoneRight.Add(13, FString("b_r_pinky1"));
	MapBoneRight.Add(14, FString("b_r_pinky2"));
	MapBoneRight.Add(15, FString("b_r_pinky3"));



	MapBoneLeft.Add(16, FString("b_l_wrist"));
	MapBoneLeft.Add(17, FString("b_l_thumb0"));
	MapBoneLeft.Add(18, FString("b_l_thumb1"));
	MapBoneLeft.Add(19, FString("b_l_pinky0"));
	MapBoneRight.Add(16, FString("b_r_wrist"));
	MapBoneRight.Add(17, FString("b_r_thumb0"));
	MapBoneRight.Add(18, FString("b_r_thumb1"));
	MapBoneRight.Add(19, FString("b_r_pinky0"));
}

나머지 뼈 이름들도 추가하면 이제 좀 제대로 나온다.

그런데 이해가 안되는게 시작전 손이 이렇게 되있었지만 전부 같은 방향을 향하도록 바뀌엇다.

월드 스페이스 기준 0,0,0 해서인듯한데

컴포넌트 스페이스로 바꿔서 하면

액터의 rot이 반영된 상태로 0,0,0 했을때 비슷하게 됫다

엄지가 손 중간으로 파고들어간게

b_l_thumb0를 0,0,0으로 했기때문이었는데

rpy를 디폴트대로 60, -44, -25 로 고정시켜놔야할듯

썸0는 맵 키로 17을 햇으므로

* string eqaul이 안되길래 int인 key사용

MapBoneLeft.Add(17, FString("b_l_thumb0"));

이러면 손비슷하게 나오긴할듯 싶었지만

아까 실행할때보다 낫긴한데, 바로 스켈레탈매시 편집기에서 했을때랑 좀 다르게 나온다.

rot이 다른가 싶어 프린트 추가

했더니 RPY가 여러가지가 나온다

같은게 4개뜰줄알아슨ㄴ데

뭔가 이상하니 이름도 출력하도록 수정

이름은 다같지만 rpy가다르다

다시보니 겟 로테이션에서 월드 로했기때문 컴포넌트로 바꿔서 하면

이제 다 똑같이 나온다.

근데 왜 스켈레톤에디터랑 다르게나오는지 모르겟다.

그래도 대충 원래 손모양이랑 비슷하니

썸0 빼고 rpy 조절하면 되지않을까싶다.

싶다가

손이 자꾸 롤 90도 회전된게 풀려버리니 b_l_wrist 16도 빼줘야할듯

원래 배치한 방향대로 나오긴한데

썸1부분이 문제인지 손가락이 사라져보인다.

썸1도 rpy 0,0,0 안하게 하면

뭔진 모르겟지만 엄지가 계속이상하게나온다.

일단 넘어가서 기존에 제어하던 모든 손가락을 30도씩 접도록 해보자

키 인트값이 15이하이므로 이 조건으로 하면될거고

	MapBoneLeft.Add(2, FString("b_l_thumb2"));
	MapBoneLeft.Add(3, FString("b_l_thumb3"));
	MapBoneLeft.Add(4, FString("b_l_index1"));
	MapBoneLeft.Add(5, FString("b_l_index2"));
	MapBoneLeft.Add(6, FString("b_l_index3"));
	MapBoneLeft.Add(7, FString("b_l_middle1"));
	MapBoneLeft.Add(8, FString("b_l_middle2"));
	MapBoneLeft.Add(9, FString("b_l_middle3"));
	MapBoneLeft.Add(10, FString("b_l_ring1"));
	MapBoneLeft.Add(11, FString("b_l_ring2"));
	MapBoneLeft.Add(12, FString("b_l_ring3"));
	MapBoneLeft.Add(13, FString("b_l_pinky1"));
	MapBoneLeft.Add(14, FString("b_l_pinky2"));
	MapBoneLeft.Add(15, FString("b_l_pinky3"));

롤로 90도 회전했기때문에

x는 그대로지만 z축이 오른쪽 y축이 아래를 향하고 있다.

손가락을 접으려면 y축으로 회전해야하므로 피치 30을 주면될듯하다.

컴포넌트 공간 피치 30도 2 ~ 15번 뼈 회전 처리하는 bp

접힌듯하지만 좀 아쉽다

70도로 수정하면

주먹쥔듯 될줄 알았는데

1번 관절들만 접힌것 같이 됨

이름이랑 rpy 출력하도록 고쳐서보면

정상적으로 나오긴한다.

각 관절 피치 70주면 손이 접혀야 될텐데

싶다가 지금보니 파란색 z축이니 요인걸 잊엇다.

또 이해안가는건 오른손법칙 생각하면 + 70으로 가야 접힐탠대 여기선 -70으로 가야한다.

컴포넌트 공간 요로 70도 회전시켯더니 이번엔 이상한 방향으로 간다.

뭐가 문젤까.

16미만은 0,0,90

16이상은 0,0,0 을 줫더니 이런식으로 된다.

하다보니 이해된게 나는 set bone rotation by name의

bone space에서

component space를 그냥 컴포넌트 공간에서 묻는건데

로컬 좌표계 기준으로 회전한다는걸로 오해하고 있었다.

월드 스페이스야 월드좌표계 기준으로 간다는거고

일단 뭐가 문젠지 이해했으니 다음글에서 다시정리

저작자표시

'컴퓨터과학 > 언리얼' 카테고리의 다른 글

HandDesktop15 - 손 관절 가지고 놀기3 손이동 구현 (0)	2024.02.01
HandDesktop14 - 손 관절 가지고 놀기2 컴포넌트공간 관절회전 (0)	2024.02.01
HandDesktop12 - 손 관절 회전 방향 구하기(fail?) (0)	2024.02.01
HandDesktop11 - 손 메시 제어하기 (0)	2024.01.31
HandDesktop10 - blazehand c++, unreal 적용 (0)	2024.01.29

HandDesktop12 - 손 관절 회전 방향 구하기(fail?)

ggamji 2024. 2. 1. 09:45

2024. 2. 1. 09:45

오큘러스에서 제공하는 손 스켈레탈 메시를

포저블 매시로 받아서 관절별로 회전하는걸 확인했는데

이제 블라즈 핸드 추론 결과로 어떻게 관절회전방향을 구하는가가 문제이다

일단 기존 mediapipepytorch 데모 코드에

정규화된 랜드마크를 따로 저장해서 다뤄보려고함

이 넘파이 파일을 한번띄워보면 과학표기법으로 나와서 알아보기힘들다.

np.set_printoptions(supress=True) 주면 과학 표기법이 꺼지고 알아보기 쉽게나옴

졍규화된 랜드마크 인줄알았는데

xy값은 정규화가 되어있지않다?

데모 코드에서 찍어보니

정규화된게 맞는데

세이브 되기전에 어디서 수정된듯하다.

세이브 시점을 저장눌렀을때가 아니라

추론 직후로 옮기고

    flags2, handed2, normalized_landmarks2 = hand_regressor(img.to(gpu))
    filename = 'normalized_landmarks2.npy'
    np.save(filename, normalized_landmarks2)
    
    landmarks2 = hand_regressor.denormalize_landmarks(normalized_landmarks2.cpu(), affine2)

이렇게 했을때 정상적으로 정규화된 값이 나온다.

일단 z축은 -256

xy축은 256으로만 반정규화시켜보면

이제 대충 원하던 형태로 값이 나온다.

이 포인트들을 맷플롭립으로 띄워보려는데 이상하게 나온다.

차원이 하나더 있어서였으므로 스퀴즈

대충 잘 나오긴한거같은데 알아보긴 힘들다

import numpy as np
import matplotlib.pyplot as plt

# 데이터 분리
x = denormalize_landmark[:, 0]
y = denormalize_landmark[:, 1]
z = denormalize_landmark[:, 2]

# 3차원 시각화
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(x, y, z)

# 축 레이블 설정
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')

# 그래프 출력
plt.show()

화면을 돌리면서 보기위해

%matplotlib widget하는데

ipympl이없단다.

!pip install ipympl

핍으로 설치뒤에

다시 %matplotlib widget 하면 ok

1~4는 엄지

5~6 검지

9~12 중지

13~16 약지

17~20 소지

이므로 색을 따로 줘서 플로팅하면

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

%matplotlib widget


# 색상 지정
colors = ['black'] + ['red'] * 4 + ['orange'] * 4 + ['yellow'] * 4 + ['green'] * 4 + ['blue'] * 4

# 3차원 산점도 그리기
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

x = denormalize_landmark[:, 0]
y = denormalize_landmark[:, 1]
z = denormalize_landmark[:, 2]

ax.scatter(x, y, z, c=colors)

ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')

plt.show()

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

%matplotlib widget

# 3차원 산점도 그리기
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

x = denormalize_landmark[:, 0]
y = denormalize_landmark[:, 1]
z = denormalize_landmark[:, 2]

# 색상 지정
colors = ['black'] + ['red'] * 4 + ['orange'] * 4 + ['yellow'] * 4 + ['green'] * 4 + ['blue'] * 4
ax.scatter(x, y, z, c=colors)


#손가락
colors = ['red', 'orange', 'yellow', 'green', 'blue']
groups = [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16], [17, 18, 19, 20]]
for i, group in enumerate(groups):
    for j in range(len(group)-1):
        plt.plot([x[group[j]], x[group[j+1]]], [y[group[j]], y[group[j+1]]], [z[group[j]], z[group[j+1]]], color=colors[i])
#손등
lines = [[0, 1], [0, 5], [0, 17], [5, 9], [9, 13], [13, 17]] 
for line in lines:
    ax.plot([x[line[0]], x[line[1]]], [y[line[0]], y[line[1]]], [z[line[0]], z[line[1]]], color='gray')



ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')

plt.show()

하면서보니 모양이 조금이상한것 같아서

z에 -1해줫는데

손목 z가 0을 기준으로 시작하는듯하다.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

%matplotlib widget

# 3차원 산점도 그리기
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

x = denormalize_landmark[:, 0]
y = denormalize_landmark[:, 1]
z = denormalize_landmark[:, 2] * -1

# 색상 지정
colors = ['black'] + ['red'] * 4 + ['orange'] * 4 + ['yellow'] * 4 + ['green'] * 4 + ['blue'] * 4
ax.scatter(x, y, z, c=colors)


#손가락
colors = ['red', 'orange', 'yellow', 'green', 'blue']
groups = [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16], [17, 18, 19, 20]]
for i, group in enumerate(groups):
    for j in range(len(group)-1):
        plt.plot([x[group[j]], x[group[j+1]]], [y[group[j]], y[group[j+1]]], [z[group[j]], z[group[j+1]]], color=colors[i])
#손등
lines = [[0, 1], [0, 5], [0, 17], [5, 9], [9, 13], [13, 17]] 
for line in lines:
    ax.plot([x[line[0]], x[line[1]]], [y[line[0]], y[line[1]]], [z[line[0]], z[line[1]]], color='gray')



ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')

plt.show()

이제 이점들로 오큘러스 핸드의 관절 회전각을 구해야한다.

일단 반정규화 랜드마크 z축을 -1 곱해놓긴하자

일단 검지 첫번째 관절부터 보자.

회전 상태를보면 롤피치요 다 거의 0,0,0

요 50도 정도 주면 이런식이된다.

근데 위의건 로컬 좌표계 기준이라 전역이랑 좀 다르다.

랜드마크로 방향을 구하면 전역 좌표계 기준 방향이 나올듯한데

전역계기준 저 포인트의 rpy를 계산해내야할듯

시작점과 끝점을 출력해보면 이런식인데

시작점으로 -시켜주면

시작점이 원점이 되고

끝점은 원점 기준으로 이동된다.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

%matplotlib widget

pts = np.array([pt_start, pt_end])

# 3차원 산점도 그리기
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

x = pts[:, 0]
y = pts[:, 1]
z = pts[:, 2]

ax.scatter(x, y, z)



ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')

plt.show()

어캐할까 싶다가

먼저 xy 평면에서 각도를 계산해보면

-71도 정도가나온다.

이번에는 xy 놈이랑 z좌표로 각을 구하면 -5도 정도가 나오는데 이것도 대충 맞겟지 싶다.

대충 각도구하긴 했는데, 언리얼에선 롤피치요를 쓰지만

여기선 각도 2개를 구했다.

z방향 회전 각도는 y축으로 회전한것이니 피치?

xy 평면 회전 각도는 z축으로 회전한것이니 요? 쯤 되지않나싶다.

언리얼에서 관절을 돌려보긴하는데 뭔가 좀 이상하다.

위에서 보니 -70도라기보단 20도쯤 될거같은데

방향이 좀 달라서 그런듯싶다.

잠깐

기본손의 다음 뼈들의 RPY를 다 0으로 설정했을때 모습

보는데 matplotlib에서 x축 방향이 반대로되어있으니 햇갈려서 다시띄움

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd


%matplotlib widget

# 3차원 산점도 그리기
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

x = denormalize_landmark[:, 0]
y = denormalize_landmark[:, 1]
z = denormalize_landmark[:, 2]

# 색상 지정
colors = ['black'] + ['red'] * 4 + ['orange'] * 4 + ['yellow'] * 4 + ['green'] * 4 + ['blue'] * 4
ax.scatter(x, y, z, c=colors)


#손가락
colors = ['red', 'orange', 'yellow', 'green', 'blue']
groups = [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16], [17, 18, 19, 20]]
for i, group in enumerate(groups):
    for j in range(len(group)-1):
        plt.plot([x[group[j]], x[group[j+1]]], [y[group[j]], y[group[j+1]]], [z[group[j]], z[group[j+1]]], color=colors[i])
#손등
lines = [[0, 1], [0, 5], [0, 17], [5, 9], [9, 13], [13, 17]] 
for line in lines:
    ax.plot([x[line[0]], x[line[1]]], [y[line[0]], y[line[1]]], [z[line[0]], z[line[1]]], color='gray')



ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')


plt.gca().invert_xaxis()
plt.show()

y 방향이 아래로 가므로 -1해주면 70도로 맞긴한데

문제는 관절 rpy들을 0으로 했을때랑 다르다

저런 식으로 맞추려면

관절에선 전역계 요를 -20도로 찍어줘야함.

z방향 각도 그러니까 y축 회전 피치는

-5도 나오는데

언리얼 본에선 +방향으로 회전시켜줘야 손바닥 안쪽으로 들어온다.

아래 그림은 전역 피치 80줬을때 상황

정리해보면 일단

요는 계산결과 - 90

피치는 * -1

를 해주면 원하는 방향이 나올듯한데

소지의 경우도 봐야할듯하다.

소지(17, 18)는 계산 결과 131도가 나오는데

17번점 기준으로 xy평면에서 131도가 맞긴 하다.

90 - 131 = -41이 되는데

-41을 전역계 요

z축 회전각은 -10도 이므로

-10 * -1 = 10을 전역계 피치

를주면

피치는 맞는데 요가 좀잘못됨

-41이 아닌 41을 주면 원하는 각이나오긴하다.

전역계 요각 계산할때

보라색 y축 기준 각도로 해야될듯하다.

두점 사이 피치, 요 계산하는 코드는 이런식으로 정리

c++로 고친결과

void ADesktopGameModeBase::get_pitch_yaw(cv::Point3f pt_start, cv::Point3f pt_end, float& pitch, float& yaw) {
	float dx = pt_end.x - pt_start.x;
	float dy = pt_end.y - pt_start.y;
	dy *= -1;
	yaw = std::atan2(dy, dx) * 180 / CV_PI;
	yaw = yaw - 90;

	float dz = pt_end.z - pt_start.z;
	float xy_norm = std::sqrt(dx * dx + dy * dy);
	pitch = std::atan2(dz, xy_norm) * 180 / CV_PI;
	pitch *= -1;
}

c++ 롤피치요 를 블루프린트 가져와서 쓸수있도록 코드수정

// Fill out your copyright notice in the Description page of Project Settings.

#pragma once


#include "Blaze.h"

#include "Windows/AllowWindowsPlatformTypes.h"
#include <Windows.h>
#include "Windows/HideWindowsPlatformTypes.h"

#include "PreOpenCVHeaders.h"
#include <opencv2/opencv.hpp>
#include "PostOpenCVHeaders.h"

#include "CoreMinimal.h"
#include "GameFramework/GameModeBase.h"
#include "DesktopGameModeBase.generated.h"

/**
 * 
 */
UCLASS()
class HANDDESKTOP_API ADesktopGameModeBase : public AGameModeBase
{
	GENERATED_BODY()

protected:
	// Called when the game starts or when spawned
	virtual void BeginPlay() override;

public:
	UFUNCTION(BlueprintCallable)
	void ReadFrame();




	int monitorWidth = 1920;
	int monitorHeight = 1080;
	UPROPERTY(EditAnywhere, BlueprintReadWrite)
	UTexture2D* imageTextureScreen1;
	UPROPERTY(EditAnywhere, BlueprintReadWrite)
	UTexture2D* imageTextureScreen2;
	UPROPERTY(EditAnywhere, BlueprintReadWrite)
	UTexture2D* imageTextureScreen3;
	cv::Mat imageScreen1;
	cv::Mat imageScreen2;
	cv::Mat imageScreen3;

	void ScreensToCVMats();
	void CVMatsToTextures();



	int webcamWidth = 640;
	int webcamHeight = 480;

	cv::VideoCapture capture;
	cv::Mat webcamImage;
	UPROPERTY(EditAnywhere, BlueprintReadWrite)
	UTexture2D* webcamTexture;
	void MatToTexture2D(const cv::Mat InMat);


	//var and functions with blaze
	Blaze blaze;
	cv::Mat img256;
	cv::Mat img128;
	float scale;
	cv::Scalar pad;





	// vars and funcs for rotator
	int hand_conns_indexes[14] = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15};
	void get_pitch_yaw(cv::Point3f pt_start, cv::Point3f pt_end, float& pitch, float& yaw);
	void make_map_for_rotators(std::vector<cv::Mat> denorm_imgs_landmarks);
	void make_map_bone();

	UPROPERTY(BlueprintReadWrite, Category="RotatorMap")
	TMap<int32, float> MapPitch;
	UPROPERTY(BlueprintReadWrite, Category = "RotatorMap")
	TMap<int32, float> MapYaw;
	UPROPERTY(BlueprintReadWrite, Category = "RotatorMap")
	TMap<int32, FString> MapBoneLeft;
	UPROPERTY(BlueprintReadWrite, Category = "RotatorMap")
	TMap<int32, FString> MapBoneRight;


};

// Fill out your copyright notice in the Description page of Project Settings.

#include "DesktopGameModeBase.h"


void ADesktopGameModeBase::BeginPlay()
{
	Super::BeginPlay();
	blaze = Blaze();

	capture = cv::VideoCapture(0);
	if (!capture.isOpened())
	{
		UE_LOG(LogTemp, Log, TEXT("Open Webcam failed"));
		return;
	}
	else
	{
		UE_LOG(LogTemp, Log, TEXT("Open Webcam Success"));
	}
	capture.set(cv::CAP_PROP_FRAME_WIDTH, webcamWidth);
	capture.set(cv::CAP_PROP_FRAME_HEIGHT, webcamHeight);


	webcamTexture = UTexture2D::CreateTransient(monitorWidth, monitorHeight, PF_B8G8R8A8);


	imageScreen1 = cv::Mat(monitorHeight, monitorWidth, CV_8UC4);
	imageScreen2 = cv::Mat(monitorHeight, monitorWidth, CV_8UC4);
	imageScreen3 = cv::Mat(monitorHeight, monitorWidth, CV_8UC4);
	imageTextureScreen1 = UTexture2D::CreateTransient(monitorWidth, monitorHeight, PF_B8G8R8A8);
	imageTextureScreen2 = UTexture2D::CreateTransient(monitorWidth, monitorHeight, PF_B8G8R8A8);
	imageTextureScreen3 = UTexture2D::CreateTransient(monitorWidth, monitorHeight, PF_B8G8R8A8);
	
	make_map_bone();
}


void ADesktopGameModeBase::ReadFrame()
{
	if (!capture.isOpened())
	{
		return;
	}
	capture.read(webcamImage);

	/*
	get filtered detections
	*/
	blaze.ResizeAndPad(webcamImage, img256, img128, scale, pad);
	//UE_LOG(LogTemp, Log, TEXT("scale value: %f, pad value: (%f, %f)"), scale, pad[0], pad[1]);
	std::vector<Blaze::PalmDetection> normDets = blaze.PredictPalmDetections(img128);
	std::vector<Blaze::PalmDetection> denormDets = blaze.DenormalizePalmDetections(normDets, webcamWidth, webcamHeight, pad);
	std::vector<Blaze::PalmDetection> filteredDets = blaze.FilteringDets(denormDets, webcamWidth, webcamHeight);





	std::vector<cv::Rect> handRects = blaze.convertHandRects(filteredDets);
	std::vector<cv::Mat> handImgs;
	blaze.GetHandImages(webcamImage, handRects, handImgs);
	std::vector<cv::Mat> imgs_landmarks = blaze.PredictHandDetections(handImgs);
	std::vector<cv::Mat> denorm_imgs_landmarks = blaze.DenormalizeHandLandmarks(imgs_landmarks, handRects);

	make_map_for_rotators(denorm_imgs_landmarks);



	//draw hand rects/ plam detection/ dets info/ hand detection
	blaze.DrawRects(webcamImage, handRects);
	blaze.DrawPalmDetections(webcamImage, filteredDets);
	blaze.DrawDetsInfo(webcamImage, filteredDets, normDets, denormDets);
	blaze.DrawHandDetections(webcamImage, denorm_imgs_landmarks);

	//cv::mat to utexture2d
	MatToTexture2D(webcamImage);


	/*
	모니터 시각화
	*/
	ScreensToCVMats();
	CVMatsToTextures();
}


void ADesktopGameModeBase::MatToTexture2D(const cv::Mat InMat)
{
	if (InMat.type() == CV_8UC3)//example for pre-conversion of Mat
	{
		cv::Mat resizedImage;
		cv::resize(InMat, resizedImage, cv::Size(monitorWidth, monitorHeight));
		cv::Mat bgraImage;
		//if the Mat is in BGR space, convert it to BGRA. There is no three channel texture in UE (at least with eight bit)
		cv::cvtColor(resizedImage, bgraImage, cv::COLOR_BGR2BGRA);

		//actually copy the data to the new texture
		FTexture2DMipMap& Mip = webcamTexture->GetPlatformData()->Mips[0];
		void* Data = Mip.BulkData.Lock(LOCK_READ_WRITE);//lock the texture data
		FMemory::Memcpy(Data, bgraImage.data, bgraImage.total() * bgraImage.elemSize());//copy the data
		Mip.BulkData.Unlock();
		webcamTexture->PostEditChange();
		webcamTexture->UpdateResource();
	}
	else if (InMat.type() == CV_8UC4)
	{
		//actually copy the data to the new texture
		FTexture2DMipMap& Mip = webcamTexture->GetPlatformData()->Mips[0];
		void* Data = Mip.BulkData.Lock(LOCK_READ_WRITE);//lock the texture data
		FMemory::Memcpy(Data, InMat.data, InMat.total() * InMat.elemSize());//copy the data
		Mip.BulkData.Unlock();
		webcamTexture->PostEditChange();
		webcamTexture->UpdateResource();
	}
	//if the texture hasnt the right pixel format, abort.
	webcamTexture->PostEditChange();
	webcamTexture->UpdateResource();
}



void ADesktopGameModeBase::ScreensToCVMats()
{
	HDC hScreenDC = GetDC(NULL);
	HDC hMemoryDC = CreateCompatibleDC(hScreenDC);
	int screenWidth = GetDeviceCaps(hScreenDC, HORZRES);
	int screenHeight = GetDeviceCaps(hScreenDC, VERTRES);

	HBITMAP hBitmap = CreateCompatibleBitmap(hScreenDC, screenWidth, screenHeight);
	HBITMAP hOldBitmap = (HBITMAP)SelectObject(hMemoryDC, hBitmap);

	//screen 1
	BitBlt(hMemoryDC, 0, 0, screenWidth, screenHeight, hScreenDC, 0, 0, SRCCOPY);
	GetBitmapBits(hBitmap, imageScreen1.total() * imageScreen1.elemSize(), imageScreen1.data);

	//screen 2
	BitBlt(hMemoryDC, 0, 0, screenWidth, screenHeight, hScreenDC, 1920, 0, SRCCOPY);
	GetBitmapBits(hBitmap, imageScreen2.total() * imageScreen2.elemSize(), imageScreen2.data);

	//screen 3
	BitBlt(hMemoryDC, 0, 0, screenWidth, screenHeight, hScreenDC, 3840, 0, SRCCOPY);
	GetBitmapBits(hBitmap, imageScreen3.total() * imageScreen3.elemSize(), imageScreen3.data);
	SelectObject(hMemoryDC, hOldBitmap);


	DeleteDC(hScreenDC);
	DeleteDC(hMemoryDC);

	DeleteObject(hBitmap);
	DeleteObject(hOldBitmap);

}


void ADesktopGameModeBase::CVMatsToTextures()
{
	for (int i = 0; i < 3; i++)
	{
		if (i == 0)
		{
			FTexture2DMipMap& Mip = imageTextureScreen1->GetPlatformData()->Mips[0];
			void* Data = Mip.BulkData.Lock(LOCK_READ_WRITE);//lock the texture data
			FMemory::Memcpy(Data, imageScreen1.data, imageScreen1.total() * imageScreen1.elemSize());//copy the data
			Mip.BulkData.Unlock();

			imageTextureScreen1->PostEditChange();
			imageTextureScreen1->UpdateResource();
		}
		else if (i == 1)
		{
			FTexture2DMipMap& Mip = imageTextureScreen2->GetPlatformData()->Mips[0];
			void* Data = Mip.BulkData.Lock(LOCK_READ_WRITE);//lock the texture data
			FMemory::Memcpy(Data, imageScreen2.data, imageScreen2.total() * imageScreen2.elemSize());//copy the data
			Mip.BulkData.Unlock();

			imageTextureScreen2->PostEditChange();
			imageTextureScreen2->UpdateResource();
		}
		else if (i == 2)
		{
			FTexture2DMipMap& Mip = imageTextureScreen3->GetPlatformData()->Mips[0];
			void* Data = Mip.BulkData.Lock(LOCK_READ_WRITE);//lock the texture data
			FMemory::Memcpy(Data, imageScreen3.data, imageScreen3.total() * imageScreen3.elemSize());//copy the data
			Mip.BulkData.Unlock();

			imageTextureScreen3->PostEditChange();
			imageTextureScreen3->UpdateResource();
		}


	}
}





void ADesktopGameModeBase::get_pitch_yaw(cv::Point3f pt_start, cv::Point3f pt_end, float& pitch, float& yaw) {
	float dx = pt_end.x - pt_start.x;
	float dy = pt_end.y - pt_start.y;
	dy *= -1;
	yaw = std::atan2(dy, dx) * 180 / CV_PI;
	yaw = yaw - 90;

	float dz = pt_end.z - pt_start.z;
	float xy_norm = std::sqrt(dx * dx + dy * dy);
	pitch = std::atan2(dz, xy_norm) * 180 / CV_PI;
	pitch *= -1;
}

void ADesktopGameModeBase::make_map_for_rotators(std::vector<cv::Mat> denorm_imgs_landmarks)
{
	for (auto& denorm_landmarks : denorm_imgs_landmarks)
	{
		std::vector<std::array<int, 2>> HAND_CONNECTIONS = blaze.HAND_CONNECTIONS;
		for (auto& hand_conns_index : hand_conns_indexes)
		{
			std::array<int, 2> hand_conns = HAND_CONNECTIONS.at(hand_conns_index);
			float pitch, yaw;
			cv::Point3f pt_start, pt_end;

			pt_start.x = denorm_landmarks.at<float>(hand_conns.at(0), 0);
			pt_start.y = denorm_landmarks.at<float>(hand_conns.at(0), 1);
			pt_start.z = denorm_landmarks.at<float>(hand_conns.at(0), 2);

			pt_end.x = denorm_landmarks.at<float>(hand_conns.at(1), 0);
			pt_end.y = denorm_landmarks.at<float>(hand_conns.at(1), 1);
			pt_end.z = denorm_landmarks.at<float>(hand_conns.at(1), 2);


			get_pitch_yaw(pt_start, pt_end, pitch, yaw);
			MapPitch.Add(hand_conns_index, pitch);
			MapYaw.Add(hand_conns_index, yaw);
		}
	}
}


/*
	std::vector<std::array<int, 2>> HAND_CONNECTIONS = {
	{0, 1}, {1, 2}, {2, 3}, {3, 4},
	{5, 6}, {6, 7}, {7, 8},
	{9, 10}, {10, 11}, {11, 12},
	{13, 14}, {14, 15}, {15, 16},
	{17, 18}, {18, 19}, {19, 20},
	{0, 5}, {5, 9}, {9, 13}, {13, 17}, {0, 17}
	};
*/


void ADesktopGameModeBase::make_map_bone()
{
	MapBoneLeft.Add(2, FString("b_l_thumb2"));
	MapBoneLeft.Add(3, FString("b_l_thumb3"));
	MapBoneLeft.Add(4, FString("b_l_index1"));
	MapBoneLeft.Add(5, FString("b_l_index2"));
	MapBoneLeft.Add(6, FString("b_l_index3"));
	MapBoneLeft.Add(7, FString("b_l_middle1"));
	MapBoneLeft.Add(8, FString("b_l_middle2"));
	MapBoneLeft.Add(9, FString("b_l_middle3"));
	MapBoneLeft.Add(10, FString("b_l_ring1"));
	MapBoneLeft.Add(11, FString("b_l_ring2"));
	MapBoneLeft.Add(12, FString("b_l_ring3"));
	MapBoneLeft.Add(13, FString("b_l_pinky1"));
	MapBoneLeft.Add(14, FString("b_l_pinky2"));
	MapBoneLeft.Add(15, FString("b_l_pinky3"));


	MapBoneRight.Add(2, FString("b_r_thumb2"));
	MapBoneRight.Add(3, FString("b_r_thumb3"));
	MapBoneRight.Add(4, FString("b_r_index1"));
	MapBoneRight.Add(5, FString("b_r_index2"));
	MapBoneRight.Add(6, FString("b_r_index3"));
	MapBoneRight.Add(7, FString("b_r_middle1"));
	MapBoneRight.Add(8, FString("b_r_middle2"));
	MapBoneRight.Add(9, FString("b_r_middle3"));
	MapBoneRight.Add(10, FString("b_r_ring1"));
	MapBoneRight.Add(11, FString("b_r_ring2"));
	MapBoneRight.Add(12, FString("b_r_ring3"));
	MapBoneRight.Add(13, FString("b_r_pinky1"));
	MapBoneRight.Add(14, FString("b_r_pinky2"));
	MapBoneRight.Add(15, FString("b_r_pinky3"));


}

출력 시켜봣는데 yaw는 좀 안맞는거같은데 피치는 대강 맞는듯하다

일단 본 회전하는걸 해봤는데

이유를 모르겠지만 되긴한다.

하지만 월드 공간 기준으로 회전해서 되지

컴포넌트 공간 기준 회전으론 잘 동작되진 않는다.

https://www.youtube.com/watch?v=ZmiZ6VkSJBE

https://www.youtube.com/watch?v=9dZjcFW3BRY

대충 손 모델 가져와서 스켈레탈 메시로 만들어 시도해보려 했으나 일단망함

저작자표시

'컴퓨터과학 > 언리얼' 카테고리의 다른 글

HandDesktop14 - 손 관절 가지고 놀기2 컴포넌트공간 관절회전 (0)	2024.02.01
HandDesktop13 - 손 관절 가지고 놀기 (0)	2024.02.01
HandDesktop11 - 손 메시 제어하기 (0)	2024.01.31
HandDesktop10 - blazehand c++, unreal 적용 (0)	2024.01.29
HandDesktop09 - 기존 문제들 원인 찾음, blaze 갈아엎기, 블라즈팜 문제해결 (0)	2024.01.26

HandDesktop11 - 손 메시 제어하기

ggamji 2024. 1. 31. 10:04

2024. 1. 31. 10:04

언리얼에서 블라즈핸드 돌렸고

이제 블라즈 핸드로 손제어하는걸 만들려고 한다.

손 모델을 오큘러스에 있는 손을 구하려고 찾아보다가 본 영상

oculus vr 깃헙에서 손을 받을수 있어보였다.

https://www.youtube.com/watch?v=CKIdogPFZg8

위 영상에 나온데로 깃헙에 들어가려고하면 404가 나오는데

에픽 게임즈에 깃헙 계정을 등록안했기 때문

https://developer.oculus.com/documentation/unreal/unreal-compatibility-matrix/

Oculus Source Distribution for Unreal Engine | Oculus Developers

developer.oculus.com

여기서 시키는대로 에픽게임즈에 깃헙아이디 등록하고

깃헙 주소로 바로들어가려고해도 안될텐데

깃헙 이메일 주소로 에픽게임즈 연결 메일을 확인까지 해줘야 들어가진다.

들어와진건 좋은데

아직 VR을 다룰 능력은 안되고

손모델만 찾으러 왔는데

여기로 들어오면 언리얼 핸드샘플들을 구할수 있음

https://github.com/oculus-samples/Unreal-HandSample

BP대신 여기있는 모델들을 가져와서 사용

https://github.com/oculus-samples/Unreal-HandSample/tree/main/Content/Hands/Models

OculusHand_L.uasset

를 열면 이런식으로 나오는데

어두운 뼈들은 왜 있는질 잘 모르겠음.

보기 지저분하니

Hand_L 스켈레탈 메시 복사로 따로빼서

회색 본들을 지우면 그래도 알아볼만큼 된다.

대충 보면 블라즈 핸드의 스캘래톤 형태와 비슷한것 같지만

관절수가 조금 맞진 않는다.

그럼 이 손 스켈레탈 메시를 게임에 어떻게 가져와서 쓰나 싶었는데

스켈레탈 메시 컴포넌트로 위 메시를 가져와도

애니메이션을 스켈레탈메시로 동작시킨것 처럼

게임에서는 원하는데로 회전시킬수 없었다.

대신 포저블 매시 컴포넌트로 쓰면 된다고 하는데

일단

HandLeft 액터 c++ 생성하고

Hand_L 스켈레탈 메시를 컨스트럭터 헬퍼로 로드하고

포저블매시컴포넌트 HandLeft에 SetSkeletalMesh를 해주면

스켈레탈 매시를 포저블 매시 컴포넌트로 사용가능하다.

// Fill out your copyright notice in the Description page of Project Settings.

#pragma once

#include "Components/PoseableMeshComponent.h"
#include "CoreMinimal.h"
#include "GameFramework/Actor.h"
#include "HandLeft.generated.h"

UCLASS()
class HANDDESKTOP_API AHandLeft : public AActor
{
	GENERATED_BODY()
	
public:	
	// Sets default values for this actor's properties
	AHandLeft();

protected:
	// Called when the game starts or when spawned
	virtual void BeginPlay() override;

public:	
	// Called every frame
	virtual void Tick(float DeltaTime) override;


	UPROPERTY(EditAnywhere, BlueprintReadWrite, Category=PoseableMesh)
	UPoseableMeshComponent* HandLeft;

};

// Fill out your copyright notice in the Description page of Project Settings.


#include "HandLeft.h"

// Sets default values
AHandLeft::AHandLeft()
{
 	// Set this actor to call Tick() every frame.  You can turn this off to improve performance if you don't need it.
	PrimaryActorTick.bCanEverTick = true;



	// 스켈레탈 메시 컴포넌트 생성 및 부착
	ConstructorHelpers::FObjectFinder<USkeletalMesh> TempHandMesh(
		TEXT("SkeletalMesh'/Game/MyContent/Hand_L.Hand_L'")
	);
	HandLeft = CreateDefaultSubobject<UPoseableMeshComponent>(TEXT("HandLeft"));

	HandLeft->SetSkeletalMesh(TempHandMesh.Object);
	HandLeft->SetRelativeLocation(FVector(0, 0, 0));
	//HandLeft->SetupAttachment(RootComponent);




}

// Called when the game starts or when spawned
void AHandLeft::BeginPlay()
{
	Super::BeginPlay();
	
}

// Called every frame
void AHandLeft::Tick(float DeltaTime)
{
	Super::Tick(DeltaTime);

}

이렇게 작성한

HandLeft.cpp로 블루프린트 생성,

뷰포트에 추가해서 회전이랑 스케일링해주면 이런식으로 오큘러스 손이 나온다.

포저블 매시 본을 어떻게 회전시키나 찾아보면

이런식으로 제공되는 함수들이 보임

https://docs.unrealengine.com/4.26/en-US/BlueprintAPI/Components/PoseableMesh/

Poseable Mesh

docs.unrealengine.com

매 틱마다 y축으로 +1도씩

새끼 손가락 1번관절을 회전시켜주는 블루프린트 작성

저작자표시

'컴퓨터과학 > 언리얼' 카테고리의 다른 글

HandDesktop13 - 손 관절 가지고 놀기 (0)	2024.02.01
HandDesktop12 - 손 관절 회전 방향 구하기(fail?) (0)	2024.02.01
HandDesktop10 - blazehand c++, unreal 적용 (0)	2024.01.29
HandDesktop09 - 기존 문제들 원인 찾음, blaze 갈아엎기, 블라즈팜 문제해결 (0)	2024.01.26
HandDesktop08 - blaze hand 사용하기 (0)	2024.01.26

HandDesktop10 - blazehand c++, unreal 적용

ggamji 2024. 1. 29. 09:53

2024. 1. 29. 09:53

이전에 파이썬으로 블라즈핸드 돌렸는데

이번에는 C++로 적용

아래는 블라즈팜, 블라즈핸즈 돌려보는 단일 코드

#include <opencv2/opencv.hpp>


#include <vector>

std::vector<std::array<int, 2>> HAND_CONNECTIONS = {
    {0, 1}, {1, 2}, {2, 3}, {3, 4},
    {5, 6}, {6, 7}, {7, 8},
    {9, 10}, {10, 11}, {11, 12},
    {13, 14}, {14, 15}, {15, 16},
    {17, 18}, {18, 19}, {19, 20},
    {0, 5}, {5, 9}, {9, 13}, {13, 17}, {0, 17}
};


// var and funcs for blazepalm
struct PalmDetection {
    float ymin;
    float xmin;
    float ymax;
    float xmax;
    cv::Point2d kp_arr[7];
    float score;
};

int blazeHandSize = 256;
int blazePalmSize = 128;
float palmMinScoreThresh = 0.7;
float palmMinNMSThresh = 0.4;
int palmMinNumKeyPoints = 7;

void ResizeAndPad(
    cv::Mat& srcimg, cv::Mat& img256,
    cv::Mat& img128, float& scale, cv::Scalar& pad
);



std::vector<PalmDetection> PredictPalmDetections(cv::Mat& img, cv::Mat& frame);
PalmDetection GetPalmDetection(cv::Mat regressor, cv::Mat classificator,
    int stride, int anchor_count, int column, int row, int anchor, int offset);
float sigmoid(float x);
std::vector<PalmDetection> DenormalizePalmDetections(std::vector<PalmDetection> detections, int width, int height, cv::Scalar pad);
void DrawPalmDetections(cv::Mat& img, std::vector<PalmDetection> denormDets);
std::vector<PalmDetection> FilteringDets(std::vector<PalmDetection> detections, int width, int height);




std::vector<cv::Rect> convertHandRects(std::vector<PalmDetection> filteredDets);
void DrawRects(cv::Mat& img, std::vector<cv::Rect> rects);
void GetHandImages(cv::Mat& img, std::vector<cv::Rect> rects, std::vector<cv::Mat>& handImgs);
std::vector<cv::Mat> PredictHandDetections(std::vector<cv::Mat>& imgs);
std::vector<cv::Mat> DenormalizeHandLandmarks(std::vector<cv::Mat> imgs_landmarks, std::vector<cv::Rect> rects);
void DrawHandDetections(cv::Mat img, std::vector<cv::Mat> denorm_imgs_landmarks);


cv::dnn::Net blazePalm = cv::dnn::readNetFromONNX("c:/blazepalm_old.onnx");

cv::dnn::Net blazeHand = cv::dnn::readNetFromONNX("c:/blazehand.onnx");
float blazeHandDScale = 3;

int webcamWidth = 640;
int webcamHeight = 480;



cv::Mat img256;
cv::Mat img128;
float scale;
cv::Scalar pad;



int main() {
    cv::Mat frame;

    cv::VideoCapture cap(0);
    if (!cap.isOpened()) {
        std::cout << "Failed to open camera" << std::endl;
        return -1;
    }


    while (true) {
        bool ret = cap.read(frame);
        if (!ret)
        {
            break;
        }



        ResizeAndPad(frame, img256, img128, scale, pad);
        //UE_LOG(LogTemp, Log, TEXT("scale value: %f, pad value: (%f, %f)"), scale, pad[0], pad[1]);
        std::vector<PalmDetection> normDets = PredictPalmDetections(img128, frame);
        std::vector<PalmDetection> denormDets = DenormalizePalmDetections(normDets, webcamWidth, webcamHeight, pad);
        std::vector<PalmDetection> filteredDets = FilteringDets(denormDets, webcamWidth, webcamHeight);



        std::vector<cv::Rect> handRects = convertHandRects(filteredDets);
        std::vector<cv::Mat> handImgs;
        GetHandImages(frame, handRects, handImgs);
        std::vector<cv::Mat> imgs_landmarks = PredictHandDetections(handImgs);
        std::vector<cv::Mat> denorm_imgs_landmarks = DenormalizeHandLandmarks(imgs_landmarks, handRects);





        std::string dets_size_str = "filtered dets : " + std::to_string(filteredDets.size()) + ", norm dets : " + std::to_string(normDets.size()) + ", denorm dets : " + std::to_string(denormDets.size());
        cv::putText(frame, dets_size_str, cv::Point(30, 30), cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(255, 0, 0), 2);

        DrawRects(frame, handRects);
        DrawPalmDetections(frame, filteredDets);
        DrawHandDetections(frame, denorm_imgs_landmarks);


        cv::imshow("Camera Streaming", frame);
        cv::imshow("img256", img256);
        cv::imshow("img128", img128);

        if (cv::waitKey(1) == 'q') {
            break;
        }
    }

    cap.release();

    cv::destroyAllWindows();

    return 0;
}






void ResizeAndPad(
    cv::Mat& srcimg, cv::Mat& img256,
    cv::Mat& img128, float& scale, cv::Scalar& pad
)
{
    float h1, w1;
    int padw, padh;

    cv::Size size0 = srcimg.size();
    if (size0.height >= size0.width) {
        h1 = 256;
        w1 = 256 * size0.width / size0.height;
        padh = 0;
        padw = static_cast<int>(256 - w1);
        scale = size0.width / static_cast<float>(w1);
    }
    else {
        h1 = 256 * size0.height / size0.width;
        w1 = 256;
        padh = static_cast<int>(256 - h1);
        padw = 0;
        scale = size0.height / static_cast<float>(h1);
    }

    //UE_LOG(LogTemp, Log, TEXT("scale value: %f, size0.height: %d, size0.width : %d, h1 : %f"), scale, size0.height, size0.width, h1);

    int padh1 = static_cast<int>(padh / 2);
    int padh2 = static_cast<int>(padh / 2) + static_cast<int>(padh % 2);
    int padw1 = static_cast<int>(padw / 2);
    int padw2 = static_cast<int>(padw / 2) + static_cast<int>(padw % 2);
    pad = cv::Scalar(static_cast<float>(padh1 * scale), static_cast<float>(padw1 * scale));


    cv::resize(srcimg, img256, cv::Size(w1, h1));
    cv::copyMakeBorder(img256, img256, padh1, padh2, padw1, padw2, cv::BORDER_CONSTANT, cv::Scalar(0, 0, 0));
    cv::resize(img256, img128, cv::Size(128, 128));
    cv::cvtColor(img256, img256, cv::COLOR_BGR2RGB);
    cv::cvtColor(img128, img128, cv::COLOR_BGR2RGB);
}






std::vector<PalmDetection> PredictPalmDetections(cv::Mat& img, cv::Mat& frame)
{
    std::vector<PalmDetection> beforeNMSResults;
    std::vector<PalmDetection> afterNMSResults;
    std::vector<float> scores;
    std::vector<int> indices;
    std::vector<cv::Rect> boundingBoxes;


    cv::Mat tensor;
    img.convertTo(tensor, CV_32F);
    tensor = tensor / 127.5 - 1;
    cv::Mat blob = cv::dnn::blobFromImage(tensor, 1.0, tensor.size(), 0, false, false, CV_32F);
    std::vector<cv::String> outNames(2);
    outNames[0] = "regressors";
    outNames[1] = "classificators";

    blazePalm.setInput(blob);
    std::vector<cv::Mat> outputs;
    blazePalm.forward(outputs, outNames);

    cv::Mat classificator = outputs[0];
    cv::Mat regressor = outputs[1];


    for (int y = 0; y < 16; ++y) {
        for (int x = 0; x < 16; ++x) {
            for (int a = 0; a < 2; ++a) {
                PalmDetection res = GetPalmDetection(regressor, classificator, 8, 2, x, y, a, 0);
                if (res.score != 0)
                {
                    beforeNMSResults.push_back(res);

                    cv::Point2d startPt = cv::Point2d(res.xmin * 128, res.ymin * 128);
                    cv::Point2d endPt = cv::Point2d(res.xmax * 128, res.ymax * 128);

                    boundingBoxes.push_back(cv::Rect(startPt, endPt));
                    scores.push_back(res.score);
                }
            }
        }
    }

    for (int y = 0; y < 8; ++y) {
        for (int x = 0; x < 8; ++x) {
            for (int a = 0; a < 6; ++a) {
                PalmDetection res = GetPalmDetection(regressor, classificator, 16, 6, x, y, a, 512);
                if (res.score != 0)
                {
                    beforeNMSResults.push_back(res);
                    cv::Point2d startPt = cv::Point2d(res.xmin * 128, res.ymin * 128);
                    cv::Point2d endPt = cv::Point2d(res.xmax * 128, res.ymax * 128);

                    boundingBoxes.push_back(cv::Rect(startPt, endPt));
                    scores.push_back(res.score);
                }
            }
        }
    }






    std::vector<PalmDetection> denormBeforeNMSResults = DenormalizePalmDetections(beforeNMSResults, 640, 480, cv::Scalar(80, 0));


    for (auto& denormBeforeNMSResult : denormBeforeNMSResults)
    {


        cv::Point2d startPt = cv::Point2d(denormBeforeNMSResult.xmin, denormBeforeNMSResult.ymin);
        cv::Point2d endPt = cv::Point2d(denormBeforeNMSResult.xmax, denormBeforeNMSResult.ymax);
        //cv::rectangle(frame, cv::Rect(startPt, endPt), cv::Scalar(0, 0, 255), 1);


        std::string score_str = std::to_string(static_cast<int>(denormBeforeNMSResult.score * 100));
        //cv::putText(frame, score_str, cv::Point(startPt.x, startPt.y - 10), cv::FONT_HERSHEY_SIMPLEX, 0.4, cv::Scalar(0, 0, 255), 2);


        /*
        for (int i = 0; i < palmMinNumKeyPoints; i++)
            cv::circle(frame, denormBeforeNMSResult.kp_arr[i], 3, cv::Scalar(0, 0, 255), -1);
        */
    }


    cv::dnn::NMSBoxes(boundingBoxes, scores, palmMinScoreThresh, palmMinNMSThresh, indices);



    for (int i = 0; i < indices.size(); i++) {
        int idx = indices[i];
        afterNMSResults.push_back(beforeNMSResults[idx]);
    }










    std::vector<PalmDetection> denormAfterNMSResults = DenormalizePalmDetections(afterNMSResults, 640, 480, cv::Scalar(80, 0));


    for (auto& denormAfterNMSResult : denormAfterNMSResults)
    {


        cv::Point2d startPt = cv::Point2d(denormAfterNMSResult.xmin, denormAfterNMSResult.ymin);
        cv::Point2d endPt = cv::Point2d(denormAfterNMSResult.xmax, denormAfterNMSResult.ymax);
        cv::rectangle(frame, cv::Rect(startPt, endPt), cv::Scalar(0, 255, 255), 2);


        for (int i = 0; i < palmMinNumKeyPoints; i++)
            cv::circle(frame, denormAfterNMSResult.kp_arr[i], 6, cv::Scalar(0, 255, 255), -1);
    }












    return afterNMSResults;
}

PalmDetection GetPalmDetection(cv::Mat regressor, cv::Mat classificator,
    int stride, int anchor_count, int column, int row, int anchor, int offset) {

    PalmDetection res;

    int index = (int(row * 128 / stride) + column) * anchor_count + anchor + offset;
    float origin_score = regressor.at<float>(0, index, 0);
    float score = sigmoid(origin_score);
    if (score < palmMinScoreThresh) return res;

    float x = classificator.at<float>(0, index, 0);
    float y = classificator.at<float>(0, index, 1);
    float w = classificator.at<float>(0, index, 2);
    float h = classificator.at<float>(0, index, 3);

    x += (column + 0.5) * stride - w / 2;
    y += (row + 0.5) * stride - h / 2;

    res.ymin = (y) / blazePalmSize;
    res.xmin = (x) / blazePalmSize;
    res.ymax = (y + h) / blazePalmSize;
    res.xmax = (x + w) / blazePalmSize;

    //std::cout << "score : " << score << ", coord : " << res.ymin << ", " << res.xmin << ", " << res.ymax << ", " << res.xmax << std::endl;
    //if ((res.ymin < 0) || (res.xmin < 0) || (res.xmax > 1) || (res.ymax > 1)) return res;

    res.score = score;

    std::vector<cv::Point2d> kpts;
    for (int key_id = 0; key_id < palmMinNumKeyPoints; key_id++)
    {
        float kpt_x = classificator.at<float>(0, index, 4 + key_id * 2);
        float kpt_y = classificator.at<float>(0, index, 5 + key_id * 2);
        kpt_x += (column + 0.5) * stride;
        kpt_x = kpt_x / blazePalmSize;
        kpt_y += (row + 0.5) * stride;
        kpt_y = kpt_y / blazePalmSize;
        //UE_LOG(LogTemp, Log, TEXT("kpt id(%d) : (%f, %f)"), key_id, kpt_x, kpt_y);
        res.kp_arr[key_id] = cv::Point2d(kpt_x, kpt_y);

    }
    return res;
}

float sigmoid(float x) {
    return 1 / (1 + exp(-x));
}


std::vector<PalmDetection> DenormalizePalmDetections(std::vector<PalmDetection> detections, int width, int height, cv::Scalar pad)
{

    std::vector<PalmDetection> denormDets;

    int scale = 0;
    if (width > height)
        scale = width;
    else
        scale = height;

    for (auto& det : detections)
    {
        PalmDetection denormDet;
        denormDet.ymin = det.ymin * scale - pad[0];
        denormDet.xmin = det.xmin * scale - pad[1];
        denormDet.ymax = det.ymax * scale - pad[0];
        denormDet.xmax = det.xmax * scale - pad[1];
        denormDet.score = det.score;

        for (int i = 0; i < palmMinNumKeyPoints; i++)
        {
            cv::Point2d pt_new = cv::Point2d(det.kp_arr[i].x * scale - pad[1], det.kp_arr[i].y * scale - pad[0]);
            //UE_LOG(LogTemp, Log, TEXT("denorm kpt id(%d) : (%f, %f)"), i, pt_new.x, pt_new.y);
            denormDet.kp_arr[i] = pt_new;
        }
        denormDets.push_back(denormDet);
    }
    return denormDets;
}

void DrawPalmDetections(cv::Mat& img, std::vector<PalmDetection> denormDets)
{
    for (auto& denormDet : denormDets)
    {
        cv::Point2d startPt = cv::Point2d(denormDet.xmin, denormDet.ymin);
        cv::Point2d endPt = cv::Point2d(denormDet.xmax, denormDet.ymax);
        cv::rectangle(img, cv::Rect(startPt, endPt), cv::Scalar(255, 0, 0), 1);

        for (int i = 0; i < palmMinNumKeyPoints; i++)
            cv::circle(img, denormDet.kp_arr[i], 5, cv::Scalar(255, 0, 0), -1);

        std::string score_str = std::to_string(static_cast<int>(denormDet.score * 100)) + "%";
        cv::putText(img, score_str, cv::Point(startPt.x, startPt.y - 20), cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(0, 0, 255), 2);

    }
}


std::vector<PalmDetection> FilteringDets(std::vector<PalmDetection> detections, int width, int height)
{
    std::vector<PalmDetection> filteredDets;

    for (auto& denormDet : detections)
    {
        cv::Point2d startPt = cv::Point2d(denormDet.xmin, denormDet.ymin);
        if (startPt.x < 10 || startPt.y < 10)
            continue;
        if (startPt.x > width || startPt.y > height)
            continue;
        int w = denormDet.xmax - denormDet.xmin;
        int y = denormDet.ymax - denormDet.ymin;
        if ((w * y < 40 * 40) || (w * y > (width * 0.7) * (height * 0.7)))
            continue;
        filteredDets.push_back(denormDet);
    }
    return filteredDets;
}








std::vector<cv::Rect> convertHandRects(std::vector<PalmDetection> filteredDets)
{
    std::vector<cv::Rect> handRects;
    for (auto& det : filteredDets)
    {
        int width = det.xmax - det.xmin; 
        int height = det.ymax - det.ymin;
        int large_length = 0;

        int center_x = static_cast<int>((det.xmax + det.xmin) / 2);
        int center_y = static_cast<int>((det.ymax + det.ymin) / 2);

        if (width > height)
            large_length = height * blazeHandDScale;
        else
            large_length = height * blazeHandDScale;

        int start_x = center_x - static_cast<int>(large_length / 2);
        int start_y = center_y - static_cast<int>(large_length / 2) - large_length * 0.1;

        int end_x = center_x + static_cast<int>(large_length / 2);
        int end_y = center_y + static_cast<int>(large_length / 2) - large_length * 0.1;


        start_x = std::max(start_x, 0);
        start_y = std::max(start_y, 0);
        end_x = std::min(end_x, webcamWidth);
        end_y = std::min(end_y, webcamHeight);

        cv::Point2d startPt = cv::Point2d(start_x, start_y);
        cv::Point2d endPt = cv::Point2d(end_x, end_y);

        cv::Rect handRect(startPt, endPt);
        handRects.push_back(handRect);
    }

    return handRects;
}

void DrawRects(cv::Mat& img, std::vector<cv::Rect> rects)
{
    for (auto& rect : rects)
    {
        cv::rectangle(img, rect, cv::Scalar(255, 255, 255), 1);
    }
}

void GetHandImages(cv::Mat& img, std::vector<cv::Rect> rects, std::vector<cv::Mat>& handImgs)
{
    for (auto& rect : rects)
    {

        //std::cout << "img size : " << img.size() << ", rect : " << rect << std::endl;

        //cv::Mat roi(cv::Size(rect.width, rect.height), img.type(), cv::Scalar(0, 0, 0))

        cv::Mat roi = img(rect).clone();
        cv::cvtColor(roi, roi, cv::COLOR_BGR2RGB);
        cv::resize(roi, roi, cv::Size(blazeHandSize, blazeHandSize));
        handImgs.push_back(roi);
    }
}



std::vector<cv::Mat> PredictHandDetections(std::vector<cv::Mat>& imgs)
{
    std::vector<cv::Mat> imgs_landmarks;


    for (auto& img : imgs)
    {
        cv::Mat tensor;
        img.convertTo(tensor, CV_32F);
        tensor = tensor / blazeHandSize;
        cv::Mat blob = cv::dnn::blobFromImage(tensor, 1.0, tensor.size(), 0, false, false, CV_32F);
        std::vector<cv::String> outNames(3);
        outNames[0] = "hand_flag";
        outNames[1] = "handedness";
        outNames[2] = "landmarks";



        blazeHand.setInput(blob);
        std::vector<cv::Mat> outputs;
        blazeHand.forward(outputs, outNames);

        cv::Mat handflag = outputs[0];
        cv::Mat handedness = outputs[1];
        cv::Mat landmarks = outputs[2];


        imgs_landmarks.push_back(landmarks);
    }
    return imgs_landmarks;
}

std::vector<cv::Mat> DenormalizeHandLandmarks(std::vector<cv::Mat> imgs_landmarks, std::vector<cv::Rect> rects)
{
    std::vector<cv::Mat> denorm_imgs_landmarks;

    for (int i = 0; i < imgs_landmarks.size(); i++)
    {
        cv::Mat landmarks = imgs_landmarks.at(i);
        cv::Rect rect = rects.at(i);
        cv::Mat squeezed = landmarks.reshape(0, landmarks.size[1]);

        //std::cout << "squeezed size: " << squeezed.size[0] << "," << squeezed.size[1] << std::endl;

        for (int j = 0; j < squeezed.size[0]; j++)
        {
            squeezed.at<float>(j, 0) = squeezed.at<float>(j, 0) * rect.width + rect.x;
            squeezed.at<float>(j, 1) = squeezed.at<float>(j, 1) * rect.height + rect.y;
            squeezed.at<float>(j, 2) = squeezed.at<float>(j, 2) * rect.height * -1;


        }
        denorm_imgs_landmarks.push_back(squeezed);
    }

    //std::cout << std::endl;
    return denorm_imgs_landmarks;
}

void DrawHandDetections(cv::Mat img, std::vector<cv::Mat> denorm_imgs_landmarks)
{
    for (auto& denorm_landmarks : denorm_imgs_landmarks)
    {
        for (int i = 0; i < denorm_landmarks.size[0]; i++)
        {
            float x = denorm_landmarks.at<float>(i, 0);
            float y = denorm_landmarks.at<float>(i, 1);

            cv::Point pt(x, y);

            cv::circle(img, pt, 5, cv::Scalar(0, 255, 0), -1);

        }

        for (auto& connection : HAND_CONNECTIONS)
        {
            int startPtIdx = connection[0];
            int endPtIdx = connection[1];

            float startPtX = denorm_landmarks.at<float>(startPtIdx, 0);
            float startPtY = denorm_landmarks.at<float>(startPtIdx, 1);
            float endPtX = denorm_landmarks.at<float>(endPtIdx, 0);
            float endPtY = denorm_landmarks.at<float>(endPtIdx, 1);
            cv::line(img, cv::Point(startPtX, startPtY), cv::Point(endPtX, endPtY), cv::Scalar(255, 255, 255), 3);


        }
    }
}

언리얼에 적용 결과

블라즈 해더 cpp

// Fill out your copyright notice in the Description page of Project Settings.

#pragma once
#include "PreOpenCVHeaders.h"
#include <opencv2/opencv.hpp>
#include "PostOpenCVHeaders.h"

#include "CoreMinimal.h"

/**
 * 
 */
class HANDDESKTOP_API Blaze
{
public:
	Blaze();
	~Blaze();

	cv::dnn::Net blazePalm;
	cv::dnn::Net blazeHand;



	// var and funcs for blazepalm
	struct PalmDetection {
		float ymin;
		float xmin;
		float ymax;
		float xmax;
		cv::Point2d kp_arr[7];
		float score;
	};


	int blazeHandSize = 256;
	int blazePalmSize = 128;
	float palmMinScoreThresh = 0.7;
	float palmMinNMSThresh = 0.4;
	int palmMinNumKeyPoints = 7;

	void ResizeAndPad(
		cv::Mat& srcimg, cv::Mat& img256,
		cv::Mat& img128, float& scale, cv::Scalar& pad
	);

	// funcs for blaze palm
	std::vector<PalmDetection> PredictPalmDetections(cv::Mat& img);
	PalmDetection GetPalmDetection(cv::Mat regressor, cv::Mat classificator,
		int stride, int anchor_count, int column, int row, int anchor, int offset);
	float sigmoid(float x);
	std::vector<PalmDetection> DenormalizePalmDetections(std::vector<PalmDetection> detections, int width, int height, cv::Scalar pad);
	void DrawPalmDetections(cv::Mat& img, std::vector<Blaze::PalmDetection> denormDets);
	void DrawDetsInfo(cv::Mat& img, std::vector<Blaze::PalmDetection> filteredDets, std::vector<Blaze::PalmDetection> normDets, std::vector<Blaze::PalmDetection> denormDets);
	std::vector<PalmDetection> FilteringDets(std::vector<PalmDetection> detections, int width, int height);




	// vars and funcs for blaze hand
	std::vector<std::array<int, 2>> HAND_CONNECTIONS = {
	{0, 1}, {1, 2}, {2, 3}, {3, 4},
	{5, 6}, {6, 7}, {7, 8},
	{9, 10}, {10, 11}, {11, 12},
	{13, 14}, {14, 15}, {15, 16},
	{17, 18}, {18, 19}, {19, 20},
	{0, 5}, {5, 9}, {9, 13}, {13, 17}, {0, 17}
	};
	float blazeHandDScale = 3;
	int webcamWidth = 640;
	int webcamHeight = 480;



	std::vector<cv::Rect> convertHandRects(std::vector<PalmDetection> filteredDets);
	void DrawRects(cv::Mat& img, std::vector<cv::Rect> rects);
	void GetHandImages(cv::Mat& img, std::vector<cv::Rect> rects, std::vector<cv::Mat>& handImgs);
	std::vector<cv::Mat> PredictHandDetections(std::vector<cv::Mat>& imgs);
	std::vector<cv::Mat> DenormalizeHandLandmarks(std::vector<cv::Mat> imgs_landmarks, std::vector<cv::Rect> rects);
	void DrawHandDetections(cv::Mat img, std::vector<cv::Mat> denorm_imgs_landmarks);
};

// Fill out your copyright notice in the Description page of Project Settings.


#include "Blaze.h"

Blaze::Blaze()
{
	this->blazePalm = cv::dnn::readNet("c:/blazepalm_old.onnx");
	this->blazeHand = cv::dnn::readNet("c:/blazehand.onnx");
}

Blaze::~Blaze()
{
}


void Blaze::ResizeAndPad(
    cv::Mat& srcimg, cv::Mat& img256,
    cv::Mat& img128, float& scale, cv::Scalar& pad
)
{
    float h1, w1;
    int padw, padh;

    cv::Size size0 = srcimg.size();
    if (size0.height >= size0.width) {
        h1 = 256;
        w1 = 256 * size0.width / size0.height;
        padh = 0;
        padw = static_cast<int>(256 - w1);
        scale = size0.width / static_cast<float>(w1);
    }
    else {
        h1 = 256 * size0.height / size0.width;
        w1 = 256;
        padh = static_cast<int>(256 - h1);
        padw = 0;
        scale = size0.height / static_cast<float>(h1);
    }

    //UE_LOG(LogTemp, Log, TEXT("scale value: %f, size0.height: %d, size0.width : %d, h1 : %f"), scale, size0.height, size0.width, h1);

    int padh1 = static_cast<int>(padh / 2);
    int padh2 = static_cast<int>(padh / 2) + static_cast<int>(padh % 2);
    int padw1 = static_cast<int>(padw / 2);
    int padw2 = static_cast<int>(padw / 2) + static_cast<int>(padw % 2);
    pad = cv::Scalar(static_cast<float>(padh1 * scale), static_cast<float>(padw1 * scale));


    cv::resize(srcimg, img256, cv::Size(w1, h1));
    cv::copyMakeBorder(img256, img256, padh1, padh2, padw1, padw2, cv::BORDER_CONSTANT, cv::Scalar(0, 0, 0));

    cv::resize(img256, img128, cv::Size(128, 128));
    cv::cvtColor(img256, img256, cv::COLOR_BGR2RGB);
    cv::cvtColor(img128, img128, cv::COLOR_BGR2RGB);
}



std::vector<Blaze::PalmDetection> Blaze::PredictPalmDetections(cv::Mat& img)
{
    std::vector<Blaze::PalmDetection> beforeNMSResults;
    std::vector<Blaze::PalmDetection> afterNMSResults;
    std::vector<float> scores;
    std::vector<int> indices;
    std::vector<cv::Rect> boundingBoxes;


    cv::Mat tensor;
    img.convertTo(tensor, CV_32F);
    tensor = tensor / 127.5 - 1;
    cv::Mat blob = cv::dnn::blobFromImage(tensor, 1.0, tensor.size(), 0, false, false, CV_32F);
    std::vector<cv::String> outNames(2);
    outNames[0] = "regressors";
    outNames[1] = "classificators";

    blazePalm.setInput(blob);
    std::vector<cv::Mat> outputs;
    blazePalm.forward(outputs, outNames);

    cv::Mat classificator = outputs[0];
    cv::Mat regressor = outputs[1];


    for (int y = 0; y < 16; ++y) {
        for (int x = 0; x < 16; ++x) {
            for (int a = 0; a < 2; ++a) {
                PalmDetection res = GetPalmDetection(regressor, classificator, 8, 2, x, y, a, 0);
                if (res.score != 0)
                {
                    beforeNMSResults.push_back(res);

                    cv::Point2d startPt = cv::Point2d(res.xmin * blazePalmSize, res.ymin * blazePalmSize);
                    cv::Point2d endPt = cv::Point2d(res.xmax * blazePalmSize, res.ymax * blazePalmSize);

                    boundingBoxes.push_back(cv::Rect(startPt, endPt));
                    scores.push_back(res.score);
                }
            }
        }
    }

    for (int y = 0; y < 8; ++y) {
        for (int x = 0; x < 8; ++x) {
            for (int a = 0; a < 6; ++a) {
                PalmDetection res = GetPalmDetection(regressor, classificator, 16, 6, x, y, a, 512);
                if (res.score != 0)
                {
                    beforeNMSResults.push_back(res);
                    cv::Point2d startPt = cv::Point2d(res.xmin * blazePalmSize, res.ymin * blazePalmSize);
                    cv::Point2d endPt = cv::Point2d(res.xmax * blazePalmSize, res.ymax * blazePalmSize);

                    boundingBoxes.push_back(cv::Rect(startPt, endPt));
                    scores.push_back(res.score);
                }
            }
        }
    }


    cv::dnn::NMSBoxes(boundingBoxes, scores, palmMinScoreThresh, palmMinNMSThresh, indices);

    for (int i = 0; i < indices.size(); i++) {
        int idx = indices[i];
        afterNMSResults.push_back(beforeNMSResults[idx]);
    }

    return afterNMSResults;
}

Blaze::PalmDetection Blaze::GetPalmDetection(cv::Mat regressor, cv::Mat classificator,
    int stride, int anchor_count, int column, int row, int anchor, int offset) {

    Blaze::PalmDetection res;

    int index = (int(row * 128 / stride) + column) * anchor_count + anchor + offset;
    float origin_score = regressor.at<float>(0, index, 0);
    float score = sigmoid(origin_score);
    if (score < palmMinScoreThresh) return res;

    float x = classificator.at<float>(0, index, 0);
    float y = classificator.at<float>(0, index, 1);
    float w = classificator.at<float>(0, index, 2);
    float h = classificator.at<float>(0, index, 3);

    x += (column + 0.5) * stride - w / 2;
    y += (row + 0.5) * stride - h / 2;

    res.ymin = (y) / blazePalmSize;
    res.xmin = (x) / blazePalmSize;
    res.ymax = (y + h) / blazePalmSize;
    res.xmax = (x + w) / blazePalmSize;

    //if ((res.ymin < 0) || (res.xmin < 0) || (res.xmax > 1) || (res.ymax > 1)) return res;

    res.score = score;

    std::vector<cv::Point2d> kpts;
    for (int key_id = 0; key_id < palmMinNumKeyPoints; key_id++)
    {
        float kpt_x = classificator.at<float>(0, index, 4 + key_id * 2);
        float kpt_y = classificator.at<float>(0, index, 5 + key_id * 2);
        kpt_x += (column + 0.5) * stride;
        kpt_x  = kpt_x / blazePalmSize;
        kpt_y += (row + 0.5) * stride;
        kpt_y  = kpt_y / blazePalmSize;
        //UE_LOG(LogTemp, Log, TEXT("kpt id(%d) : (%f, %f)"), key_id, kpt_x, kpt_y);
        res.kp_arr[key_id] = cv::Point2d(kpt_x, kpt_y);

    }
    return res;
}

float Blaze::sigmoid(float x) {
    return 1 / (1 + exp(-x));
}


std::vector<Blaze::PalmDetection> Blaze::DenormalizePalmDetections(std::vector<Blaze::PalmDetection> detections, int width, int height, cv::Scalar pad)
{

    std::vector<Blaze::PalmDetection> denormDets;

    int scale = 0;
    if (width > height)
        scale = width;
    else
        scale = height;

    for (auto& det : detections)
    {
        Blaze::PalmDetection denormDet;
        denormDet.ymin = det.ymin * scale - pad[0];
        denormDet.xmin = det.xmin * scale - pad[1];
        denormDet.ymax = det.ymax * scale - pad[0];
        denormDet.xmax = det.xmax * scale - pad[1];
        denormDet.score = det.score;

        for (int i = 0; i < palmMinNumKeyPoints; i++)
        {
            cv::Point2d pt_new = cv::Point2d(det.kp_arr[i].x * scale - pad[1], det.kp_arr[i].y * scale - pad[0]);
            //UE_LOG(LogTemp, Log, TEXT("denorm kpt id(%d) : (%f, %f)"), i, pt_new.x, pt_new.y);
            denormDet.kp_arr[i] = pt_new;
        }
        denormDets.push_back(denormDet);
    }
    return denormDets;
}

void Blaze::DrawPalmDetections(cv::Mat& img, std::vector<Blaze::PalmDetection> denormDets)
{
    for (auto& denormDet : denormDets)
    {
        cv::Point2d startPt = cv::Point2d(denormDet.xmin, denormDet.ymin);
        cv::Point2d endPt = cv::Point2d(denormDet.xmax, denormDet.ymax);
        cv::rectangle(img, cv::Rect(startPt, endPt), cv::Scalar(255, 0, 0), 1);

        for (int i = 0; i < palmMinNumKeyPoints; i++)
            cv::circle(img, denormDet.kp_arr[i], 5, cv::Scalar(255, 0, 0), -1);

        std::string score_str = std::to_string(static_cast<int>(denormDet.score * 100))+ "%";
        cv::putText(img, score_str, cv::Point(startPt.x, startPt.y - 20), cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(0, 0, 255), 2);

    }
}

void Blaze::DrawDetsInfo(cv::Mat& img, std::vector<Blaze::PalmDetection> filteredDets, std::vector<Blaze::PalmDetection> normDets, std::vector<Blaze::PalmDetection> denormDets)
{
    //puttext num of filtered/denorm dets
    std::string dets_size_str = "filtered dets : " + std::to_string(filteredDets.size()) + ", norm dets : " + std::to_string(normDets.size()) + ", denorm dets : " + std::to_string(denormDets.size());
    cv::putText(img, dets_size_str, cv::Point(30, 30), cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(255, 0, 0), 2);

    for (int i = 0; i < filteredDets.size(); i++)
    {
        auto& det = filteredDets.at(i);

        std::ostringstream oss;
        oss << "filteredDets : (" << det.xmin << ", " << det.ymin << "),(" <<
            det.xmax << ", " << det.ymax << ")";
        std::string det_str = oss.str();
        cv::putText(img, det_str, cv::Point(30, 50 + 20 * i), cv::FONT_HERSHEY_SIMPLEX, 0.4, cv::Scalar(0, 0, 255), 2);
    }

}




std::vector<Blaze::PalmDetection> Blaze::FilteringDets(std::vector<Blaze::PalmDetection> detections, int width, int height)
{
    std::vector<Blaze::PalmDetection> filteredDets;

    for (auto& denormDet : detections)
    {
        cv::Point2d startPt = cv::Point2d(denormDet.xmin, denormDet.ymin);
        if (startPt.x < 10 || startPt.y < 10)
            continue;
        if (startPt.x > width || startPt.y > height)
            continue;
        int w = denormDet.xmax - denormDet.xmin;
        int y = denormDet.ymax - denormDet.ymin;
        if ((w * y < 40 * 40 )|| (w * y  > (width * 0.7) * (height * 0.7)))
            continue;
        filteredDets.push_back(denormDet);
    }
    return filteredDets;
}







std::vector<cv::Rect> Blaze::convertHandRects(std::vector<PalmDetection> filteredDets)
{
    std::vector<cv::Rect> handRects;
    for (auto& det : filteredDets)
    {
        int width = det.xmax - det.xmin; 
        int height = det.ymax - det.ymin;
        int large_length = 0;

        int center_x = static_cast<int>((det.xmax + det.xmin) / 2);
        int center_y = static_cast<int>((det.ymax + det.ymin) / 2);

        if (width > height)
            large_length = height * blazeHandDScale;
        else
            large_length = height * blazeHandDScale;

        int start_x = center_x - static_cast<int>(large_length / 2);
        int start_y = center_y - static_cast<int>(large_length / 2) - large_length * 0.1;

        int end_x = center_x + static_cast<int>(large_length / 2);
        int end_y = center_y + static_cast<int>(large_length / 2) - large_length * 0.1;


        start_x = std::max(start_x, 0);
        start_y = std::max(start_y, 0);
        end_x = std::min(end_x, webcamWidth);
        end_y = std::min(end_y, webcamHeight);

        cv::Point2d startPt = cv::Point2d(start_x, start_y);
        cv::Point2d endPt = cv::Point2d(end_x, end_y);

        cv::Rect handRect(startPt, endPt);
        handRects.push_back(handRect);
    }

    return handRects;
}

void Blaze::DrawRects(cv::Mat& img, std::vector<cv::Rect> rects)
{
    for (auto& rect : rects)
    {
        cv::rectangle(img, rect, cv::Scalar(255, 255, 255), 1);
    }
}

void Blaze::GetHandImages(cv::Mat& img, std::vector<cv::Rect> rects, std::vector<cv::Mat>& handImgs)
{
    for (auto& rect : rects)
    {

        //std::cout << "img size : " << img.size() << ", rect : " << rect << std::endl;

        //cv::Mat roi(cv::Size(rect.width, rect.height), img.type(), cv::Scalar(0, 0, 0))

        cv::Mat roi = img(rect).clone();
        cv::cvtColor(roi, roi, cv::COLOR_BGR2RGB);
        cv::resize(roi, roi, cv::Size(blazeHandSize, blazeHandSize));
        handImgs.push_back(roi);
    }
}



std::vector<cv::Mat> Blaze::PredictHandDetections(std::vector<cv::Mat>& imgs)
{
    std::vector<cv::Mat> imgs_landmarks;


    for (auto& img : imgs)
    {
        cv::Mat tensor;
        img.convertTo(tensor, CV_32F);
        tensor = tensor / blazeHandSize;
        cv::Mat blob = cv::dnn::blobFromImage(tensor, 1.0, tensor.size(), 0, false, false, CV_32F);
        std::vector<cv::String> outNames(3);
        outNames[0] = "hand_flag";
        outNames[1] = "handedness";
        outNames[2] = "landmarks";



        blazeHand.setInput(blob);
        std::vector<cv::Mat> outputs;
        blazeHand.forward(outputs, outNames);

        cv::Mat handflag = outputs[0];
        cv::Mat handedness = outputs[1];
        cv::Mat landmarks = outputs[2];


        imgs_landmarks.push_back(landmarks);
    }
    return imgs_landmarks;
}

std::vector<cv::Mat> Blaze::DenormalizeHandLandmarks(std::vector<cv::Mat> imgs_landmarks, std::vector<cv::Rect> rects)
{
    std::vector<cv::Mat> denorm_imgs_landmarks;

    for (int i = 0; i < imgs_landmarks.size(); i++)
    {
        cv::Mat landmarks = imgs_landmarks.at(i);
        cv::Rect rect = rects.at(i);
        cv::Mat squeezed = landmarks.reshape(0, landmarks.size[1]);

        //std::cout << "squeezed size: " << squeezed.size[0] << "," << squeezed.size[1] << std::endl;

        for (int j = 0; j < squeezed.size[0]; j++)
        {
            squeezed.at<float>(j, 0) = squeezed.at<float>(j, 0) * rect.width + rect.x;
            squeezed.at<float>(j, 1) = squeezed.at<float>(j, 1) * rect.height + rect.y;
            squeezed.at<float>(j, 2) = squeezed.at<float>(j, 2) * rect.height * -1;


        }
        denorm_imgs_landmarks.push_back(squeezed);
    }

    //std::cout << std::endl;
    return denorm_imgs_landmarks;
}

void Blaze::DrawHandDetections(cv::Mat img, std::vector<cv::Mat> denorm_imgs_landmarks)
{
    for (auto& denorm_landmarks : denorm_imgs_landmarks)
    {
        for (int i = 0; i < denorm_landmarks.size[0]; i++)
        {
            float x = denorm_landmarks.at<float>(i, 0);
            float y = denorm_landmarks.at<float>(i, 1);

            cv::Point pt(x, y);

            cv::circle(img, pt, 5, cv::Scalar(0, 255, 0), -1);

        }

        for (auto& connection : HAND_CONNECTIONS)
        {
            int startPtIdx = connection[0];
            int endPtIdx = connection[1];

            float startPtX = denorm_landmarks.at<float>(startPtIdx, 0);
            float startPtY = denorm_landmarks.at<float>(startPtIdx, 1);
            float endPtX = denorm_landmarks.at<float>(endPtIdx, 0);
            float endPtY = denorm_landmarks.at<float>(endPtIdx, 1);
            cv::line(img, cv::Point(startPtX, startPtY), cv::Point(endPtX, endPtY), cv::Scalar(255, 255, 255), 3);


        }
    }
}

데스크탑 게임모드베이스 헤더, cpp

// Fill out your copyright notice in the Description page of Project Settings.

#pragma once

#include "Blaze.h"

#include "Windows/AllowWindowsPlatformTypes.h"
#include <Windows.h>
#include "Windows/HideWindowsPlatformTypes.h"

#include "PreOpenCVHeaders.h"
#include <opencv2/opencv.hpp>
#include "PostOpenCVHeaders.h"

#include "CoreMinimal.h"
#include "GameFramework/GameModeBase.h"
#include "DesktopGameModeBase.generated.h"

/**
 * 
 */
UCLASS()
class HANDDESKTOP_API ADesktopGameModeBase : public AGameModeBase
{
	GENERATED_BODY()

protected:
	// Called when the game starts or when spawned
	virtual void BeginPlay() override;

public:
	UFUNCTION(BlueprintCallable)
	void ReadFrame();




	int monitorWidth = 1920;
	int monitorHeight = 1080;
	UPROPERTY(EditAnywhere, BlueprintReadWrite)
	UTexture2D* imageTextureScreen1;
	UPROPERTY(EditAnywhere, BlueprintReadWrite)
	UTexture2D* imageTextureScreen2;
	UPROPERTY(EditAnywhere, BlueprintReadWrite)
	UTexture2D* imageTextureScreen3;
	cv::Mat imageScreen1;
	cv::Mat imageScreen2;
	cv::Mat imageScreen3;

	void ScreensToCVMats();
	void CVMatsToTextures();



	int webcamWidth = 640;
	int webcamHeight = 480;

	cv::VideoCapture capture;
	cv::Mat webcamImage;
	UPROPERTY(EditAnywhere, BlueprintReadWrite)
	UTexture2D* webcamTexture;
	void MatToTexture2D(const cv::Mat InMat);


	//var and functions with blaze
	Blaze blaze;
	cv::Mat img256;
	cv::Mat img128;
	float scale;
	cv::Scalar pad;


};

// Fill out your copyright notice in the Description page of Project Settings.

#include "DesktopGameModeBase.h"


void ADesktopGameModeBase::BeginPlay()
{
	Super::BeginPlay();
	blaze = Blaze();

	capture = cv::VideoCapture(0);
	if (!capture.isOpened())
	{
		UE_LOG(LogTemp, Log, TEXT("Open Webcam failed"));
		return;
	}
	else
	{
		UE_LOG(LogTemp, Log, TEXT("Open Webcam Success"));
	}
	capture.set(cv::CAP_PROP_FRAME_WIDTH, webcamWidth);
	capture.set(cv::CAP_PROP_FRAME_HEIGHT, webcamHeight);


	webcamTexture = UTexture2D::CreateTransient(monitorWidth, monitorHeight, PF_B8G8R8A8);


	imageScreen1 = cv::Mat(monitorHeight, monitorWidth, CV_8UC4);
	imageScreen2 = cv::Mat(monitorHeight, monitorWidth, CV_8UC4);
	imageScreen3 = cv::Mat(monitorHeight, monitorWidth, CV_8UC4);
	imageTextureScreen1 = UTexture2D::CreateTransient(monitorWidth, monitorHeight, PF_B8G8R8A8);
	imageTextureScreen2 = UTexture2D::CreateTransient(monitorWidth, monitorHeight, PF_B8G8R8A8);
	imageTextureScreen3 = UTexture2D::CreateTransient(monitorWidth, monitorHeight, PF_B8G8R8A8);

}


void ADesktopGameModeBase::ReadFrame()
{
	if (!capture.isOpened())
	{
		return;
	}
	capture.read(webcamImage);

	/*
	get filtered detections
	*/
	blaze.ResizeAndPad(webcamImage, img256, img128, scale, pad);
	//UE_LOG(LogTemp, Log, TEXT("scale value: %f, pad value: (%f, %f)"), scale, pad[0], pad[1]);
	std::vector<Blaze::PalmDetection> normDets = blaze.PredictPalmDetections(img128);
	std::vector<Blaze::PalmDetection> denormDets = blaze.DenormalizePalmDetections(normDets, webcamWidth, webcamHeight, pad);
	std::vector<Blaze::PalmDetection> filteredDets = blaze.FilteringDets(denormDets, webcamWidth, webcamHeight);





	std::vector<cv::Rect> handRects = blaze.convertHandRects(filteredDets);
	std::vector<cv::Mat> handImgs;
	blaze.GetHandImages(webcamImage, handRects, handImgs);
	std::vector<cv::Mat> imgs_landmarks = blaze.PredictHandDetections(handImgs);
	std::vector<cv::Mat> denorm_imgs_landmarks = blaze.DenormalizeHandLandmarks(imgs_landmarks, handRects);



	//draw hand rects/ plam detection/ dets info/ hand detection
	blaze.DrawRects(webcamImage, handRects);
	blaze.DrawPalmDetections(webcamImage, filteredDets);
	blaze.DrawDetsInfo(webcamImage, filteredDets, normDets, denormDets);
	blaze.DrawHandDetections(webcamImage, denorm_imgs_landmarks);

	//cv::mat to utexture2d
	MatToTexture2D(webcamImage);


	/*
	모니터 시각화
	*/
	ScreensToCVMats();
	CVMatsToTextures();
}


void ADesktopGameModeBase::MatToTexture2D(const cv::Mat InMat)
{
	if (InMat.type() == CV_8UC3)//example for pre-conversion of Mat
	{
		cv::Mat resizedImage;
		cv::resize(InMat, resizedImage, cv::Size(monitorWidth, monitorHeight));
		cv::Mat bgraImage;
		//if the Mat is in BGR space, convert it to BGRA. There is no three channel texture in UE (at least with eight bit)
		cv::cvtColor(resizedImage, bgraImage, cv::COLOR_BGR2BGRA);

		//actually copy the data to the new texture
		FTexture2DMipMap& Mip = webcamTexture->GetPlatformData()->Mips[0];
		void* Data = Mip.BulkData.Lock(LOCK_READ_WRITE);//lock the texture data
		FMemory::Memcpy(Data, bgraImage.data, bgraImage.total() * bgraImage.elemSize());//copy the data
		Mip.BulkData.Unlock();
		webcamTexture->PostEditChange();
		webcamTexture->UpdateResource();
	}
	else if (InMat.type() == CV_8UC4)
	{
		//actually copy the data to the new texture
		FTexture2DMipMap& Mip = webcamTexture->GetPlatformData()->Mips[0];
		void* Data = Mip.BulkData.Lock(LOCK_READ_WRITE);//lock the texture data
		FMemory::Memcpy(Data, InMat.data, InMat.total() * InMat.elemSize());//copy the data
		Mip.BulkData.Unlock();
		webcamTexture->PostEditChange();
		webcamTexture->UpdateResource();
	}
	//if the texture hasnt the right pixel format, abort.
	webcamTexture->PostEditChange();
	webcamTexture->UpdateResource();
}



void ADesktopGameModeBase::ScreensToCVMats()
{
	HDC hScreenDC = GetDC(NULL);
	HDC hMemoryDC = CreateCompatibleDC(hScreenDC);
	int screenWidth = GetDeviceCaps(hScreenDC, HORZRES);
	int screenHeight = GetDeviceCaps(hScreenDC, VERTRES);

	HBITMAP hBitmap = CreateCompatibleBitmap(hScreenDC, screenWidth, screenHeight);
	HBITMAP hOldBitmap = (HBITMAP)SelectObject(hMemoryDC, hBitmap);

	//screen 1
	BitBlt(hMemoryDC, 0, 0, screenWidth, screenHeight, hScreenDC, 0, 0, SRCCOPY);
	GetBitmapBits(hBitmap, imageScreen1.total() * imageScreen1.elemSize(), imageScreen1.data);

	//screen 2
	BitBlt(hMemoryDC, 0, 0, screenWidth, screenHeight, hScreenDC, 1920, 0, SRCCOPY);
	GetBitmapBits(hBitmap, imageScreen2.total() * imageScreen2.elemSize(), imageScreen2.data);

	//screen 3
	BitBlt(hMemoryDC, 0, 0, screenWidth, screenHeight, hScreenDC, 3840, 0, SRCCOPY);
	GetBitmapBits(hBitmap, imageScreen3.total() * imageScreen3.elemSize(), imageScreen3.data);
	SelectObject(hMemoryDC, hOldBitmap);


	DeleteDC(hScreenDC);
	DeleteDC(hMemoryDC);

	DeleteObject(hBitmap);
	DeleteObject(hOldBitmap);

}


void ADesktopGameModeBase::CVMatsToTextures()
{
	for (int i = 0; i < 3; i++)
	{
		if (i == 0)
		{
			FTexture2DMipMap& Mip = imageTextureScreen1->GetPlatformData()->Mips[0];
			void* Data = Mip.BulkData.Lock(LOCK_READ_WRITE);//lock the texture data
			FMemory::Memcpy(Data, imageScreen1.data, imageScreen1.total() * imageScreen1.elemSize());//copy the data
			Mip.BulkData.Unlock();

			imageTextureScreen1->PostEditChange();
			imageTextureScreen1->UpdateResource();
		}
		else if (i == 1)
		{
			FTexture2DMipMap& Mip = imageTextureScreen2->GetPlatformData()->Mips[0];
			void* Data = Mip.BulkData.Lock(LOCK_READ_WRITE);//lock the texture data
			FMemory::Memcpy(Data, imageScreen2.data, imageScreen2.total() * imageScreen2.elemSize());//copy the data
			Mip.BulkData.Unlock();

			imageTextureScreen2->PostEditChange();
			imageTextureScreen2->UpdateResource();
		}
		else if (i == 2)
		{
			FTexture2DMipMap& Mip = imageTextureScreen3->GetPlatformData()->Mips[0];
			void* Data = Mip.BulkData.Lock(LOCK_READ_WRITE);//lock the texture data
			FMemory::Memcpy(Data, imageScreen3.data, imageScreen3.total() * imageScreen3.elemSize());//copy the data
			Mip.BulkData.Unlock();

			imageTextureScreen3->PostEditChange();
			imageTextureScreen3->UpdateResource();
		}


	}
}

저작자표시

'컴퓨터과학 > 언리얼' 카테고리의 다른 글

HandDesktop12 - 손 관절 회전 방향 구하기(fail?) (0)	2024.02.01
HandDesktop11 - 손 메시 제어하기 (0)	2024.01.31
HandDesktop09 - 기존 문제들 원인 찾음, blaze 갈아엎기, 블라즈팜 문제해결 (0)	2024.01.26
HandDesktop08 - blaze hand 사용하기 (0)	2024.01.26
HandDesktop07 - blaze hand 사용 준비하기 (0)	2024.01.26

HandDesktop09 - 기존 문제들 원인 찾음, blaze 갈아엎기, 블라즈팜 문제해결

ggamji 2024. 1. 26. 17:54

2024. 1. 26. 17:54

방금 파이썬으로 블라즈 팜, 핸즈 하기전에 어제까지만해도

미디어파이프파이토치 코드 기준으로 따라가려다가 (논문에서 설명한걸 따라간거겟지만)

팜의 회전방향 고려해서 어파인변환하고 엄청 지저분해졌는데

굳이 어파인 변환까지 하지않아도

색상영역로 손추출 -> 블라즈핸드로도 충분히 손자세를 잘 잡아낼수 있었다.

그런 이유로 블라즈를 다 날리고 깔끔하게 다시 정리해볼 생각

한참 해맨 문제 정리

1. 느린 웹캠 시작시간

그리고 왜 시작이 느린가 싶더니

노트북 웹캠은 빨리 시작하고, usb 웹캠은 시작이 늦었다.

스트리밍 해상도가 높을떈 더 느렸고

노트북 웹캠을 쓰는게 나을듯

2. 손 검출이 잘 안되던 이유

블라즈 팜할때 RGB가 아닌 BGR 상태를 입력으로 준게 문제였다.

패딩이 문제가 아니라.

블라즈 갈아엎기

기존에 하면서 엉망이된

블라즈 다시 시작

// Fill out your copyright notice in the Description page of Project Settings.

#pragma once
#include "PreOpenCVHeaders.h"
#include <opencv2/opencv.hpp>
#include "PostOpenCVHeaders.h"

#include "CoreMinimal.h"

/**
 * 
 */
class HANDDESKTOP_API Blaze
{
public:
	Blaze();
	~Blaze();

	cv::dnn::Net blazePalm;
	cv::dnn::Net blazeHand;



	// var and funcs for blazepalm
	struct PalmDetection {
		float ymin;
		float xmin;
		float ymax;
		float xmax;
		cv::Point2d kp_arr[7];
		float score;
	};


	int blazeHandSize = 256;
	int blazePalmSize = 128;
	float palmMinScoreThresh = 0.4;
	float palmMinNMSThresh = 0.2;
	int palmMinNumKeyPoints = 7;

	void ResizeAndPad(
		cv::Mat& srcimg, cv::Mat& img256,
		cv::Mat& img128, float& scale, cv::Scalar& pad
	);

	std::vector<PalmDetection> PredictPalmDetections(cv::Mat& img);
	PalmDetection GetPalmDetection(cv::Mat regressor, cv::Mat classificator,
		int stride, int anchor_count, int column, int row, int anchor, int offset);
	float sigmoid(float x);
	std::vector<PalmDetection> DenormalizePalmDetections(std::vector<PalmDetection> detections, int width, int height, cv::Scalar pad);
	void DrawPalmDetections(cv::Mat& img, std::vector<Blaze::PalmDetection> denormDets);
	std::vector<PalmDetection> FilteringDets(std::vector<PalmDetection> detections, int width, int height);



};

패딩은 문제가 없었고, 색변환이 문제였으므로

리사이즈앤패드 맨뒤에 bgr2rgb 추가하였음.

// Fill out your copyright notice in the Description page of Project Settings.


#include "Blaze.h"

Blaze::Blaze()
{
	this->blazePalm = cv::dnn::readNet("c:/blazepalm_old.onnx");
	this->blazeHand = cv::dnn::readNet("c:/blazehand.onnx");
}

Blaze::~Blaze()
{
}


void Blaze::ResizeAndPad(
    cv::Mat& srcimg, cv::Mat& img256,
    cv::Mat& img128, float& scale, cv::Scalar& pad
)
{
    float h1, w1;
    int padw, padh;

    cv::Size size0 = srcimg.size();
    if (size0.height >= size0.width) {
        h1 = 256;
        w1 = 256 * size0.width / size0.height;
        padh = 0;
        padw = static_cast<int>(256 - w1);
        scale = size0.width / static_cast<float>(w1);
    }
    else {
        h1 = 256 * size0.height / size0.width;
        w1 = 256;
        padh = static_cast<int>(256 - h1);
        padw = 0;
        scale = size0.height / static_cast<float>(h1);
    }

    //UE_LOG(LogTemp, Log, TEXT("scale value: %f, size0.height: %d, size0.width : %d, h1 : %f"), scale, size0.height, size0.width, h1);

    int padh1 = static_cast<int>(padh / 2);
    int padh2 = static_cast<int>(padh / 2) + static_cast<int>(padh % 2);
    int padw1 = static_cast<int>(padw / 2);
    int padw2 = static_cast<int>(padw / 2) + static_cast<int>(padw % 2);
    pad = cv::Scalar(static_cast<float>(padh1 * scale), static_cast<float>(padw1 * scale));


    cv::resize(srcimg, img256, cv::Size(w1, h1));
    cv::copyMakeBorder(img256, img256, padh1, padh2, padw1, padw2, cv::BORDER_CONSTANT, cv::Scalar(0, 0, 0));

    cv::cvtColor(img256, img256, cv::COLOR_BGR2RGB);
    cv::resize(img256, img128, cv::Size(128, 128));
    cv::cvtColor(img128, img128, cv::COLOR_BGR2RGB);
}



std::vector<Blaze::PalmDetection> Blaze::PredictPalmDetections(cv::Mat& img)
{
    std::vector<Blaze::PalmDetection> beforeNMSResults;
    std::vector<Blaze::PalmDetection> afterNMSResults;
    std::vector<float> scores;
    std::vector<int> indices;
    std::vector<cv::Rect> boundingBoxes;


    cv::Mat inputImg;
    cv::cvtColor(img, inputImg, cv::COLOR_BGR2RGB);

    cv::Mat tensor;
    inputImg.convertTo(tensor, CV_32F, 1 / blazePalmSize, 0);
    cv::Mat blob = cv::dnn::blobFromImage(tensor, 1.0, tensor.size(), 0, false, false, CV_32F);
    std::vector<cv::String> outNames(2);
    outNames[0] = "regressors";
    outNames[1] = "classificators";

    blazePalm.setInput(blob);
    std::vector<cv::Mat> outputs;
    blazePalm.forward(outputs, outNames);

    cv::Mat classificator = outputs[0];
    cv::Mat regressor = outputs[1];


    for (int y = 0; y < 16; ++y) {
        for (int x = 0; x < 16; ++x) {
            for (int a = 0; a < 2; ++a) {
                PalmDetection res = GetPalmDetection(regressor, classificator, 8, 2, x, y, a, 0);
                if (res.score != 0)
                {
                    beforeNMSResults.push_back(res);

                    cv::Point2d startPt = cv::Point2d(res.xmin, res.ymin);
                    cv::Point2d endPt = cv::Point2d(res.xmax, res.ymax);

                    boundingBoxes.push_back(cv::Rect(startPt, endPt));
                    scores.push_back(res.score);
                }
            }
        }
    }

    for (int y = 0; y < 8; ++y) {
        for (int x = 0; x < 8; ++x) {
            for (int a = 0; a < 6; ++a) {
                PalmDetection res = GetPalmDetection(regressor, classificator, 16, 6, x, y, a, 512);
                if (res.score != 0)
                {
                    beforeNMSResults.push_back(res);
                    cv::Point2d startPt = cv::Point2d(res.xmin, res.ymin);
                    cv::Point2d endPt = cv::Point2d(res.xmax, res.ymax);

                    boundingBoxes.push_back(cv::Rect(startPt, endPt));
                    scores.push_back(res.score);
                }
            }
        }
    }


    cv::dnn::NMSBoxes(boundingBoxes, scores, palmMinScoreThresh, palmMinNMSThresh, indices);

    for (int i = 0; i < indices.size(); i++) {
        int idx = indices[i];
        afterNMSResults.push_back(beforeNMSResults[idx]);
    }

    return afterNMSResults;
}

Blaze::PalmDetection Blaze::GetPalmDetection(cv::Mat regressor, cv::Mat classificator,
    int stride, int anchor_count, int column, int row, int anchor, int offset) {

    Blaze::PalmDetection res;

    int index = (int(row * 128 / stride) + column) * anchor_count + anchor + offset;
    float origin_score = regressor.at<float>(0, index, 0);
    float score = sigmoid(origin_score);
    if (score < palmMinScoreThresh) return res;

    float x = classificator.at<float>(0, index, 0);
    float y = classificator.at<float>(0, index, 1);
    float w = classificator.at<float>(0, index, 2);
    float h = classificator.at<float>(0, index, 3);

    x += (column + 0.5) * stride - w / 2;
    y += (row + 0.5) * stride - h / 2;

    res.ymin = (y) / blazePalmSize;
    res.xmin = (x) / blazePalmSize;
    res.ymax = (y + h) / blazePalmSize;
    res.xmax = (x + w) / blazePalmSize;

    if ((res.ymin < 0) || (res.xmin < 0) || (res.xmax > 1) || (res.ymax > 1)) return res;

    res.score = score;

    std::vector<cv::Point2d> kpts;
    for (int key_id = 0; key_id < palmMinNumKeyPoints; key_id++)
    {
        float kpt_x = classificator.at<float>(0, index, 4 + key_id * 2);
        float kpt_y = classificator.at<float>(0, index, 5 + key_id * 2);
        kpt_x += (column + 0.5) * stride;
        kpt_x  = kpt_x / blazePalmSize;
        kpt_y += (row + 0.5) * stride;
        kpt_y  = kpt_y / blazePalmSize;
        //UE_LOG(LogTemp, Log, TEXT("kpt id(%d) : (%f, %f)"), key_id, kpt_x, kpt_y);
        res.kp_arr[key_id] = cv::Point2d(kpt_x, kpt_y);

    }
    return res;
}

float Blaze::sigmoid(float x) {
    return 1 / (1 + exp(-x));
}


std::vector<Blaze::PalmDetection> Blaze::DenormalizePalmDetections(std::vector<Blaze::PalmDetection> detections, int width, int height, cv::Scalar pad)
{

    std::vector<Blaze::PalmDetection> denormDets;

    int scale = 0;
    if (width > height)
        scale = width;
    else
        scale = height;

    for (auto& det : detections)
    {
        Blaze::PalmDetection denormDet;
        denormDet.ymin = det.ymin * scale - pad[0];
        denormDet.xmin = det.xmin * scale - pad[1];
        denormDet.ymax = det.ymax * scale - pad[0];
        denormDet.xmax = det.xmax * scale - pad[1];
        denormDet.score = det.score;

        for (int i = 0; i < palmMinNumKeyPoints; i++)
        {
            cv::Point2d pt_new = cv::Point2d(det.kp_arr[i].x * scale - pad[1], det.kp_arr[i].y * scale - pad[0]);
            //UE_LOG(LogTemp, Log, TEXT("denorm kpt id(%d) : (%f, %f)"), i, pt_new.x, pt_new.y);
            denormDet.kp_arr[i] = pt_new;
        }
        denormDets.push_back(denormDet);
    }
    return denormDets;
}


void Blaze::DrawPalmDetections(cv::Mat& img, std::vector<Blaze::PalmDetection> denormDets)
{
    for (auto& denormDet : denormDets)
    {
        cv::Point2d startPt = cv::Point2d(denormDet.xmin, denormDet.ymin);
        cv::Point2d endPt = cv::Point2d(denormDet.xmax, denormDet.ymax);
        cv::rectangle(img, cv::Rect(startPt, endPt), cv::Scalar(255, 0, 0), 1);

        for (int i = 0; i < palmMinNumKeyPoints; i++)
            cv::circle(img, denormDet.kp_arr[i], 5, cv::Scalar(255, 0, 0), -1);

        std::string score_str = std::to_string(static_cast<int>(denormDet.score * 100))+ "%";
        cv::putText(img, score_str, cv::Point(startPt.x, startPt.y - 20), cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(0, 0, 255), 2);

    }
}

std::vector<Blaze::PalmDetection> Blaze::FilteringDets(std::vector<Blaze::PalmDetection> detections, int width, int height)
{
    std::vector<Blaze::PalmDetection> filteredDets;

    for (auto& denormDet : detections)
    {
        cv::Point2d startPt = cv::Point2d(denormDet.xmin, denormDet.ymin);
        if (startPt.x < 10 || startPt.y < 10)
            continue;
        if (startPt.x > width || startPt.y > height)
            continue;
        int w = denormDet.xmax - denormDet.xmin;
        int y = denormDet.ymax - denormDet.ymin;
        if ((w * y < 40 * 40 )|| (w * y  > (width * 0.7) * (height * 0.7)))
            continue;
        filteredDets.push_back(denormDet);
    }
    return filteredDets;
}

언리얼에선 확인하기 힘들어서 따로 c++로 코드 빼서 정리중

cvt color 시점때문에 손검출이 잘안되던걸 찾아서 수정했으나

손이 1개밖에 검출되지 않는다. 2개 안되는 원인 찾아야함.

#include <opencv2/opencv.hpp>


// var and funcs for blazepalm
struct PalmDetection {
    float ymin;
    float xmin;
    float ymax;
    float xmax;
    cv::Point2d kp_arr[7];
    float score;
};

int blazeHandSize = 256;
int blazePalmSize = 128;
float palmMinScoreThresh = 0.4;
float palmMinNMSThresh = 0.2;
int palmMinNumKeyPoints = 7;

void ResizeAndPad(
    cv::Mat& srcimg, cv::Mat& img256,
    cv::Mat& img128, float& scale, cv::Scalar& pad
);



std::vector<PalmDetection> PredictPalmDetections(cv::Mat& img);
PalmDetection GetPalmDetection(cv::Mat regressor, cv::Mat classificator,
    int stride, int anchor_count, int column, int row, int anchor, int offset);
float sigmoid(float x);
std::vector<PalmDetection> DenormalizePalmDetections(std::vector<PalmDetection> detections, int width, int height, cv::Scalar pad);
void DrawPalmDetections(cv::Mat& img, std::vector<PalmDetection> denormDets);
std::vector<PalmDetection> FilteringDets(std::vector<PalmDetection> detections, int width, int height);


cv::dnn::Net blazePalm = cv::dnn::readNetFromONNX("c:/blazepalm_old.onnx");


int webcamWidth = 640;
int webcamHeight = 480;



cv::Mat img256;
cv::Mat img128;
float scale;
cv::Scalar pad;



int main() {
    cv::Mat frame;

    cv::VideoCapture cap(0);
    if (!cap.isOpened()) {
        std::cout << "Failed to open camera" << std::endl;
        return -1;
    }


    while (true) {
        bool ret = cap.read(frame);
        if (!ret)
        {
            break;
        }



        ResizeAndPad(frame, img256, img128, scale, pad);
        //UE_LOG(LogTemp, Log, TEXT("scale value: %f, pad value: (%f, %f)"), scale, pad[0], pad[1]);
        std::vector<PalmDetection> normDets = PredictPalmDetections(img128);
        std::vector<PalmDetection> denormDets = DenormalizePalmDetections(normDets, webcamWidth, webcamHeight, pad);
        std::vector<PalmDetection> filteredDets = FilteringDets(denormDets, webcamWidth, webcamHeight);



        std::string dets_size_str = "filtered dets : " + std::to_string(filteredDets.size()) + ", norm dets : " + std::to_string(normDets.size()) + ", denorm dets : " + std::to_string(denormDets.size());
        cv::putText(frame, dets_size_str, cv::Point(30, 30), cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(255, 0, 0), 2);


        DrawPalmDetections(frame, filteredDets);


        cv::imshow("Camera Streaming", frame);
        cv::imshow("img256", img256);
        cv::imshow("img128", img128);

        if (cv::waitKey(1) == 'q') {
            break;
        }
    }

    cap.release();

    cv::destroyAllWindows();

    return 0;
}






void ResizeAndPad(
    cv::Mat& srcimg, cv::Mat& img256,
    cv::Mat& img128, float& scale, cv::Scalar& pad
)
{
    float h1, w1;
    int padw, padh;

    cv::Size size0 = srcimg.size();
    if (size0.height >= size0.width) {
        h1 = 256;
        w1 = 256 * size0.width / size0.height;
        padh = 0;
        padw = static_cast<int>(256 - w1);
        scale = size0.width / static_cast<float>(w1);
    }
    else {
        h1 = 256 * size0.height / size0.width;
        w1 = 256;
        padh = static_cast<int>(256 - h1);
        padw = 0;
        scale = size0.height / static_cast<float>(h1);
    }

    //UE_LOG(LogTemp, Log, TEXT("scale value: %f, size0.height: %d, size0.width : %d, h1 : %f"), scale, size0.height, size0.width, h1);

    int padh1 = static_cast<int>(padh / 2);
    int padh2 = static_cast<int>(padh / 2) + static_cast<int>(padh % 2);
    int padw1 = static_cast<int>(padw / 2);
    int padw2 = static_cast<int>(padw / 2) + static_cast<int>(padw % 2);
    pad = cv::Scalar(static_cast<float>(padh1 * scale), static_cast<float>(padw1 * scale));


    cv::resize(srcimg, img256, cv::Size(w1, h1));
    cv::copyMakeBorder(img256, img256, padh1, padh2, padw1, padw2, cv::BORDER_CONSTANT, cv::Scalar(0, 0, 0));
    cv::resize(img256, img128, cv::Size(128, 128));
    cv::cvtColor(img256, img256, cv::COLOR_BGR2RGB);
    cv::cvtColor(img128, img128, cv::COLOR_BGR2RGB);
}






std::vector<PalmDetection> PredictPalmDetections(cv::Mat& img)
{
    std::vector<PalmDetection> beforeNMSResults;
    std::vector<PalmDetection> afterNMSResults;
    std::vector<float> scores;
    std::vector<int> indices;
    std::vector<cv::Rect> boundingBoxes;


    cv::Mat tensor;
    img.convertTo(tensor, CV_32F);
    tensor = tensor / 127.5 - 1;
    cv::Mat blob = cv::dnn::blobFromImage(tensor, 1.0, tensor.size(), 0, false, false, CV_32F);
    std::vector<cv::String> outNames(2);
    outNames[0] = "regressors";
    outNames[1] = "classificators";

    blazePalm.setInput(blob);
    std::vector<cv::Mat> outputs;
    blazePalm.forward(outputs, outNames);

    cv::Mat classificator = outputs[0];
    cv::Mat regressor = outputs[1];


    for (int y = 0; y < 16; ++y) {
        for (int x = 0; x < 16; ++x) {
            for (int a = 0; a < 2; ++a) {
                PalmDetection res = GetPalmDetection(regressor, classificator, 8, 2, x, y, a, 0);
                if (res.score != 0)
                {
                    beforeNMSResults.push_back(res);

                    cv::Point2d startPt = cv::Point2d(res.xmin, res.ymin);
                    cv::Point2d endPt = cv::Point2d(res.xmax, res.ymax);

                    boundingBoxes.push_back(cv::Rect(startPt, endPt));
                    scores.push_back(res.score);
                }
            }
        }
    }

    for (int y = 0; y < 8; ++y) {
        for (int x = 0; x < 8; ++x) {
            for (int a = 0; a < 6; ++a) {
                PalmDetection res = GetPalmDetection(regressor, classificator, 16, 6, x, y, a, 512);
                if (res.score != 0)
                {
                    beforeNMSResults.push_back(res);
                    cv::Point2d startPt = cv::Point2d(res.xmin, res.ymin);
                    cv::Point2d endPt = cv::Point2d(res.xmax, res.ymax);

                    boundingBoxes.push_back(cv::Rect(startPt, endPt));
                    scores.push_back(res.score);
                }
            }
        }
    }


    cv::dnn::NMSBoxes(boundingBoxes, scores, palmMinScoreThresh, palmMinNMSThresh, indices);

    for (int i = 0; i < indices.size(); i++) {
        int idx = indices[i];
        afterNMSResults.push_back(beforeNMSResults[idx]);
    }

    return afterNMSResults;
}

PalmDetection GetPalmDetection(cv::Mat regressor, cv::Mat classificator,
    int stride, int anchor_count, int column, int row, int anchor, int offset) {

    PalmDetection res;

    int index = (int(row * 128 / stride) + column) * anchor_count + anchor + offset;
    float origin_score = regressor.at<float>(0, index, 0);
    float score = sigmoid(origin_score);
    if (score < palmMinScoreThresh) return res;

    float x = classificator.at<float>(0, index, 0);
    float y = classificator.at<float>(0, index, 1);
    float w = classificator.at<float>(0, index, 2);
    float h = classificator.at<float>(0, index, 3);

    x += (column + 0.5) * stride - w / 2;
    y += (row + 0.5) * stride - h / 2;

    res.ymin = (y) / blazePalmSize;
    res.xmin = (x) / blazePalmSize;
    res.ymax = (y + h) / blazePalmSize;
    res.xmax = (x + w) / blazePalmSize;

    if ((res.ymin < 0) || (res.xmin < 0) || (res.xmax > 1) || (res.ymax > 1)) return res;

    res.score = score;

    std::vector<cv::Point2d> kpts;
    for (int key_id = 0; key_id < palmMinNumKeyPoints; key_id++)
    {
        float kpt_x = classificator.at<float>(0, index, 4 + key_id * 2);
        float kpt_y = classificator.at<float>(0, index, 5 + key_id * 2);
        kpt_x += (column + 0.5) * stride;
        kpt_x = kpt_x / blazePalmSize;
        kpt_y += (row + 0.5) * stride;
        kpt_y = kpt_y / blazePalmSize;
        //UE_LOG(LogTemp, Log, TEXT("kpt id(%d) : (%f, %f)"), key_id, kpt_x, kpt_y);
        res.kp_arr[key_id] = cv::Point2d(kpt_x, kpt_y);

    }
    return res;
}

float sigmoid(float x) {
    return 1 / (1 + exp(-x));
}


std::vector<PalmDetection> DenormalizePalmDetections(std::vector<PalmDetection> detections, int width, int height, cv::Scalar pad)
{

    std::vector<PalmDetection> denormDets;

    int scale = 0;
    if (width > height)
        scale = width;
    else
        scale = height;

    for (auto& det : detections)
    {
        PalmDetection denormDet;
        denormDet.ymin = det.ymin * scale - pad[0];
        denormDet.xmin = det.xmin * scale - pad[1];
        denormDet.ymax = det.ymax * scale - pad[0];
        denormDet.xmax = det.xmax * scale - pad[1];
        denormDet.score = det.score;

        for (int i = 0; i < palmMinNumKeyPoints; i++)
        {
            cv::Point2d pt_new = cv::Point2d(det.kp_arr[i].x * scale - pad[1], det.kp_arr[i].y * scale - pad[0]);
            //UE_LOG(LogTemp, Log, TEXT("denorm kpt id(%d) : (%f, %f)"), i, pt_new.x, pt_new.y);
            denormDet.kp_arr[i] = pt_new;
        }
        denormDets.push_back(denormDet);
    }
    return denormDets;
}

void DrawPalmDetections(cv::Mat& img, std::vector<PalmDetection> denormDets)
{
    for (auto& denormDet : denormDets)
    {
        cv::Point2d startPt = cv::Point2d(denormDet.xmin, denormDet.ymin);
        cv::Point2d endPt = cv::Point2d(denormDet.xmax, denormDet.ymax);
        cv::rectangle(img, cv::Rect(startPt, endPt), cv::Scalar(255, 0, 0), 1);

        for (int i = 0; i < palmMinNumKeyPoints; i++)
            cv::circle(img, denormDet.kp_arr[i], 5, cv::Scalar(255, 0, 0), -1);

        std::string score_str = std::to_string(static_cast<int>(denormDet.score * 100)) + "%";
        cv::putText(img, score_str, cv::Point(startPt.x, startPt.y - 20), cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(0, 0, 255), 2);

    }
}


std::vector<PalmDetection> FilteringDets(std::vector<PalmDetection> detections, int width, int height)
{
    std::vector<PalmDetection> filteredDets;

    for (auto& denormDet : detections)
    {
        cv::Point2d startPt = cv::Point2d(denormDet.xmin, denormDet.ymin);
        if (startPt.x < 10 || startPt.y < 10)
            continue;
        if (startPt.x > width || startPt.y > height)
            continue;
        int w = denormDet.xmax - denormDet.xmin;
        int y = denormDet.ymax - denormDet.ymin;
        if ((w * y < 40 * 40) || (w * y > (width * 0.7) * (height * 0.7)))
            continue;
        filteredDets.push_back(denormDet);
    }
    return filteredDets;
}

자꾸 NMS 결과가 1개밖에 안나와서 최종 박스가 1개밖에 안나왔다.

후보 박스들은 많은데

원인 계속 해매다보니

NMS때 사용하는 박스 좌표들이 0 ~ 1로 정규화되어있어 너무 작았기 때문

-> 입력 해상도 128 을 곱해준 결과 정상적으로 나옴

따로 만든 예제 코드

#include <opencv2/opencv.hpp>


// var and funcs for blazepalm
struct PalmDetection {
    float ymin;
    float xmin;
    float ymax;
    float xmax;
    cv::Point2d kp_arr[7];
    float score;
};

int blazeHandSize = 256;
int blazePalmSize = 128;
float palmMinScoreThresh = 0.4;
float palmMinNMSThresh = 0.4;
int palmMinNumKeyPoints = 7;

void ResizeAndPad(
    cv::Mat& srcimg, cv::Mat& img256,
    cv::Mat& img128, float& scale, cv::Scalar& pad
);



std::vector<PalmDetection> PredictPalmDetections(cv::Mat& img, cv::Mat& frame);
PalmDetection GetPalmDetection(cv::Mat regressor, cv::Mat classificator,
    int stride, int anchor_count, int column, int row, int anchor, int offset);
float sigmoid(float x);
std::vector<PalmDetection> DenormalizePalmDetections(std::vector<PalmDetection> detections, int width, int height, cv::Scalar pad);
void DrawPalmDetections(cv::Mat& img, std::vector<PalmDetection> denormDets);
std::vector<PalmDetection> FilteringDets(std::vector<PalmDetection> detections, int width, int height);


cv::dnn::Net blazePalm = cv::dnn::readNetFromONNX("c:/blazepalm_old.onnx");


int webcamWidth = 640;
int webcamHeight = 480;



cv::Mat img256;
cv::Mat img128;
float scale;
cv::Scalar pad;



int main() {
    cv::Mat frame;

    cv::VideoCapture cap(0);
    if (!cap.isOpened()) {
        std::cout << "Failed to open camera" << std::endl;
        return -1;
    }


    while (true) {
        bool ret = cap.read(frame);
        if (!ret)
        {
            break;
        }



        ResizeAndPad(frame, img256, img128, scale, pad);
        //UE_LOG(LogTemp, Log, TEXT("scale value: %f, pad value: (%f, %f)"), scale, pad[0], pad[1]);
        std::vector<PalmDetection> normDets = PredictPalmDetections(img128, frame);
        std::vector<PalmDetection> denormDets = DenormalizePalmDetections(normDets, webcamWidth, webcamHeight, pad);
        std::vector<PalmDetection> filteredDets = FilteringDets(denormDets, webcamWidth, webcamHeight);



        std::string dets_size_str = "filtered dets : " + std::to_string(filteredDets.size()) + ", norm dets : " + std::to_string(normDets.size()) + ", denorm dets : " + std::to_string(denormDets.size());
        cv::putText(frame, dets_size_str, cv::Point(30, 30), cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(255, 0, 0), 2);


        DrawPalmDetections(frame, filteredDets);


        cv::imshow("Camera Streaming", frame);
        cv::imshow("img256", img256);
        cv::imshow("img128", img128);

        if (cv::waitKey(1) == 'q') {
            break;
        }
    }

    cap.release();

    cv::destroyAllWindows();

    return 0;
}






void ResizeAndPad(
    cv::Mat& srcimg, cv::Mat& img256,
    cv::Mat& img128, float& scale, cv::Scalar& pad
)
{
    float h1, w1;
    int padw, padh;

    cv::Size size0 = srcimg.size();
    if (size0.height >= size0.width) {
        h1 = 256;
        w1 = 256 * size0.width / size0.height;
        padh = 0;
        padw = static_cast<int>(256 - w1);
        scale = size0.width / static_cast<float>(w1);
    }
    else {
        h1 = 256 * size0.height / size0.width;
        w1 = 256;
        padh = static_cast<int>(256 - h1);
        padw = 0;
        scale = size0.height / static_cast<float>(h1);
    }

    //UE_LOG(LogTemp, Log, TEXT("scale value: %f, size0.height: %d, size0.width : %d, h1 : %f"), scale, size0.height, size0.width, h1);

    int padh1 = static_cast<int>(padh / 2);
    int padh2 = static_cast<int>(padh / 2) + static_cast<int>(padh % 2);
    int padw1 = static_cast<int>(padw / 2);
    int padw2 = static_cast<int>(padw / 2) + static_cast<int>(padw % 2);
    pad = cv::Scalar(static_cast<float>(padh1 * scale), static_cast<float>(padw1 * scale));


    cv::resize(srcimg, img256, cv::Size(w1, h1));
    cv::copyMakeBorder(img256, img256, padh1, padh2, padw1, padw2, cv::BORDER_CONSTANT, cv::Scalar(0, 0, 0));
    cv::resize(img256, img128, cv::Size(128, 128));
    cv::cvtColor(img256, img256, cv::COLOR_BGR2RGB);
    cv::cvtColor(img128, img128, cv::COLOR_BGR2RGB);
}






std::vector<PalmDetection> PredictPalmDetections(cv::Mat& img, cv::Mat& frame)
{
    std::vector<PalmDetection> beforeNMSResults;
    std::vector<PalmDetection> afterNMSResults;
    std::vector<float> scores;
    std::vector<int> indices;
    std::vector<cv::Rect> boundingBoxes;


    cv::Mat tensor;
    img.convertTo(tensor, CV_32F);
    tensor = tensor / 127.5 - 1;
    cv::Mat blob = cv::dnn::blobFromImage(tensor, 1.0, tensor.size(), 0, false, false, CV_32F);
    std::vector<cv::String> outNames(2);
    outNames[0] = "regressors";
    outNames[1] = "classificators";

    blazePalm.setInput(blob);
    std::vector<cv::Mat> outputs;
    blazePalm.forward(outputs, outNames);

    cv::Mat classificator = outputs[0];
    cv::Mat regressor = outputs[1];


    for (int y = 0; y < 16; ++y) {
        for (int x = 0; x < 16; ++x) {
            for (int a = 0; a < 2; ++a) {
                PalmDetection res = GetPalmDetection(regressor, classificator, 8, 2, x, y, a, 0);
                if (res.score != 0)
                {
                    beforeNMSResults.push_back(res);

                    cv::Point2d startPt = cv::Point2d(res.xmin * 128, res.ymin * 128);
                    cv::Point2d endPt = cv::Point2d(res.xmax * 128, res.ymax * 128);

                    boundingBoxes.push_back(cv::Rect(startPt, endPt));
                    scores.push_back(res.score);
                }
            }
        }
    }

    for (int y = 0; y < 8; ++y) {
        for (int x = 0; x < 8; ++x) {
            for (int a = 0; a < 6; ++a) {
                PalmDetection res = GetPalmDetection(regressor, classificator, 16, 6, x, y, a, 512);
                if (res.score != 0)
                {
                    beforeNMSResults.push_back(res);
                    cv::Point2d startPt = cv::Point2d(res.xmin * 128, res.ymin * 128);
                    cv::Point2d endPt = cv::Point2d(res.xmax * 128, res.ymax * 128);

                    boundingBoxes.push_back(cv::Rect(startPt, endPt));
                    scores.push_back(res.score);
                }
            }
        }
    }






    std::vector<PalmDetection> denormBeforeNMSResults = DenormalizePalmDetections(beforeNMSResults, 640, 480, cv::Scalar(80, 0));


    for (auto& denormBeforeNMSResult : denormBeforeNMSResults)
    {


        cv::Point2d startPt = cv::Point2d(denormBeforeNMSResult.xmin, denormBeforeNMSResult.ymin);
        cv::Point2d endPt = cv::Point2d(denormBeforeNMSResult.xmax, denormBeforeNMSResult.ymax);
        cv::rectangle(frame, cv::Rect(startPt, endPt), cv::Scalar(0, 0, 255), 1);


        std::string score_str = std::to_string(static_cast<int>(denormBeforeNMSResult.score * 100));
        cv::putText(frame, score_str, cv::Point(startPt.x, startPt.y - 10), cv::FONT_HERSHEY_SIMPLEX, 0.4, cv::Scalar(0, 0, 255), 2);



        for (int i = 0; i < palmMinNumKeyPoints; i++)
            cv::circle(frame, denormBeforeNMSResult.kp_arr[i], 3, cv::Scalar(0, 0, 255), -1);
    }


    cv::dnn::NMSBoxes(boundingBoxes, scores, palmMinScoreThresh, palmMinNMSThresh, indices);













    std::cout << "index check : " << std::endl;
    for (auto& index : indices)
    {
        std::cout << index << std::endl;
    }
    std::cout << std::endl << std::endl;


    for (int i = 0; i < indices.size(); i++) {
        int idx = indices[i];
        afterNMSResults.push_back(beforeNMSResults[idx]);
    }










    std::vector<PalmDetection> denormAfterNMSResults = DenormalizePalmDetections(afterNMSResults, 640, 480, cv::Scalar(80, 0));


    for (auto& denormAfterNMSResult : denormAfterNMSResults)
    {


        cv::Point2d startPt = cv::Point2d(denormAfterNMSResult.xmin, denormAfterNMSResult.ymin);
        cv::Point2d endPt = cv::Point2d(denormAfterNMSResult.xmax, denormAfterNMSResult.ymax);
        cv::rectangle(frame, cv::Rect(startPt, endPt), cv::Scalar(0, 255, 255), 2);


        for (int i = 0; i < palmMinNumKeyPoints; i++)
            cv::circle(frame, denormAfterNMSResult.kp_arr[i], 6, cv::Scalar(0, 255, 255), -1);
    }












    return afterNMSResults;
}

PalmDetection GetPalmDetection(cv::Mat regressor, cv::Mat classificator,
    int stride, int anchor_count, int column, int row, int anchor, int offset) {

    PalmDetection res;

    int index = (int(row * 128 / stride) + column) * anchor_count + anchor + offset;
    float origin_score = regressor.at<float>(0, index, 0);
    float score = sigmoid(origin_score);
    if (score < palmMinScoreThresh) return res;

    float x = classificator.at<float>(0, index, 0);
    float y = classificator.at<float>(0, index, 1);
    float w = classificator.at<float>(0, index, 2);
    float h = classificator.at<float>(0, index, 3);

    x += (column + 0.5) * stride - w / 2;
    y += (row + 0.5) * stride - h / 2;

    res.ymin = (y) / blazePalmSize;
    res.xmin = (x) / blazePalmSize;
    res.ymax = (y + h) / blazePalmSize;
    res.xmax = (x + w) / blazePalmSize;

    //std::cout << "score : " << score << ", coord : " << res.ymin << ", " << res.xmin << ", " << res.ymax << ", " << res.xmax << std::endl;
    //if ((res.ymin < 0) || (res.xmin < 0) || (res.xmax > 1) || (res.ymax > 1)) return res;

    res.score = score;

    std::vector<cv::Point2d> kpts;
    for (int key_id = 0; key_id < palmMinNumKeyPoints; key_id++)
    {
        float kpt_x = classificator.at<float>(0, index, 4 + key_id * 2);
        float kpt_y = classificator.at<float>(0, index, 5 + key_id * 2);
        kpt_x += (column + 0.5) * stride;
        kpt_x = kpt_x / blazePalmSize;
        kpt_y += (row + 0.5) * stride;
        kpt_y = kpt_y / blazePalmSize;
        //UE_LOG(LogTemp, Log, TEXT("kpt id(%d) : (%f, %f)"), key_id, kpt_x, kpt_y);
        res.kp_arr[key_id] = cv::Point2d(kpt_x, kpt_y);

    }
    return res;
}

float sigmoid(float x) {
    return 1 / (1 + exp(-x));
}


std::vector<PalmDetection> DenormalizePalmDetections(std::vector<PalmDetection> detections, int width, int height, cv::Scalar pad)
{

    std::vector<PalmDetection> denormDets;

    int scale = 0;
    if (width > height)
        scale = width;
    else
        scale = height;

    for (auto& det : detections)
    {
        PalmDetection denormDet;
        denormDet.ymin = det.ymin * scale - pad[0];
        denormDet.xmin = det.xmin * scale - pad[1];
        denormDet.ymax = det.ymax * scale - pad[0];
        denormDet.xmax = det.xmax * scale - pad[1];
        denormDet.score = det.score;

        for (int i = 0; i < palmMinNumKeyPoints; i++)
        {
            cv::Point2d pt_new = cv::Point2d(det.kp_arr[i].x * scale - pad[1], det.kp_arr[i].y * scale - pad[0]);
            //UE_LOG(LogTemp, Log, TEXT("denorm kpt id(%d) : (%f, %f)"), i, pt_new.x, pt_new.y);
            denormDet.kp_arr[i] = pt_new;
        }
        denormDets.push_back(denormDet);
    }
    return denormDets;
}

void DrawPalmDetections(cv::Mat& img, std::vector<PalmDetection> denormDets)
{
    for (auto& denormDet : denormDets)
    {
        cv::Point2d startPt = cv::Point2d(denormDet.xmin, denormDet.ymin);
        cv::Point2d endPt = cv::Point2d(denormDet.xmax, denormDet.ymax);
        cv::rectangle(img, cv::Rect(startPt, endPt), cv::Scalar(255, 0, 0), 1);

        for (int i = 0; i < palmMinNumKeyPoints; i++)
            cv::circle(img, denormDet.kp_arr[i], 5, cv::Scalar(255, 0, 0), -1);

        std::string score_str = std::to_string(static_cast<int>(denormDet.score * 100)) + "%";
        cv::putText(img, score_str, cv::Point(startPt.x, startPt.y - 20), cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(0, 0, 255), 2);

    }
}


std::vector<PalmDetection> FilteringDets(std::vector<PalmDetection> detections, int width, int height)
{
    std::vector<PalmDetection> filteredDets;

    for (auto& denormDet : detections)
    {
        cv::Point2d startPt = cv::Point2d(denormDet.xmin, denormDet.ymin);
        if (startPt.x < 10 || startPt.y < 10)
            continue;
        if (startPt.x > width || startPt.y > height)
            continue;
        int w = denormDet.xmax - denormDet.xmin;
        int y = denormDet.ymax - denormDet.ymin;
        if ((w * y < 40 * 40) || (w * y > (width * 0.7) * (height * 0.7)))
            continue;
        filteredDets.push_back(denormDet);
    }
    return filteredDets;
}

이제 인게임에서도 손 잘 찾아낸다.

현재 블라즈 코드

// Fill out your copyright notice in the Description page of Project Settings.


#include "Blaze.h"

Blaze::Blaze()
{
	this->blazePalm = cv::dnn::readNet("c:/blazepalm_old.onnx");
	this->blazeHand = cv::dnn::readNet("c:/blazehand.onnx");
}

Blaze::~Blaze()
{
}


void Blaze::ResizeAndPad(
    cv::Mat& srcimg, cv::Mat& img256,
    cv::Mat& img128, float& scale, cv::Scalar& pad
)
{
    float h1, w1;
    int padw, padh;

    cv::Size size0 = srcimg.size();
    if (size0.height >= size0.width) {
        h1 = 256;
        w1 = 256 * size0.width / size0.height;
        padh = 0;
        padw = static_cast<int>(256 - w1);
        scale = size0.width / static_cast<float>(w1);
    }
    else {
        h1 = 256 * size0.height / size0.width;
        w1 = 256;
        padh = static_cast<int>(256 - h1);
        padw = 0;
        scale = size0.height / static_cast<float>(h1);
    }

    //UE_LOG(LogTemp, Log, TEXT("scale value: %f, size0.height: %d, size0.width : %d, h1 : %f"), scale, size0.height, size0.width, h1);

    int padh1 = static_cast<int>(padh / 2);
    int padh2 = static_cast<int>(padh / 2) + static_cast<int>(padh % 2);
    int padw1 = static_cast<int>(padw / 2);
    int padw2 = static_cast<int>(padw / 2) + static_cast<int>(padw % 2);
    pad = cv::Scalar(static_cast<float>(padh1 * scale), static_cast<float>(padw1 * scale));


    cv::resize(srcimg, img256, cv::Size(w1, h1));
    cv::copyMakeBorder(img256, img256, padh1, padh2, padw1, padw2, cv::BORDER_CONSTANT, cv::Scalar(0, 0, 0));

    cv::resize(img256, img128, cv::Size(128, 128));
    cv::cvtColor(img256, img256, cv::COLOR_BGR2RGB);
    cv::cvtColor(img128, img128, cv::COLOR_BGR2RGB);
}



std::vector<Blaze::PalmDetection> Blaze::PredictPalmDetections(cv::Mat& img)
{
    std::vector<Blaze::PalmDetection> beforeNMSResults;
    std::vector<Blaze::PalmDetection> afterNMSResults;
    std::vector<float> scores;
    std::vector<int> indices;
    std::vector<cv::Rect> boundingBoxes;


    cv::Mat tensor;
    img.convertTo(tensor, CV_32F);
    tensor = tensor / 127.5 - 1;
    cv::Mat blob = cv::dnn::blobFromImage(tensor, 1.0, tensor.size(), 0, false, false, CV_32F);
    std::vector<cv::String> outNames(2);
    outNames[0] = "regressors";
    outNames[1] = "classificators";

    blazePalm.setInput(blob);
    std::vector<cv::Mat> outputs;
    blazePalm.forward(outputs, outNames);

    cv::Mat classificator = outputs[0];
    cv::Mat regressor = outputs[1];


    for (int y = 0; y < 16; ++y) {
        for (int x = 0; x < 16; ++x) {
            for (int a = 0; a < 2; ++a) {
                PalmDetection res = GetPalmDetection(regressor, classificator, 8, 2, x, y, a, 0);
                if (res.score != 0)
                {
                    beforeNMSResults.push_back(res);

                    cv::Point2d startPt = cv::Point2d(res.xmin * blazePalmSize, res.ymin * blazePalmSize);
                    cv::Point2d endPt = cv::Point2d(res.xmax * blazePalmSize, res.ymax * blazePalmSize);

                    boundingBoxes.push_back(cv::Rect(startPt, endPt));
                    scores.push_back(res.score);
                }
            }
        }
    }

    for (int y = 0; y < 8; ++y) {
        for (int x = 0; x < 8; ++x) {
            for (int a = 0; a < 6; ++a) {
                PalmDetection res = GetPalmDetection(regressor, classificator, 16, 6, x, y, a, 512);
                if (res.score != 0)
                {
                    beforeNMSResults.push_back(res);
                    cv::Point2d startPt = cv::Point2d(res.xmin * blazePalmSize, res.ymin * blazePalmSize);
                    cv::Point2d endPt = cv::Point2d(res.xmax * blazePalmSize, res.ymax * blazePalmSize);

                    boundingBoxes.push_back(cv::Rect(startPt, endPt));
                    scores.push_back(res.score);
                }
            }
        }
    }


    cv::dnn::NMSBoxes(boundingBoxes, scores, palmMinScoreThresh, palmMinNMSThresh, indices);

    for (int i = 0; i < indices.size(); i++) {
        int idx = indices[i];
        afterNMSResults.push_back(beforeNMSResults[idx]);
    }

    return afterNMSResults;
}

Blaze::PalmDetection Blaze::GetPalmDetection(cv::Mat regressor, cv::Mat classificator,
    int stride, int anchor_count, int column, int row, int anchor, int offset) {

    Blaze::PalmDetection res;

    int index = (int(row * 128 / stride) + column) * anchor_count + anchor + offset;
    float origin_score = regressor.at<float>(0, index, 0);
    float score = sigmoid(origin_score);
    if (score < palmMinScoreThresh) return res;

    float x = classificator.at<float>(0, index, 0);
    float y = classificator.at<float>(0, index, 1);
    float w = classificator.at<float>(0, index, 2);
    float h = classificator.at<float>(0, index, 3);

    x += (column + 0.5) * stride - w / 2;
    y += (row + 0.5) * stride - h / 2;

    res.ymin = (y) / blazePalmSize;
    res.xmin = (x) / blazePalmSize;
    res.ymax = (y + h) / blazePalmSize;
    res.xmax = (x + w) / blazePalmSize;

    //if ((res.ymin < 0) || (res.xmin < 0) || (res.xmax > 1) || (res.ymax > 1)) return res;

    res.score = score;

    std::vector<cv::Point2d> kpts;
    for (int key_id = 0; key_id < palmMinNumKeyPoints; key_id++)
    {
        float kpt_x = classificator.at<float>(0, index, 4 + key_id * 2);
        float kpt_y = classificator.at<float>(0, index, 5 + key_id * 2);
        kpt_x += (column + 0.5) * stride;
        kpt_x  = kpt_x / blazePalmSize;
        kpt_y += (row + 0.5) * stride;
        kpt_y  = kpt_y / blazePalmSize;
        //UE_LOG(LogTemp, Log, TEXT("kpt id(%d) : (%f, %f)"), key_id, kpt_x, kpt_y);
        res.kp_arr[key_id] = cv::Point2d(kpt_x, kpt_y);

    }
    return res;
}

float Blaze::sigmoid(float x) {
    return 1 / (1 + exp(-x));
}


std::vector<Blaze::PalmDetection> Blaze::DenormalizePalmDetections(std::vector<Blaze::PalmDetection> detections, int width, int height, cv::Scalar pad)
{

    std::vector<Blaze::PalmDetection> denormDets;

    int scale = 0;
    if (width > height)
        scale = width;
    else
        scale = height;

    for (auto& det : detections)
    {
        Blaze::PalmDetection denormDet;
        denormDet.ymin = det.ymin * scale - pad[0];
        denormDet.xmin = det.xmin * scale - pad[1];
        denormDet.ymax = det.ymax * scale - pad[0];
        denormDet.xmax = det.xmax * scale - pad[1];
        denormDet.score = det.score;

        for (int i = 0; i < palmMinNumKeyPoints; i++)
        {
            cv::Point2d pt_new = cv::Point2d(det.kp_arr[i].x * scale - pad[1], det.kp_arr[i].y * scale - pad[0]);
            //UE_LOG(LogTemp, Log, TEXT("denorm kpt id(%d) : (%f, %f)"), i, pt_new.x, pt_new.y);
            denormDet.kp_arr[i] = pt_new;
        }
        denormDets.push_back(denormDet);
    }
    return denormDets;
}

void Blaze::DrawPalmDetections(cv::Mat& img, std::vector<Blaze::PalmDetection> denormDets)
{
    for (auto& denormDet : denormDets)
    {
        cv::Point2d startPt = cv::Point2d(denormDet.xmin, denormDet.ymin);
        cv::Point2d endPt = cv::Point2d(denormDet.xmax, denormDet.ymax);
        cv::rectangle(img, cv::Rect(startPt, endPt), cv::Scalar(255, 0, 0), 1);

        for (int i = 0; i < palmMinNumKeyPoints; i++)
            cv::circle(img, denormDet.kp_arr[i], 5, cv::Scalar(255, 0, 0), -1);

        std::string score_str = std::to_string(static_cast<int>(denormDet.score * 100))+ "%";
        cv::putText(img, score_str, cv::Point(startPt.x, startPt.y - 20), cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(0, 0, 255), 2);

    }
}

std::vector<Blaze::PalmDetection> Blaze::FilteringDets(std::vector<Blaze::PalmDetection> detections, int width, int height)
{
    std::vector<Blaze::PalmDetection> filteredDets;

    for (auto& denormDet : detections)
    {
        cv::Point2d startPt = cv::Point2d(denormDet.xmin, denormDet.ymin);
        if (startPt.x < 10 || startPt.y < 10)
            continue;
        if (startPt.x > width || startPt.y > height)
            continue;
        int w = denormDet.xmax - denormDet.xmin;
        int y = denormDet.ymax - denormDet.ymin;
        if ((w * y < 40 * 40 )|| (w * y  > (width * 0.7) * (height * 0.7)))
            continue;
        filteredDets.push_back(denormDet);
    }
    return filteredDets;
}

손 검출도 잘되고 검출 스코어 임계치 올렸더니 오검출도 꽤줄음

class HANDDESKTOP_API Blaze
{
public:
	Blaze();
	~Blaze();

	cv::dnn::Net blazePalm;
	cv::dnn::Net blazeHand;



	// var and funcs for blazepalm
	struct PalmDetection {
		float ymin;
		float xmin;
		float ymax;
		float xmax;
		cv::Point2d kp_arr[7];
		float score;
	};


	int blazeHandSize = 256;
	int blazePalmSize = 128;
	float palmMinScoreThresh = 0.6;
	float palmMinNMSThresh = 0.2;
	int palmMinNumKeyPoints = 7;

	void ResizeAndPad(
		cv::Mat& srcimg, cv::Mat& img256,
		cv::Mat& img128, float& scale, cv::Scalar& pad
	);

	std::vector<PalmDetection> PredictPalmDetections(cv::Mat& img);
	PalmDetection GetPalmDetection(cv::Mat regressor, cv::Mat classificator,
		int stride, int anchor_count, int column, int row, int anchor, int offset);
	float sigmoid(float x);
	std::vector<PalmDetection> DenormalizePalmDetections(std::vector<PalmDetection> detections, int width, int height, cv::Scalar pad);
	void DrawPalmDetections(cv::Mat& img, std::vector<Blaze::PalmDetection> denormDets);
	std::vector<PalmDetection> FilteringDets(std::vector<PalmDetection> detections, int width, int height);


};

저작자표시

'컴퓨터과학 > 언리얼' 카테고리의 다른 글

HandDesktop11 - 손 메시 제어하기 (0)	2024.01.31
HandDesktop10 - blazehand c++, unreal 적용 (0)	2024.01.29
HandDesktop08 - blaze hand 사용하기 (0)	2024.01.26
HandDesktop07 - blaze hand 사용 준비하기 (0)	2024.01.26
HandDesktop06 - blaze palm 사용하기 (0)	2024.01.25

HandDesktop08 - blaze hand 사용하기

ggamji 2024. 1. 26. 15:46

2024. 1. 26. 15:46

블라즈 핸드 사용하기위해

어파인 변환된 이미지와 역어파인변환 행렬, ROI 까지 준비다되었고

이제 블라즈핸드 추론 시작하면된다.

일단 파이썬으로

cvdnn로 블라즈 핸드 검출 결과를 출력시켜봄

import cv2
import time
import numpy as np
import traceback


def get_color_filtered_boxes(image):
    # 이미지를 HSV 색 공간으로 변환
    hsv_image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
    # 살색 영역을 마스크로 만들기
    skin_mask = cv2.inRange(hsv_image, lower_skin, upper_skin)
    # 모폴로지 연산을 위한 구조 요소 생성
    kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (9, 9))
    # 모폴로지 열림 연산 적용
    skin_mask = cv2.morphologyEx(skin_mask, cv2.MORPH_OPEN, kernel)

    # 마스크를 이용하여 살색 영역 추출
    skin_image = cv2.bitwise_and(image, image, mask=skin_mask)

    # 살색 영역에 대한 바운딩 박스 추출
    contours, _ = cv2.findContours(skin_mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

    bounding_boxes = [cv2.boundingRect(cnt) for cnt in contours]
        
    # 크기가 작은 박스와 큰 박스 제거
    color_boxes = []
    for (x, y, w, h) in bounding_boxes:
        if w * h > 100 * 100:
            # 약간 박스 더크게
            #color_boxes.append((x - 30, y - 30, w + 60, h + 60))
            center_x = int((x + x + w) / 2)
            center_y = int((y + y + h) / 2)

            large = 0
            if w > h:
                large = w                
            if w < h:                
                large = h
            
            large = int(large * 0.7)
            color_boxes.append((center_x - large, center_y - large, 2 * large,  2 * large))


    return color_boxes


def landmark_inference(img):
    img = cv2.resize(img, (lm_infer_width, lm_infer_height))
    tensor = img / 127.5 - 1.0
    blob = cv2.dnn.blobFromImage(tensor.astype(np.float32), swapRB=False, crop=False)
    net.setInput(blob)
    preds = net.forward(outNames)

    return preds


# 살색 영역의 범위 지정 (HSV 색 공간)
lower_skin = np.array([0, 20, 70], dtype=np.uint8)
upper_skin = np.array([20, 255, 255], dtype=np.uint8)

img_width = 640
img_height = 480
lm_infer_width = 256
lm_infer_height = 256

net = cv2.dnn.readNet('blazehand.onnx')
#net = cv2.dnn.readNet('hand_landmark.onnx')

outNames = net.getUnconnectedOutLayersNames()
print(outNames)


cap = cv2.VideoCapture(1)

while True:
    time_start = time.time()
    try:
        roi = None
        ret, frame = cap.read()
        frame = cv2.resize(frame, (img_width, img_height))

        skin_image = frame.copy()
        # 크기가 작은 박스와 큰 박스 제거
        color_boxes = get_color_filtered_boxes(skin_image)

        # 바운딩 박스를 이미지에 그리기
        for (x, y, w, h) in color_boxes:
            cv2.rectangle(skin_image, (x, y), (x + w, y + h), (0, 255, 0), 2)


        for idx, color_box in enumerate(color_boxes):
            x, y, w, h = color_box
            cbox_ratio_width = w / lm_infer_width
            cbox_ratio_height = h / lm_infer_height

            roi = frame[y:y+h, x:x+w]
            preds = landmark_inference(roi)
        


        if cv2.waitKey(1) == ord('q'):
            break
        time_cur = time.time()
        cv2.putText(frame, f"time spend: {time_cur - time_start}", (0, 50), cv2.FONT_HERSHEY_SIMPLEX, 2, (125, 125, 125), 2)
        
        #frame = cv2.resize(frame, (320, 240))
        if roi is not None:
            cv2.imshow('roi', roi)
        cv2.imshow('Camera Streaming', frame)
        cv2.imshow('Skin Extraction', skin_image)
    except Exception as e:
        traceback.print_exc()

cap.release()
cv2.destroyAllWindows()

핸드 플레그와 핸디드니스 랜드마크가 나오는데

핸드플래그는 1에 가까운거봐서 손 점수일거같고

랜드마크는 1 x 21 x 3인거봐선 손 랜드마크 일듯 싶은데

핸드니스는 모르겟다.

미디어파이프파이토치 코드보면 여기도

플래그, 핸디드, 노말라이즈드 랜드마크로 받는다.

그 뒤에 보면 플래그는 손유무 점수가 맞아보이고

랜드마크를 그대로 반정규화후 드로잉해주는데 반정규화에는 역어파인변환 행렬을 사용한다.

내가 맨위에 사용한 코드의 경우

컬러박스로 손 검출해서 하는거라 어파인 행렬이 존재하지 않는다.

        cv2.imshow("Image", image)
    flags2, handed2, normalized_landmarks2 = hand_regressor(img.to(gpu))
    landmarks2 = hand_regressor.denormalize_landmarks(normalized_landmarks2.cpu(), affine2)
    
    
    

    for i in range(len(flags2)):
        landmark, flag = landmarks2[i], flags2[i]
        if flag>.5:
            draw_landmarks(frame, landmark[:,:2], HAND_CONNECTIONS, size=2)

반정규화 랜드마크 코드보면 해상도(256)으로 곱해준후

랜드마크를 역어파인 변환 시켜주는 내용인듯 싶은데

일단 맨 위 코드에 반정규화 해서 시각화하는걸로 일단 작성해보면

(일단 해상도로만 반정규화시켜서 실제 입력이미지가 아닌 256 x 256에 맞는 좌표가 나올듯)

    def denormalize_landmarks(self, landmarks, affines):
        landmarks[:,:,:2] *= self.resolution
        for i in range(len(landmarks)):
            landmark, affine = landmarks[i], affines[i]
            landmark = (affine[:,:2] @ landmark[:,:2].T + affine[:,2:]).T
            landmarks[i,:,:2] = landmark
        return landmarks

일단 기존의 데모 코드와 내가 만든 코드랑 섞어서 블라즈 핸드를 돌렸는데

좀 아쉬운 결과가 나옴 전처리 문젠것 같다

tensor = img / 127.5 - 1.0

를 주석 했을땐 완전 망했는데

주석안하고 사용하니 좀 나아진걸 봐선 전처리 문젠거같기도하고

import cv2
import time
import numpy as np
import traceback


def get_color_filtered_boxes(image):
    # 이미지를 HSV 색 공간으로 변환
    hsv_image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
    # 살색 영역을 마스크로 만들기
    skin_mask = cv2.inRange(hsv_image, lower_skin, upper_skin)
    # 모폴로지 연산을 위한 구조 요소 생성
    kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (9, 9))
    # 모폴로지 열림 연산 적용
    skin_mask = cv2.morphologyEx(skin_mask, cv2.MORPH_OPEN, kernel)

    # 마스크를 이용하여 살색 영역 추출
    skin_image = cv2.bitwise_and(image, image, mask=skin_mask)

    # 살색 영역에 대한 바운딩 박스 추출
    contours, _ = cv2.findContours(skin_mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

    bounding_boxes = [cv2.boundingRect(cnt) for cnt in contours]
        
    # 크기가 작은 박스와 큰 박스 제거
    color_boxes = []
    for (x, y, w, h) in bounding_boxes:
        if w * h > 100 * 100:
            # 약간 박스 더크게
            #color_boxes.append((x - 30, y - 30, w + 60, h + 60))
            center_x = int((x + x + w) / 2)
            center_y = int((y + y + h) / 2)

            large = 0
            if w > h:
                large = w                
            if w < h:                
                large = h
            
            large = int(large * 0.7)
            color_boxes.append((center_x - large, center_y - large, 2 * large,  2 * large))


    return color_boxes


def landmark_inference(img):
    tensor = img / 127.5 - 1.0
    blob = cv2.dnn.blobFromImage(tensor.astype(np.float32), swapRB=False, crop=False)
    net.setInput(blob)
    preds = net.forward(outNames)

    return preds

def denormalize_landmarks(landmarks):
    landmarks[:,:,:2] *= 256
    return landmarks


def draw_landmarks(img, points, connections=[], color=(0, 255, 0), size=2):
    points = points[:,:2]
    for point in points:
        x, y = point
        x, y = int(x), int(y)
        cv2.circle(img, (x, y), size, color, thickness=size)

    for connection in connections:
        x0, y0 = points[connection[0]]
        x1, y1 = points[connection[1]]
        x0, y0 = int(x0), int(y0)
        x1, y1 = int(x1), int(y1)
        cv2.line(img, (x0, y0), (x1, y1), (0,0,0), size)




HAND_CONNECTIONS = [
    (0, 1), (1, 2), (2, 3), (3, 4),
    (5, 6), (6, 7), (7, 8),
    (9, 10), (10, 11), (11, 12),
    (13, 14), (14, 15), (15, 16),
    (17, 18), (18, 19), (19, 20),
    (0, 5), (5, 9), (9, 13), (13, 17), (0, 17)
]






# 살색 영역의 범위 지정 (HSV 색 공간)
lower_skin = np.array([0, 20, 70], dtype=np.uint8)
upper_skin = np.array([20, 255, 255], dtype=np.uint8)

img_width = 640
img_height = 480
lm_infer_width = 256
lm_infer_height = 256

net = cv2.dnn.readNet('blazehand.onnx')
#net = cv2.dnn.readNet('hand_landmark.onnx')

outNames = net.getUnconnectedOutLayersNames()
print(outNames)


cap = cv2.VideoCapture(1)

while True:
    time_start = time.time()
    try:
        roi = None
        ret, frame = cap.read()
        frame = cv2.resize(frame, (img_width, img_height))

        skin_image = frame.copy()
        # 크기가 작은 박스와 큰 박스 제거
        color_boxes = get_color_filtered_boxes(skin_image)

        # 바운딩 박스를 이미지에 그리기
        for (x, y, w, h) in color_boxes:
            cv2.rectangle(skin_image, (x, y), (x + w, y + h), (0, 255, 0), 2)


        for idx, color_box in enumerate(color_boxes):
            x, y, w, h = color_box
            cbox_ratio_width = w / lm_infer_width
            cbox_ratio_height = h / lm_infer_height

            roi = frame[y:y+h, x:x+w]
            roi = cv2.resize(roi, (lm_infer_width, lm_infer_height))


            preds = landmark_inference(roi)

            landmarks = preds[2]
            flag = preds[0]
            denorm_landmarks = denormalize_landmarks(landmarks)

            for i in range(len(flag)):
                landmark, flag = denorm_landmarks[i], flag[i]
                
                if flag>.5:
                    draw_landmarks(roi, landmark[:,:2], HAND_CONNECTIONS, size=2)




        if cv2.waitKey(1) == ord('q'):
            break
        time_cur = time.time()
        cv2.putText(frame, f"time spend: {time_cur - time_start}", (0, 50), cv2.FONT_HERSHEY_SIMPLEX, 2, (125, 125, 125), 2)
        
        #frame = cv2.resize(frame, (320, 240))
        if roi is not None:
            cv2.imshow('roi', roi)
        cv2.imshow('Camera Streaming', frame)
        cv2.imshow('Skin Extraction', skin_image)
    except Exception as e:
        traceback.print_exc()

cap.release()
cv2.destroyAllWindows()

미디어파이프 파이토치 데모코드에서 입력으로 사용하는 이미지 125,125 포인트 출력하면

0 ~ 1사이 값이 나온다.

extract_roi에서 정규화에 대한 내용은 못봣는데

해매다 보니 전처리 과정 어캐해야하는지 이해됫다.

bgr2rgb 하고 /256으로 나눠주면 끝

import cv2
import time
import numpy as np
import traceback


def get_color_filtered_boxes(image):
    # 이미지를 HSV 색 공간으로 변환
    hsv_image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
    # 살색 영역을 마스크로 만들기
    skin_mask = cv2.inRange(hsv_image, lower_skin, upper_skin)
    # 모폴로지 연산을 위한 구조 요소 생성
    kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (9, 9))
    # 모폴로지 열림 연산 적용
    skin_mask = cv2.morphologyEx(skin_mask, cv2.MORPH_OPEN, kernel)

    # 마스크를 이용하여 살색 영역 추출
    skin_image = cv2.bitwise_and(image, image, mask=skin_mask)

    # 살색 영역에 대한 바운딩 박스 추출
    contours, _ = cv2.findContours(skin_mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

    bounding_boxes = [cv2.boundingRect(cnt) for cnt in contours]
        
    # 크기가 작은 박스와 큰 박스 제거
    color_boxes = []
    for (x, y, w, h) in bounding_boxes:
        if w * h > 100 * 100:
            # 약간 박스 더크게
            #color_boxes.append((x - 30, y - 30, w + 60, h + 60))
            center_x = int((x + x + w) / 2)
            center_y = int((y + y + h) / 2)

            large = 0
            if w > h:
                large = w                
            if w < h:                
                large = h
            
            large = int(large * 0.7)
            color_boxes.append((center_x - large, center_y - large, 2 * large,  2 * large))


    return color_boxes


def landmark_inference(img):
    #tensor = img / 127.5 - 1.0
    tensor = img / 256
    blob = cv2.dnn.blobFromImage(tensor.astype(np.float32), swapRB=False, crop=False)
    net.setInput(blob)
    preds = net.forward(outNames)

    return preds

def denormalize_landmarks(landmarks):
    landmarks[:,:,:2] *= 256
    return landmarks


def draw_landmarks(img, points, connections=[], color=(0, 255, 0), size=2):
    points = points[:,:2]
    for point in points:
        x, y = point
        x, y = int(x), int(y)
        cv2.circle(img, (x, y), size, color, thickness=size)

    for connection in connections:
        x0, y0 = points[connection[0]]
        x1, y1 = points[connection[1]]
        x0, y0 = int(x0), int(y0)
        x1, y1 = int(x1), int(y1)
        cv2.line(img, (x0, y0), (x1, y1), (0,0,0), size)




HAND_CONNECTIONS = [
    (0, 1), (1, 2), (2, 3), (3, 4),
    (5, 6), (6, 7), (7, 8),
    (9, 10), (10, 11), (11, 12),
    (13, 14), (14, 15), (15, 16),
    (17, 18), (18, 19), (19, 20),
    (0, 5), (5, 9), (9, 13), (13, 17), (0, 17)
]






# 살색 영역의 범위 지정 (HSV 색 공간)
lower_skin = np.array([0, 20, 70], dtype=np.uint8)
upper_skin = np.array([20, 255, 255], dtype=np.uint8)

img_width = 640
img_height = 480
lm_infer_width = 256
lm_infer_height = 256

net = cv2.dnn.readNet('blazehand.onnx')
#net = cv2.dnn.readNet('hand_landmark.onnx')

outNames = net.getUnconnectedOutLayersNames()
print(outNames)


cap = cv2.VideoCapture(1)

while True:
    time_start = time.time()
    try:
        roi = None
        ret, frame = cap.read()
        frame = cv2.resize(frame, (img_width, img_height))

        skin_image = frame.copy()
        # 크기가 작은 박스와 큰 박스 제거
        color_boxes = get_color_filtered_boxes(skin_image)

        # 바운딩 박스를 이미지에 그리기
        for (x, y, w, h) in color_boxes:
            cv2.rectangle(skin_image, (x, y), (x + w, y + h), (0, 255, 0), 2)


        for idx, color_box in enumerate(color_boxes):
            x, y, w, h = color_box
            cbox_ratio_width = w / lm_infer_width
            cbox_ratio_height = h / lm_infer_height

            roi = frame[y:y+h, x:x+w]
            roi = cv2.resize(roi, (lm_infer_width, lm_infer_height))
            lm_input = cv2.cvtColor(roi, cv2.COLOR_BGR2RGB)


            preds = landmark_inference(lm_input)

            landmarks = preds[2]
            flag = preds[0]
            denorm_landmarks = denormalize_landmarks(landmarks)

            for i in range(len(flag)):
                landmark, flag = denorm_landmarks[i], flag[i]
                
                if flag>.5:
                    draw_landmarks(roi, landmark[:,:2], HAND_CONNECTIONS, size=2)




        if cv2.waitKey(1) == ord('q'):
            break
        time_cur = time.time()
        cv2.putText(frame, f"time spend: {time_cur - time_start}", (0, 50), cv2.FONT_HERSHEY_SIMPLEX, 2, (125, 125, 125), 2)
        
        #frame = cv2.resize(frame, (320, 240))
        if roi is not None:
            cv2.imshow('roi', roi)
        cv2.imshow('Camera Streaming', frame)
        cv2.imshow('Skin Extraction', skin_image)
    except Exception as e:
        traceback.print_exc()

cap.release()
cv2.destroyAllWindows()

블라즈팜도 언리얼에 비해서 잘된다.

패딩을 넣은게 문제인가

import cv2
import time
import numpy as np

def sigmoid(x):
  return 1 / (1 + np.exp(-x))

def draw_rect(frame, regressor, classificator, stride, anchor_count, column, row, anchor, offset):
    index = (int(row * 128 / stride) + column) * anchor_count + anchor + offset

    score = sigmoid(regressor[index][0])
    if score < 0.5: return

    x, y, w, h = classificator[index][:4]

    x += (column + 0.5) * stride - w / 2
    y += (row    + 0.5) * stride - h / 2
    x = int(x)
    y = int(y)
    w = int(w)
    h = int(h)
    frame = cv2.rectangle(frame, (x, y), (x + w, y + h), (255, 0, 0), 1)

net = cv2.dnn.readNet('palm_detection.onnx')
outNames = net.getUnconnectedOutLayersNames()
print(outNames)
cap = cv2.VideoCapture(0)
while True:
    time_start = time.time()
    ret, frame = cap.read()

    frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    frame = cv2.resize(frame, dsize=(128, 128))
    #tensor = (frame / 127.5 - 1.0).reshape((128, 128, 3)).transpose(2, 0, 1)

    tensor = (frame / 127.5 - 1.0).reshape((128, 128, 3))
    print(tensor.shape)
    blob = cv2.dnn.blobFromImage(tensor.astype(np.float32), swapRB=False, crop=False)
    print(blob.shape)

    #blob = cv2.dnn.blobFromImage(tensor.astype(np.float32), 1.0, (128, 128), (127.5, 127.5, 127.5), swapRB=True)
    net.setInput(blob)
    preds = net.forward(outNames)
    regressor = preds[0]
    classifier = preds[1]
    """
    print(regressor.shape)
    print(classifier.shape)
    print()
    """
    for y in range(16):
        for x in range(16):
            for a in range(2):
                draw_rect(frame, regressor[0], classifier[0], 8, 2, x, y, a, 0)

    for y in range(8):
        for x in range(8):
            for a in range(6):
                draw_rect(frame, regressor[0], classifier[0], 16, 6, x, y, a, 512)
    
    frame = cv2.resize(frame, dsize=(640, 480))

    if cv2.waitKey(1) == ord('q'):
        break
    frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
    time_cur = time.time()
    cv2.putText(frame, f"time spend: {time_cur - time_start}", (0, 50), cv2.FONT_HERSHEY_SIMPLEX, 2, (125, 125, 125), 2)
    cv2.imshow('Camera Streaming', frame)

cap.release()
cv2.destroyAllWindows()

저작자표시

'컴퓨터과학 > 언리얼' 카테고리의 다른 글

HandDesktop10 - blazehand c++, unreal 적용 (0)	2024.01.29
HandDesktop09 - 기존 문제들 원인 찾음, blaze 갈아엎기, 블라즈팜 문제해결 (0)	2024.01.26
HandDesktop07 - blaze hand 사용 준비하기 (0)	2024.01.26
HandDesktop06 - blaze palm 사용하기 (0)	2024.01.25
HandDesktop05 - 손 추정을 위한 blaze 추론 코드 살펴보기 (0)	2024.01.24

HandDesktop07 - blaze hand 사용 준비하기

ggamji 2024. 1. 26. 14:15

2024. 1. 26. 14:15

오탐 필터링하기

지난 글에 필터링 한다고 코드 추가했던거같지만

잘동작하지 않는데

좌표 값이 0에 가깝꺼니 0, 이미지보다 큰 경우가 발생

그래서 denormDets를 필터링하는 코드 추가

1. xmax, ymax가 10보다 작은경우 제거

2. xmin, ymin이 입력이미지(웹캠)보다 큰경우 제거

3. dets 영역이 너무 좁거나 50 x 40, 너무 큰경우 width0.7 x height0.7 제거

std::vector<Blaze::PalmDetection> Blaze::FilteringDets(std::vector<Blaze::PalmDetection> detections, int width, int height)
{
    std::vector<Blaze::PalmDetection> filteredDets;

    for (auto& denormDet : detections)
    {
        cv::Point2d startPt = cv::Point2d(denormDet.xmin, denormDet.ymin);
        if (startPt.x < 10 || startPt.y < 10)
            continue;
        if (startPt.x > width || startPt.y > height)
            continue;
        int w = denormDet.xmax - denormDet.xmin;
        int y = denormDet.ymax - denormDet.ymin;
        if ((w * y < 50 * 40 )|| (w * y  > (width * 0.7) * (height * 0.7)))
            continue;
        filteredDets.push_back(denormDet);
    }
    return filteredDets;
}

검출 성능이 아쉽긴하지만 위 조건은 잘 걸러내는편

검출결과 변환하기

이번에 할일은 검출결과를 변환

-> xy중심점, 스케일, 세타(방향각)을 반환

    def detection2roi(self, detection):
        """ Convert detections from detector to an oriented bounding box.

        Adapted from:
        # mediapipe/modules/face_landmark/face_detection_front_detection_to_roi.pbtxt

        The center and size of the box is calculated from the center 
        of the detected box. Rotation is calcualted from the vector
        between kp1 and kp2 relative to theta0. The box is scaled
        and shifted by dscale and dy.

        """
        if self.detection2roi_method == 'box':
            # compute box center and scale
            # use mediapipe/calculators/util/detections_to_rects_calculator.cc
            xc = (detection[:,1] + detection[:,3]) / 2
            yc = (detection[:,0] + detection[:,2]) / 2
            scale = (detection[:,3] - detection[:,1]) # assumes square boxes

        elif self.detection2roi_method == 'alignment':
            # compute box center and scale
            # use mediapipe/calculators/util/alignment_points_to_rects_calculator.cc
            xc = detection[:,4+2*self.kp1]
            yc = detection[:,4+2*self.kp1+1]
            x1 = detection[:,4+2*self.kp2]
            y1 = detection[:,4+2*self.kp2+1]
            scale = ((xc-x1)**2 + (yc-y1)**2).sqrt() * 2
        else:
            raise NotImplementedError(
                "detection2roi_method [%s] not supported"%self.detection2roi_method)

        yc += self.dy * scale
        scale *= self.dscale

        # compute box rotation
        x0 = detection[:,4+2*self.kp1]
        y0 = detection[:,4+2*self.kp1+1]
        x1 = detection[:,4+2*self.kp2]
        y1 = detection[:,4+2*self.kp2+1]
        #theta = np.arctan2(y0-y1, x0-x1) - self.theta0
        theta = torch.atan2(y0-y1, x0-x1) - self.theta0
        return xc, yc, scale, theta

블라즈해더에 필요한 변수, 함수선언

// Fill out your copyright notice in the Description page of Project Settings.

#pragma once
#define _USE_MATH_DEFINES
#include <math.h>

#include "PreOpenCVHeaders.h"
#include <opencv2/opencv.hpp>
#include "PostOpenCVHeaders.h"

#include "CoreMinimal.h"

/**
 * 
 */
class HANDDESKTOP_API Blaze
{
public:
	Blaze();
	~Blaze();

	cv::dnn::Net blazePalm;
	cv::dnn::Net blazeHand;


	//for resize and pad
	void ResizeAndPad(
		cv::Mat& srcimg, cv::Mat& img256,
		cv::Mat& img128, float& scale, cv::Scalar& pad
	);



	// var and funcs for blazepalm
	struct PalmDetection {
		float ymin;
		float xmin;
		float ymax;
		float xmax;
		cv::Point2d kp_arr[7];
		float score;
	};

	int blazePalmSize = 128;
	float palmMinScoreThresh = 0.4;
	float palmMinNMSThresh = 0.2;
	int palmMinNumKeyPoints = 7;

	std::vector<PalmDetection> PredictPalmDetections(cv::Mat& img);
	PalmDetection GetPalmDetection(cv::Mat regressor, cv::Mat classificator,
		int stride, int anchor_count, int column, int row, int anchor, int offset);
	float sigmoid(float x);
	std::vector<PalmDetection> DenormalizePalmDetections(std::vector<PalmDetection> detections, int width, int height, cv::Scalar pad);
	void DrawPalmDetections(cv::Mat& img, std::vector<Blaze::PalmDetection> denormDets);
	std::vector<PalmDetection> FilteringDets(std::vector<PalmDetection> detections, int width, int height);


	//var and func for blazehand
	void Detections2ROI(std::vector<Blaze::PalmDetection> dets,
		std::vector<float>& vec_xc, std::vector<float>& vec_yc,
		std::vector<float>& vec_scale, std::vector<float>& vec_theta
	);
	float blazePalmDy = -0.5;
	float blazePalmDScale = 2.6;
	float blazePalmTheta0 = M_PI / 2;
};

센터점, 스케일, 방향 계산코드 작성

void Blaze::Detections2ROI(std::vector<Blaze::PalmDetection> dets, std::vector<float>& vec_xc, std::vector<float>& vec_yc, std::vector<float>& vec_scale, std::vector<float>& vec_theta)
{
    for (int i = 0; i < dets.size(); i++)
    {
        Blaze::PalmDetection det = dets.at(i);
        float xc = (det.xmax + det.xmin) / 2;
        float yc = (det.ymax + det.ymin) / 2;
        float scale = (det.xmax - det.xmin); //assumes square box

        yc = blazePalmDy * scale;
        scale *= blazePalmDScale;

        //compute box rot
        float theta = std::atan2(det.kp_arr[0].y - det.kp_arr[1].y, det.kp_arr[0].x - det.kp_arr[1].x) - blazePalmTheta0;

        vec_xc.push_back(xc);
        vec_yc.push_back(yc);
        vec_scale.push_back(scale);
        vec_theta.push_back(theta);
    }
}

det2roi로 중심점, 스케일, 세타들가지고 입력텐서, 역어파인변환한 이미지, 박스를 가져옴

    xc, yc, scale, theta = palm_detector.detection2roi(palm_detections.cpu())
    img, affine2, box2 = hand_regressor.extract_roi(frame, xc, yc, theta, scale)

    def extract_roi(self, frame, xc, yc, theta, scale):

        # take points on unit square and transform them according to the roi
        points = torch.tensor([[-1, -1, 1, 1],
                            [-1, 1, -1, 1]], device=scale.device).view(1,2,4)
        points = points * scale.view(-1,1,1)/2
        theta = theta.view(-1, 1, 1)
        R = torch.cat((
            torch.cat((torch.cos(theta), -torch.sin(theta)), 2),
            torch.cat((torch.sin(theta), torch.cos(theta)), 2),
            ), 1)
        center = torch.cat((xc.view(-1,1,1), yc.view(-1,1,1)), 1)
        points = R @ points + center

        # use the points to compute the affine transform that maps 
        # these points back to the output square
        res = self.resolution
        points1 = np.array([[0, 0, res-1],
                            [0, res-1, 0]], dtype=np.float32).T
        affines = []
        imgs = []
        for i in range(points.shape[0]):
            pts = points[i, :, :3].cpu().numpy().T
            M = cv2.getAffineTransform(pts, points1)
            img = cv2.warpAffine(frame, M, (res,res))#, borderValue=127.5)
            img = torch.tensor(img, device=scale.device)
            imgs.append(img)
            affine = cv2.invertAffineTransform(M).astype('float32')
            affine = torch.tensor(affine, device=scale.device)
            affines.append(affine)
        if imgs:
            imgs = torch.stack(imgs).permute(0,3,1,2).float() / 255.#/ 127.5 - 1.0
            affines = torch.stack(affines)
        else:
            imgs = torch.zeros((0, 3, res, res), device=scale.device)
            affines = torch.zeros((0, 2, 3), device=scale.device)

        return imgs, affines, points

대충 이런식으로 한다는 코드인것같긴한데

여러 이미지에 대한 점들을 한번에 처리하는 파이썬 코드를

다중 처리는 힘드니 개별 처리하도록 c++로 구현해야함

보다보니 정리된게

img는 역 어파인변환된 이미지, 블라즈 핸드 추론용

affine은 역어파인 행렬, 추론결과(정방향 좌표들) 반정규화하는데(정방향을 실제이미지방향으로) 사용

box는 손감싸는 박스 좌표

    img, affine2, box2 = hand_regressor.extract_roi(frame, xc, yc, theta, scale)

    if len(img) != 0:
        print(img.shape)
        # 파이토치 텐서를 넘파이 배열로 변환
        array = img.squeeze().numpy()
        array = np.transpose(array, (1, 2, 0))
        print(array.shape)
        print()
        # 넘파이 배열을 이미지로 변환
        image = cv2.cvtColor(array, cv2.COLOR_RGB2BGR)

        # 이미지를 표시
        cv2.imshow("Image", image)
    flags2, handed2, normalized_landmarks2 = hand_regressor(img.to(gpu))
    landmarks2 = hand_regressor.denormalize_landmarks(normalized_landmarks2.cpu(), affine2)

어파인변환이미지, 역어파인변환행렬, 박스좌표(기울어진)를 구하기 위해 별도 테스트 코드 작성

1. 원본 이미지

2. 어파인 변환 이미지

3. 어파인 변환이미지 -> 역어파인변환결과

#include <opencv2/opencv.hpp>

int main() {
    cv::Mat img = cv::imread("testimg.jpg");


    float theta = -30;
    cv::Rect box(260, 240, 120, 120);

    float scale = 2.6;

    // 회전 중심점 계산
    cv::Point2f center(box.x + box.width / 2, box.y + box.height / 2);

    // 회전 변환 행렬 계산
    cv::Mat rotationMatrix = cv::getRotationMatrix2D(center, theta, 1.0);


    // 원래 detection 꼭지점
    cv::Point2f originaDetPoints[4] = {
    cv::Point2f(box.x, box.y),
    cv::Point2f(box.x + box.width, box.y),
    cv::Point2f(box.x + box.width, box.y + box.height),
    cv::Point2f(box.x, box.y + box.height)
    };

    // 회전된 상자의 꼭지점 계산
    cv::Point2f startRotatedPoints[4];

    // 목표 꼭지점(블라즈 핸드는 256이므로
    cv::Point2f targetRotatedPoint[4] = {
        cv::Point2f(0, 0),
        cv::Point2f(256, 0),
        cv::Point2f(256, 256),
        cv::Point2f(0, 256)

    };

    for (int i = 0; i < 4; i++)
    {
        cv::Point2f startRotatedPoint = originaDetPoints[i] - center;
        float x = startRotatedPoint.x * std::cos(theta * CV_PI / 180) - startRotatedPoint.y * std::sin(theta * CV_PI / 180);
        float y = startRotatedPoint.x * std::sin(theta * CV_PI / 180) + startRotatedPoint.y * std::cos(theta * CV_PI / 180);
        x = x * scale;
        y = y * scale;
        startRotatedPoints[i] = cv::Point2f(x, y) + center;
    }

    // 회전된 상자 그리기
    for (int i = 0; i < 4; i++) {
        cv::line(img, startRotatedPoints[i], startRotatedPoints[(i + 1) % 4], cv::Scalar(0, 255, 0), 2);
    }


    // 어파인 변환 행렬 계산
    cv::Mat affineTransform_Mat = cv::getAffineTransform(startRotatedPoints, targetRotatedPoint);
    cv::Mat inv_affine_tranform_Mat;
    cv::invertAffineTransform(affineTransform_Mat, inv_affine_tranform_Mat);
    
    // 역/정 어파인변환 행렬로 이미지 변환
    cv::Mat affine_img;
    cv::Mat inv_affine_img;
    cv::warpAffine(img, affine_img, affineTransform_Mat, cv::Size(256, 256));
    cv::warpAffine(affine_img, inv_affine_img, inv_affine_tranform_Mat, img.size());


    // 결과 이미지 출력
    cv::imshow("img", img);
    cv::imshow("img2affine_img", affine_img);
    cv::imshow("affine_img2inv_affine_img", inv_affine_img);

    cv::waitKey(0);
    cv::destroyAllWindows();

    return 0;
}

이 코드를 언리엘에 적합하게 수정

블라즈 헤더에 핸드를 위한 변수, 함수 다시 정리

	//var and func for blazehand
	void Detections2ROI(std::vector<Blaze::PalmDetection> dets,
		std::vector<float>& vec_xc, std::vector<float>& vec_yc,
		std::vector<float>& vec_scale, std::vector<float>& vec_theta
	);
	float blazePalmDy = -0.5;
	float blazePalmDScale = 2.6;
	float blazePalmTheta0 = CV_PI / 2;

	void extract_roi(cv::Mat frame, std::vector<float>& vec_xc, std::vector<float>& vec_yc,
		std::vector<float>& vec_scale, std::vector<float>& vec_theta, std::vector<PalmDetection> denormDets,
		std::vector<cv::Mat>& imgs, std::vector<cv::Mat>& affines, std::vector<BoxROI>& boxROIs);

	int blazeHandSize = 256;
};

디텍션즈투ROI에서 scale이 잘 맞지 않아 디폴트스케일값으로 설정

센터, 스케일, 세타 벡터에 담기

void Blaze::Detections2ROI(std::vector<Blaze::PalmDetection> dets, std::vector<float>& vec_xc, std::vector<float>& vec_yc, std::vector<float>& vec_scale, std::vector<float>& vec_theta)
{
    for (int i = 0; i < dets.size(); i++)
    {
        Blaze::PalmDetection det = dets.at(i);
        float xc = (det.xmax + det.xmin) / 2;
        float yc = (det.ymax + det.ymin) / 2;
        float scale = blazePalmDScale;
        /*
        float scale = (det.xmax - det.xmin); //assumes square box

        yc = blazePalmDy * scale;
        scale *= blazePalmDScale;
        */

        //compute box rot
        float theta = std::atan2(det.kp_arr[0].y - det.kp_arr[1].y, det.kp_arr[0].x - det.kp_arr[1].x) - blazePalmTheta0;

        vec_xc.push_back(xc);
        vec_yc.push_back(yc);
        vec_scale.push_back(scale);
        vec_theta.push_back(theta);
    }
}

기존 센터 좌표, 세타, 스케일 가져오기

센터 점과 세타로 회전행렬 취득

원래 검출 점들 준비(작은 사각형)

roi 점들에 원래 검출점 잠깐담고

void Blaze::extract_roi(cv::Mat frame, std::vector<float>& vec_xc, std::vector<float>& vec_yc, 
    std::vector<float>& vec_scale, std::vector<float>& vec_theta, std::vector<Blaze::PalmDetection> denormDets,
    std::vector<cv::Mat>& vec_img, std::vector<cv::Mat>& vec_affine, std::vector<BoxROI>& vec_boxROI)
{
    cv::Mat img = frame.clone();
    for (int i = 0; i < vec_xc.size(); i++)
    {
        cv::Point2f center = cv::Point2f(vec_xc.at(i), vec_yc.at(i));

        Blaze::PalmDetection denormDet = denormDets.at(i);
        float theta = vec_theta.at(i);
        theta = theta * 180 / CV_PI;
        float scale = vec_scale.at(i);




        // 회전 변환 행렬 계산
        cv::Mat rotationMatrix = cv::getRotationMatrix2D(center, theta, 1.0);


        // 원래 detection 꼭지점
        cv::Point2f originaDetPoints[4] = {
            cv::Point2f(denormDet.xmin, denormDet.ymin),
            cv::Point2f(denormDet.xmax, denormDet.ymin),
            cv::Point2f(denormDet.xmax, denormDet.ymax),
            cv::Point2f(denormDet.xmin, denormDet.ymax)
        };

        // 회전된 상자의 꼭지점 계산
        cv::Point2f startRotatedPoints[4];

        // 목표 꼭지점(블라즈 핸드는 256이므로
        cv::Point2f targetRotatedPoint[4] = {
            cv::Point2f(0, 0),
            cv::Point2f(256, 0),
            cv::Point2f(256, 256),
            cv::Point2f(0, 256)

        };


        Blaze::BoxROI roi;
        for (int j = 0; j < 4; j++)
        {
            cv::Point2f startRotatedPoint = originaDetPoints[j] - center;
            float x = startRotatedPoint.x * std::cos(theta * CV_PI / 180) - startRotatedPoint.y * std::sin(theta * CV_PI / 180);
            float y = startRotatedPoint.x * std::sin(theta * CV_PI / 180) + startRotatedPoint.y * std::cos(theta * CV_PI / 180);
            x = x * scale;
            y = y * scale;
            startRotatedPoints[j] = cv::Point2f(x, y) + center;

            roi.pts[j] = startRotatedPoints[j];
        }



        // 어파인 변환 행렬 계산
        cv::Mat affineTransform_Mat = cv::getAffineTransform(startRotatedPoints, targetRotatedPoint);
        cv::Mat inv_affine_tranform_Mat;
        cv::invertAffineTransform(affineTransform_Mat, inv_affine_tranform_Mat);

        // 정 어파인변환 행렬로 이미지 변환
        cv::Mat affine_img;
        cv::warpAffine(img, affine_img, affineTransform_Mat, cv::Size(blazeHandSize, blazeHandSize));




        /*

        std::ostringstream oss_info;
        oss_info << "center(" << center.x << "," << center.y << "), scale : " << scale;
        std::string oss_info_str = oss_info.str();
        cv::putText(frame, oss_info_str, cv::Point(30, 370), cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(255, 0, 255), 2);



        std::ostringstream oss_originaDetPoints;
        oss_originaDetPoints << "originaDetPoints pt0(" << originaDetPoints[0].x << "," << originaDetPoints[0].y << ")"
                << ", pt1(" << originaDetPoints[1].x << ", " << originaDetPoints[1].y << ")";
        std::string originaDetPoints_str = oss_originaDetPoints.str();
        cv::putText(frame, originaDetPoints_str, cv::Point(30, 400), cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(255, 0, 255), 2);

        std::ostringstream oss_startRotatedPoints;
        oss_startRotatedPoints << "startRotatedPoint pt0(" << startRotatedPoints[0].x << "," << startRotatedPoints[0].y<<")"
            << ", pt1(" << startRotatedPoints[1].x << ", " << startRotatedPoints[1].y << ")";

        std::string startRotatedPoints_str = oss_startRotatedPoints.str();
        cv::putText(frame, startRotatedPoints_str, cv::Point(30, 420), cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(255, 0, 255), 2);
        */




        vec_img.push_back(affine_img);
        vec_affine.push_back(inv_affine_tranform_Mat);
        vec_boxROI.push_back(roi);
    }
}

리드프레임엔 웹캠 이미지에 어파인 변환 손영상을 복붙하도록 수정

void ADesktopGameModeBase::ReadFrame()
{
	if (!capture.isOpened())
	{
		return;
	}
	capture.read(webcamImage);

	/*
	get filtered detections
	*/
	blaze.ResizeAndPad(webcamImage, img256, img128, scale, pad);
	//UE_LOG(LogTemp, Log, TEXT("scale value: %f, pad value: (%f, %f)"), scale, pad[0], pad[1]);
	std::vector<Blaze::PalmDetection> normDets = blaze.PredictPalmDetections(img128);
	std::vector<Blaze::PalmDetection> denormDets = blaze.DenormalizePalmDetections(normDets, webcamWidth, webcamHeight, pad);
	std::vector<Blaze::PalmDetection> filteredDets = blaze.FilteringDets(denormDets, webcamWidth, webcamHeight);



	/*
	get affined hand img, affine mat, ROI
	*/
	std::vector<float> vec_xc;
	std::vector<float> vec_yc;
	std::vector<float> vec_scale;
	std::vector<float> vec_theta;
	blaze.Detections2ROI(filteredDets, vec_xc, vec_yc, vec_scale, vec_theta);


	std::vector<cv::Mat> vec_img;
	std::vector<cv::Mat> vec_affine;
	std::vector<Blaze::BoxROI> vec_boxROI;

	blaze.extract_roi(webcamImage, vec_xc, vec_yc,
		vec_scale, vec_theta, filteredDets,
		vec_img, vec_affine, vec_boxROI);


	/*
	Draw
	*/
	blaze.DrawPalmDetections(webcamImage, filteredDets);

	std::string dets_size_str = "filtered dets : " + std::to_string(filteredDets.size()) + ", norm dets : " + std::to_string(normDets.size()) + ", denorm dets : " + std::to_string(denormDets.size());
	cv::putText(webcamImage, dets_size_str, cv::Point(30, 30), cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(255, 0, 0), 2);

	for (int i = 0; i < filteredDets.size(); i++)
	{
		auto& det = filteredDets.at(i);


		std::ostringstream oss;
		oss << "denorm dets : (" << det.xmin << ", " << det.ymin << "),(" <<
			det.xmax << ", " << det.ymax << "), center : " << vec_xc.at(i) <<","<<vec_yc.at(i);
		std::string det_str = oss.str();
		cv::putText(webcamImage, det_str, cv::Point(30, 50 + 20 * i), cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(0, 0, 255), 2);
	}



	std::ostringstream oss_vec;
	oss_vec << "vec_xc : " << vec_xc.size() << ", vec_scale : " << vec_scale.size() << ", vec_theta" << vec_theta.size();
	std::string oss_vec_str = oss_vec.str();
	cv::putText(webcamImage, oss_vec_str, cv::Point(30, 450), cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(0, 0, 255), 2);



	std::ostringstream oss_mat;
	oss_mat << "vec_img : " << vec_img.size() << ", vec_affine : " << vec_affine.size() << ", vec_boxROI" << vec_boxROI.size();
	std::string oss_mat_str = oss_mat.str();
	cv::putText(webcamImage, oss_mat_str, cv::Point(30, 465), cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(0, 0, 255), 2);




	for (int i = 0; i < vec_img.size(); i++)
	{
		cv::Mat tmp = vec_img.at(i);
		cv::resize(tmp, tmp, cv::Size(100, 100));
		cv::Mat roi = webcamImage(cv::Rect(400 + (i * 100), 300, 100, 100));
		tmp.copyTo(roi);
	}

	/*
	blaze.DrawPalmDetections(img128, normDets);
	cv::Mat roi1 = webcamImage(cv::Rect(1000, 500, img128.cols, img128.rows));
	img128.copyTo(roi1);
	*/


	MatToTexture2D(webcamImage);


	/*
	모니터 시각화
	*/
	ScreensToCVMats();
	CVMatsToTextures();
}

원래 extract roi 한 이후에는 추론하고 시각화가지만

먼저 roi 드로잉부터 구현해보면

    xc, yc, scale, theta = palm_detector.detection2roi(palm_detections.cpu())
    img, affine2, box2 = hand_regressor.extract_roi(frame, xc, yc, theta, scale)

    if len(img) != 0:
        print(img.shape)
        # 파이토치 텐서를 넘파이 배열로 변환
        array = img.squeeze().numpy()
        array = np.transpose(array, (1, 2, 0))
        print(array.shape)
        print()
        # 넘파이 배열을 이미지로 변환
        image = cv2.cvtColor(array, cv2.COLOR_RGB2BGR)

        # 이미지를 표시
        cv2.imshow("Image", image)
    flags2, handed2, normalized_landmarks2 = hand_regressor(img.to(gpu))
    landmarks2 = hand_regressor.denormalize_landmarks(normalized_landmarks2.cpu(), affine2)
    

    for i in range(len(flags)):
        landmark, flag = landmarks[i], flags[i]
        if flag>.5:
            draw_landmarks(frame, landmark[:,:2], FACE_CONNECTIONS, size=1)


    for i in range(len(flags2)):
        landmark, flag = landmarks2[i], flags2[i]
        if flag>.5:
            draw_landmarks(frame, landmark[:,:2], HAND_CONNECTIONS, size=2)

    draw_roi(frame, box)
    draw_roi(frame, box2)
    draw_detections(frame, face_detections)
    draw_detections(frame, palm_detections)

블라즈 헤더에 드로 roi 함수 추가하고

	void extract_roi(cv::Mat frame, std::vector<float>& vec_xc, std::vector<float>& vec_yc,
		std::vector<float>& vec_scale, std::vector<float>& vec_theta, std::vector<PalmDetection> denormDets,
		std::vector<cv::Mat>& imgs, std::vector<cv::Mat>& affines, std::vector<BoxROI>& boxROIs);

	int blazeHandSize = 256;

	void DrawROI(cv::Mat& frame, std::vector<BoxROI> vec_box);
};

드로 roi 구현하고

void Blaze::DrawROI(cv::Mat& frame, std::vector<Blaze::BoxROI> vec_box)
{
    for (auto& roi : vec_box)
    {
        cv::line(frame, roi.pts[0], roi.pts[1], cv::Scalar(125, 0, 0), 2);
        cv::line(frame, roi.pts[1], roi.pts[2], cv::Scalar(125, 0, 0), 2);
        cv::line(frame, roi.pts[2], roi.pts[3], cv::Scalar(125, 0, 0), 2);
        cv::line(frame, roi.pts[3], roi.pts[0], cv::Scalar(125, 0, 0), 2);
    }
}

데스크탑게임모드베이스 맨 끝에 드로ROI 추가



	for (int i = 0; i < vec_img.size(); i++)
	{
		cv::Mat tmp = vec_img.at(i);
		cv::resize(tmp, tmp, cv::Size(100, 100));
		cv::Mat roi = webcamImage(cv::Rect(500 + (i * 100), 350, 100, 100));
		tmp.copyTo(roi);
	}


	blaze.DrawROI(webcamImage, vec_boxROI);




	MatToTexture2D(webcamImage);


	/*
	모니터 시각화
	*/
	ScreensToCVMats();
	CVMatsToTextures();
}

손을 놓칠때가 많긴 하지만 일단 ROI 좌표도 잘 가져와서 드로잉 됨

손가락이 약간 잘리니 디폴트 스케일 값을 이렇게 수정

	float blazePalmDScale = 3;

저작자표시

'컴퓨터과학 > 언리얼' 카테고리의 다른 글

HandDesktop09 - 기존 문제들 원인 찾음, blaze 갈아엎기, 블라즈팜 문제해결 (0)	2024.01.26
HandDesktop08 - blaze hand 사용하기 (0)	2024.01.26
HandDesktop06 - blaze palm 사용하기 (0)	2024.01.25
HandDesktop05 - 손 추정을 위한 blaze 추론 코드 살펴보기 (0)	2024.01.24
HandDesktop04 - 다중 모니터 구현하기 (0)	2024.01.24

HandDesktop06 - blaze palm 사용하기

ggamji 2024. 1. 25. 14:08

2024. 1. 25. 14:08

우선 먼저 웹캠 스트리밍부터 다시해보자

BP_Screen 생성하고 이름을 BP_ScreenCam, 좀 작게 보도록 스케일을 줄여 배치

웹캠 영상을 위한 이미지와 택스처, 준비하고

 */
UCLASS()
class HANDDESKTOP_API ADesktopGameModeBase : public AGameModeBase
{
	GENERATED_BODY()

protected:
	// Called when the game starts or when spawned
	virtual void BeginPlay() override;

public:
	UFUNCTION(BlueprintCallable)
	void ReadFrame();




	int monitorWidth = 1920;
	int monitorHeight = 1080;
	UPROPERTY(EditAnywhere, BlueprintReadWrite)
	UTexture2D* imageTextureScreen1;
	UPROPERTY(EditAnywhere, BlueprintReadWrite)
	UTexture2D* imageTextureScreen2;
	UPROPERTY(EditAnywhere, BlueprintReadWrite)
	UTexture2D* imageTextureScreen3;
	cv::Mat imageScreen1;
	cv::Mat imageScreen2;
	cv::Mat imageScreen3;

	void ScreensToCVMats();
	void CVMatsToTextures();



	int webcamWidth = 1280;
	int webcamHeight = 720;

	cv::VideoCapture capture;
	cv::Mat webcamImage;
	UPROPERTY(EditAnywhere, BlueprintReadWrite)
	UTexture2D* webcamTexture;
	void MatToTexture2D(const cv::Mat InMat);

};

이전에 주석처리한 코드 고쳐서 수정

* 노트북 웹캠이 아니라 따로 연결한 웹캠쓸꺼라 비디오캡처 1

* 웹캠 set으로 1280x720 받아오도록 설정.

* 하지만 스크린이 FHD 사이즈라 텍스처 크기를 모니터 기준으로함



void ADesktopGameModeBase::BeginPlay()
{
	Super::BeginPlay();

	capture = cv::VideoCapture(1);

	if (!capture.isOpened())
	{
		UE_LOG(LogTemp, Log, TEXT("Open Webcam failed"));
		return;
	}
	else
	{
		UE_LOG(LogTemp, Log, TEXT("Open Webcam Success"));
	}
    	capture.set(cv::CAP_PROP_FRAME_WIDTH, webcamWidth);
	capture.set(cv::CAP_PROP_FRAME_HEIGHT, webcamHeight);
    
	webcamTexture = UTexture2D::CreateTransient(monitorWidth, monitorHeight, PF_B8G8R8A8);


	imageScreen1 = cv::Mat(monitorHeight, monitorWidth, CV_8UC4);
	imageScreen2 = cv::Mat(monitorHeight, monitorWidth, CV_8UC4);
	imageScreen3 = cv::Mat(monitorHeight, monitorWidth, CV_8UC4);
	imageTextureScreen1 = UTexture2D::CreateTransient(monitorWidth, monitorHeight, PF_B8G8R8A8);
	imageTextureScreen2 = UTexture2D::CreateTransient(monitorWidth, monitorHeight, PF_B8G8R8A8);
	imageTextureScreen3 = UTexture2D::CreateTransient(monitorWidth, monitorHeight, PF_B8G8R8A8);

}


void ADesktopGameModeBase::ReadFrame()
{

	if (!capture.isOpened())
	{
		return;
	}
	capture.read(webcamImage);
	MatToTexture2D(webcamImage);



	ScreensToCVMats();
	CVMatsToTextures();
}

MatToTexture2D가 기존에 imageTexture로 되있던걸 webcamTexture로 수정

* 웹캠 영상이 1280 x 720이라 FHD로 리사이징하여 처리

void ADesktopGameModeBase::MatToTexture2D(const cv::Mat InMat)
{
	if (InMat.type() == CV_8UC3)//example for pre-conversion of Mat
	{
		cv::Mat resizedImage;
		cv::resize(InMat, resizedImage, cv::Size(monitorWidth, monitorHeight));
		cv::Mat bgraImage;
		//if the Mat is in BGR space, convert it to BGRA. There is no three channel texture in UE (at least with eight bit)
		cv::cvtColor(resizedImage, bgraImage, cv::COLOR_BGR2BGRA);

		//Texture->SRGB = 0;//set to 0 if Mat is not in srgb (which is likely when coming from a webcam)
		//other settings of the texture can also be changed here
		//Texture->UpdateResource();

		//actually copy the data to the new texture
		FTexture2DMipMap& Mip = webcamTexture->GetPlatformData()->Mips[0];
		void* Data = Mip.BulkData.Lock(LOCK_READ_WRITE);//lock the texture data
		FMemory::Memcpy(Data, bgraImage.data, bgraImage.total() * bgraImage.elemSize());//copy the data
		Mip.BulkData.Unlock();
		webcamTexture->PostEditChange();
		webcamTexture->UpdateResource();
	}
	else if (InMat.type() == CV_8UC4)
	{
		//actually copy the data to the new texture
		FTexture2DMipMap& Mip = webcamTexture->GetPlatformData()->Mips[0];
		void* Data = Mip.BulkData.Lock(LOCK_READ_WRITE);//lock the texture data
		FMemory::Memcpy(Data, InMat.data, InMat.total() * InMat.elemSize());//copy the data
		Mip.BulkData.Unlock();
		webcamTexture->PostEditChange();
		webcamTexture->UpdateResource();
	}
	//if the texture hasnt the right pixel format, abort.
	webcamTexture->PostEditChange();
	webcamTexture->UpdateResource();
}

}

웹캠 영상 띄우기 위해

데스크톱 게임모드 베이스의 변수에

StrScreenWebcam 디폴트값은 BP_ScreenWebcam과

ScreenCam 추가

스크린웹캠을 위한 이벤트 그래프 추가

이벤트 틱쪽도 추가

웹캠 영상이 잘나오는걸 확인

영상 추론, 전처리, 후처리 코드를 모아둘

None타입 클래스 Blaze 생성

opencv를 사용할것이므로 Blaze.h에 와서 opencv 헤더부터 추가

#pragma once

#include "PreOpenCVHeaders.h"
#include <opencv2/opencv.hpp>
#include "PostOpenCVHeaders.h"

#include "CoreMinimal.h"

/**
 * 
 */
class HANDDESKTOP_API Blaze
{
public:
	Blaze();
	~Blaze();
};

먼저 모델로드부터 작성

#pragma once

#include "PreOpenCVHeaders.h"
#include <opencv2/opencv.hpp>
#include "PostOpenCVHeaders.h"

#include "CoreMinimal.h"

/**
 * 
 */
class HANDDESKTOP_API Blaze
{
public:
	Blaze();
	~Blaze();

	cv::dnn::Net blazePalm;
	cv::dnn::Net blazeHand;
};

최근 blazepam은 256 x 256 입력받으나

tflite2onnx로 변환을 못해서 구버전 blazepalm으로 변환시켜 사용

이 구버전 blazepalm은 128 x 128 입력받음

// Fill out your copyright notice in the Description page of Project Settings.


#include "Blaze.h"

Blaze::Blaze()
{
	this->blazePalm = cv::dnn::readNet("c:/blazepalm_old.onnx");
	this->blazeHand = cv::dnn::readNet("c:/blazehand.onnx");
}

Blaze::~Blaze()
{
}

이제 리사이징, 패딩 파이썬 코드를 cpp로 작성

def resize_pad(img):
    """ resize and pad images to be input to the detectors

    The face and palm detector networks take 256x256 and 128x128 images
    as input. As such the input image is padded and resized to fit the
    size while maintaing the aspect ratio.

    Returns:
        img1: 256x256
        img2: 128x128
        scale: scale factor between original image and 256x256 image
        pad: pixels of padding in the original image
    """

    size0 = img.shape
    if size0[0]>=size0[1]:
        h1 = 256
        w1 = 256 * size0[1] // size0[0]
        padh = 0
        padw = 256 - w1
        scale = size0[1] / w1
    else:
        h1 = 256 * size0[0] // size0[1]
        w1 = 256
        padh = 256 - h1
        padw = 0
        scale = size0[0] / h1
    padh1 = padh//2
    padh2 = padh//2 + padh%2
    padw1 = padw//2
    padw2 = padw//2 + padw%2
    img1 = cv2.resize(img, (w1,h1))
    img1 = np.pad(img1, ((padh1, padh2), (padw1, padw2), (0,0)))
    pad = (int(padh1 * scale), int(padw1 * scale))
    img2 = cv2.resize(img1, (128,128))
    
    // img256을 srcimg의 (0, 0) 좌표에 그리기
    cv::Mat roi1 = srcimg(cv::Rect(0, 0, img256.cols, img256.rows));
    img256.copyTo(roi1);

    // img128를 srcimg의 (300, 0) 좌표에 그리기
    cv::Mat roi2 = srcimg(cv::Rect(300, 0, img128.cols, img128.rows));
    img128.copyTo(roi2);
    return img1, img2, scale, pad

리사이즈엔 패드 cpp 코드

void Blaze::ResizeAndPad(
	cv::Mat& srcimg, cv::Mat& img256,
	cv::Mat& img128, float& scale, cv::Scalar& pad
)
{
    float h1, w1;
    int padw, padh;

    cv::Size size0 = srcimg.size();
    if (size0.height >= size0.width) {
        h1 = 256;
        w1 = 256 * size0.width / size0.height;
        padh = 0;
        padw = static_cast<int>(256 - w1);
        scale = size0.width / static_cast<float>(w1);
    }
    else {
        h1 = 256 * size0.height / size0.width;
        w1 = 256;
        padh = static_cast<int>(256 - h1);
        padw = 0;
        scale = size0.height / static_cast<float>(h1);
    }

    //UE_LOG(LogTemp, Log, TEXT("scale value: %f, size0.height: %d, size0.width : %d, h1 : %f"), scale, size0.height, size0.width, h1);

    int padh1 = static_cast<int>(padh / 2);
    int padh2 = static_cast<int>(padh / 2) + static_cast<int>(padh % 2);
    int padw1 = static_cast<int>(padw / 2);
    int padw2 = static_cast<int>(padw / 2) + static_cast<int>(padw % 2);
    pad = cv::Scalar(static_cast<float>(padh1 * scale), static_cast<float>(padw1 * scale));

    
    cv::resize(srcimg, img256, cv::Size(w1, h1));
    cv::copyMakeBorder(img256, img256, padh1, padh2, padw1, padw2, cv::BORDER_CONSTANT, cv::Scalar(0, 0, 0));
    cv::resize(img256, img128, cv::Size(128, 128));

    // img256을 srcimg의 (0, 0) 좌표에 그리기
    cv::Mat roi1 = srcimg(cv::Rect(0, 0, img256.cols, img256.rows));
    img256.copyTo(roi1);

    // img128를 srcimg의 (300, 0) 좌표에 그리기
    cv::Mat roi2 = srcimg(cv::Rect(300, 0, img128.cols, img128.rows));
    img128.copyTo(roi2);
}

데스크탑게임모드베이스로 돌아와

read frame 중에 동작하도록 약간 고쳐주자

리사이즈 앤 패드 후 스케일과 패드도 출력


void ADesktopGameModeBase::ReadFrame()
{

	if (!capture.isOpened())
	{
		return;
	}
	capture.read(webcamImage);


	Blaze::ResizeAndPad(webcamImage, img256, img128, scale, pad);
	UE_LOG(LogTemp, Log, TEXT("scale value: %f, pad value: (%f, %f)"), scale, pad[0], pad[1]);


	MatToTexture2D(webcamImage);


	ScreensToCVMats();
	CVMatsToTextures();
}

256x256, 128 x 128 이미지가 패드 붙여서 잘나온다.

스케일은 5, 패드는 280, 0이 나오는데 왜이렇게 나오는지 잠깐보면

지금 입력되는 이미지는 1280 x 720 해상도로

리사이즈 패드 루틴에서

위스가 기므로 아래 조건에 들어간다.

스케일을 구하기위해 h1은 256 x 720 / 1280 = 144

scale = 720 / 144 = 5

기존의 폭이 720이었으나 256 x 256이미지에서 패딩을 제외하면 144가 되므로 5배가 줄어든게 맞음

    int h1, w1, padw, padh;

    cv::Size size0 = srcimg.size();
    if (size0.height >= size0.width) {
        h1 = 256;
        w1 = 256 * size0.width / size0.height;
        padh = 0;
        padw = 256 - w1;
        scale = size0.width / static_cast<float>(w1);
    }
    else {
        h1 = 256 * size0.height / size0.width;
        w1 = 256;
        padh = 256 - h1;
        padw = 0;
        scale = size0.height / static_cast<float>(h1);
    }

위스가 길기때문에 패딩은 h방향으로 붙는데

padh = 256 - h1 = 256 - 144 = 112

padh1 = 112 / 2 = 56

pad = (padh1 * scale, 0) = (56 x 5 = 280, 0)

패딩 길이가 280이여야 위아래로 붙였을때 원본 이미지의 종횡 길이가 일치함

일단 리사이즈 앤 패드로 패딩처리된 256, 128 이미지, 스케일, 패딩(h길이, w길이) 값 가져옴

잠깐 파이썬으로 돌아와 보면

이제 팜 디텍션을 할 차례

이전에 차 시뮬레이터 만들면서 팜 디텍터만 이미 사용해봤었는데

(아래의 블라즈팜 오닉스 추론 예제 참고해서 c++ 코드로만들었음)

https://colab.research.google.com/drive/1dgKi8btAKu2ihB7XOJbNwBcs7246B79_?usp=sharing#scrollTo=AVtkwBtXXG7p

BlazePalm ONNX Inference Test

Colaboratory notebook

colab.research.google.com

mediapipepytorch는 파이썬, 토치 기준으로 작성되어있지만

여기선 c++, opencv dnn으로 블라즈팜 추론 수행해야함

    img1, img2, scale, pad = resize_pad(frame)

    if back_detector:
        normalized_face_detections = face_detector.predict_on_image(img1)
    else:
        normalized_face_detections = face_detector.predict_on_image(img2)
    normalized_palm_detections = palm_detector.predict_on_image(img1)

일단 블라즈팜 디텍터 코드를 들어가서보면

전처리 -> 추론 -> 텐서투 디텍션 -> NMS 필터링 순으로 동작된다.

먼저 전처리 코드부터 보자

    def predict_on_batch(self, x):
        """Makes a prediction on a batch of images.

        Arguments:
            x: a NumPy array of shape (b, H, W, 3) or a PyTorch tensor of
               shape (b, 3, H, W). The height and width should be 128 pixels.

        Returns:
            A list containing a tensor of face detections for each image in 
            the batch. If no faces are found for an image, returns a tensor
            of shape (0, 17).

        Each face detection is a PyTorch tensor consisting of 17 numbers:
            - ymin, xmin, ymax, xmax
            - x,y-coordinates for the 6 keypoints
            - confidence score
        """
        if isinstance(x, np.ndarray):
            x = torch.from_numpy(x).permute((0, 3, 1, 2))

        assert x.shape[1] == 3
        assert x.shape[2] == self.y_scale
        assert x.shape[3] == self.x_scale

        # 1. Preprocess the images into tensors:
        x = x.to(self._device())
        x = self._preprocess(x)

        # 2. Run the neural network:
        with torch.no_grad():
            out = self.__call__(x)

        # 3. Postprocess the raw predictions:
        detections = self._tensors_to_detections(out[0], out[1], self.anchors)

        # 4. Non-maximum suppression to remove overlapping detections:
        filtered_detections = []
        for i in range(len(detections)):
            faces = self._weighted_non_max_suppression(detections[i])
            faces = torch.stack(faces) if len(faces) > 0 else torch.zeros((0, self.num_coords+1))
            filtered_detections.append(faces)

        return filtered_detections

전처리 코드는 /255 뿐인데 옆에 주석으로 # 127.5 - 1.0으로 되어있다.

cvdnn으로 팜디텍터 전처리할때는 /127.5 - 1했었는데 여긴 왜 /255한진 모르겠다.

    def _preprocess(self, x):
        """Converts the image pixels to the range [-1, 1]."""
        return x.float() / 255.# 127.5 - 1.0

일단 전에 만들어둔 전처리 + 추론 코드는 이런식이다.

전처리로 정규화후

W x H x C 이미지를

N C W H 블롭으로 만들고

출력레이어 이름 지정, 출력 레이어 결과 받아올 outputs 준비

net.forward(출력결과받을행렬, 출력레이어이름들) 로 출력결과 받아옴

    cv::Mat tensor;
    inputImg.convertTo(tensor, CV_32F, 1 / 127.5, -1.0);
    cv::Mat blob = cv::dnn::blobFromImage(tensor, 1.0, tensor.size(), 0, false, false, CV_32F);
    std::vector<cv::String> outNames(2);
    outNames[0] = "regressors";
    outNames[1] = "classificators";

    net.setInput(blob);
    std::vector<cv::Mat> outputs;
    net.forward(outputs, outNames);

    cv::Mat classificator = outputs[0];
    cv::Mat regressor = outputs[1];

파이썬에서 텐서즈 투 디텍션을보면

꽤 복잡하다

스코어 텐서를 시그모이드해서 점수가져오고

디코드 박스로 박스가져오고

최소 스코어 임계치로 필터링하는 내용

    def _tensors_to_detections(self, raw_box_tensor, raw_score_tensor, anchors):
        """The output of the neural network is a tensor of shape (b, 896, 16)
        containing the bounding box regressor predictions, as well as a tensor 
        of shape (b, 896, 1) with the classification confidences.

        This function converts these two "raw" tensors into proper detections.
        Returns a list of (num_detections, 17) tensors, one for each image in
        the batch.

        This is based on the source code from:
        mediapipe/calculators/tflite/tflite_tensors_to_detections_calculator.cc
        mediapipe/calculators/tflite/tflite_tensors_to_detections_calculator.proto
        """
        assert raw_box_tensor.ndimension() == 3
        assert raw_box_tensor.shape[1] == self.num_anchors
        assert raw_box_tensor.shape[2] == self.num_coords

        assert raw_score_tensor.ndimension() == 3
        assert raw_score_tensor.shape[1] == self.num_anchors
        assert raw_score_tensor.shape[2] == self.num_classes

        assert raw_box_tensor.shape[0] == raw_score_tensor.shape[0]
        
        detection_boxes = self._decode_boxes(raw_box_tensor, anchors)
        
        thresh = self.score_clipping_thresh
        raw_score_tensor = raw_score_tensor.clamp(-thresh, thresh)
        detection_scores = raw_score_tensor.sigmoid().squeeze(dim=-1)
        
        # Note: we stripped off the last dimension from the scores tensor
        # because there is only has one class. Now we can simply use a mask
        # to filter out the boxes with too low confidence.
        mask = detection_scores >= self.min_score_thresh

        # Because each image from the batch can have a different number of
        # detections, process them one at a time using a loop.
        output_detections = []
        for i in range(raw_box_tensor.shape[0]):
            boxes = detection_boxes[i, mask[i]]
            scores = detection_scores[i, mask[i]].unsqueeze(dim=-1)
            output_detections.append(torch.cat((boxes, scores), dim=-1))

        return output_detections

    def _decode_boxes(self, raw_boxes, anchors):
        """Converts the predictions into actual coordinates using
        the anchor boxes. Processes the entire batch at once.
        """
        boxes = torch.zeros_like(raw_boxes)

        x_center = raw_boxes[..., 0] / self.x_scale * anchors[:, 2] + anchors[:, 0]
        y_center = raw_boxes[..., 1] / self.y_scale * anchors[:, 3] + anchors[:, 1]

        w = raw_boxes[..., 2] / self.w_scale * anchors[:, 2]
        h = raw_boxes[..., 3] / self.h_scale * anchors[:, 3]

        boxes[..., 0] = y_center - h / 2.  # ymin
        boxes[..., 1] = x_center - w / 2.  # xmin
        boxes[..., 2] = y_center + h / 2.  # ymax
        boxes[..., 3] = x_center + w / 2.  # xmax

        for k in range(self.num_keypoints):
            offset = 4 + k*2
            keypoint_x = raw_boxes[..., offset    ] / self.x_scale * anchors[:, 2] + anchors[:, 0]
            keypoint_y = raw_boxes[..., offset + 1] / self.y_scale * anchors[:, 3] + anchors[:, 1]
            boxes[..., offset    ] = keypoint_x
            boxes[..., offset + 1] = keypoint_y

        return boxes

이전에 스코어랑, 박스 좌표를 가져오는 후처리 코드 만들어둔게 있긴한데

여기서는 키포인트까지 담지는 않고, (스코어,x,y,w,h) 순으로 만들었엇다.

파이썬 코드는 (ymin,xmin,ymax,xmax,kp1x,kp1y, ...., kp6x, pk6y, score)로 17개인 형태


struct DetectResult {
    float score;
    int x;
    int y;
    int w;
    int h;
};

    
std::vector<ASimulatorGameModeBase::DetectResult> ASimulatorGameModeBase::getDetectResults(cv::Mat frame, cv::dnn::Net net, cv::Size detectInputSize, cv::Size detectOutputSize)
{
    std::vector<DetectResult> beforeNMSResults;
    std::vector<DetectResult> afterNMSResults;
    std::vector<float> scores;
    std::vector<int> indices;
    std::vector<cv::Rect> boundingBoxes;

    cv::Mat inputImg;
    cv::resize(frame, inputImg, detectInputSize);
    cv::cvtColor(inputImg, inputImg, cv::COLOR_BGR2RGB);

    cv::Mat tensor;
    inputImg.convertTo(tensor, CV_32F, 1 / 127.5, -1.0);
    cv::Mat blob = cv::dnn::blobFromImage(tensor, 1.0, tensor.size(), 0, false, false, CV_32F);
    std::vector<cv::String> outNames(2);
    outNames[0] = "regressors";
    outNames[1] = "classificators";

    net.setInput(blob);
    std::vector<cv::Mat> outputs;
    net.forward(outputs, outNames);

    cv::Mat classificator = outputs[0];
    cv::Mat regressor = outputs[1];


    for (int y = 0; y < 16; ++y) {
        for (int x = 0; x < 16; ++x) {
            for (int a = 0; a < 2; ++a) {
                DetectResult res = getDetectResult(frame, regressor, classificator, 8, 2, x, y, a, 0, detectInputSize, detectOutputSize);
                if (res.score != 0)
                {
                    beforeNMSResults.push_back(res);
                    boundingBoxes.push_back(cv::Rect(res.x, res.y, res.w, res.h));
                    scores.push_back(res.score);
                }
            }
        }
    }

    for (int y = 0; y < 8; ++y) {
        for (int x = 0; x < 8; ++x) {
            for (int a = 0; a < 6; ++a) {
                DetectResult res = getDetectResult(frame, regressor, classificator, 16, 6, x, y, a, 512, detectInputSize, detectOutputSize);
                if (res.score != 0)
                {
                    beforeNMSResults.push_back(res);
                    boundingBoxes.push_back(cv::Rect(res.x, res.y, res.w, res.h));
                    scores.push_back(res.score);
                }
            }
        }
    }

    cv::dnn::NMSBoxes(boundingBoxes, scores, 0.5, 0.3, indices);

    for (int i = 0; i < indices.size(); i++) {
        int idx = indices[i];
        afterNMSResults.push_back(beforeNMSResults[idx]);
    }

    return afterNMSResults;
}

일단 blaze.h로 돌아와 팜 디텍션에 필요한 변수 함수 구조체 선언

팜 디텍션에 키포인트 좌표도 필요하긴 한데 일단 동작되는지만 보기위해 기존 코드 비슷하게 감


/**
 * 
 */
class HANDDESKTOP_API Blaze
{
public:
	Blaze();
	~Blaze();

	cv::dnn::Net blazePalm;
	cv::dnn::Net blazeHand;


	//for resize and pad
	static void ResizeAndPad(
		cv::Mat& srcimg, cv::Mat& img256,
		cv::Mat& img128, float& scale, cv::Scalar& pad
	);



	// var and funcs for blazepalm
	struct PalmDetection {
		int x;
		int y;
		int w;
		int h;
		float score;
	};


	float palmMinScoreThresh = 0.5;
	float palmMinNMSThresh = 0.3;
	int palmMinNumKeyPoints = 6;

	std::vector<PalmDetection> PredictPalmDetections(cv::Mat& img);
	PalmDetection GetPalmDetection(cv::Mat regressor, cv::Mat classificator,
		int stride, int anchor_count, int column, int row, int anchor, int offset);
	float sigmoid(float x);

};

기존 추론, 후처리 코드에는 반정규화도 포함되어있긴했지만

파이썬 코드 비슷하게 만들려고 따로 뺏음

일단 정규화된 팜 디텍션 검출 결과들 반환하는 코드 준비됬고

std::vector<Blaze::PalmDetection> Blaze::PredictPalmDetections(cv::Mat& img)
{
    std::vector<Blaze::PalmDetection> beforeNMSResults;
    std::vector<Blaze::PalmDetection> afterNMSResults;
    std::vector<float> scores;
    std::vector<int> indices;
    std::vector<cv::Rect> boundingBoxes;


    cv::Mat inputImg;
    cv::cvtColor(img, inputImg, cv::COLOR_BGR2RGB);

    cv::Mat tensor;
    inputImg.convertTo(tensor, CV_32F, 1 / 127.5, -1.0);
    cv::Mat blob = cv::dnn::blobFromImage(tensor, 1.0, tensor.size(), 0, false, false, CV_32F);
    std::vector<cv::String> outNames(2);
    outNames[0] = "regressors";
    outNames[1] = "classificators";

    blazePalm.setInput(blob);
    std::vector<cv::Mat> outputs;
    blazePalm.forward(outputs, outNames);

    cv::Mat classificator = outputs[0];
    cv::Mat regressor = outputs[1];


    for (int y = 0; y < 16; ++y) {
        for (int x = 0; x < 16; ++x) {
            for (int a = 0; a < 2; ++a) {
                PalmDetection res = GetPalmDetection(regressor, classificator, 8, 2, x, y, a, 0, detectInputSize, detectOutputSize);
                if (res.score != 0)
                {
                    beforeNMSResults.push_back(res);
                    boundingBoxes.push_back(cv::Rect(res.x, res.y, res.w, res.h));
                    scores.push_back(res.score);
                }
            }
        }
    }

    for (int y = 0; y < 8; ++y) {
        for (int x = 0; x < 8; ++x) {
            for (int a = 0; a < 6; ++a) {
                PalmDetection res = GetPalmDetection(regressor, classificator, 16, 6, x, y, a, 512, detectInputSize, detectOutputSize);
                if (res.score != 0)
                {
                    beforeNMSResults.push_back(res);
                    boundingBoxes.push_back(cv::Rect(res.x, res.y, res.w, res.h));
                    scores.push_back(res.score);
                }
            }
        }
    }


    cv::dnn::NMSBoxes(boundingBoxes, scores, palmMinScoreThresh, palmMinNMSThresh, indices);

    for (int i = 0; i < indices.size(); i++) {
        int idx = indices[i];
        afterNMSResults.push_back(beforeNMSResults[idx]);
    }

    return afterNMSResults;
}

Blaze::PalmDetection Blaze::GetPalmDetection(cv::Mat regressor, cv::Mat classificator,
    int stride, int anchor_count, int column, int row, int anchor, int offset) {

    Blaze::PalmDetection res{0, 0, 0, 0, 0.0f};

    int index = (int(row * 128 / stride) + column) * anchor_count + anchor + offset;
    float origin_score = regressor.at<float>(0, index, 0);
    float score = sigmoid(origin_score);
    if (score < palmMinScoreThresh) return res;

    float x = classificator.at<float>(0, index, 0);
    float y = classificator.at<float>(0, index, 1);
    float w = classificator.at<float>(0, index, 2);
    float h = classificator.at<float>(0, index, 3);


    x += (column + 0.5) * stride - w / 2;
    y += (row + 0.5) * stride - h / 2;

    res.score = score;
    res.x = int(x);
    res.y = int(y);
    res.w = int(w);
    res.h = int(h);
    return res;
}

float Blaze::sigmoid(float x) {
    return 1 / (1 + exp(-x));
}

반정규화 과정은

패딩 처리된 256 x 256 이미지 기준 정규화된 좌표들을

패딩 처리되어 1280 x 1280 형태로 돌린 후 패딩을 제거해

1280 x 720이미지에 맞게 위치를 조정하는 내용

def denormalize_detections(detections, scale, pad):
    """ maps detection coordinates from [0,1] to image coordinates

    The face and palm detector networks take 256x256 and 128x128 images
    as input. As such the input image is padded and resized to fit the
    size while maintaing the aspect ratio. This function maps the
    normalized coordinates back to the original image coordinates.

    Inputs:
        detections: nxm tensor. n is the number of detections.
            m is 4+2*k where the first 4 valuse are the bounding
            box coordinates and k is the number of additional
            keypoints output by the detector.
        scale: scalar that was used to resize the image
        pad: padding in the x and y dimensions

    """
    detections[:, 0] = detections[:, 0] * scale * 256 - pad[0]
    detections[:, 1] = detections[:, 1] * scale * 256 - pad[1]
    detections[:, 2] = detections[:, 2] * scale * 256 - pad[0]
    detections[:, 3] = detections[:, 3] * scale * 256 - pad[1]

    detections[:, 4::2] = detections[:, 4::2] * scale * 256 - pad[1]
    detections[:, 5::2] = detections[:, 5::2] * scale * 256 - pad[0]
    return detections

일단 내가 가지고 있는 팜디텍션 모델은 구버전이라 256 x 256이 아닌 128 x 128 사용하므로 조금 다름

코드 작성하다보니 xmin, ymin, xmax, ymax로 다루고 있었는데 내코드에는 xywh를 쓰고 있었네

잠깐 기존 PalmDetection 내용을 (ymin,xmin,ymax,xmax, score) 형태로 수정

블라즈 헤더

	struct PalmDetection {
		int ymin;
		int xmin;
		int ymax;
		int xmax;
		float score;
	};

블라드 cpp

Blaze::PalmDetection Blaze::GetPalmDetection(cv::Mat regressor, cv::Mat classificator,
    int stride, int anchor_count, int column, int row, int anchor, int offset) {

    Blaze::PalmDetection res{0, 0, 0, 0, 0.0f};

    int index = (int(row * 128 / stride) + column) * anchor_count + anchor + offset;
    float origin_score = regressor.at<float>(0, index, 0);
    float score = sigmoid(origin_score);
    if (score < palmMinScoreThresh) return res;

    float x = classificator.at<float>(0, index, 0);
    float y = classificator.at<float>(0, index, 1);
    float w = classificator.at<float>(0, index, 2);
    float h = classificator.at<float>(0, index, 3);


    x += (column + 0.5) * stride - w / 2;
    y += (row + 0.5) * stride - h / 2;

    res.ymin = int(y);
    res.xmin = int(x);
    res.ymax = int(y + h);
    res.xmax = int(x + w);
    res.score = score;
    return res;
}

블라즈팜 구버전은 128 사이즈 사용하므로

검출결과들 반정규화 코드는 이런식으로 작성

	// var and funcs for blazepalm
	struct PalmDetection {
		int ymin;
		int xmin;
		int ymax;
		int xmax;
		float score;
	};

	int blazePalmSize = 128;
	float palmMinScoreThresh = 0.5;
	float palmMinNMSThresh = 0.3;
	int palmMinNumKeyPoints = 6;

	std::vector<PalmDetection> PredictPalmDetections(cv::Mat& img);
	PalmDetection GetPalmDetection(cv::Mat regressor, cv::Mat classificator,
		int stride, int anchor_count, int column, int row, int anchor, int offset);
	float sigmoid(float x);
	std::vector<PalmDetection> DenormalizePalmDetections(std::vector<PalmDetection> detections, float scale, cv::Scalar pad);
};

std::vector<Blaze::PalmDetection> Blaze::DenormalizePalmDetections(std::vector<Blaze::PalmDetection> detections, float scale, cv::Scalar pad)
{

    std::vector<Blaze::PalmDetection> denormDets;

    for (auto& det : detections)
    {
        Blaze::PalmDetection denormDet;
        denormDet.ymin = det.ymin * scale * blazePalmSize - pad[0];
        denormDet.xmin = det.xmin * scale * blazePalmSize - pad[1];
        denormDet.ymax = det.ymax * scale * blazePalmSize - pad[0];
        denormDet.xmax = det.xmax * scale * blazePalmSize - pad[1];
        denormDet.score = det.score;
        denormDets.push_back(denormDet);
    }
    return denormDets;
}

이제 시각화 코드도 작성


	// var and funcs for blazepalm
	struct PalmDetection {
		int ymin;
		int xmin;
		int ymax;
		int xmax;
		float score;
	};

	int blazePalmSize = 128;
	float palmMinScoreThresh = 0.5;
	float palmMinNMSThresh = 0.3;
	int palmMinNumKeyPoints = 6;

	std::vector<PalmDetection> PredictPalmDetections(cv::Mat& img);
	PalmDetection GetPalmDetection(cv::Mat regressor, cv::Mat classificator,
		int stride, int anchor_count, int column, int row, int anchor, int offset);
	float sigmoid(float x);
	std::vector<PalmDetection> DenormalizePalmDetections(std::vector<PalmDetection> detections, float scale, cv::Scalar pad);
	void DrawPalmDetections(cv::Mat& img, std::vector<Blaze::PalmDetection> denormDets);

};

void Blaze::DrawPalmDetections(cv::Mat& img, std::vector<Blaze::PalmDetection> denormDets)
{
    for (auto& denormDet : denormDets)
    {
        cv::Point2d startPt = cv::Point2d(denormDet.xmin, denormDet.ymin);
        cv::Point2d endPt = cv::Point2d(denormDet.xmax, denormDet.ymax);
        cv::rectangle(img, cv::Rect(startPt, endPt), cv::Scalar(255, 0, 0), 1);
    }
}

일단 여기까지 기본적인 블라즈쪽 코드 구현은 다된듯하다.

하다가 꼬인게 데스크탑 게임모드베이스에서 블라즈 생성해서 쓰도록 수정

블라즈쪽에는 스태틱 함수를 그냥 함수로 변경

데스크탑 게임모드베이스 해더

	//var and functions with blaze
	Blaze blaze;
	cv::Mat img256;
	cv::Mat img128;
	float scale;
	cv::Scalar pad;

};

블라즈 헤더

class HANDDESKTOP_API Blaze
{
public:
	Blaze();
	~Blaze();

	cv::dnn::Net blazePalm;
	cv::dnn::Net blazeHand;


	//for resize and pad
	void ResizeAndPad(
		cv::Mat& srcimg, cv::Mat& img256,
		cv::Mat& img128, float& scale, cv::Scalar& pad
	);



	// var and funcs for blazepalm
	struct PalmDetection {
		int ymin;
		int xmin;
		int ymax;
		int xmax;
		float score;
	};

	int blazePalmSize = 128;
	float palmMinScoreThresh = 0.5;
	float palmMinNMSThresh = 0.3;
	int palmMinNumKeyPoints = 6;

	std::vector<PalmDetection> PredictPalmDetections(cv::Mat& img);
	PalmDetection GetPalmDetection(cv::Mat regressor, cv::Mat classificator,
		int stride, int anchor_count, int column, int row, int anchor, int offset);
	float sigmoid(float x);
	std::vector<PalmDetection> DenormalizePalmDetections(std::vector<PalmDetection> detections, float scale, cv::Scalar pad);
	void DrawPalmDetections(cv::Mat& img, std::vector<Blaze::PalmDetection> denormDets);

};

데스크탑 게임모드베이스 cpp - 비긴플레이에 블라즈 생성

#include "DesktopGameModeBase.h"


void ADesktopGameModeBase::BeginPlay()
{
	Super::BeginPlay();
	blaze = Blaze();

	capture = cv::VideoCapture(1);
	if (!capture.isOpened())
	{
		UE_LOG(LogTemp, Log, TEXT("Open Webcam failed"));
		return;
	}
	else
	{
		UE_LOG(LogTemp, Log, TEXT("Open Webcam Success"));
	}

리드프레임에서 블라즈에 구현한 함수들 사용


void ADesktopGameModeBase::ReadFrame()
{
	/*
	웹캠 처리
	*/
	if (!capture.isOpened())
	{
		return;
	}
	capture.read(webcamImage);


	blaze.ResizeAndPad(webcamImage, img256, img128, scale, pad);
	//UE_LOG(LogTemp, Log, TEXT("scale value: %f, pad value: (%f, %f)"), scale, pad[0], pad[1]);
	std::vector<Blaze::PalmDetection> normDets = blaze.PredictPalmDetections(img128);
	std::vector<Blaze::PalmDetection> denormDets = blaze.DenormalizePalmDetections(normDets, scale, pad);
	blaze.DrawPalmDetections(webcamImage, denormDets);


	std::string dets_str = "norm dets : " + std::to_string(normDets.size()) + ", denorm dets : " + std::to_string(denormDets.size());
	cv::putText(webcamImage, dets_str, cv::Point(0, 50), cv::FONT_HERSHEY_SIMPLEX, 0.7, cv::Scalar(125, 125, 125), 2);

	MatToTexture2D(webcamImage);


	/*
	모니터 시각화
	*/
	ScreensToCVMats();
	CVMatsToTextures();
}

다 된줄 알았는데

프레딕트 팜 디텍션즈에서 PalmDetection 바꾼걸 반영안시켜놔서 에러났다.

바뀐 PalmDetection에 맞게 수정

std::vector<Blaze::PalmDetection> Blaze::PredictPalmDetections(cv::Mat& img)
{
    std::vector<Blaze::PalmDetection> beforeNMSResults;
    std::vector<Blaze::PalmDetection> afterNMSResults;
    std::vector<float> scores;
    std::vector<int> indices;
    std::vector<cv::Rect> boundingBoxes;


    cv::Mat inputImg;
    cv::cvtColor(img, inputImg, cv::COLOR_BGR2RGB);

    cv::Mat tensor;
    inputImg.convertTo(tensor, CV_32F, 1 / 127.5, -1.0);
    cv::Mat blob = cv::dnn::blobFromImage(tensor, 1.0, tensor.size(), 0, false, false, CV_32F);
    std::vector<cv::String> outNames(2);
    outNames[0] = "regressors";
    outNames[1] = "classificators";

    blazePalm.setInput(blob);
    std::vector<cv::Mat> outputs;
    blazePalm.forward(outputs, outNames);

    cv::Mat classificator = outputs[0];
    cv::Mat regressor = outputs[1];


    for (int y = 0; y < 16; ++y) {
        for (int x = 0; x < 16; ++x) {
            for (int a = 0; a < 2; ++a) {
                PalmDetection res = GetPalmDetection(regressor, classificator, 8, 2, x, y, a, 0);
                if (res.score != 0)
                {
                    beforeNMSResults.push_back(res);

                    cv::Point2d startPt = cv::Point2d(res.xmin, res.ymin);
                    cv::Point2d endPt = cv::Point2d(res.xmax, res.ymax);

                    boundingBoxes.push_back(cv::Rect(startPt, endPt));
                    scores.push_back(res.score);
                }
            }
        }
    }

    for (int y = 0; y < 8; ++y) {
        for (int x = 0; x < 8; ++x) {
            for (int a = 0; a < 6; ++a) {
                PalmDetection res = GetPalmDetection(regressor, classificator, 16, 6, x, y, a, 512);
                if (res.score != 0)
                {
                    beforeNMSResults.push_back(res);
                    cv::Point2d startPt = cv::Point2d(res.xmin, res.ymin);
                    cv::Point2d endPt = cv::Point2d(res.xmax, res.ymax);

                    boundingBoxes.push_back(cv::Rect(startPt, endPt));
                    scores.push_back(res.score);
                }
            }
        }
    }


    cv::dnn::NMSBoxes(boundingBoxes, scores, palmMinScoreThresh, palmMinNMSThresh, indices);

    for (int i = 0; i < indices.size(); i++) {
        int idx = indices[i];
        afterNMSResults.push_back(beforeNMSResults[idx]);
    }

    return afterNMSResults;
}

실행해봣는데 드로잉이 안되서 결과도 출력시켜봤더니

오검출도 오검출이지만 좌표가 이상하다.

	blaze.ResizeAndPad(webcamImage, img256, img128, scale, pad);
	//UE_LOG(LogTemp, Log, TEXT("scale value: %f, pad value: (%f, %f)"), scale, pad[0], pad[1]);
	std::vector<Blaze::PalmDetection> normDets = blaze.PredictPalmDetections(img128);
	std::vector<Blaze::PalmDetection> denormDets = blaze.DenormalizePalmDetections(normDets, scale, pad);
	blaze.DrawPalmDetections(webcamImage, denormDets);


	std::string dets_size_str = "norm dets : " + std::to_string(normDets.size()) + ", denorm dets : " + std::to_string(denormDets.size());
	cv::putText(webcamImage, dets_size_str, cv::Point(30, 50), cv::FONT_HERSHEY_SIMPLEX, 0.7, cv::Scalar(255, 0, 0), 2);

	int i = 1;
	for (auto& det : denormDets)
	{

		std::ostringstream oss;
		oss << "denorm dets : (" << det.xmin << ", " << det.ymin << "),(" << det.xmax << ", " << det.ymax << ")";

		std::string det_str = oss.str();
		cv::putText(webcamImage, det_str, cv::Point(30, 50 + 50 * i), cv::FONT_HERSHEY_SIMPLEX, 0.7, cv::Scalar(0, 0, 255), 2);
		i++;
	}

denormDets에 들어간 값이 50000넘는걸봐서 원래 반정규화된 체로 들어가는 느낌이라

normDets도 출력시켜봤더니 정상적인 좌표값이 나온다

이 모델 검출 결과가 처음부터 반정규화 된체로 나오는듯?

전에는 차시뮬레이터 만들때는 가능했는데 지금은 왜 이상한가 계속 찾다보니

좌표 데이터가 128 x 128 입력 이미지 기준으로 나오고 있었고

여기 반정규화 코드와 맞지 않았다.

잠깐 내가 blazehand.onnx를 어디서 얻었는지 생각 안나서 잠깐 찾아봤는데 링크남김

blazepalm.onnx도 다운 가능한데 옛날 모델이 아니라 내가 작성한 코드는 못쓴다.

https://storage.googleapis.com/ailia-models/blazepalm/blazepalm.onnx

https://storage.googleapis.com/ailia-models/blazehand/blazehand.onnx

다시 반정규화 문제 해결하는걸로 돌아와서

구버전 블라즈팜은 128 x 128 기준 좌표를 반환하므로 /128로 정규화

Blaze::PalmDetection Blaze::GetPalmDetection(cv::Mat regressor, cv::Mat classificator,
    int stride, int anchor_count, int column, int row, int anchor, int offset) {

    Blaze::PalmDetection res{0, 0, 0, 0, 0.0f};

    int index = (int(row * 128 / stride) + column) * anchor_count + anchor + offset;
    float origin_score = regressor.at<float>(0, index, 0);
    float score = sigmoid(origin_score);
    if (score < palmMinScoreThresh) return res;

    float x = classificator.at<float>(0, index, 0);
    float y = classificator.at<float>(0, index, 1);
    float w = classificator.at<float>(0, index, 2);
    float h = classificator.at<float>(0, index, 3);


    x += (column + 0.5) * stride - w / 2;
    y += (row + 0.5) * stride - h / 2;

    res.ymin = (y) / blazePalmSize;
    res.xmin = (x) / blazePalmSize;
    res.ymax = (y + h) / blazePalmSize;
    res.xmax = (x + w) / blazePalmSize;
    res.score = score;
    return res;
}

반정규화 코드에서는 width, height 받아서 긴쪽으로 스케일링하여 1280 x 1280 크기 좌표들 만든뒤

패딩 만큼 -하여 조정

블라즈 해더 cpp

	std::vector<PalmDetection> PredictPalmDetections(cv::Mat& img);
	PalmDetection GetPalmDetection(cv::Mat regressor, cv::Mat classificator,
		int stride, int anchor_count, int column, int row, int anchor, int offset);
	float sigmoid(float x);
	std::vector<PalmDetection> DenormalizePalmDetections(std::vector<PalmDetection> detections, int width, int height, cv::Scalar pad);
	void DrawPalmDetections(cv::Mat& img, std::vector<Blaze::PalmDetection> denormDets);

std::vector<Blaze::PalmDetection> Blaze::DenormalizePalmDetections(std::vector<Blaze::PalmDetection> detections, int width, int height, cv::Scalar pad)
{

    std::vector<Blaze::PalmDetection> denormDets;

    int scale = 0;
    if (width > height)
        scale = width;
    else
        scale = height;

    for (auto& det : detections)
    {
        Blaze::PalmDetection denormDet;
        denormDet.ymin = det.ymin * scale - pad[0];
        denormDet.xmin = det.xmin * scale - pad[1];
        denormDet.ymax = det.ymax * scale - pad[0];
        denormDet.xmax = det.xmax * scale - pad[1];
        denormDet.score = det.score;
        denormDets.push_back(denormDet);
    }
    return denormDets;
}

손없을때 오탐하긴 하지만 대충 쫒아가긴 한다.

기존에 만든 블라즈팜 코드에는 박스 검출만하고 키포인트는 뭔지 몰라서 구현안했는데

키포인트가 손관절인걸 알았고 필요하니 팜디텍션에 추가

구버전 블라즈 핸드 파이썬 코드에서 키포인트는 이렇게 구하였다.

def draw_patch(regressor, classificator, stride, anchor_count, column, row, anchor, offset, color):
  index = (int(row * 128 / stride) + column) * anchor_count + anchor + offset

  score = sigmoid(regressor[index][0])
  if score < 0.5: return

  x, y, w, h = classificator[index][:4]

  x += (column + 0.5) * stride - w / 2
  y += (row    + 0.5) * stride - h / 2

  rect = patches.Rectangle((x, y), w, h, color = color, fill = None)
  ax.add_patch(rect)

  for key in range(7):
    x = classificator[index][4 + key * 2]
    y = classificator[index][5 + key * 2]
    
    x += (column + 0.5) * stride
    y += (row    + 0.5) * stride
    
    rect = patches.Rectangle((x, y), 1, 1, facecolor = color)
    ax.add_patch(rect)

팜디텍션 구조체는 이렇게 고치고

	// var and funcs for blazepalm
	struct PalmDetection {
		float ymin;
		float xmin;
		float ymax;
		float xmax;
		cv::Point2d kp_arr[7];
		float score;
	};

키포인트 들은 이렇게 뽑아냄

* 이상한 값 필터링을위해 중간에 /blazemPalmSize 한뒤에 0보다 작거나 1보다 큰경우 나와서 조건줌

Blaze::PalmDetection Blaze::GetPalmDetection(cv::Mat regressor, cv::Mat classificator,
    int stride, int anchor_count, int column, int row, int anchor, int offset) {

    Blaze::PalmDetection res;

    int index = (int(row * 128 / stride) + column) * anchor_count + anchor + offset;
    float origin_score = regressor.at<float>(0, index, 0);
    float score = sigmoid(origin_score);
    if (score < palmMinScoreThresh) return res;

    float x = classificator.at<float>(0, index, 0);
    float y = classificator.at<float>(0, index, 1);
    float w = classificator.at<float>(0, index, 2);
    float h = classificator.at<float>(0, index, 3);


    x += (column + 0.5) * stride - w / 2;
    y += (row + 0.5) * stride - h / 2;

    res.ymin = (y) / blazePalmSize;
    res.xmin = (x) / blazePalmSize;
    res.ymax = (y + h) / blazePalmSize;
    res.xmax = (x + w) / blazePalmSize;
    res.score = score;
    
    if ((res.ymin < 0) || (res.xmin < 0) || (res.xmax > 1) || (res.ymax > 1)) return res;

    std::vector<cv::Point2d> kpts;
    for (int key_id = 0; key_id < palmMinNumKeyPoints; key_id++)
    {
        float kpt_x = classificator.at<float>(0, index, 4 + key_id * 2);
        float kpt_y = classificator.at<float>(0, index, 5 + key_id * 2);
        kpt_x += (column + 0.5) * stride;
        kpt_x  = kpt_x / blazePalmSize;
        kpt_y += (row + 0.5) * stride;
        kpt_y  = kpt_y / blazePalmSize;
        //UE_LOG(LogTemp, Log, TEXT("kpt id(%d) : (%f, %f)"), key_id, kpt_x, kpt_y);
        res.kp_arr[key_id] = cv::Point2d(kpt_x, kpt_y);

    }
    return res;
}

키포인트 반정규화 코드도 추가

std::vector<Blaze::PalmDetection> Blaze::DenormalizePalmDetections(std::vector<Blaze::PalmDetection> detections, int width, int height, cv::Scalar pad)
{

    std::vector<Blaze::PalmDetection> denormDets;

    int scale = 0;
    if (width > height)
        scale = width;
    else
        scale = height;

    for (auto& det : detections)
    {
        Blaze::PalmDetection denormDet;
        denormDet.ymin = det.ymin * scale - pad[0];
        denormDet.xmin = det.xmin * scale - pad[1];
        denormDet.ymax = det.ymax * scale - pad[0];
        denormDet.xmax = det.xmax * scale - pad[1];
        denormDet.score = det.score;

        for (int i = 0; i < palmMinNumKeyPoints; i++)
        {
            cv::Point2d pt_new = cv::Point2d(det.kp_arr[i].x * scale - pad[1], det.kp_arr[i].y * scale - pad[0]);
            //UE_LOG(LogTemp, Log, TEXT("denorm kpt id(%d) : (%f, %f)"), i, pt_new.x, pt_new.y);
            denormDet.kp_arr[i] = pt_new;
        }
        
        denormDets.push_back(denormDet);
    }
    return denormDets;
}

드로잉도 추가

void Blaze::DrawPalmDetections(cv::Mat& img, std::vector<Blaze::PalmDetection> denormDets)
{
    for (auto& denormDet : denormDets)
    {
        cv::Point2d startPt = cv::Point2d(denormDet.xmin, denormDet.ymin);
        cv::Point2d endPt = cv::Point2d(denormDet.xmax, denormDet.ymax);
        cv::rectangle(img, cv::Rect(startPt, endPt), cv::Scalar(255, 0, 0), 1);

        for (int i = 0; i < palmMinNumKeyPoints; i++)
            cv::circle(img, denormDet.kp_arr[i], 5, cv::Scalar(255, 0, 0), -1);

        std::string score_str = std::to_string(static_cast<int>(denormDet.score * 100))+ "%";
        cv::putText(img, score_str, cv::Point(startPt.x, startPt.y - 20), cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(0, 0, 255), 2);

    }
}

원래 1280 x 720 이미지 사용했는데

종횡비도 너무 틀어지고

128 x 128 모델 입력으로 넣기엔 패딩까지 포함해서 너무 작아져서인지

스코어 임계치(0.35) 낮춰도 오판은 무시하더라도 손을 잘 찾질 못했다.

웹캠 해상도를 640 x 480으로 변경하여 사용하니 좀 나아진것같긴한데 여전하다.

차 시뮬레이터에서도 이정도로 못잡지는 않았는데 키포인트 코드 추가한뒤로 영잡질못한다

있다보니 괜찬아졋는데, 카메라 조명 조정되면서 뭔가 바뀐듯하기도하고

지금보니 블라즈팜이 왜 바뀐지 알겠다.

조금만 돌려도 너무 못찾음

블라즈 해더의 임계치 설정값은 이런식

	int blazePalmSize = 128;
	float palmMinScoreThresh = 0.4;
	float palmMinNMSThresh = 0.2;
	int palmMinNumKeyPoints = 7;

저작자표시

'컴퓨터과학 > 언리얼' 카테고리의 다른 글

HandDesktop08 - blaze hand 사용하기 (0)	2024.01.26
HandDesktop07 - blaze hand 사용 준비하기 (0)	2024.01.26
HandDesktop05 - 손 추정을 위한 blaze 추론 코드 살펴보기 (0)	2024.01.24
HandDesktop04 - 다중 모니터 구현하기 (0)	2024.01.24
HandDesktop03 - 다중 모니터 준비하기, 스트리밍 풀 문제 해결 (0)	2024.01.24

PREV 이전 1 2 3 4 5 NEXT 다음

깜지로 쓰려는 블로그