Improving Temporal Interpolation of Head and Body Pose using Gaussian
Process Regression in a Matrix Completion Setting
release_uygt3s7pqjfwxp4xhjozm7kqna
by
Stephanie Tan, Hayley Hung
2018
Abstract
This paper presents a model for head and body pose estimation (HBPE) when
labelled samples are highly sparse. The current state-of-the-art multimodal
approach to HBPE utilizes the matrix completion method in a transductive
setting to predict pose labels for unobserved samples. Based on this approach,
the proposed method tackles HBPE when manually annotated ground truth labels
are temporally sparse. We posit that the current state of the art approach
oversimplifies the temporal sparsity assumption by using Laplacian smoothing.
Our final solution uses: i) Gaussian process regression in place of Laplacian
smoothing, ii) head and body coupling, and iii) nuclear norm minimization in
the matrix completion setting. The model is applied to the challenging SALSA
dataset for benchmark against the state-of-the-art method. Our presented
formulation outperforms the state-of-the-art significantly in this particular
setting, e.g. at 5% ground truth labels as training data, head pose accuracy
and body pose accuracy is approximately 62% and 70%, respectively. As well as
fitting a more flexible model to missing labels in time, we posit that our
approach also loosens the head and body coupling constraint, allowing for a
more expressive model of the head and body pose typically seen during
conversational interaction in groups. This provides a new baseline to improve
upon for future integration of multimodal sensor data for the purpose of HBPE.
In text/plain
format
Archived Files and Locations
application/pdf 2.8 MB
file_k2j2ewkforeafgvx5f22kadfze
|
arxiv.org (repository) web.archive.org (webarchive) |
1808.01837v1
access all versions, variants, and formats of this works (eg, pre-prints)