VidQ: Video Query Using Optimized Audio-Visual Processing

Noor Felemban; Fidan Mehmeti; Thomas F.La Porta

doi:10.1109/TNET.2022.3215601

VidQ: Video Query Using Optimized Audio-Visual Processing

Noor Felemban^*
, Fidan Mehmeti
, Thomas F.La Porta

^*Corresponding author for this work

Computer Engineering Department (CE)

Research output: Contribution to journal › Article › peer-review

Abstract

As mobile devices become more prevalent in everyday life and the amount of recorded and stored videos increases, efficient techniques for searching video content become more important. When a user sends a query searching for a specific action in a large amount of data, the goal is to respond to the query accurately and fast. In this paper, we address the problem of responding to queries which search for specific actions in mobile devices in a timely manner by utilizing both visual and audio processing approaches. We build a system, called VidQ, which consists of several stages, and that uses various Convolutional Neural Networks (CNNs) and Speech APIs to respond to such queries. As the state-of-the-art computer vision and speech algorithms are computationally intensive, we use servers with GPUs to assist mobile users in the process. After a query is issued, we identify the different stages of processing that will take place. Then, we identify the order of these stages. Finally, solving an optimization problem that captures the system behavior, we distribute the process among the available network resources to minimize the processing time. Results show that VidQ reduces the completion time by at least 50% compared to other approaches.

Original language	English
Pages (from-to)	1338-1352
Number of pages	15
Journal	IEEE/ACM Transactions on Networking
Volume	31
Issue number	3
DOIs	https://doi.org/10.1109/TNET.2022.3215601
State	Published - 1 Jun 2023

Keywords

convolutional neural networks
deep learning
heuristics
Mobile networks
performance optimization

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1109/TNET.2022.3215601

Cite this

@article{46cf2f9fac284a6ab7deab0879b87090,

title = "VidQ: Video Query Using Optimized Audio-Visual Processing",

abstract = "As mobile devices become more prevalent in everyday life and the amount of recorded and stored videos increases, efficient techniques for searching video content become more important. When a user sends a query searching for a specific action in a large amount of data, the goal is to respond to the query accurately and fast. In this paper, we address the problem of responding to queries which search for specific actions in mobile devices in a timely manner by utilizing both visual and audio processing approaches. We build a system, called VidQ, which consists of several stages, and that uses various Convolutional Neural Networks (CNNs) and Speech APIs to respond to such queries. As the state-of-the-art computer vision and speech algorithms are computationally intensive, we use servers with GPUs to assist mobile users in the process. After a query is issued, we identify the different stages of processing that will take place. Then, we identify the order of these stages. Finally, solving an optimization problem that captures the system behavior, we distribute the process among the available network resources to minimize the processing time. Results show that VidQ reduces the completion time by at least 50\% compared to other approaches.",

keywords = "convolutional neural networks, deep learning, heuristics, Mobile networks, performance optimization",

author = "Noor Felemban and Fidan Mehmeti and Porta, \{Thomas F.La\}",

note = "Publisher Copyright: {\textcopyright} 1993-2012 IEEE.",

year = "2023",

month = jun,

day = "1",

doi = "10.1109/TNET.2022.3215601",

language = "English",

volume = "31",

pages = "1338--1352",

journal = "IEEE/ACM Transactions on Networking",

issn = "1063-6692",

number = "3",

}

TY - JOUR

T1 - VidQ

T2 - Video Query Using Optimized Audio-Visual Processing

AU - Felemban, Noor

AU - Mehmeti, Fidan

AU - Porta, Thomas F.La

PY - 2023/6/1

Y1 - 2023/6/1

N2 - As mobile devices become more prevalent in everyday life and the amount of recorded and stored videos increases, efficient techniques for searching video content become more important. When a user sends a query searching for a specific action in a large amount of data, the goal is to respond to the query accurately and fast. In this paper, we address the problem of responding to queries which search for specific actions in mobile devices in a timely manner by utilizing both visual and audio processing approaches. We build a system, called VidQ, which consists of several stages, and that uses various Convolutional Neural Networks (CNNs) and Speech APIs to respond to such queries. As the state-of-the-art computer vision and speech algorithms are computationally intensive, we use servers with GPUs to assist mobile users in the process. After a query is issued, we identify the different stages of processing that will take place. Then, we identify the order of these stages. Finally, solving an optimization problem that captures the system behavior, we distribute the process among the available network resources to minimize the processing time. Results show that VidQ reduces the completion time by at least 50% compared to other approaches.

AB - As mobile devices become more prevalent in everyday life and the amount of recorded and stored videos increases, efficient techniques for searching video content become more important. When a user sends a query searching for a specific action in a large amount of data, the goal is to respond to the query accurately and fast. In this paper, we address the problem of responding to queries which search for specific actions in mobile devices in a timely manner by utilizing both visual and audio processing approaches. We build a system, called VidQ, which consists of several stages, and that uses various Convolutional Neural Networks (CNNs) and Speech APIs to respond to such queries. As the state-of-the-art computer vision and speech algorithms are computationally intensive, we use servers with GPUs to assist mobile users in the process. After a query is issued, we identify the different stages of processing that will take place. Then, we identify the order of these stages. Finally, solving an optimization problem that captures the system behavior, we distribute the process among the available network resources to minimize the processing time. Results show that VidQ reduces the completion time by at least 50% compared to other approaches.

KW - convolutional neural networks

KW - deep learning

KW - heuristics

KW - Mobile networks

KW - performance optimization

UR - https://www.scopus.com/pages/publications/85141541511

U2 - 10.1109/TNET.2022.3215601

DO - 10.1109/TNET.2022.3215601

M3 - Article

AN - SCOPUS:85141541511

SN - 1063-6692

VL - 31

SP - 1338

EP - 1352

JO - IEEE/ACM Transactions on Networking

JF - IEEE/ACM Transactions on Networking

IS - 3

ER -

VidQ: Video Query Using Optimized Audio-Visual Processing

Abstract

Keywords

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this