TY - GEN
T1 - Multimodal NLP for Embedded Psychological Text Analysis
AU - Esmi, Nima
AU - Borhani, Fatemeh
AU - Nezhadmoghaddam, Maryam
AU - Parsa, Mina Mohammadi
AU - Shahbahrami, Asadollah
AU - Gaydadjiev, Georgi
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Accurate psychological text analysis on embedded devices is essential for privacy-preserving mental health monitoring; however, deploying transformer-based models in such constrained environments remains challenging due to limited memory, computational power, and energy budgets. We present a multimodal framework that integrates a lightweight Transformer (DistilBERT) with a CNN-based image branch (ResNet-50) accelerated by the Hailo-8L edge AI board. Text inputs are processed both as sequences and as two-dimensional image-like representations, enabling robust handling of noisy, informal text containing emojis and misspellings. Evaluations on the Depression Twitter dataset and the SST-2 benchmark demonstrate that the fusion model consistently outperforms text-only and image-only baselines, achieving up to 91.6% accuracy while remaining feasible for deployment on a Raspberry Pi 5. Overall, the proposed design offers a practical balance between efficiency and accuracy, enabling real-world, privacy-aware psychological monitoring directly on personal embedded devices.
AB - Accurate psychological text analysis on embedded devices is essential for privacy-preserving mental health monitoring; however, deploying transformer-based models in such constrained environments remains challenging due to limited memory, computational power, and energy budgets. We present a multimodal framework that integrates a lightweight Transformer (DistilBERT) with a CNN-based image branch (ResNet-50) accelerated by the Hailo-8L edge AI board. Text inputs are processed both as sequences and as two-dimensional image-like representations, enabling robust handling of noisy, informal text containing emojis and misspellings. Evaluations on the Depression Twitter dataset and the SST-2 benchmark demonstrate that the fusion model consistently outperforms text-only and image-only baselines, achieving up to 91.6% accuracy while remaining feasible for deployment on a Raspberry Pi 5. Overall, the proposed design offers a practical balance between efficiency and accuracy, enabling real-world, privacy-aware psychological monitoring directly on personal embedded devices.
KW - Edge AI
KW - Embedded systems
KW - Large language models
KW - Mental health monitoring
KW - Multimodal NLP
UR - https://www.scopus.com/pages/publications/105031371397
U2 - 10.1109/IoT69654.2025.11297713
DO - 10.1109/IoT69654.2025.11297713
M3 - Conference contribution
AN - SCOPUS:105031371397
T3 - Proceedings of 2025 9th International Conference on Internet of Things and Applications, IoT 2025
BT - Proceedings of 2025 9th International Conference on Internet of Things and Applications, IoT 2025
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2025 9th International Conference on Internet of Things and Applications, IoT 2025
Y2 - 29 October 2025 through 30 October 2025
ER -