Research Engineer & AI Specialist
AI and Computer Vision Engineer with 4+ years of experience specializing in Vision-Language Models and Video LLMs. First-author contributor to publications at top-tier venues (ECCV 2024, EMNLP 2025).
Passionate about advancing AI through innovative research and practical applications
I am a Senior Computer Vision Engineer at CreativAI, specializing in large-scale video analytics and AI research. With a Master's degree in AI and Data Science from Ottawa University (GPA: A+), I combine deep research expertise with practical engineering skills.
My research centers on Vision-Language Models (VLMs) and Video Large Language Models, with published work at prestigious conferences including ECCV 2024 and EMNLP 2025. I've contributed to breakthrough projects in video understanding and multimodal AI.
Currently working on large-scale video projects analyzing 1000+ hours of content using efficient algorithms, parallel computing, and video knowledge graphs. Previously collaborated with VISION-CAIR lab at KAUST University under Prof. Mohammed El Hossieny superviosn.
Years Experience
Published Papers
Major Projects
Total Citations
Journey through innovative roles in AI and computer vision
Working on large-scale video projects analyzing 1000+ hours of content with efficient algorithms, parallel computations, and video knowledge graphs. Developing cutting-edge solutions for video analytics and insights aggregation.
Conducted research under Prof. Mohammed El Hossieny superviosn focusing on vision-language models, long video understanding, and video benchmarking. Published 3 papers at top-tier conferences.
Analyzed football matches using advanced computer vision techniques including detection, bird's eye view generation, player tracking, jersey classification, and soccer event detection.
Developed a virtual fitting room mobile application using GANs. Deployed on AWS using EC2, S3, ECS, and ECR services. Published research paper in IJACSA journal.
Academic foundation in AI and computer science
Contributing to the advancement of AI and computer vision
Developed an efficient retrieval mechanism for comprehending videos of arbitrary lengths such as movies and TV shows, advancing the field of long-form video understanding.
Designed a robust benchmark for extreme long video understanding, providing the research community with tools to evaluate multimodal models on extended video content.
Developed a multimodal Large Language Model specifically designed for video understanding, capable of processing temporal visual and textual data with state-of-the-art results.
Showcasing innovative solutions in AI and computer vision
Complete computer vision pipeline for football analysis including player detection, tracking, jersey classification, and event detection using YOLOv8 and custom algorithms.
Contributed to the official YOLOv8 library with improvements in object tracking algorithms, enhancing performance for real-time applications.
Advanced sentiment analysis for Arabic Twitter data using state-of-the-art models including AraBERTv2, MARBERT, and custom preprocessing techniques.
Deep learning-based network intrusion detection system using advanced neural network architectures for cybersecurity applications.
Bachelor's graduation project: Virtual reality environment with NPCs that respond to user speech using NLP models, creating immersive interactive experiences.
Netflix stock price prediction model incorporating Twitter sentiment analysis, combining financial data with social media sentiment for enhanced predictions.
Technical proficiencies and professional capabilities
May 2022
Mar 2022
Stanford/Coursera
Udacity Nanodegree
Let's collaborate on innovative AI and computer vision projects
kirolosatef1997@gmail.com
(+20) 1280636202
El Obour, Cairo, Egypt