Muhammad Naeem

Muhammad Naeem
Graduate Student in LIESMARS
Wuhan University, Hubei, China
Advisor: Prof. Xiongwu Xiao

Email:m.naeem4288@gmail.com
LinkedIn

Researcher with experience in deep learning, computer vision, and image generation spanning both industry and academia. Currently I'm pursuing my Master's in Photogrammetry and Remote Sensing from Wuhan University - China. I am working under the supervision of Prof. Xiongwu Xiao in the State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing (LIESMARS).

Previously, I worked as a Senior AI Developer and Team Lead at Octaloop Technologies, leading the Computer Vision Team in projects like Virtual Room Staging, Virtual Try-on, and Video Game Streaming Highlights Generation, ensuring seamless AI integration and mentoring team members.

I did my B.Sc in Computer Science (major in AI) from COMSATS University, Islamabad, Pakistan, where I worked under Dr. Jamal Hussain Shah.

Education

Wuhan University, China
Master's in Photogrammetry and Remote Sensing (LIEASMARS)
September 2024 – June 2026
Relevant Coursework: Data Structures and Algorithm, Digital Image Processing, Machine Learning, Computer Vision, Pattern Recognition

COMSATS University Islamabad, Pakistan
Bachelor of Science in Computer Science
September 2019 – June 2023
Relevant Coursework: Data Structures and Algorithm, Digital Image Processing, Machine Learning, Computer Vision, Pattern Recognition

Experience

Octaloop Technologies
Senior AI Developer / Team Lead, Computer Vision Team
April 2024 – Aug 2025
- Lead and oversee the Computer Vision Team, developing advanced solutions for projects such as Virtual Room Staging, Virtual Try-on, and Video Game Streaming Highlights Generation using scene analysis.
- Supervise and mentor team members, ensuring seamless integration of AI functionalities into client applications and fostering a collaborative environment to drive innovation.
- Stay updated with the latest advancements in AI and computer vision, applying cutting-edge techniques to enhance project performance and client satisfaction.
NineSol Technologies
AI Developer
April 2023 – April 2024
- Developed and integrated the latest models into the backend of mobile and web applications, enhancing functionality with features such as Background Removal, Colorization, Document Scanning, Virtual Try-on, and Transcription.
- Stayed informed with the latest industry trends and emerging technologies, applying this knowledge to refine and advance features within diverse applications.
- Collaborated with other team members to smoothly embed advanced functionalities, ensuring seamless user experiences in mobile applications.

Projects

Advanced Point Cloud Upsampling for 3D Data Enhancement
Python, PyTorch, MinkowskiEngin, Open3D,

Developed a novel method to improve the quality of 3D point cloud data, enabling better reconstruction of sparse and noisy 3D scans for applications like autonomous driving and augmented reality.
Integrated cutting-edge techniques from recent research to create an efficient and high-performing upsampling model, optimized for limited computational resources.
Evaluated the approach on standard 3D datasets, achieving significant improvements in accuracy and detail compared to existing solutions.

Virtual Staging
Python, PyTorch, Stable Diffusion, ControlNet, OpenCV, NumPy, Scikit-Learn

Implemented an inpainting approach using Stable Diffusion with ControlNet to furnish empty room images without altering existing items and color combinations.
Applied semantic segmentation on empty rooms to identify areas and items to preserve, generating binary images from segmented masks.
Used tailored prompts for bedrooms and living rooms, passing binary masked images, prompts, and empty room images to the Stable Diffusion inpainting model to generate furniture in the specified areas.

Video-games Streaming Highlight Generation
Python, PyTorch, OpenCV, NLTK, Scikit-Learn

Developed a system to generate interesting highlights from 6-7 hours of video game streaming videos using three main approaches: Sound, Transcription, and Movement.
Implemented separate AI techniques for each approach to identify key timestamps, leveraging audio analysis, natural language processing, and computer vision.
Combined results from all three approaches in a specific sequence to produce the best highlights and generate a final video.

Weed Detection Robotic Car
YOLO-v5, TensorFlow, Flask, Tkinter, VS code, Colab, Roboflow, labelimg

Compiled a diverse dataset comprising four categories of weed. Implemented YOLO-v5 for real-time object detection, achieving precise identification of weeds in agricultural fields.
Designed and built a robotic car prototype that uses a smartphone camera and DC motors to wirelessly stream live images to the YOLO-v5 model via FastAPI.
Integrated an Arduino-controlled LED system with a matrix-based system to precisely locate and eliminate detected weeds.
Created a website to showcase project insights, allowing users to engage with the trained YOLO-v5 model and evaluate its effectiveness.
Used data visualization to highlight the project's technological innovation and potential for sustainable agricultural practices.

Portrait Background Remover
Python, PyTorch, U-Net, NumPy, Scikit-Learn, OpenCV

Used U-Net structures for both rough and detailed portrait segmentation, addressing the problem of subject isolation in images.
Utilized internet and publicly available datasets like COCO for training data.

Colorize Gray scale Image
Python, PyTorch, U-Net, Fast API, Colab, VS code

Included two time-scale update rules, self-attention GANs, and inflection points for effective and error-free coloring.
Used U-Net architecture with model-specific backbone choices (resnet34/resnet101), spectral normalization, and self-attention.
Implemented Perceptual Loss based on VGG16 during NoGAN learning for realistic colorization.

Image to Talking Portrait
Python, Dlib, face-recognition, PyTorch, ffmpeg, OpenCV, Fast API, Colab, VS code

Created talking videos by combining lips and head movements of a reference video with a source image using Dlib for facial landmark identification.
Used Wav2Lip-hq for precise lip movement synchronization and ESRGAN for video up-sampling.
Integrated synchronized audio and used ffmpeg and OpenCV for efficient video processing.

Document Scanner
Python, OpenCV, PyTorch, Fast API, Colab, VS code

Created a Document Scanner with a Geometric Unwarping Transformer using Doc3D dataset for accurate geometric unwrapping.
Used DocProj dataset for Illumination Correction Transformer to resolve illumination problems during document scanning.

Sky Changer
Python, PyTorch, UNET, Fast API, Colab, VS code

Developed a model with sky-changing (Multiband blender), high-res processing (UNET), and low-res processing (hrnet-ocr).
Enhanced segmentation in the low-res module using ASPP and improved high-res module with learnable parameters.
Used Multiband blender for effective sky replacement.

Text-to Image Generator
Python, PyTorch, Hugging Face, Stable Diffusion, Fast API, Colab, VS code

Implemented Stable Diffusion for text-to-image generation, using fine-tuning and Dreambooth for high-quality, customized images.

Face Swap
Python, Dlib, PyTorch, face-recognition, TensorFlow, Fast API, Colab, VS code

Implemented pose-matching and facial point detection for accurate face swapping using Dlib.
Blended faces to ensure natural skin color transitions.

Achievements

Awarded a fully funded scholarship for my Master's at the prestigious LIESMARS Lab, Wuhan University—renowned globally for its pioneering research in photogrammetry, remote sensing, and geospatial information science.
Ranked 2nd with the ‘Weed Detection System’ in BS Computer Science Final Year Project.
Won 2nd place at the 2023 Hackathon with the ‘Weed Detection System’.