Community Project


Surgery Tools
Surgery Tools


Welcome to our very first AIMSS Summer Project showcase! Team-based community projects are a new way that we at AIMSS are engaging with our members, with the intention of creating an opportunity for participants to gain hands-on experience tackling different aspects of machine learning app development. Over the last four months, a number of our members have joined together to explore the world of medical object detection, plus the creation of an interactive framework to support it. AIMSS would like to extend a very humble thank-you to all those who contributed so many of their valuable hours over this summer to get this initiative off the ground.

Problem Statement and Motivation

A typical surgery requires an estimate of 250 to 300 tools, this number increasing further with surgery complexity. This, unfortunately, means that every 1/1000 to 1/3000 abdominal operations alone result in an incomplete retention of surgical tools -- in other words, tweezers, scalpels, needles, sponges, towels, and more occasionally remain in patients after surgery. Surgical staff perform strict counting procedures as a preventative measure, but mistakes can still happen in these high-stakes, high-speed environments. We were interested in prototyping a machine learning model that can detect, classify, and count a number of surgical objects in an image. Downstream iterations of the model could one day act as another set of tool-counting “eyes” in the operating room.

Methods: A Tale of Two teams

Thanks to the amazingly diverse experiences of our AIMSS community members, it was clear that the best way to begin was to group people based on the skills they were interested in improving. A number of students joined this project excited to hone their programming skills. Another significant group expressed interest in getting hands-on experience training and improving machine learning models. Thus, the Python Team and the Machine Learning Team began their respective journeys, each group taking on slightly different roles in creating the final application.


Python side:

Front-end development is user-focused. What does the user require to perform their tasks efficiently and easily? It takes into account how intuitive a design is and how much functionality it provides. This area of work is key in integrating the user interface and backend into one final product. As our first project, we decided to focus on Python for our whole project along with many libraries the ML application community provided. Streamlit -- a popular and lightweight UI framework -- was used to easily code elements such as buttons and menus for the user to take advantage of. The process of creating the user interface started from conceptualization of what the user wants to see and what data we can provide. Our team created many concept designs of the UI they envisioned and the best components were taken from each concept and carefully added to our emerging product. The final step being integration, the ML components and workflow were added as functionalities of our UI components. Relevant data and scores are also shown in the output-meaningful data that the user can utilize while not sacrificing efficiency. In the end, a lightweight all-python UI proved best for the scope of this project.

ML side:

Every attempt to solve a machine learning problem begins with the data questions. What kind of data will I use? Where can I find it? What type of pre-processing is necessary? How will I split the data into train, validation, and test sets? These are the questions that we asked ourselves to help to define and narrow the scope of the problem. To train this object detection model, we needed robust labeled image data containing classes of surgical tools that we wanted to detect -- to do this, we combined this Kaggle dataset, this Github dataset, and hand-labeled images found on the internet to form a dataset with over 3000 images. We entertained seven classes in this dataset: Scalpel, Retractor, Clamp, Scissors, CurvedScissors, Forceps, and BabcockForceps. A number of data augmentation methods were explored to more-than-double the size of the dataset, which had undergone the usual 80/10/10 random train/validation/test split. 


Once the dataset was defined, we began the long, iterative process of model training and improvement. We began with a robust general pre-trained model, and then used our newly created dataset to tune the model to solve the problem we set out to tackle. Some examples of factors we experimented with were the starting model (eventually settling on RetinaNet101), model hyperparameters, loss function, and different methods of monitoring the training process. To choose the best experimental setting, we monitored performance based on the validation loss, which took into account the precision and recall scores as well as a measure of localization (i.e. how well the bounding box was drawn.)



All of the work done by both of these sides culminated in a finished application that supports surgical tool detection, classification, and counting within an image. If you would like to try running this app yourself, please refer to our GitHub Repository for download and instructions. 


With our final tuned model, on the test set, we achieved the following scores over all the classes:

  • Average precision: 0.693

  • Average recall: 0.706


Functionally, the model is best at finding scalpels and scissors, whereas it seems to have the most trouble with finding forceps. 


While we have a functioning prototype right now, more work would need to be done for this application to be surgery-room ready. For one, iterative rounds of model tuning and improvements are easy pursuits to dedicate a lot of time and resources to. It is likely that we would need significantly more data showcasing many more classes of surgical tools as well. Furthermore, we would need to make some slight tweaks to the application such that it can be used on live-video rather than still frames, as well as interface directly with the surgical staff to make live recommendations. 


Thank you again to all the community members who spent their time working to make the first AIMSS community project a smashing success! 


Sarah Davis
Harish Prabhakar
Rafay Osmani
David Li

ML Team:
Sarah Davis
Bahareh Behroozi
Celine Mayak
Charlie Wang
Heeba Paravez
Kevin Zhan
Sergey Khlynovskiy
Sumayah Ghalab

Python Team:
Harish Prabhakar
Abu Khalid
Mai Hunyh
Muktadir Mir
Sergey Khlynovskiy
Avnish Jadhav

If this is something that you found interesting, consider keeping in touch! AIMSS UAlberta hopes to run a community-based project every summer, so stay tuned to hear how you can get involved next year.