METRO Magazine Logo
MenuMENU
SearchSEARCH

The Need for Transit Data Cleansing and Standardization

While ensuring passengers have access to real-time data is the new norm in transit, there are important safeguards that must be put into place to be sure that information is both accurate and being used effectively by the transit agencies themselves.

by Mark Talbot and Scott Belcher
November 27, 2023
The Need for Transit Data Cleansing and Standardization

Ensuring projected arrival times are correct in transit is difficult and there are a number of causes ranging from driver shortages to technology shortcomings.

Photo: Pexels/RDNE Stock project

7 min to read


At a recent transit conference, an IT professional attending a data session, which one of the authors moderated, stood up and said:

“I ride the bus to work every day and rely on my agency’s app. The app provides scheduled arrival times and predicted arrival times. My bus is scheduled to arrive at 8:00 a.m. Most days my app says my bus will arrive at 8:00 a.m. and yet it sometimes arrives 10 minutes early, 10 minutes late, or not at all, and the app doesn’t reflect this. It is very frustrating. I don’t know how many customers we are losing because of this, but I assume it is a lot.”

Ad Loading...

Ensuring projected arrival times are correct in transit is difficult and there are a number of causes ranging from driver shortages to technology shortcomings. Regardless of the cause, they all result in inaccurate data quality and unsatisfied customers.

The management and effective use of data has become essential for high performing transit operations. Buses have become computers on wheels that produce massive amounts of data from systems such as CAD-AVL, passenger counting, GNSS, vehicle health monitoring, and payment systems.

A 2023 National Academies report concluded “the sheer volume and diversity of that data is a problem for many agencies. They are not able to view or use all the data they collect; and, as a result, they may not be able to comprehend the value of the data available.” Many agencies struggle to optimize the use of data that they have.

The GTFS feed is essential for planning and communications purposes. Unfortunately, its value is only as good as the quality of the data provided.

Photo: HART

GTFS Feeds

Most agencies that provide bus service make their data available to the public following the General Transit Feed Specification (GTFS) Schedule and GTFS Real Time (RT), which are open standards used to distribute relevant information about transit systems to riders.

GTFS Schedule data feeds include seven underlying text files: agency; stops; routes; trips; stop times; and calendar dates that are recommended to be updated on a weekly basis. GTFS RT is an additional standard that provides data on the position of the vehicle, and thus, feeds trip updates, travel alerts, and vehicle location and is recommended to be updated every 30 seconds.

Ad Loading...

Transit Agencies are not required to use these standards, but they have become the norm for most and a minimum requirement for an improved rider experience.

The GTFS feed is essential for planning and communications purposes. Unfortunately, its value is only as good as the quality of the data provided. As one transit agency put it, “making GTFS and GTFS-RT publicly available in real time is problematic. We are not comfortable with data quality. We don’t have enough time to reconcile it.” 

Accurate and Reliable Data

Agencies are required to provide National Transit Data (NTD) to the Federal Transit Administration (FTA).

The NTD is a national database that records the financial, operating, and asset condition of transit systems, helping to keep track of the industry and provide the public with information and statistics. Formula grant allocations are impacted by this data. An FTA representative stated that “NTD financial data is generally good. The quality of the data gets worse as you move to operational data. The quality of what is reported varies greatly across agencies.”

Many transit agencies do not have the resources or appropriate incentives to make sure their data is accurate and reliable. It is hard to determine a distinct ROI for cleaning and engineering data. Should it be customer satisfaction ratings; riders lost or not recovered; or financial savings? The challenge is how best to draw a direct coalition between quality data and customer satisfaction.

Ad Loading...

One transit professional said he believes his agency’s data is good “because we are no longer getting as many customer complaints about our arrival predictions.”

Another creative transportation professional was able to derive a financial ROI from their investment in ensuring quality data stating that, “we saved hundreds of thousands of dollars last year by using data to help drive route decisions. We were able to eliminate an unnecessary route.”

The first step to ensuring data is reliable is to make sure it is clean. Dirty data, which is incomplete, incorrect, inaccurate, or irrelevant, can lead to misinformed decisions and missed opportunities.

To clean its data, a transit agency must remove corrupt or inaccurate data and then enhance it to ensure that it is complete, up-to-date, and reliable. Cleaning GTFS data, for example, can be a time consuming and resource intensive process that includes multiple steps. Much of it can be done by outside consultants that can run the data through machine learning algorithms to identify anomalies. Even so, it still requires visual inspection by the staff at various stages.

One consultant that focuses on helping agencies with transit data commented that “agencies will pay only for data management if it supports their NTD reporting obligations.” Our experience is that most large agencies and a number of small to mid-size agencies are making this investment. The common attribute they all share is there is somebody on staff who understands the importance of quality data and is willing to be an evangelist.

Ad Loading...

The second key to obtaining quality data, is data standards that establishes the ground truth. 

Buses have become computers on wheels that produce massive amounts of data from systems such as CAD-AVL, passenger counting, GNSS, vehicle health monitoring, and payment systems.

Photo: Houston METRO

The Need for Standardization

The purpose of standardization is to ensure an organization’s data is consistent for all of its users and there is a uniform structure that makes the data easier to manage, analyze, and exchange across different systems and organizations.  

While GTFS and GTFS-RT are open data standards, they are not consistently used across the country, across agencies, or even within the same organization. For example, an agency might have a GTFS driven public schedule, but they may not use GTFS RT for on-time predictions.

Additionally, many agencies have mobile app providers that use their own arrival prediction data because they don’t rely on the GTFS feed from the bus because of its quality.

Standardization requires defining and implementing rules and formats to organize the data. It can provide numerous benefits to all organizations and the transit industry in particular. Key benefits can include:

Ad Loading...
  • Enabling different systems and applications to work together seamlessly. In transit this means ticketing, route scheduling, and passenger information systems would all be relying on the same set of information.

  • Data is higher quality because it is less prone to errors, inconsistencies, and duplications. Standards eliminate uncertainties and ensures that data is accurate, complete, and up to date.

  • Many organizations use a variety of software applications and data sources. Following a standard simplifies the process of integrating data from the variety of different sources by providing a common structure and format for all data.

  • Data is easier to analyze and report on. IT professionals will spend less time cleaning and preparing data and more time analyzing to make informed decisions.

  • It becomes easier to share with external vendors, customers, and stakeholders. This is especially important in transit when agencies rely on numerous vendors to help them supply services to customers and data sharing is an integral part of providing real-time travel information.

Ad Loading...
  • Data will maintain its relevance longer as technology and business requirements evolve. Consistently applying the same standard enables historical data sets to influence future planning decisions.

The purpose of standardization is to ensure an organization’s data is consistent for all of its users and there is a uniform structure that makes the data easier to manage, analyze, and exchange across different systems and organizations.  

Photo: MARTA

The Benefits Derived from Standardization

Because of the lack of clear, consistent, and enforceable standards, regional solutions have proliferated. Data standards like the Transit ITS Data Exchange Specification (TIDES) Google group, the Mobility Data Slack group, and the MNDOT transit data specification have sprung up as various regional groups seek the benefits of common standards. They are even beginning to require compliance with their data standards by their vendors, which creates its own set of challenges as vendors are required to present their data in different formats for different clients.

Leadership at the federal and state level to create and enforce national standards would ensure conformity across the public and private sector. This will increase the utility of the data gathered and reduce the inefficiency in gathering and distributing real-time information, improving the agency’s operations and the customer experience.

In response to this need, the FTA has created a new Standards Development Program (SDP) to develop voluntary standards, best practices, guidance, and tools for the transit industry.

Unfortunately, this process will be a long-term investment. In the meantime, regional groups will grow and ideally evangelists and consultants will educate more agencies about the value of continuing to improve the quality of their data.

Ad Loading...

Standardized data provides a foundation for better decision-making. When data is consistent and reliable, organizations can make better informed strategic and operational decisions that drive performance and customer satisfaction.  

Transit agencies can benefit in several areas: fleet management, regulatory compliance, safety and security, and passenger experience.

Data cleansing and standardization in public transportation is essential for creating a more efficient, integrated, and passenger-friendly transit system. It promotes interoperability, efficiency, cost savings, and data quality; unlocks innovation; and improves the overall rider experience. Data standardization is a key enabler for the modernization and improvement of public transportation services.

About the Author: Scott Belcher is President and CEO at SFB Consulting LLC; and Mark Talbot is Principal for EFCT Consulting LLC

Subscribe to Our Newsletter

More Technology

An MBTA paratransit vehicle.

FINAL CALL: Apply Now for METRO's 2026 Innovative Solutions Awards

Now in its latest edition, the awards recognize forward-thinking solutions that improve safety, operational efficiency, sustainability, rider experience, and overall system performance.

Read More →
A person working on a bus
ManagementMay 1, 2026

Data-Driven Maintenance: Focusing Effort Where It Matters Most

Advances in data and analytics are giving transit agencies new opportunities to refine maintenance practices, improve efficiency and make more informed decisions about asset performance.

Read More →
transit tracker connectpoint
Sponsoredby Alex RomanMay 1, 2026

Connectpoint Expands Digital Signage Strategy with LED Push

Connectpoint is enhancing its digital signage strategy by integrating LED technology into its services.

Read More →
Ad Loading...
Cover photo for Part 2 with Cecil Blandon
Managementby Alex RomanApril 30, 2026

Bus Tech Talk: Part 2 with AC Transit’s Cecil Blandon

In Part 2 of a two-part conversation, AC Transit’s director of maintenance joins co-hosts Alex Roman and Mark Hollenbeck to discuss his maintenance team’s work with various types of vehicle, training, augmented reality, and more.

Read More →
Carmen C. Cham of HNTB
Managementby Alex RomanApril 29, 2026

How Transit Architecture Is Reshaping the Rider Journey

In this Consultant Roundtable, Carmen C. Cham shares insights on how agencies can create spaces that are intuitive, connected and built for long-term impact.

Read More →
Siemens and LK Comstock photo for Fulton-Liberty Lines
Security and Safetyby StaffApril 27, 2026

NYC’s Fulton–Liberty Lines Get Digital Signal Upgrade from Siemens and L.K. Comstock

The Siemens CBTC System, Trainguard MT, in compliance with New York Subway Interoperability Interface Specifications, enables trains to run as close as 90 seconds apart, using next-generation signaling and continuous communication to keep operations moving seamlessly.

Read More →
Ad Loading...
A MOIA/Beep vehicle on the road
New Mobilityby StaffApril 24, 2026

MOIA America Teams with Beep to Grow US Footprint

Through the strategic partnership, MOIA America will provide MOIA’s turnkey autonomous mobility solution. This includes purpose-built, autonomous-ready ID. Buzz vehicles equipped with the self-driving system developed by Mobileye, as well as operator training and enablement.

Read More →
DART's new Tolar bus stop with wayfinding signage.
Technologyby StaffApril 24, 2026

DART Teams with Tolar to Upgrade Bus Stop Shelters

Officials said the project delivers a fully integrated passenger environment featuring improved solar-powered LED lighting, real-time arrival information, and a precision-engineered shelter designed to withstand the Texas climate. 

Read More →
Cincinnati Metro's new battery-electric bus, which was unveiled on Earth Day
Zero Emissionsby StaffApril 23, 2026

Cincinnati Metro Goes Electric

Two battery-electric buses entered service on Earth Day, with four additional vehicles expected to join the fleet this summer. Seven more buses are planned for the end of 2027, bringing Metro’s total zero-emission fleet to 13.

Read More →
Ad Loading...
A photo of RFTA's Glenwood Springs Maintenance Facility
Technologyby StaffApril 23, 2026

Stantec in Colorado, STV in North Carolina Top Biz Briefs

Stay informed with these quick takes on the projects and companies driving progress across the transportation landscape.

Read More →