METRO Magazine Logo
MenuMENU
SearchSEARCH

The Need for Transit Data Cleansing and Standardization

While ensuring passengers have access to real-time data is the new norm in transit, there are important safeguards that must be put into place to be sure that information is both accurate and being used effectively by the transit agencies themselves.

by Mark Talbot and Scott Belcher
November 27, 2023
The Need for Transit Data Cleansing and Standardization

Ensuring projected arrival times are correct in transit is difficult and there are a number of causes ranging from driver shortages to technology shortcomings.

Photo: Pexels/RDNE Stock project

7 min to read


At a recent transit conference, an IT professional attending a data session, which one of the authors moderated, stood up and said:

“I ride the bus to work every day and rely on my agency’s app. The app provides scheduled arrival times and predicted arrival times. My bus is scheduled to arrive at 8:00 a.m. Most days my app says my bus will arrive at 8:00 a.m. and yet it sometimes arrives 10 minutes early, 10 minutes late, or not at all, and the app doesn’t reflect this. It is very frustrating. I don’t know how many customers we are losing because of this, but I assume it is a lot.”

Ad Loading...

Ensuring projected arrival times are correct in transit is difficult and there are a number of causes ranging from driver shortages to technology shortcomings. Regardless of the cause, they all result in inaccurate data quality and unsatisfied customers.

The management and effective use of data has become essential for high performing transit operations. Buses have become computers on wheels that produce massive amounts of data from systems such as CAD-AVL, passenger counting, GNSS, vehicle health monitoring, and payment systems.

A 2023 National Academies report concluded “the sheer volume and diversity of that data is a problem for many agencies. They are not able to view or use all the data they collect; and, as a result, they may not be able to comprehend the value of the data available.” Many agencies struggle to optimize the use of data that they have.

The GTFS feed is essential for planning and communications purposes. Unfortunately, its value is only as good as the quality of the data provided.

Photo: HART

GTFS Feeds

Most agencies that provide bus service make their data available to the public following the General Transit Feed Specification (GTFS) Schedule and GTFS Real Time (RT), which are open standards used to distribute relevant information about transit systems to riders.

GTFS Schedule data feeds include seven underlying text files: agency; stops; routes; trips; stop times; and calendar dates that are recommended to be updated on a weekly basis. GTFS RT is an additional standard that provides data on the position of the vehicle, and thus, feeds trip updates, travel alerts, and vehicle location and is recommended to be updated every 30 seconds.

Ad Loading...

Transit Agencies are not required to use these standards, but they have become the norm for most and a minimum requirement for an improved rider experience.

The GTFS feed is essential for planning and communications purposes. Unfortunately, its value is only as good as the quality of the data provided. As one transit agency put it, “making GTFS and GTFS-RT publicly available in real time is problematic. We are not comfortable with data quality. We don’t have enough time to reconcile it.” 

Accurate and Reliable Data

Agencies are required to provide National Transit Data (NTD) to the Federal Transit Administration (FTA).

The NTD is a national database that records the financial, operating, and asset condition of transit systems, helping to keep track of the industry and provide the public with information and statistics. Formula grant allocations are impacted by this data. An FTA representative stated that “NTD financial data is generally good. The quality of the data gets worse as you move to operational data. The quality of what is reported varies greatly across agencies.”

Many transit agencies do not have the resources or appropriate incentives to make sure their data is accurate and reliable. It is hard to determine a distinct ROI for cleaning and engineering data. Should it be customer satisfaction ratings; riders lost or not recovered; or financial savings? The challenge is how best to draw a direct coalition between quality data and customer satisfaction.

Ad Loading...

One transit professional said he believes his agency’s data is good “because we are no longer getting as many customer complaints about our arrival predictions.”

Another creative transportation professional was able to derive a financial ROI from their investment in ensuring quality data stating that, “we saved hundreds of thousands of dollars last year by using data to help drive route decisions. We were able to eliminate an unnecessary route.”

The first step to ensuring data is reliable is to make sure it is clean. Dirty data, which is incomplete, incorrect, inaccurate, or irrelevant, can lead to misinformed decisions and missed opportunities.

To clean its data, a transit agency must remove corrupt or inaccurate data and then enhance it to ensure that it is complete, up-to-date, and reliable. Cleaning GTFS data, for example, can be a time consuming and resource intensive process that includes multiple steps. Much of it can be done by outside consultants that can run the data through machine learning algorithms to identify anomalies. Even so, it still requires visual inspection by the staff at various stages.

One consultant that focuses on helping agencies with transit data commented that “agencies will pay only for data management if it supports their NTD reporting obligations.” Our experience is that most large agencies and a number of small to mid-size agencies are making this investment. The common attribute they all share is there is somebody on staff who understands the importance of quality data and is willing to be an evangelist.

Ad Loading...

The second key to obtaining quality data, is data standards that establishes the ground truth. 

Buses have become computers on wheels that produce massive amounts of data from systems such as CAD-AVL, passenger counting, GNSS, vehicle health monitoring, and payment systems.

Photo: Houston METRO

The Need for Standardization

The purpose of standardization is to ensure an organization’s data is consistent for all of its users and there is a uniform structure that makes the data easier to manage, analyze, and exchange across different systems and organizations.  

While GTFS and GTFS-RT are open data standards, they are not consistently used across the country, across agencies, or even within the same organization. For example, an agency might have a GTFS driven public schedule, but they may not use GTFS RT for on-time predictions.

Additionally, many agencies have mobile app providers that use their own arrival prediction data because they don’t rely on the GTFS feed from the bus because of its quality.

Standardization requires defining and implementing rules and formats to organize the data. It can provide numerous benefits to all organizations and the transit industry in particular. Key benefits can include:

Ad Loading...
  • Enabling different systems and applications to work together seamlessly. In transit this means ticketing, route scheduling, and passenger information systems would all be relying on the same set of information.

  • Data is higher quality because it is less prone to errors, inconsistencies, and duplications. Standards eliminate uncertainties and ensures that data is accurate, complete, and up to date.

  • Many organizations use a variety of software applications and data sources. Following a standard simplifies the process of integrating data from the variety of different sources by providing a common structure and format for all data.

  • Data is easier to analyze and report on. IT professionals will spend less time cleaning and preparing data and more time analyzing to make informed decisions.

  • It becomes easier to share with external vendors, customers, and stakeholders. This is especially important in transit when agencies rely on numerous vendors to help them supply services to customers and data sharing is an integral part of providing real-time travel information.

Ad Loading...
  • Data will maintain its relevance longer as technology and business requirements evolve. Consistently applying the same standard enables historical data sets to influence future planning decisions.

The purpose of standardization is to ensure an organization’s data is consistent for all of its users and there is a uniform structure that makes the data easier to manage, analyze, and exchange across different systems and organizations.  

Photo: MARTA

The Benefits Derived from Standardization

Because of the lack of clear, consistent, and enforceable standards, regional solutions have proliferated. Data standards like the Transit ITS Data Exchange Specification (TIDES) Google group, the Mobility Data Slack group, and the MNDOT transit data specification have sprung up as various regional groups seek the benefits of common standards. They are even beginning to require compliance with their data standards by their vendors, which creates its own set of challenges as vendors are required to present their data in different formats for different clients.

Leadership at the federal and state level to create and enforce national standards would ensure conformity across the public and private sector. This will increase the utility of the data gathered and reduce the inefficiency in gathering and distributing real-time information, improving the agency’s operations and the customer experience.

In response to this need, the FTA has created a new Standards Development Program (SDP) to develop voluntary standards, best practices, guidance, and tools for the transit industry.

Unfortunately, this process will be a long-term investment. In the meantime, regional groups will grow and ideally evangelists and consultants will educate more agencies about the value of continuing to improve the quality of their data.

Ad Loading...

Standardized data provides a foundation for better decision-making. When data is consistent and reliable, organizations can make better informed strategic and operational decisions that drive performance and customer satisfaction.  

Transit agencies can benefit in several areas: fleet management, regulatory compliance, safety and security, and passenger experience.

Data cleansing and standardization in public transportation is essential for creating a more efficient, integrated, and passenger-friendly transit system. It promotes interoperability, efficiency, cost savings, and data quality; unlocks innovation; and improves the overall rider experience. Data standardization is a key enabler for the modernization and improvement of public transportation services.

About the Author: Scott Belcher is President and CEO at SFB Consulting LLC; and Mark Talbot is Principal for EFCT Consulting LLC

Subscribe to Our Newsletter

More Technology

Cover photo for Biz Briefs
Technologyby Staff and News ReportsFebruary 26, 2026

Biz Briefs: Hitachi Rail in Philadelphia, Keolis in Nantucket, and More

Stay informed with these quick takes on the projects and companies driving progress across the transportation landscape.

Read More →
MBTA passengers at rail station
Technologyby StaffFebruary 24, 2026

Boston's MBTA Advancing Major Signal Modernization to Improve Red Line Reliability

To accomplish this work, the MBTA announced that four phases of temporary evening service changes will begin on February 28 and continue through April.

Read More →
Wiki Honolulu Airport shuttle
Technologyby Staff and News ReportsFebruary 20, 2026

Biz Briefs: Electric Shuttles in Hawaii, STV and LA Metro's Clean Bus Program, and More

From strategic partnerships to acquisitions and service expansions, the industry continues to evolve at a rapid pace. 

Read More →
Ad Loading...
Portrait of Joshua Schank, Ph.D., alongside the ACES Mobility Coalition logo.
Managementby StaffFebruary 16, 2026

ACES Mobility Coalition Selects Joshua Schank as New Executive Director

Veteran transportation innovator to lead coalition as it pushes nationwide expansion of shared autonomous mobility.

Read More →
Cover photo for Biz Briefs
Technologyby StaffFebruary 16, 2026

Biz Briefs: Glydways Breaks Ground in ATL and More!

In this edition of Biz Briefs, we highlight the latest developments shaping the future of mobility — from manufacturers and technology providers to transit agencies and motorcoach service operators.

Read More →
Denver RTD riders using Tap-n-Ride fare payment system.
Technologyby StaffFebruary 13, 2026

Denver's RTD Sees Strong Early Adoption of Tap-n-Ride Fare System

RTD aims to have 15% of all fare payments by individual customers made via Tap-n-Ride by the end of 2026, as awareness of this contactless payment option grows.

Read More →
Ad Loading...
A Picture of Ster Seating's Parent/Child transit seating product.
Technologyby StaffFebruary 10, 2026

Ster Seating, Maryland Transit Launch First Parent/Child Transit Seat in North America

The configuration uses Ster Seating's Gemini seat platform to create a family-friendly floor layout specifically engineered to accommodate parents traveling with young children.

Read More →
Cover photo for Biz Briefs
Technologyby Staff and News ReportsFebruary 6, 2026

Biz Briefs: Bus and Railcar Orders, Tech, and More

In this edition of Biz Briefs, we highlight the latest developments shaping the future of mobility — from manufacturers and technology providers to transit agencies and motorcoach service operators.

Read More →
Cherriots bus at traffic signal using LYT technology.
Technologyby StaffFebruary 5, 2026

Oregon’s Cherriots, LYT Launch Transit Signal Priority Partnership

The project explicitly targets the busiest and longest route in the Cherriots system.

Read More →
Ad Loading...
Passenger boarding Saskatoon Transit bus.
Technologyby StaffFebruary 4, 2026

Masabi Acquires Passenger Technology Group

Masabi and Passenger share a vision for the future of public transport — connected, customer-centric, and cloud-native — where every step of the journey works together seamlessly to elevate the transit experience.

Read More →