Language and Access: U.S. Federal Government Websites (LIS 572)

May 29, 2024
5 min read

by Ben Tice, Aaron Black, Marcos Corona

Introduction

The aim of this project was to analyze usage statistics of federal government websites with the purpose of examining visits in languages other than English. The data is important and interesting because access to government resources is a crucial piece of taking advantage of the different services that public tax dollars pay for. In short, public goods should be able to be enjoyed by the whole public, regardless of what language they speak. Additionally, people outside of the U.S. should be able to learn whatever information they need about and from our government to take advantage of any programs they may qualify for or to inform themselves of U.S. laws.

While brainstorming about potential topics, we came across analytics.usa.gov and noticed that an overwhelming majority of visits to federal government websites were in English. However, we also noticed that roughly a quarter of visits came from outside the U.S. Both numbers surprised us. We thought a higher percentage of visits would be in languages other than English and that an overwhelming majority of visits would be from inside the U.S. This data comes directly from the federal government’s Google Analytics account, which is otherwise referred to as the Digital Analytics Program.

Dataset

Government website usage statistics can be found at https://analytics.usa.gov. However, it only tracks the past ninety days. The full dataset of all the agencies did not include the language and country codes we needed, so we selected five government agencies: Health and Human Services, Nuclear Regulatory Commission, Homeland Security, United States Post Office, and the Department of State. We then joined the datasets for those five agencies together into a single data frame which we worked from. Our goal was to try and get a cross section of government website visits. Moreover, the country of the visitor is only captured in the “language code” which is a country-language pair to create a new column for just the country when it was relevant to do so.

Ethical Concerns and Limitations

We recognized that some ethical concerns that could arise with this data are related to privacy, security, accuracy, and transparency. Depending on the governmental agency, some data may have been collected from vulnerable populations, which may skew data which could reinforce harmful biases or stereotypes as well as not being representative of the populations measured. Thus, we were careful that we do not participate in unintentionally perpetuating harmful stereotypes about communities who speak languages other than English as their primary language. Moreover, while personal information is omitted and IP addresses are anonymized, people generally need to access the government information they are seeking. This creates a power imbalance where users are not meaningfully consenting to the government recording their information, even in an anonymized form.

There are also many limitations to the data. First, not all governmental agency websites are included, so the data is not representative of all government websites and services. Moreover, we limited ourselves to just five to make the project more manageable. Second, the data only covers a ninety day period. That means current events could skew the usage statistics in ways that are unrepresentative of typical use. Thus, the data may not be representative of longer time frames. Third, a large amount of the data is incomplete because the language and language code entries are left blank or are simply marked with “other” without further specification. As a result, certain groups of visitors may be proportionally undercounted depending on the skew of these visitors.

Findings

We made three graphs for our dataset. The first graph illustrates the top ten languages and the number of visits per that language. The second graph represents the top ten "language codes," which represents a country and language pair, and the number of visits. The third graph represents the top seven countries by visits in English.

After a review of the graphs, our findings are thus: Generally, the English language has the most number of visits by a very wide margin. Surprisingly, English is even the top language for countries other than the US as represented by the language codes graph. However, one potentially major concern we came across in our analysis is that there is a large chunk of data labeled "other". Unfortunately, we have no idea what that data could represent as a lot of data doesn't even have an "other" label in either the language or language_code column. Therefore, that data represents a big unknown and gap of understanding and representation in our dataset.

Data Visualizations

The first graph reflects the top 10 languages and the amount of visits in that language.

The second graph illustrates the top 10 "language codes", which represents a country and language pair.

The third graph illustrates the first seven countries at the top of the data frame when sorting by visits.

Examples

As stated above, the aim of our analysis in this dataset was to analyze usage statistics of federal government websites in languages other than English. We determined that this data was important and interesting because access to government resources is a crucial piece of taking advantage of the different services that public tax dollars pay for. As such, we were surprised how little other countries and languages accessed the websites in both relative and absolute terms.

This is such a surprise because, in short, we think that public goods should be able to be enjoyed by the whole public, regardless of what language they speak. Additionally, people outside of the U.S. should be able to learn whatever information they need about and from our government to take advantage of any programs they may qualify for or to inform themselves of U.S. laws.

Future Work

There are many groups that can potentially benefit from our analysis. To start, we, as law librarians, would use these insights to better curate multilingual resources and assist patrons in navigating government websites along with being able to provide targeted help, workshops, or guides in community-prevalent languages, enhancing public access to crucial information.

Additionally, we think designers, specifically Website and UX designers, would take these findings into consideration to create more inclusive and accessible government websites. By understanding the language needs and preferences of users, designers can create experiences that cater to non-English speakers more effectively.

Furthermore we feel that policymakers could use these findings to influence policy decisions regarding digital inclusivity and language accessibility. Policymakers might advocate for increased funding or new regulations to ensure that government digital resources are accessible to all, regardless of language proficiency, potentially leading to the drafting of new guidelines for federal digital assets.

Finally, as members of the general public, we would want to improve language accessibility. We hope this leads to a greater engagement with government resources, which leads to higher satisfaction rates, which leads to an overall increase in civic participation. We see a lot of potential and utility in our dataset. The underlying philosophy is that making free governmental data more accessible to everyone will better all.

My Writings and Professional Projects

Language and Access: U.S. Federal Government Websites (LIS 572)

Recent Posts

Comments