Big documents often contain quite a few tables. Tables are useful: they can provide a structured overview of data that supports or contradicts a particular statement, written in the accompanying text. However, if your goal is to analyze reports - tables can especially be useful because they provide more raw data. But analyzing tables costs a lot of energy, as one has to reason over these tables in answering their questions.
But what if that process can be partially automated?
The Table Parser Transformer, or TAPAS, is a machine learning model that is capable of precisely that. Given a table and a question related to that table, it can provide the answer in a short amount of time.
In this tuturial, we will be taking a look at using Machine Learning for Table Parsing in more detail. Previous approaches cover extracting logic forms manually, while Transformer-based approaches have simplified parsing tables. Finally, we'll take a look at the TAPAS Transformer for table parsing, and how it works. This is followed by implementing a table parsing model yourself using a pretrained and finetuned variant of TAPAS, with HuggingFace Transformers.
After reading this tutorial, you will understand...
Let's take a look! đ
Ever since Vaswani et al. (2017) introduced the Transformer architecture back in 2017, the field of NLP has been on fire. Transformers have removed the need for recurrent segments and thus avoiding the drawbacks of recurrent neural networks and LSTMs when creating sequence based models. By relying on a mechanism called self-attention, built-in with multiple so-called attention heads, models are capable of generating a supervision signal themselves.
By consequence, Transformers have widely used the pretraining-finetuning paradigm, where models are first pretrained using a massive but unlabeled dataset, acquiring general capabilities, after which they are finetuned with a smaller but labeled and hence task-focused dataset.
The results are incredible: through subsequent improvements like GPT and BERT and a variety of finetuned models, Transformers can now be used for a wide variety of tasks ranging from text summarization, machine translation to speech recognition. And today we can also add table parsing to that list.
Additional reading materials:
The BERT family of language models is a widely varied but very powerful family of language models that relies on the encoder segment of the original Transformer. Invented by Google, it employs Masked Language Modeling during the pretraining and finetuning stages, and slightly adapts architecture and embedding in order to add more context to the processed representations.
TAPAS, which stands for Table Parser, is an extension of BERT proposed by Herzig et al. (2020) - who are affiliated with Google. It is specifically tailored to table parsing - not unsurprising given its name. TAPAS allows tables to be input after they are flattened and thus essentially converted into 1D.
By adding a variety of additional embeddings, however, table specific and additional table context can be harnessed during training. It outputs a prediction for an aggregation operator (i.e., what to do with some outcome) and cell selection coordinates (i.e., what is the outcome to do something with).
TAPAS is covered in another article on this website, and I recommend going there if you want to understand how it works in great detail. For now, a visualization of its architecture will suffice - as this is a practical tutorial :)
Source: Herzig et al. (2020)
Let's now take a look at how you can implement a Table Parsing model yourself with HuggingFace Transformers. We'll first focus on the software requirements that you must install into your environment. You will then learn how to code a TAPAS based table parser for question answering. Finally, we will also show you the results that we got when running the code.
HuggingFace Transformer is a Python library that was created for democratizing the application of state-of-the-art NLP models, Transformers. It can easily be installed with pip
, by means of pip install transformers
. If you are running it, you will also need to use PyTorch or TensorFlow as the backend - by installing it into the same environment (or vice-versa, installing HuggingFace Transformers in your PT/TF environment).
The code in this tutorial was created with PyTorch, but it may be relatively easy (possibly with a few adaptations) to run it with TensorFlow as well.
To run the code, you will need to install the following things into an environment:
pip install transformers
.1.6.0
with your PyTorch version.pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.6.0+${CUDA}.html
pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.6.0+cpu.html
pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.6.0+${CUDA}.html
Compared to Pipelines and other pretrained models, running TAPAS requires you to do a few more things. Below, you can find the code for the TAPAS based model as a whole. But don't worry! I'll explain everything right now.
TapasTokenizer
and TapasForQuestionAnswering
imports from transformers
- that is, HuggingFace Transformers. The tokenizer can be used for tokenization of which the result can be fed to the question answering model subsequently. Tapas requires a specific way of tokenization and input presenting, and these Tapas specific tokenizer and QA model have this built in. Very easy! We also import pandas
, which we'll need later.Cities
and Inhabitants
- and values (in millions of inhabitants) are provided for Paris, London and Lyon.load_model_and_tokenizer
, we initialize the Tokenizer and QuestionAnswering model with a finetuned variant of TAPAS - more specifically, google/tapas-base-finetuned-wtq
, or TAPAS finetuned on WikiTable Questions (WTQ).DataFrame
before it can be tokenized. We use pandas
for this purpose, and create the dataframe from a dictionary. We can then feed it to the tokenizer
together with the queries
, and return the results.generate_predictions
, we feed the tokenized inputs to our TAPAS model. Our tokenizer can be used subsequently to find the cell coordinates and aggregation operators that were predicted - recall that TAPAS predicts relevant cells (the coordinates) and an operator that must be executed to answer the question (the aggregation operator).postprocess_predictions
, we convert the predictions into a format that can be displayed on screen.show_answers
, we then actually visualize these answers.run_tapas
combines all other def
s together in an end-to-end flow. This wasn't directly added to __main__
because it's best practice to keep as much functionality within Python definitions.run_tapas()
- and therefore the whole model.from transformers import TapasTokenizer, TapasForQuestionAnswering
import pandas as pd
# Define the table
data = {'Cities': ["Paris, France", "London, England", "Lyon, France"], 'Inhabitants': ["2.161", "8.982", "0.513"]}
# Define the questions
queries = ["Which city has most inhabitants?", "What is the average number of inhabitants?", "How many French cities are in the list?", "How many inhabitants live in French cities?"]
def load_model_and_tokenizer():
"""
Load
"""
# Load pretrained tokenizer: TAPAS finetuned on WikiTable Questions
tokenizer = TapasTokenizer.from_pretrained("google/tapas-base-finetuned-wtq")
# Load pretrained model: TAPAS finetuned on WikiTable Questions
model = TapasForQuestionAnswering.from_pretrained("google/tapas-base-finetuned-wtq")
# Return tokenizer and model
return tokenizer, model
def prepare_inputs(data, queries, tokenizer):
"""
Convert dictionary into data frame and tokenize inputs given queries.
"""
# Prepare inputs
table = pd.DataFrame.from_dict(data)
inputs = tokenizer(table=table, queries=queries, padding='max_length', return_tensors="pt")
# Return things
return table, inputs
def generate_predictions(inputs, model, tokenizer):
"""
Generate predictions for some tokenized input.
"""
# Generate model results
outputs = model(**inputs)
# Convert logit outputs into predictions for table cells and aggregation operators
predicted_table_cell_coords, predicted_aggregation_operators = tokenizer.convert_logits_to_predictions(
inputs,
outputs.logits.detach(),
outputs.logits_aggregation.detach()
)
# Return values
return predicted_table_cell_coords, predicted_aggregation_operators
def postprocess_predictions(predicted_aggregation_operators, predicted_table_cell_coords, table):
"""
Compute the predicted operation and nicely structure the answers.
"""
# Process predicted aggregation operators
aggregation_operators = {0: "NONE", 1: "SUM", 2: "AVERAGE", 3:"COUNT"}
aggregation_predictions_string = [aggregation_operators[x] for x in predicted_aggregation_operators]
# Process predicted table cell coordinates
answers = []
for coordinates in predicted_table_cell_coords:
if len(coordinates) == 1:
# 1 cell
answers.append(table.iat[coordinates[0]])
else:
# > 1 cell
cell_values = []
for coordinate in coordinates:
cell_values.append(table.iat[coordinate])
answers.append(", ".join(cell_values))
# Return values
return aggregation_predictions_string, answers
def show_answers(queries, answers, aggregation_predictions_string):
"""
Visualize the postprocessed answers.
"""
for query, answer, predicted_agg in zip(queries, answers, aggregation_predictions_string):
print(query)
if predicted_agg == "NONE":
print("Predicted answer: " + answer)
else:
print("Predicted answer: " + predicted_agg + " > " + answer)
def run_tapas():
"""
Invoke the TAPAS model.
"""
tokenizer, model = load_model_and_tokenizer()
table, inputs = prepare_inputs(data, queries, tokenizer)
predicted_table_cell_coords, predicted_aggregation_operators = generate_predictions(inputs, model, tokenizer)
aggregation_predictions_string, answers = postprocess_predictions(predicted_aggregation_operators, predicted_table_cell_coords, table)
show_answers(queries, answers, aggregation_predictions_string)
if __name__ == '__main__':
run_tapas()
Running the WTQ based TAPAS model against the questions specified above gives the following results:
Which city has most inhabitants?
Predicted answer: London, England
What is the average number of inhabitants?
Predicted answer: AVERAGE > 2.161, 8.982, 0.513
How many French cities are in the list?
Predicted answer: COUNT > Paris, France, Lyon, France
How many inhabitants live in French cities?
Predicted answer: SUM > 2.161, 0.513
This is great!
AVERAGE
- and all relevant cells are selected. True again.COUNT
and two relevant cells - precisely what we mean - when we ask which French cities are in the list.SUM
operator and cells are also provided when the question is phrased differently, focusing on inhabitants instead.Really cool! đ
Transformers have really changed the world of language models. Harnessing the self-attention mechanism, they have removed the need for recurrent segments and hence sequential processing, allowing bigger and bigger models to be created that every now and then show human-like behavior - think GPT, BERT and DALL-E.
In this tutorial, we focused on TAPAS, which is an extension of BERT and which can be used for table parsing. It specifically focused on the practical parts: that is, implementing this model for real-world usage by means of HuggingFace Transformers.
Reading it, you have learned...
I hope that this tutorial was useful for you! đ If it was, please let me know in the comments section below đŹ Please do the same if you have any questions or other comments. I'd love to hear from you.
Thank you for reading MachineCurve today and happy engineering! đ
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ⌠& Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30, 5998-6008.
Herzig, J., Nowak, P. K., Mßller, T., Piccinno, F., & Eisenschlos, J. M. (2020). Tapas: Weakly supervised table parsing via pre-training. arXiv preprint arXiv:2004.02349.
GitHub. (n.d.). Google-research/tapas. https://github.com/google-research/tapas
Google. (2020, April 30). Using neural networks to find answers in tables. Google AI Blog. https://ai.googleblog.com/2020/04/using-neural-networks-to-find-answers.html
HuggingFace. (n.d.). TAPAS â transformers 4.3.0 documentation. Hugging Face â On a mission to solve NLP, one commit at a time. https://huggingface.co/transformers/model_doc/tapas.html
Learn how large language models and other foundation models are working and how you can train open source ones yourself.
Keras is a high-level API for TensorFlow. It is one of the most popular deep learning frameworks.
Read about the fundamentals of machine learning, deep learning and artificial intelligence.
To get in touch with me, please connect with me on LinkedIn. Make sure to write me a message saying hi!
The content on this website is written for educational purposes. In writing the articles, I have attempted to be as correct and precise as possible. Should you find any errors, please let me know by creating an issue or pull request in this GitHub repository.
All text on this website written by me is copyrighted and may not be used without prior permission. Creating citations using content from this website is allowed if a reference is added, including an URL reference to the referenced article.
If you have any questions or remarks, feel free to get in touch.
TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.
PyTorch, the PyTorch logo and any related marks are trademarks of The Linux Foundation.
Montserrat and Source Sans are fonts licensed under the SIL Open Font License version 1.1.
Mathjax is licensed under the Apache License, Version 2.0.