Interview: Jaap van der Meer（TAUS Director）
Jaap van der Meer founded TAUS in 2004. He is a language industry pioneer and visionary, who started his first translation company, INK, in The Netherlands in 1980. Jaap is a regular speaker at conferences and author of many articles about technologies, translation and globalization trends.
- Could you please give a short explanation of the activities of TAUS?
So TAUS, we started as a think tank, focused on advanced technology – machine translation, specifically – in 2005, with a group of early adopting companies, mostly the big IT companies, the buyers of translation. And, at that time, obviously, the industry at large was not really interested in machine translation. It was a very bad word. But you see how much change we’ve gone through in the eight years since we launched, since we started TAUS. And with that, also the activities of TAUS have evolved, have changed quite a bit. Now, people are using machine translation and we’re facing challenges in terms of how do we change the business model, how do we adapt – it’s not, it’s not, well it’s still a question of whether it’s an evolutionary change or whether it’s disruptive. But it seems more disruptive than evolutionary, so you need to change a lot in the whole environment, the business model.
And this is where we’ve been asked to sort of step up, step in, you know, providing support, so that’s why we describe it as, the last few years, “TAUS is making a transition from a think tank to a platform for industry-shared services, industry-shared support services.” Not leaving the think tank function behind, we’re still… thinking (laughs), as well. But we need to follow through. We need to support those members who started using machine translation and innovating the whole environment. So innovation comes with, actually, four – let’s keep this very clear: four different action lines.
The first one is the support for the technology. The choices you need to make, the technology to use, then how do you set it up? How do you train engines with data, terminology? We provide tutorials, we provide a knowledge base. Under this action line of technology, we also set up collaboration with universities worldwide: the Developing Talent Project. And we’ve started to become very involved with Moses, with open source MT as well, because we kind of like it because it helps a lot of companies become self-supporting, independent, which is, I think, good, because they can then start creating a real value-add to their relationship with the market, with buyers especially. And we’re also a member in a consortium in Europe – the MosesCore Project – so that gives us a special kind of mission there too, to promote that.
So that’s one action line – on technology. And then there is the data action line, which we started in July 2008 with a data repository. Basically sharing translation memories, but on such a scale that you don’t talk about translation memories anymore but more about translation data. And this is quite challenging because this is disruptive, this is revolutionary, like, you know people have invested in their translation memories, so why would you share? Well, if you share, you get more. It’s synergy because it’s based on the principle of reciprocity: you give and you take. But nevertheless, we’ve seen it growing, we are now 54 billion words and 2200 language pairs in the database. We get 700,000 searches every month now from translators. The search, the TAUS search is integrated in MemoQ, Trados, Lionbridge, Translation Workspace, and the API is available. We still get, every week we get uploads of new data, downloading data to train engines. It functioning. We’d like it to grow further. That’s a major activity, of course.
Number three is the metric that I talked about earlier this morning. You heard, I talked about the argumentation, if you want to grow in this industry, if you want to automate and innovate, you have to measure. If you don’t measure, you cannot grow. You see that in other industries, they have clear agreements, references, in other… you know, think of any other industry and there is some kind of reference of metrics, a definition of what they deliver, but you know we don’t have that in our industry. And now with increased use of technology, machine translation, lots of different types of content that people need to work on, it’s becoming kind of an urgency to have measurements, to have metrics. So, you know, since it’s very related to machine translation, it became natural for us to step up and do that. So we rallied the community to get together, to discuss the best practices, to document it with a knowledge base, to profile content, to match it with the appropriate one of the evaluation processes, and then there’s a whole knowledge base that’s on the site that creates this common language for people to refer to when you discuss quality evaluation in any kind of buyer/vendor dialogue.
And we’ve added the tools to do quality evaluation, which will lead to benchmarking after you’ve started accumulating enough data points. So, this is, we think, a very important action line.
And the last action line is the one where we’re sort of most, well, it’s a difficult one, and I don’t know if, I don’t think we will solve it. Probably nobody will, but it’s interoperability. How do you make sure all the systems can work together? And of course if you look back over the last couple of decades, a lot of effort has gone into developing standards for exchange of file formats, and translation tools, like TMX and TBX, and actually, there is no, there is no universal agreement on the exact specification of these standards, on the right use of these standards. So a lot of companies have found workarounds, and so looking forward, we probably shouldn’t spend too much time on these legacy standards because the world is moving towards web-based translation, so we need a common API for web-based translation. Right now, anyone who’s doing web-based translation, Google, Microsoft, lots of startup companies, they develop their own API. So already, I think we have more than sixty proprietary APIs. You’re already in the jungle, again, with this type of translation.
So we’ve published an API. It’s freely available on the site for people to use. Some companies have started using it or referring to it, so there’s the beginning of something. But like every standard, it only becomes relevant and useful if people adopt it, so we don’t know. But we do invest some time and money in that as well.
So that’s a summary of the action lines, and as I said, in the think tank goes on, and we try to influence governments, buyers, vendors on issues like copyright, on translation data, which is a very, a real inhibitor for innovation and change. We publish articles, our thoughts on how this could change, try to influence policy makers. We published quite a comprehensive report on the translation technology landscape – sort of painting the future. And the key theme for this year is “Entering the Convergence Era.” Translation is basically becoming a utility. We talked about that for a long time, but now it’s happening. Soon you’ll have the smart eyewear with translation built in, and you know, from Google, from Baidu, and it’s like electricity and water and the internet itself. Not a real end of all things for the professional translator, because at the same time, we’re saying, there will be a growth of boutique style translation.
The transcreation, we make the comparison to water. Water is coming out of the tap in every kitchen – often free or very cheap at least – but when you go to the shop, you can buy bottles of water. You know, that’s the boutique-style water. Same with translation, people will always want something that’s much better than the ubiquitous, not-perfect machine translation output for certain types of content. So the ubiquitous availability of translation – because that’s what we will talk about in a few years for every language in the world – will stimulate the demand for growth of boutique-style translation. From every angle, from every perspective, things look good for the translation industry.
But you have to make choices, now. And one of the fundamental choices now is you want to share, collaborate. Because if you don’t then you may be left behind, so that’s what we’re promoting at TAUS: sharing, collaboration. Specifically, working with the same metrics, the same data, best practices.
That’s the summary. And that was one question.
- Past achievements of TAUS. Could you elaborate on those?
Well, I think the main thing that we’re good at is getting the community together. And so we are now about 120 members, but we’re getting very strong support from the buyer’s side of the industry. Many of the big companies are very supportive in driving, driving the change, and so I think that that’s a real achievement – to get the mindshare for the agenda that we’re focusing on. And of course, concrete deliverables. The data repository is a major effort. I, personally, I mean, I dreamed of it. I couldn’t have, when we worked on the requirements back in 2007, like searching a repository of 50 billion words, I couldn’t imagine at that time that it would be possible, but right now, if you just type in a term and you say, “Give me all the sentences in which this term or phrase occurs,” and you realize that it’s going into a database of 50 billion words and it returns a list of all the results, the sentences from Dell, Intel, Microsoft, everywhere, just within a few seconds. It’s just incredible. And people use these as a pooling platform, I think that’s quite an achievement.
We’re not there yet with the metrics, but with the support we’re getting from, again, the buyer’s side, I think it’s looking very promising.
What does TAUS mean to you personally?
(I can tell this is going to be a very personal question (laughs).)
I think it’s a crown on my career. It’s… I don’t know if I would like to see that published…
“A crown on my career,” you can publish that. I was going to say, like, “a vocation,” but you know, that’s too heavy. But I do, because, maybe you have some kind of biography in this, but of course I go back many years, and started in 1979 in this industry. I’ve done a lot of interesting things, been running the biggest supplier companies in this industry, and I’ve always been fascinated about the technology. I started as a translator, and already, after a month I thought (don’t want that to be published) very stupid work, because you’re not going for an academic degree to translate a computer manual, so we have to automate this, and I feel like thirty years later we are getting close to this.
So “Helping the world communicate better” is our tagline, and I think if I look back on my career in, let’s say, ten years from now, I can look back and say “ I truly helped the world communicate better. By creating this platform, by pushing this kind of innovation, automation almost against all odds. Because there have been times that people thought that I was crazy, but it’s happened. It’s very rewarding.”
Thank you very much for such a deep and personal answer.