背景

使用MapReduce on Yarn或者Spark on Yarn来生成TFRecord的过程中,会发生Hadoop和TensorFlow依赖的Protobuf版本不一致导致冲突的问题。

解决方案

方案一

在运行时不要指定胖jar包,通过libjars命令指定需要的protobuf版本。

export HADOOP_CLASSPATH=${LIB_PATH}/*

hadoop jar your_tfrecord.jar \
your_class \
-Dmapreduce.job.user.classpath.first=true \
-libjars ${LIB_PATH}/protobuf-java-3.3.1.jar,${LIB_PATH}/tensorflow-hadoop-1.0.jar

方案二

使用胖jar包,把需要用到的jar包中的类重命名,在程序中调用重命名后的类,避免和集群上低版本的jar包冲突。在pom.xml里添加下面的配置,com.google开头类换成third.com.google

<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.1.1</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
<exclude>META-INF/LICENSE</exclude>
</excludes>
</filter>
</filters>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/spring.handlers</resource>
</transformer>
<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/spring.schemas</resource>
</transformer>
</transformers>
<!-- 当protobuf、guava等冲突时,将com.google开头的类转换成third.com.google开头 -->
<relocations>
<relocation>
<pattern>com.google</pattern>
<shadedPattern>third.com.google</shadedPattern>
</relocation>
</relocations>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>

参考文档