Argument UDP invalide sur Docker Swarm EC2

Lors de l’envoi de paquets UDP sur EC2 avec Docker, j’obtiens parfois cette erreur étrange (tous les messages envoyés n’ont pas d’exception), ce qui ne se produit jamais sur notre cluster interne avec OpenNebula. J’ai autorisé tout le trafic entrant / sortant sur chaque port sur toutes mes instances EC2. Voici l’exception:

2017-01-19 10:01:53,170 - ERROR: Exception caught for address: 10.99.0.153 Traceback (most recent call last): File "./server.py", line 56, in  sock.sendto(bytes('{}'.format(i), "utf-8"), (address, PORT)) OSError: [Errno 22] Invalid argument 

Je lance 5 instances c4.xlarge avec le serveur Ubuntu 16.04 et Docker 1.12.6. Ils sont tous dans le même essaim de dockers.

Je crée un service et un sous-réseau en utilisant le pilote de superposition. Ce service a un sharepoint assembly pour obtenir les journaux de chaque pair. Je cours 150 pairs ayant chacun une limite de mémoire de 300 Mo.

Mon Dockerfile:

 FROM debian:jessie RUN echo 'deb http://mirror.switch.ch/ftp/mirror/debian/ jessie-backports main' >> /etc/apt/sources.list && \ apt-get -yqq update && \ apt-get -yqq dist-upgrade && \ apt-get -yqq install --no-install-recommends dnsutils wget curl ntp python3 && \ apt-get -yqq clean CMD ["/opt/epto/container-start-script.sh"] 

J’utilise le script shell suivant comme CMD:

 #!/usr/bin/env bash MY_IP_ADDR=$(/bin/hostname -i) MY_IP_ADDR=($MY_IP_ADDR) ./server.py ${MY_IP_ADDR[0]} 

Et ceci est le script python réel en cours d’exécution:

 #!/usr/bin/env python3 import socketserver import sys import logging import threading import urllib.request import time import socket from random import randint PORT = 15342 class MyUDPHandler(socketserver.BaseRequestHandler): """ This class works similar to the TCP handler class, except that self.request consists of a pair of data and client socket, and since there is no connection the client address must be given explicitly when sending data back via sendto(). """ def handle(self): data = self.request[0].ssortingp().decode("utf-8") logging.info("Message received from {} during loop {}".format(self.client_address[0], data)) class ThreadedUDPServer(socketserver.ThreadingMixIn, socketserver.UDPServer): pass if __name__ == "__main__": HOST = sys.argv[1] logging.basicConfig(format='%(asctime)s - %(levelname)s: %(message)s', level=logging.INFO, filename='/data/{}.test'.format(HOST)) server = ThreadedUDPServer((HOST, PORT), MyUDPHandler) server.allow_reuse_address = True logging.info("Create server listening on {}:{}".format(HOST, PORT)) logging.info("Server allow_reuse_address: {}".format(server.allow_reuse_address)) server_thread = threading.Thread(target=server.serve_forever) server_thread.daemon = True server_thread.start() sleep_delay = randint(10, 180) logging.info("Sleeping for {}s".format(sleep_delay)) time.sleep(sleep_delay) logging.info("Finished sleeping") sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) content = urllib.request.urlopen('http://epto-tracker:4321/REST/v1/admin/get_view').read() content = content.decode("utf-8") addresses = content.split('|') logging.info("View size: {}".format(len(addresses))) i = 0 while True: logging.info("Loop {}".format(i)) for address in addresses: try: logging.info("Sending to {}".format(address)) sock.sendto(bytes('{}'.format(i), "utf-8"), (address, PORT)) except: logging.exception("Exception caught for address: {}".format(address)) time.sleep(5) i += 1 

Je crée un deuxième service sur le même réseau superposé. Celui-ci contient le tracker, quels nœuds contacteront pour obtenir la vue réseau:

Dockerfile:

 FROM python:3.5.2-alpine RUN pip install pydevd COPY tracker.py /code/ WORKDIR /code EXPOSE 4321 CMD [ "python", "./tracker.py" ] 

le fichier de code:

 # import pydevd import random import logging import time from http.server import HTTPServer, BaseHTTPRequestHandler available_peers = {} K = 25 logging.basicConfig(format='%(levelname)s: %(message)s', level=logging.INFO) def florida_ssortingng(ip): available_peers[ip] = int(time.time()) to_choose = list(available_peers.keys()) logging.info("View size: {:d}".format(len(to_choose))) to_choose.remove(ip) if len(to_choose) > K: to_send = random.sample(to_choose, K) else: to_send = to_choose return '|'.join(to_choose).encode() class FloridaHandler(BaseHTTPRequestHandler): def do_GET(self): if self.path == '/REST/v1/admin/get_view': self.send_response(200) self.send_header("Content-type", "text/plain") self.end_headers() self.wfile.write(florida_ssortingng(self.client_address[0])) elif self.path == '/terminate': if self.client_address[0] in available_peers: del available_peers[self.client_address[0]] logging.info("Removed {:s}".format(self.client_address[0])) logging.info("View size: {:d}".format(len(available_peers))) else: logging.error("IP already removed or was never here") self.send_response(200) self.send_header("Content-type", "text/plain") self.end_headers() self.wfile.write(b"Success") else: self.send_response(404) self.send_header("Content-type", "text/plain") self.end_headers() self.wfile.write(b"Nothing here, content is at /REST/v1/admin/get_view\n") class FloridaServer: def __init__(self): self.server = HTTPServer(('', 4321), FloridaHandler) self.server.serve_forever() FloridaServer() 

Quelqu’un a-t-il rencontré cette même erreur sur EC2?