phone

    • chevron_right

      Ignite Realtime Blog: Openfire 4.6.8 Release

      news.movim.eu / PlanetJabber · Tuesday, 23 May, 2023 - 16:41 · 2 minutes

    The Ignite Realtime Community is happy to announce the 4.6.8 release of Openfire!

    We have made available a new release of this older version to addresses the issue that is subject of security advisory CVE-2023-32315 .

    We are aware that for some, the process of deploying a new major version of Openfire is not a trivial matter, as it may encompass a lot more than only performing the update of the executables. Depending on regulations that are in place, this process can require a lot of effort and take a long time to complete. To facilitate users that currently use an older version of Openfire, we are also making available this new release in the older 4.6 branch of Openfire. An upgrade to this version will, for some, require a lot less effort. Note well: although we are making available a new version in the 4.6 branch of Openfire, we strongly recommend that you upgrade to the latest version of Openfire (currently in the 4.7 branch), as that includes important fixes and improvements that are not available in 4.6.

    You can find download artifacts available here with the following sha256sum values

    aa1947097895a6d41bc8d1ac29f6ea60507bce69caadc497b4794a2a4110dc20  openfire-4.6.8-1.i686.rpm
    346871c71eff8e3b085fecd2f8dce5bfbf387885cfa7aff2076d42bd7273f70b  openfire-4.6.8-1.noarch.rpm
    37e4cc510cc2a59de50288c0e3baa53dcc702631433a01873a9270eeb7c789db  openfire-4.6.8-1.x86_64.rpm
    e92c5a0b76da5964b2e3fa43686ad63db29ef891ec7266ab16fe3a93b06c9e01  openfire_4.6.8_all.deb
    c6e0e40c55a81276881e93469ce88a862226ce33e58c8811e760427b878ebed4  openfire_4_6_8_bundledJRE.exe
    1b4c209453fffb6a6310354b425995bb92c1f09944eed35a1fd61a30201355bc  openfire_4_6_8_bundledJRE_x64.exe
    6b19394dc3f275ca039f85af59ca4f2fc5f628e2505cb39e59f5cfa55d605788  openfire_4_6_8.exe
    b22fce993bce4930346183d5edc1e9e38827a47ed8f64c41486a105f574cc116  openfire_4_6_8.tar.gz
    7c5769c7c8869ce2dfbb93fbbf1ec97a4e8509d61f8c14ba3f6be20abd05e90e  openfire_4_6_8_x64.exe
    72f27d063446479e1d4ceb2a46ac838f5462dfca53032cfa068eb96ef08d0697  openfire_4_6_8.zip
    

    If you have any questions, please stop by our community forum or our live groupchat . We are always looking for volunteers interested in helping out with Openfire development!

    For other release announcements and news follow us on Twitter and Mastodon .

    1 post - 1 participant

    Read full topic

    • chevron_right

      Erlang Solutions: Entendiendo procesos y concurrencia

      news.movim.eu / PlanetJabber · Thursday, 18 May, 2023 - 13:06 · 8 minutes

    Bienvenidos al segundo capítulo de la serie “ Elixir, 7 pasos para iniciar tu viaje” .

    En el primer capítulo hablamos sobre la máquina virtual de Erlang, la BEAM, y las características que Elixir aprovecha de ella para desarrollar sistemas que son:

    • Concurrentes
    • Tolerantes a fallos
    • Escalables y
    • Distribuidos

    En esta nota explicaremos qué significa la concurrencia para Elixir y Erlang y por qué es importante para desarrollar sistemas tolerantes a fallos. Al final encontrarás un pequeño ejemplo de código hecho con Elixir para que puedas observar las ventajas de la concurrencia en acción.

    Concurrencia

    La concurrencia es la habilidad para llevar a cabo dos o más tareas

    aparentemente al mismo tiempo .

    Para entender por qué la palabra aparentemente está resaltada, veamos el siguiente caso:

    Una persona tiene que completar dos actividades, la tarea A y la tarea B

    • Inicia la tarea A, avanza un poco y la pausa.
    • Inicia la tarea B, avanza un poco, la pausa y continúa con la tarea A.
    • Avanza un poco con la tarea A, la pausa y continúa con la tarea B.

    Y así va avanzando con cada una, hasta terminar ambas actividades.

    No es que la tarea A y la tarea B se lleven a cabo exactamente al mismo tiempo, más bien la persona dedica un tiempo a cada una y va intercambiándose entre ellas. Estos tiempos pueden ser tan cortos que el cambio es imperceptible para nosotros, por eso se produce la ilusión de que las actividades están sucediendo simultáneamente.

    Paralelismo

    Hasta ahora no había mencionado nada sobre paralelismo porque no es un concepto fundamental en la BEAM o para Elixir. Pero recuerdo que cuando estaba aprendiendo a programar se me dificultó comprender la diferencia entre paralelismo y concurrencia, así que aprovecharé esta nota para compartirte una breve explicación.

    Sigamos con el ejemplo anterior. Si ahora traemos a otra persona para completar las tareas y ambas trabajan al mismo tiempo, hablamos de paralelismo .

    De manera que podríamos tener a dos o más personas trabajando paralelamente, cada una llevando a cabo sus actividades concurrentemente. Es decir, la concurrencia puede ser o no paralela.

    En Elixir la concurrencia se logra gracias a los procesos de Erlang, que son creados y administrados por la BEAM.

    Procesos

    En Elixir todo el código se ejecuta dentro de procesos. Y una aplicación puede tener cientos o miles de ellos ejecutándose de manera concurrente.

    ¿Cómo funciona?

    Cuando la BEAM se ejecuta en una máquina, se encarga de crear por default un hilo en cada procesador disponible. En ese hilo existe una cola dedicada a tareas específicas, y cada cola tiene a su vez un administrador ( scheduler ) que es responsable de asignar un tiempo y una prioridad a las tareas.

    Entonces, en una máquina multicore con dos procesadores puedes tener dos hilos y dos schedulers , lo que te permite paralelizar las tareas al máximo. También puedes ajustar la configuración de la BEAM para indicarle qué procesadores utilizar.

    En cuanto a las tareas, cada una se ejecuta en un proceso aislado.

    Parece algo simple, pero justamente esta idea es la magia detrás de la escalabilidad, distribución y tolerancia a fallos de un sistema hecho con Elixir.

    Veamos este último concepto para entender por qué.

    Tolerancia a fallos

    La tolerancia a fallos de un sistema se refiere a la capacidad que tiene para manejar los errores y no morir en el intento. El objetivo es que ninguna falla, sin importar lo crítica que sea, inhabilite o bloquee el sistema. Esto se logra nuevamente gracias a los procesos de Erlang.

    Los procesos son elementos aislados, que no comparten memoria y se comunican mediante paso de mensajes.

    Lo anterior significa que si algo falla en el proceso A, el proceso B no se ve afectado, es más, es posible que ni siquiera se entere. El sistema seguirá funcionando con normalidad mientras la falla se arregla tras bambalinas . Y si a esto sumamos que la BEAM también nos proporciona por defecto mecanismos para detección y recuperación de errores podemos garantizar que el sistema funcione de manera ininterrumpida.

    Si quieres explorar más acerca del funcionamiento de los procesos, puedes consultar esta nota: Understanding Processes for Elixir Developers .

    ¿Cómo se ve esto en Elixir?

    ¡Por fin llegamos al código!

    Revisemos un ejemplo de cómo crear procesos que se ejecutan de manera concurrente en Elixir. Lo vamos a contrastar con el mismo ejercicio ejecutándose de manera secuencial.

    ¿Listo? No te preocupes si no entiendes algo de la sintaxis, en general el lenguaje es muy intuitivo, pero el objetivo por ahora es que seas testigo de la magia de la concurrencia en acción.

    El primer paso consiste en crear los procesos.

    Spawn

    Hay diferentes formas de crear procesos en Elixir. A medida que vayas avanzando encontrarás maneras más sofisticadas de hacerlo, aquí utilizaremos la básica: la función spawn . ¡Manos a la obra!

    Tenemos 10 registros que corresponden a la información de usuarios que vamos a insertar en una base de datos, pero antes queremos validar que el nombre no contenga caracteres raros y que el email tenga un @.

    Supongamos que la validación de cada usuario tarda en total 2 segundos.

    1. Abre un editor de texto y copia el siguiente código. Guárdalo en un archivo llamado procesos.ex
    defmodule Procesos do
    
    
     # Vamos a utilizar expresiones regulares para el formato del nombre y
     # el correo electrónico
     @email_valido ~r/^([a-zA-Z0-9_\-\.\+]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$/
     @nombre_valido ~r/\b([A-ZÀ-ÿ][-,a-z. ']+[ ]*)+/
    
    
     # Se tiene una lista de usuarios con un nombre y correo electrónico. 
     # La función validar_usuarios_X manda a llamar a otra función: 
     # validar_usuario, que revisa el formato del correo e imprime un
     # mensaje de ok o error para cada registro
    
    
     # Esta función trabaja SECUENCIALMENTE
     def validar_usuarios_secuencialmente() do
       IO.puts("Validando usuarios secuencialmente...")
    
    
       usuarios = crear_usuarios()
    
    
       Enum.each(usuarios, fn elem -> 
         validar_usuario(elem) end)
    end
    
    
     # Esta función trabaja CONCURRENTEMENTE, utilizando spawn
     def validar_usuarios_concurrentemente() do
       IO.puts("Validando usuarios concurrentemente...")
    
    
       usuarios = crear_usuarios()
    
    
       Enum.each(usuarios, fn elem ->
         spawn(fn -> validar_usuario(elem) end)
       end)
     end
    
    
     def validar_usuario(usuario) do
       usuario
       |> validar_email()
       |> validar_nombre()
       |> imprimir_estatus()
    
    
    # Esto hace una pausa de 2 segundos para simular que el proceso inserta # los registros en base de datos
       Process.sleep(2000)
     end
    
    
     # Esta función recibe un usuario, valida el formato del correo y le 
     # agrega la llave email_valido con el resultado.
    def validar_email(usuario) do
       if Regex.match?(@email_valido, usuario.email) do
         Map.put(usuario, :email_valido, true)
       else
         Map.put(usuario, :email_valido, false)
       end
     end
    
    
     # Esta función recibe un usuario, valida su nombre y le agrega la
     # llave nombre_valido con el resultado.
     def validar_nombre(usuario) do
       if Regex.match?(@nombre_valido, usuario.nombre) do
         Map.put(usuario, :nombre_valido, true)
       else
         Map.put(usuario, :nombre_valido, false)
       end
     end
    # Esta función recibe un usuario que ya pasó por la validación
     # de email y nombre y dependiendo de su resultado, imprime el
     # mensaje correspondiente al estatus.
     def imprimir_estatus(%{
           id: id,
           nombre: nombre,
           email: email,
           email_valido: email_valido,
           nombre_valido: nombre_valido
         }) do
       cond do
         email_valido && nombre_valido ->
           IO.puts("Usuario #{id} | #{nombre} | #{email} ... es válido")
    
    
         email_valido && !nombre_valido ->
           IO.puts("Usuario #{id} | #{nombre} | #{email} ... tiene un nombre inválido")
    
    
         !email_valido && nombre_valido ->
           IO.puts("Usuario #{id} | #{nombre} | #{email} ... tiene un email inválido")
    
    
         !email_valido && !nombre_valido ->
           IO.puts("Usuario #{id} | #{nombre} | #{email} ... es inválido")
       end
     end
    
    
     defp crear_usuarios do
       [
         %{id: 1, nombre: "Melanie C.", email: "melaniec@test.com"},
         %{id: 2, nombre: "Victoria Beckham", email: "victoriab@testcom"},
         %{id: 3, nombre: "Geri Halliwell", email: "gerih@test.com"},
         %{id: 4, nombre: "123456788", email: "melb@test.com"},
         %{id: 5, nombre: "Emma Bunton", email: "emmab@test.com"},
         %{id: 6, nombre: "Nick Carter", email: "nickc@test.com"},
         %{id: 7, nombre: "Howie Dorough", email: "howie.dorough"},
         %{id: 8, nombre: "", email: "ajmclean@test.com"},
         %{id: 9, nombre: "341AN L1ttr377", email: "Brian-Littrell"},
         %{id: 10, nombre: "Kevin Richardson", email: "kevinr@test.com"}
       ]
     end
    end
    

    2. Abre una terminal, escribe iex y compila el archivo que acabamos de crear.

    $ iex
    
    
    Erlang/OTP 25 [erts-13.1.3] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [jit:ns]
    
    
    Interactive Elixir (1.14.0) - press Ctrl+C to exit (type h() ENTER for help)
    iex(1)> c("procesos.ex")
    [Procesos]

    3. Una vez que hayas hecho esto, manda a llamar la función que valida los registros secuencialmente. Tomará un poco de tiempo, ya que cada registro tarda 2 segundos.

    
    iex(2)>  Procesos.validar_usuarios_secuencialmente

    4. Ahora manda a llamar la función que valida los registros concurrentemente y observa la diferencia en tiempos.

    iex(3)>  Procesos.validar_usuarios_concurrentemente

    Es bastante notoria, ¿no crees? Esto se debe a que en el paso 3, con la evaluación secuencial, cada proceso tiene que esperar a que el anterior termine. En cambio, la ejecución concurrente crea procesos que funcionan aisladamente; por lo tanto, ninguno depende del anterior ni está bloqueado por ninguna otra tarea.

    ¡Imagina la diferencia cuando se trata de miles o millones de tareas en un sistema!

    La concurrencia es la base para las otras características que mencionamos al inicio: distribución, escalabilidad y tolerancia a fallos. Gracias a la BEAM, se vuelve relativamente fácil implementarla en Elixir y aprovechar las ventajas que nos brinda.

    Ahora, ya conoces más sobre procesos y concurrencia, especialmente sobre la importancia de este aspecto para crear sistemas altamente confiables y tolerantes a fallas. Recuerda practicar y volver a esta nota cuando lo necesites.

    Siguiente capítulo…

    En la siguiente nota hablaremos de las bibliotecas, frameworks y todos los recursos que existen alrededor de Elixir. Te sorprenderá lo fácil y rápido que es crear un proyecto desde cero y verlo funcionando.

    Documentación y Recursos

    Consejero técnico:

    Raúl Chouza

    Corrección de estilo:

    Si tienes dudas acerca de esta nota o te gustaría profundizar en el tema, puedes escribirme por Twitter a @loreniuxmr .

    ¡Nos vemos en el siguiente capítulo!

    The post Entendiendo procesos y concurrencia appeared first on Erlang Solutions .

    • chevron_right

      Erlang Solutions: Understanding Elixir processes and concurrency

      news.movim.eu / PlanetJabber · Thursday, 18 May, 2023 - 13:05 · 7 minutes

    Welcome to the second chapter of the “Elixir, 7 steps to start your journey” series.

    In the first chapter , we talk about the Erlang Virtual Machine, the BEAM, and the characteristics that Elixir takes advantage of to develop systems that are:

    • Concurrent
    • Fault-tolerant
    • Scalable and
    • Distributed

    In this note, I’ll explain what concurrency means to Elixir and Erlang and why it’s essential for building fault-tolerant systems. You’ll also find a little Elixir code example to see the advantages of concurrency in action.

    Concurrency

    Concurrency is the ability to perform two or more tasks apparently at the same time.

    To understand why the word apparently is highlighted, let’s look at the following case:

    A person has to complete two activities, task A and task B.

    • Starts task A, moves forward a bit, and pauses.
    • Starts task B, moves forward, pauses, and continues with task A.
    • Goes ahead with task A, pauses, and continues with task B.

    And so it progresses with each one, until finishing both activities.

    It is not that task A and task B are carried out at precisely the same time; instead, the person spends time on each one and interchanges between them. But these times can be so short that the change is invisible to us, so the illusion is produced that the activities are happening simultaneously.

    Parallelism

    So far, I haven’t mentioned anything about parallelism because it’s not a fundamental concept in the BEAM or for Elixir. But I remember that when I was learning to program, it took me a while to understand the difference between parallelism and concurrency, so I took advantage of this note to explain briefly.

    Let’s continue with the previous example. If we now bring in another person to complete the tasks and they both work at the same time, we now achieve parallelism.

    So, we could have two or more people working in parallel, each carrying out their activities concurrently. That is, the concurrency may or may not be parallel.

    In Elixir, concurrency is achieved thanks to Erlang processes, which are created and managed by the BEAM.

    Processes

    In Elixir all code runs inside processes. And an application can have hundreds or thousands of them running concurrently.

    How does it work?

    When the BEAM runs on a machine, it creates a thread on each available processor by default. In this thread, there is a queue dedicated to specific tasks, and each queue has a scheduler responsible for assigning a time and a priority to the tasks.

    So, on a multicore machine with two processors, you can have two threads and two schedulers, allowing you to parallelize tasks as much as possible. You can also adjust BEAM’s settings to indicate which processors to use.

    As for the tasks, each one is executed in an isolated process.

    It seems simple, but precisely this idea is the magic behind the scalability, distribution, and fault tolerance of a system built with Elixir.

    Let’s go deep into this last concept to understand why.

    Fault-tolerance

    The fault tolerance of a system refers to its ability to handle errors. The goal is that no failure, no matter how critical, disables or blocks the system and this is again achieved thanks to Erlang processes.

    The processes are isolated elements that do not share memory and communicate through message passing.

    This means that if something goes wrong in process A, process B is unaffected. It may not even know about it. The system will continue functioning normally while the fault is fixed behind the scenes. And if we add that the BEAM also provides default mechanisms for error detection and recovery, we can guarantee that the system works uninterruptedly.

    Elixir

    If you want to explore more about how the processes work, you can check this note: Understanding Processes for Elixir Developers .

    What does this look like in Elixir?

    Finally, the code!

    Let’s review an example of creating processes that run concurrently in Elixir. We are going to contrast it with the same exercise running sequentially.

    Ready? Don’t worry if you don’t understand some of the syntax; overall, the language is very intuitive, but the goal is to witness the magic of concurrency in action.

    The first step is to create the processes.

    Spawn

    There are different ways to create processes in Elixir. As you progress, you will find more forms to do it; here, we will use the basic one: the spawn function. Let’s do it!!

    We have ten records that correspond to the user’s information that we will insert into a database, but first, we want to validate that the name does not contain random characters and that the email has @.

    Suppose each user validation takes a total of 2 seconds.

    1. In your favorite text editor copy the following code. Save it in a file called processes.ex
    defmodule Processes do
    
    
     # We are going to use regular expressions for the name format and the
     # email
     @valid_email ~r/^([a-zA-Z0-9_\-\.\+]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$/
     @valid_name ~r/\b([A-ZÀ-ÿ][-,a-z. ']+[ ]*)+/
    
    
    # There is a list of users with a name and email.
    # The function validate_users_X calls another function:
    # validate_user, which checks the format of the email and prints an
    # ok or error message for each record
    
    
     # This function works SEQUENTIALLY
     def validate_users_sequentially() do
       IO.puts("Validating users sequentially...")
    
    
       users = create_users()
    
    
       Enum.each(users, fn elem -> 
         validate_user(elem) end)
    end
    
    
     # This function works CONCURRENTLY, with spawn
     def validate_users_concurrently() do
       IO.puts("Validating users concurrently...")
    
    
       users = create_users()
    
    
       Enum.each(users, fn elem ->
         spawn(fn -> validate_user(elem) end)
       end)
     end
    
    
     def validate_user(user) do
       user
       |> validate_email()
       |> validate_name()
       |> print_status()
    
    
     # This pauses for 2 seconds to simulate the process inserting 
     # the records into the database
       Process.sleep(2000)
     end
    
    
     # This function receives a user, validates the format of the email and
     # add the valid_email key to the result.
    def validate_email(user) do
       if Regex.match?(@valid_email, user.email) do
         Map.put(user, :valid_email, true)
       else
         Map.put(user, :valid_email, false)
       end
     end
    
    
    # This function receives a user, validates the format of the name and
     # add the valid_name key to the result.
     def validate_name(user) do
       if Regex.match?(@valid_name, user.name) do
         Map.put(user, :valid_name, true)
       else
         Map.put(user, :valid_name, false)
       end
     end
    
    
     # This function receives a user that has already gone through  
     # validation email and name and depending on its result, prints 
     # the message corresponding to the status.
     def print_status(%{
           id: id,
           name: name,
           email: email,
           valid_email: valid_email,
           valid_name: valid_name
         }) do
       cond do
         valid_email && valid_name ->
           IO.puts("User #{id} | #{name} | #{email} ... is valid")
    
    
         valid_email && !valid_name ->
           IO.puts("User #{id} | #{name} | #{email} ... has an invalid name")
    
    
         !valid_email && valid_name ->
           IO.puts("User #{id} | #{name} | #{email} ... has an invalid email")
    
    
         !valid_email && !valid_name ->
           IO.puts("User #{id} | #{name} | #{email} ... is invalid")
       end
     end
    
    
     defp create_users do
       [
         %{id: 1, name: "Melanie C.", email: "melaniec@test.com"},
         %{id: 2, name: "Victoria Beckham", email: "victoriab@testcom"},
         %{id: 3, name: "Geri Halliwell", email: "gerih@test.com"},
         %{id: 4, name: "123456788", email: "melb@test.com"},
         %{id: 5, name: "Emma Bunton", email: "emmab@test.com"},
         %{id: 6, name: "Nick Carter", email: "nickc@test.com"},
         %{id: 7, name: "Howie Dorough", email: "howie.dorough"},
         %{id: 8, name: "", email: "ajmclean@test.com"},
         %{id: 9, name: "341AN L1ttr377", email: "Brian-Littrell"},
         %{id: 10, name: "Kevin Richardson", email: "kevinr@test.com"}
       ]
     end
    end
    

    2. Open a terminal, type iex and compile the file we just created.

    $ iex
    
    
    Erlang/OTP 25 [erts-13.1.3] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [jit:ns]
    
    
    Interactive Elixir (1.14.0) - press Ctrl+C to exit (type h() ENTER for help)
    iex(1)> c("processes.ex")
    [Processes]

    3. Once you’ve done this, call the function that validates the records sequentially. Remember that it will take a little time since each record takes 2 seconds.

    iex(2)>  Processes.validate_users_sequentially

    4. Now call the function that validates the records concurrently and observe the difference in times.

    iex(3)>  Processes.validate_users_concurrently

    It’s pretty noticeable, don’t you think? This is because in step 3, with sequential evaluation, each process has to wait for the previous one to finish. Instead, concurrent execution creates processes that run in isolation; therefore, neither depends on the former nor does any other task block it.

    Imagine the difference with thousands or millions of tasks in a system!

    Concurrency is the foundation for the other features we mentioned: distribution, scalability, and fault tolerance. Thanks to the BEAM, implementing it in Elixir and taking advantage of it becomes relatively easy.

    Now, you know more about processes and concurrency, especially about the importance of this aspect in building highly reliable and fault-tolerant systems. Remember to practice and come back to this note when you need to.

    Next Chapter

    In the next note, we will talk about the libraries, frameworks, and all the resources that exist around Elixir. You will be surprised how easy and fast it is to create a project from scratch and see it working.

    Documentation and Resources

    Technical Adviser

    Style correction

    If you have questions about this story or want to go deeper, you can reach me on Twitter @loreniuxmr .

    See you in the next chapter!

    The post Understanding Elixir processes and concurrency appeared first on Erlang Solutions .

    • wifi_tethering open_in_new

      This post is public

      www.erlang-solutions.com /blog/understanding-elixir-processes-and-concurrency/

    • chevron_right

      Isode: Messaging Products Update – 19.0 Capabilities

      news.movim.eu / PlanetJabber · Tuesday, 16 May, 2023 - 16:24 · 5 minutes

    The below is a list of the new capabilities brought to our Messaging products for the 19.0 release. 19.0 adds a lot of extra functionality across the board for our messaging products, along with a complete rewrite of the codebase so that future releases and bug fixes can be developed more quickly. For the full release notes please check the individual product updates, available from the customer portal and evaluation sections of our website.

    Dependencies

    Cobalt (version 1.3 or later) is needed to manage various capabilities in M-Switch 19.0.

    M-Switch, M-Store and M-Box depend on M-Vault 19.0.   All of these products are a part of R19.0 with common libraries and so are commonly installed together.

    Product Activation

    All of the messaging products now use the new product activation.  Products activation is managed with the Messaging Activation Server (MAS) which provides a Web interface to facilitate managing activation of messaging and other Isode products.   MAS is provided as a tool, but installed as an independent component.

    M-Switch

    Product Activation

    There are a number of M-Switch features arising from the new product activation:

    • Various product options are encoded in the activation, restricting functionality to M-Switch options purchased.   The options available and any activation time limits are displayed by MConsole.
    • MConsole will correctly display the product name of the M-Switch being used (e.g., M-Switch MIXER, M-Switch Gateway etc).
    • MConsole views are restricted so that only ones relevant to the activated options are shown (e.g,, ACP 127 views will not be shown unless ACP 127 is activated).

    Use of Cobalt

    A number of functions have been moved from MConsole to Cobalt, which provides a Web general administrator interface.   MConsole is being more focused on M-Switch server configuration and operation.   Capabilities provided by Cobalt in support of M-Switch:

    • User and Role provisioning (replacing Internet Mail View)
    • Special function mailboxes
    • Redirections
    • Standard SMTP distribution lists
    • Military Distribution Lists
    • Profiler Configuration
    • File Transfer by Email (FTBE) account provisioning

    Directory and Authentication

    A number of enhancements have been made to improve security of authentication.   New configurations will require this improved security and upgrades are expected to switch.

    • Configuration of default M-Vault configuration directory is simplified.
    • Option provided to use a different M-Vault directory for users/operators, defaulting to the configuration directory.
    • M-Switch access to configuration and user directories will always authenticate using SASL SCRAM-SHA-1.  This is particularly important for deployments not using TLS, as it will ensure plain passwords are not sent over a link, while still using hashed passwords in M-Vault.
    • M-Vault directories created by MConsole will always have TLS enabled (where the product activation option allows).
    • Connections from M-Switch to M-Vault will use TLS by default.
    • Three modes can be configured for SMTP and SOM (MConsole) access to M-Switch
      • SCRAM-SHA-1.  This is the default and is a secure option suitable for most configurations.
      • PLAIN.  This option is needed if authentication is done using pass through to Active directory.   This should only be used on systems with TLS.
      • ANY.  When this option is used, SOM/MConsole will use SCRAM-SHA-1.   It is needed for SMTP setups that want to offer additional SASL mechanisms such as CRAM-MD5, which will need plain passwords to be stored in M-Vault.

    ACP 127

    An extensive set of enhancements had been provided to ACP 127.

    • Extend circuit control from enabled/disable to Enabled (Rx/Tx) / Rx Only / Disabled
    • Enhanced OPSIG support for BRIPES following agreed doc:
      • QRT/QRV.   Supports remote enable/disable, including control from top level of circuit management UI
      • ZES2 automatic handling on receive
      • Service message option to send INT ZBZ
      • Configurable option for reliable circuit to send ZBZ5 to acknowledge receipt of identified message
      • Limiting priority UI use two letter codes, but will still recognize single letter
      • Add CHANNEL CHECK generation and response
    • Option to use “Y” for emergency messages
    • Support for Community Variables (CV) which is a BRASS mechanism to use multiple crypto keys
      • Configuration of CVs available for each destination
      • Display of CVs for queued messages
      • CV Audit Logging
    • Scheduled Broadcasts to support MUs with constrained availability (e.g., Submarines)
      • Periodic Mode with GUI configuration
      • UI to show which messages will be transmitted in which period based on estimated transmission times
      • Scheduled periods at same time each day
      • Explicitly scheduled fixed intervals on specific day
    • Extension to Routing Tree configuration to specify specific channel.   This makes it easier to utilize the ACP 127 RI routing, which is needed in many ACP 127 configurations
    • Improved mapping of CAD/AIG to SMTP
    • Option to turn off message reassembly
    • Improvements to monitoring of circuits using serial links

    FAB (Frequency Assignment Broadcast)

    A subsystem is provided to support FAB, which is needed for older BRASS systems that do not support ALE. The M-Switch FAB architecture is described in https://www.isode.com/whitepapers/brass.html . The key points are listed below:

    • A new FAB Server component is provided to run black side and generate the FAB data stream(s).
    • Red/Black separation can be provided by M-Guard
    • The FAB Server can monitor a remote modem for link quality using a new SNR monitoring protocol provided by Icon-5066 3.0.
    • Circuits to support FAB use a new “anonymous” type, reflecting that they are not associated with a specific peer.
    • Support is provided for ARQ (STANAG 5066 COSS) circuits which operate automatically shore side and for direct to modem circuits which require a shore side operator.
    • There is an operator UI for each circuit that enables setting FAB status and controlling acceptance of messages

    Profiler and Corrector

    1. Support of TLS for Corrector UI and Manual Profiler
    2. Improved message display, including Security Label
    3. Profile configuration read from directory, which enables Cobalt configuration of Profiler rules

    Icon-Topo Support

    Isode’s Icon-Topo product automatically updates M-Switch configuration in support of MU Mobility.  M-Switch enhancements made in support of this:

    • Show clearly in MConsole when External MTAs, Routing Tree Entries and Nexus are created by Icon-Topo.
    • Enhance Nexus and Diversion UI to better display Icon-Topo created information.

    Miscellaneous

    • Configure Warning Time based on Message Priority.
    • Tool to facilitate log and archive clear out

    M-Store

    No new features for R19.0.

    M-Box

    Improved Searching

    Message searching is extended with three new capabilities that are exposed in Harrier.

    • Choice to search based on SIC (Subject Indicator Code) which can be used on its own or in conjunction with options to search other parts of the message.
    • Option to filter search based on a choice of one or more message precedences, matching against the action or info precedence as appropriate for the logged in user.
    • Option to filter search based on selected security label.

    • wifi_tethering open_in_new

      This post is public

      www.isode.com /company/wordpress/messaging-products-update-19-0-capabilities/

    • chevron_right

      Isode: Directory Products Update – 19.0 Capabilities

      news.movim.eu / PlanetJabber · Tuesday, 16 May, 2023 - 16:23 · 1 minute

    The below is a list of the new capabilities brought to our Directory products for the 19.0 release. 19.0 adds a lot of extra functionality across the board for our messaging products, along with a complete rewrite of the codebase so that future releases and bug fixes can be developed more quickly. For the full release notes please check the individual product updates, available from the customer portal and evaluation sections of our website.

    Dependencies

    Use of several new 19.0 features depend on Cobalt 1.3 or later.

    M-Vault

    Product Activation

    M-Vault uses the new product activation.  Product activation is managed with the Messaging Activation Server (MAS) which provides a Web interface to facilitate managing activation of messaging and other Isode products. MAS is provided as a tool, but installed as an independent component.

    Headless Setup

    M-Vault, in conjunction with Cobalt, provides a mechanism to set up a server remotely with a Web interface only. This complements setup on the server using the M-Vault Console GUI.

    Password Storage

    Password storage format defaults to SCRAM-SHA-1 (hashed). This hash format is preferred as it enables use of SASL SCRAM-SHA-1 authentication which avoids sending plain passwords. Storage of passwords in the plain (previous default) is still allowed but discouraged.

    LDAP/AD Passthrough

    An LDAP Passthrough mechanism is added so that M-Vault users can be authenticated over LDAP against an entry in another directory. The key target for this mechanism is where there is a need to manage information in M-Vault, but to authenticate users with password against users provisioned in Microsoft Active Directory.  This is particularly important for Isode applications such as M-Switch, M-Link, and Harrier which utilize directory information not generally held in Active Directory.

    Cobalt provides capabilities to manage accounts utilizing LDAP Passthrough.

    OAuth Enhancements

    A number of enhancements to OAuth, which was introduced in R18.1

    • OAUTH service has been integrated  into the core M-Vault server, which simplifies configuration and improves security,
    • Operation without Client Secret, validating OAUTH Client using TLS Client Authentication.  This improves security and resilience.
    • Allow client authentication using Windows SSO, so that Windows SSO can work for OAUTH Clients.  This enables SSO to be used for Isode’s applications using OAuth.

    Sodium Sync

    • Some enhancements to Sodium Sync to improve operation on Windows Server.
    • Option that will improve performance for any remote server with a large round-trip-time.
    • wifi_tethering open_in_new

      This post is public

      www.isode.com /company/wordpress/directory-products-update-19-0-capabilities/

    • chevron_right

      Erlang Solutions: MongooseIM 6.1: Handle more traffic, consume less resources

      news.movim.eu / PlanetJabber · Wednesday, 10 May, 2023 - 13:00 · 9 minutes

    MongooseIM is a highly customisable instant messaging backend, that can handle millions of messages per minute, exchanged between millions of users from thousands of dynamically configurable XMPP domains. With the new release 6.1.0 it becomes even more cost-efficient, flexible and robust thanks to the new arm64 Docker containers and the C2S process rework.

    Arm64 Docker containers

    Modern applications are often deployed in Docker containers. This solution simplifies deployment to cloud-based environments, such as Amazon Web Services ( AWS ) and Google Cloud . We believe this is a great choice for MongooseIM, and we also support Kubernetes by providing Helm Charts . Docker images are independent of the host operating system, but they need to be built for specific processor architectures. Amd64 (x86-64) CPUs have dominated the market for a long time, but recently arm64 (AArch64) has been taking over. Notable examples include the Apple Silicon and AWS Graviton processors. We made the decision to start publishing ARM-compatible Docker images with our latest 6.1.0 release.

    To ensure top performance, we have been load-testing MongooseIM for many years using our own tools, such as amoc and amoc-arsenal-xmpp .

    When we tested the latest Docker image on both amd64 and arm64 AWS EC2 instances, the results turned out to be much better than before – especially for arm64. The tested MongooseIM cluster consisted of two nodes, which is less than the recommended production size of three nodes. But the goal was to determine the maximum capability of a simple installation. Various compute-optimized instances were tested – including the 5th, 6th and 7th generations, all in the xlarge size. PostgreSQL ( db.m6g.xlarge) was used for persistent storage, and three Amoc nodes ( m6g.xlarge ) were used for load generation. The three best-performing instance types were c6id (Intel Xeon Scalable, amd64), c6gd (AWS Graviton2, arm64) and c7g (AWS Graviton3, arm64).

    The two most important test scenarios were:

    • One-to-one messaging, where each user chats with their contacts.
    • Multi-user chat, where each user sends messages to chat rooms with 5 participants each.

    Several extensions were enabled to resemble a real-life use case. The most important are:

    The first two extensions perform database write operations for each message, and disabling them would improve performance.

    The results are summarized in the table below:

    Node instance type (size: xlarge) c6id c6gd c7g
    One-to-one messages per minute per node 240k 240k 300k
    Multi-user chat messages per minute per node 120k sent
    600k received
    120k sent
    600k received
    150k sent
    750k received
    On-demand AWS instance pricing per node per hour (USD) 0.2016 0.1536 0.1445
    Instance cost per billion delivered one-to-one chat messages (USD) 14.00 10.67 8.03
    Instance cost per billion delivered multi-user chat messages (USD) 5.60 4.27 3.21

    For each instance, the table shows the highest possible message rates achievable without performance degradation. The load was scaled up for the c7g instances thanks to their better performance, making it possible to handle 600k one-to-one messages per minute in the whole cluster, which is 300k messages per minute per node. Should you need more, you can scale horizontally or vertically, and further tests showed almost a linear increase of performance – of course there are limits (especially for the cluster size), but they are high. Maximum message rates for MUC Light were different because each message was routed to five recipients, making it possible to send up to 300k messages per minute, but deliver 1.5 million.

    The results allowed calculating the costs of MongooseIM instances per 1 billion delivered messages, which are presented in the table above. Of course it might be difficult to reach these numbers in production environments because of the necessary margin for handling bursts of traffic, but during heavy load you can get close to these numbers. The database cost was actually higher than the cost of MongooseIM instances themselves.

    C2S Process Rework

    We have completely reimplemented the handling of C2S (client-to-server) connections. Although the changes are mostly internal, you can benefit from them, even if you are not interested in the implementation details.

    The first change is about accepting incoming connections – instead of custom listener processes, the Ranch 2.1 library is now used. This introduces some new options, e.g. max_connections and reuse_port .

    Prior to version 6.1.0, each open C2S connection was handled by two Erlang processes – the receiver process was responsible for XML parsing, while the C2S process would handle the decoded XML elements. They are now integrated into one, which means that the footprint of each session is smaller, and there is less internal messaging.

    C2S State Machine: Separation of Concerns

    The core XMPP operations are defined in RFC 6120 , and we have reimplemented them from scratch in the new mongoose_c2s module. The most important benefit of this change from the user perspective is the vastly improved separation of concerns , making feature development much easier. A simplified version of the C2S state machine diagram is presented below. Error handling is omitted for simplicity. The “wait for session” state is optional, and you can disable it with the backwards_compatible_session configuration option.

    A similar diagram for version 6.0 would be much more complicated, because the former implementation had parts of multiple extensions scattered around its code:

    Functionality Described in Moved out to
    Stream resumption XEP-0198 Stream Management mod_stream_management
    AMP event triggers XEP-0079 Advanced Message Processing mod_amp
    Stanza buffering for CSI XEP-0352 Client State Indication mod_csi
    Roster subscription handling RFC 6121 Instant Messaging and Presence mod_roster
    Presence tracking RFC 6121 Instant Messaging and Presence mod_presence
    Broadcasting PEP messages XEP-0163 Personal Eventing Protocol mod_pubsub
    Handling and using privacy lists XEP-0016 Privacy Lists mod_privacy
    Handling and using blocking commands XEP-0191 Blocking Command mod_blocking

    It is important to note that mod_presence is the only new module in the list. Others have existed before, but parts of their code were in the C2S module. By disabling unnecessary extensions, you can gain performance. For example, by omitting [mod_presence] from your configuration file you can skip all the server-side presence handling. Our load tests have shown that this could significantly reduce the total time needed to establish a connection. Moreover, disabling extensions is now 100% reliable and guarantees that no unwanted code would be executed.

    Easier extension development

    If you are interested in developing your custom extensions, it is now easier than ever, because mongoose_c2s uses the new C2S-related hooks and handlers and several new features of the gen_statem behaviour. C2S Hooks can be divided into the following categories, depending on the events that trigger them:

    Trigger Hooks
    User session opening user_open_session
    User sends an XML element user_send_packet, user_send_xmlel, user_send_message, user_send_presence, user_send_iq
    User receives an XML element user_receive_packet, user_receive_xmlel, user_receive_message, user_receive_presence, user_receive_iq, xmpp_presend_element
    User session closing user_stop_request, user_socker_closed, user_socket_error, reroute_unacked_messages
    mongoose_c2s:call/3
    mongoose_c2s:cast/3

    foreign_event

    Most of the hooks are triggered by XMPP traffic. The only exception is foreign_event , which can be triggered by modules on demand, making it possible to execute code in context of a specific user’s C2S process.

    Modules add handlers to selected hooks. Such a handler performs module-specific actions and returns an accumulator, which can contain special options, allowing the module to:

    • Store module-specific data using state_mod , or replace the whole C2S state data with c2s_data .
    • Transition to a new state with c2s_state .
    • Perform arbitrary gen_statem transition actions with actions.
    • Stop the state machine gracefully ( stop ) or forcefully ( hard_stop ).
    • Deliver XML elements to the user with ( route, flush ) or without triggering hooks ( socket_send ).

    Example

    Let’s take a look at the handlers of the new mod_presence module. For user_send_presence and user_receive_presence hooks, it updates the module-specific state ( state_mod ) storing the presence state. The handler for foreign_event is more complicated, because it handles the following events:

    Event Handler logic Trigger
    {mod_presence, get_presence | get_subscribed}
    Get user presence information / subscribed users mongoose_c2s:call(Pid, mod_presence, get_presence | get_subscribed)
    {mod_presence, {set_presence, Presence}}
    Set user presence information mongoose_c2s:cast(Pid, mod_presence, {set_presence, Presence})
    {mod_roster, RosterItem} Update roster subscription state mongoose_c2s:cast(Pid, mod_roster, RosterItem)

    The example shows how the coupling between extension modules remains loose and modules don’t call each other’s code directly.

    The benefits of gen_statem

    The following new gen_statem features are used in mongoose_c2s:

    Arbitrary term state – with the state_event_function callback mode it is possible to use tuples for state names. An example is {wait_for_sasl_response, cyrsasl:sasl_state(), retries()} , which has the state of the SASL authentication process and the number of authentication retries left encoded in the state tuple. Apart from the states shown in the diagram above, modules can introduce their own external states – they have the format {external, StateName} . An example is mod_stream_management , which causes transition to the {external, resume} state when a session is closed.

    Multiple callback modules – to handle an external state, the callback module has to be changed, e.g. mod_stream_management uses the {push_callback_module, ?MODULE} transition action to provide its own handle_event function for the {external, resume} state.

    State timeouts for all states before wait_for_session , the session terminates after the configurable c2s_state_timeout . The timeout tuple itself is {state_timeout, Timeout, state_timeout_termination} .

    Named timeouts – modules use these to trigger specific actions, e.g. mod_ping uses several timeouts to schedule ping requests and to wait for responses. The timeout tuple has the format {{timeout, ping | ping_timeout | send_ping}, Interval, fun ping_c2s_handler/2} . This feature is also used for traffic shaping to pause the state machine if the traffic volume exceeds the limit.

    Self-generated events – this feature is used very often, for example when incoming XML data is parsed, an event {next_event, internal, XmlElement} is generated for each parsed XML element. The route and flush options of the c2s accumulator generate internal events as well.

    Summary

    MongooseIM 6.1.0 is full of improvements on many levels – both on the outside, like the arm64 Docker images, and deep inside, like the separation of concerns in mongoose_c2s. What is common for all of them is that we have load-tested them extensively, making sure that our new messaging server delivers what it promises and the performance is better than ever. There are no unpleasant surprises hidden underneath. After all, it is open source, and you are welcome to download, deploy, use and extend it free of charge. However, should you have a special use case, high performance requirements or want to reduce costs.

    Don’t hesitate to contact us , and we will be able to help you deploy, load test and maintain your messaging solution.

    The post MongooseIM 6.1: Handle more traffic, consume less resources appeared first on Erlang Solutions .

    • wifi_tethering open_in_new

      This post is public

      www.erlang-solutions.com /blog/mongooseim-6-1-handle-more-traffic-consume-less-resources/

    • chevron_right

      Kaidan: Kaidan 0.9: End-to-End Encryption & XMPP Providers

      news.movim.eu / PlanetJabber · Friday, 5 May, 2023 - 10:00 · 2 minutes

    OMEMO logo

    It’s finally there: Kaidan with end-to-end encryption via OMEMO 2 , Automatic Trust Management and support of XMPP Providers ! Most of the work has been funded by NLnet via NGI Zero PET and NGI Assure with public money provided by the European Commission. We would also like to thank Radically Open Security (especially Christian Reitter) for a quick security evaluation during the NGI Zero project.

    Even if Kaidan is making good progress, please keep in mind that it is not yet a stable app. Do not expect it to work well on all supported systems. Moreover, we do currently not consider Kaidan’s security as good as the security of the dominating chat apps.

    There is a new overview of features Kaidan supports. Have a look at that or at the changelog for more details.

    Encryption

    All messages sent by Kaidan can be encrypted now. If a contact supports the same encryption, Kaidan enables it by default. Therefore, you do not have to enable it by yourself. And you will also never need to worry about enabling it for new contacts. But it is possible to disable it for each contact at any time.

    Additionally, all metadata that is encryptable, such as typing notifications, is encrypted too. The new Automatic Trust Management (ATM) makes trust management easier than before. The details are explained in a previous post .

    We worked hard on covering as many corner cases as possible. Encrypted sessions are initialized in the background to reduce the loading time. Kaidan even tries to repair sessions broken by other chat apps. But if you discover any strange behavior, please let us know!

    We decided to focus on future technologies. Thus, Kaidan does not support OMEMO versions older than 0.8.1. Unfortunately, many other clients do not support the latest version yet. They only encrypt the body (text content) of a message, which is not compatible with newer OMEMO versions and ATM. But we hope that other client developers will follow our lead soon.

    Screenshot of Kaidan in widescreen Screenshot of Kaidan

    XMPP Providers

    Kaidan introduced an easy registration in version 0.5. It used an own list of XMPP providers since then. The new project XMPP Providers arose from that approach. That project is intended to be used by various applications and services.

    Kaidan is now one of them. It uses XMPP Providers for its registration process instead of maintaining an own list of providers. Try it out and see how easy it can be to get an XMPP account with Kaidan!

    Changelog

    This release adds the following features:

    • End-to-end encryption with OMEMO 2 for messages, files and metadata including an easy trust management
    • XMPP Providers support for an easy onboarding
    • Message reactions for sending emojis upon a message
    • Read markers showing which messages a contact has read
    • Message drafts to send entered messages later after switching chats or restarting Kaidan
    • Message search for messages that are not yet loaded
    • New look of the chat background and message bubbles including grouped messages from the same author
    • Chat pinning for reordering chats
    • Public group chat search (without group chat support yet)
    • New contact and account details including the ability to change the own profile picture
    • Restored window position on start

    Download

    Or install Kaidan from your distribution:

    Packaging status

    • wifi_tethering open_in_new

      This post is public

      kaidan.im /2023/05/05/kaidan-0.9.0/

    • chevron_right

      JMP: Newsletter: Jabber ID Discovery, New Referral Codes

      news.movim.eu / PlanetJabber · Monday, 1 May, 2023 - 16:59 · 4 minutes

    Hi everyone!

    Welcome to the latest edition of your pseudo-monthly JMP update!

    In case it’s been a while since you checked out JMP, here’s a refresher: JMP lets you send and receive text and picture messages (and calls) through a real phone number right from your computer, tablet, phone, or anything else that has a Jabber client.  Among other things, JMP has these features: Your phone number on every device; Multiple phone numbers, one app; Free as in Freedom; Share one number with multiple people.

    It has been a while since we got a newsletter out, and lots has been happening as we race towards our launch.

    For those who have experienced the issue with Google Voice participants not showing up properly in our MMS group texting stack, we have a new stack in testing right now. Let support know if you want to try it out, it has been working well so far for those already using it.

    If you check your account settings for the “refer a friend” option you will now see two kinds of referral code.  The list of one-time use codes remains the same as always: a free month for your friend, and a free month’s worth of credit for you if they start paying.  The new code up in the top is multi-use and you can post and share it as much as you like.  It provides credit equivalent to an additional month to anyone who uses it on sign up after their initial $15 deposit as normal, and then a free month’s worth of credit for you after that payment fully clears.

    We mentioned before that much of the team will be present at FOSSY , and we can now reveal why: there will be a conference track dedicated to XMPP , which we are helping to facilitate!  Call for proposals ends May 14th. Sign up and come out this summer!

    Quicksy Logo For quite some time now, customers have been asked while registering if they would like to enable others who know their phone number to discover their Jabber ID, to enable upgrading to end-to-end encryption, video calls, etc.  The first version of this feature is now live, and users of at least Cheogram Android and Movim can check the contact details of anyone they exchange SMS with to see if a Jabber ID is listed.  We are happy to announce that we have also partnered with Quicksy to allow discovery of anyone registered for their app or directory as well.

    Tapbacks Jabber-side reactions are now translated where possible into the tapback pseudo-syntax recognized by many Android and iMessage users so that your reactions will appear in a native way to those users.  In Cheogram Android you can swipe to reply to a message and enter a single emoji as the reply to send a reaction/tapback.

    Cheogram Android There have been two Cheogram Android releases since our last newsletter, with a third coming out today.  You no longer need to add a contact to send a message or initiate a call.  The app has seen the addition of moderation features for channel administrators, as well as respecting these moderation actions on display.  For offensive media arriving from other sources, in avatars, or just not moderated quickly enough, users also have the ability to permanently block any media they see from their device.

    Cheogram Android has seen some new sticker-related features including default sticker packs and the ability to import any sticker pack made for signal (browse signalstickers.com to find more sticker packs, just tap “add to signal” to add them to Cheogram Android).

    There are also brand-new features today in 2.12.1-5 , including a new onboarding flow that allows new users to register and pay for JMP before getting a Jabber ID, and then set up their very own Snikket instance all from within the app.  This flow also features some new introductory material about the Jabber network which we will continue to refine over time:

    Welcome to Cheogram Android Screenshot How the Jabber network works Screenshot Welcome Screen Screenshot

    Notifications about new messages now use the conversation style in Android.  This means that you can set seperate priority and sounds per-conversation at the OS level on new enough version of Android.  There is also an option in each conversation’s menu to add that conversation to your homescreen, something that has always been possible with the app but hopefully this makes it more discoverable for some.

    For communities organizing in Jabber channels, sometimes it can be useful to notify everyone present about a message.  Cheogram Android now respects the attention element from members and higher in any channel or group chat.  To send a message with this priority attached, start the message body with @here (this will not be included in the actual message people see).

    WebXDC Logo

    This release also brings an experimental prototype supporting WebXDC .  This is an experimental specification to allow developers to ship mini-apps that work inside your chats.  Take any *.xdc file and send it to a contact or group chat where everyone uses Cheogram Android and you can play games, share notes, shopping lists, calendars, and more.  Please come by the channel to discuss the future of this technology on the Jabber network with us.

    To learn what’s happening with JMP between newsletters, here are some ways you can find out:

    Thanks for reading and have a wonderful rest of your week!

    • wifi_tethering open_in_new

      This post is public

      blog.jmp.chat /b/april-newsletter-2023

    • chevron_right

      Erlang Solutions: Re-implement our first blog scrapper with Crawly 0.15.0

      news.movim.eu / PlanetJabber · Tuesday, 25 April, 2023 - 16:02 · 14 minutes

    It has been almost four years since my first article about scraping with Elixir and Crawly was published. Since then, many changes have occurred, the most significant being Erlang Solution’s blog design update. As a result, the 2019 tutorial is no longer functional.

    This situation provided an excellent opportunity to update the original work and re-implement the Crawler using the new version of Crawly. By doing so, the tutorial will showcase several new features added to Crawly over the years and, more importantly, provide a functional version to the community. Hopefully, this updated tutorial will be beneficial to all.

    First of all, why it’s broken now?

    This situation is reasonably expected! When a website gets a new design, usually they redo everything—the new layout results in a new HTML which makes all old CSS/XPath selectors obselete, not even speaking about new URL schemes. As a result, the XPath/CSS selectors that were working before referred to nothing after the redesign, so we have to start from the very beginning. What a shame!

    But of course, the web is done for more than just crawling. The web is done for people, not robots, so let’s adapt our robots!

    Our experience from a large-scale scraping platform is that a successful business usually runs at least one complete redesign every two years. More minor updates will occur even more often, but remember that even minor updates harm your web scrapers.

    Getting started

    Usually, I recommend starting by following the Quickstart guide from Crawly’s documentation pages . However, this time I have something else in mind. I want to show you the Crawly standalone version.

    Make it simple. In some cases, you need the data that can be extracted from a relatively simple source. In these situations, it might be quite beneficial to avoid bootstrapping all the Elixir stuff (new project, config, libs, dependencies). The idea is to deliver you data that other applications can consume without setting up.

    Of course, the approach will have some limitations and only work for simple projects at this stage. Some may get inspired by this article and improve it so that the following readers will be amazed by new possibilities. In any case, let’s get straight to it now!

    Bootstrapping 2.0

    As promised, the simplified (compare it with the previous setup described here )version of the setup:

    1. Create a directory for your project: mkdir erlang_solutions_blog
    2. Create a subdirectory that will contain the code of your spiders: mkdir erlang_solutions_blog/spiders
    3. Now, knowing that we want to extract the following fields: title, author , publishing_date, URL, article_body . Let’s define the following configuration for your project (erlang_solutions_blog/crawly.config):
    
    [{crawly, [
       {closespider_itemcount, 100},
       {closespider_timeout, 5},
       {concurrent_requests_per_domain, 15},
    
       {middlewares, [
               'Elixir.Crawly.Middlewares.DomainFilter',
               'Elixir.Crawly.Middlewares.UniqueRequest',
               'Elixir.Crawly.Middlewares.RobotsTxt',
               {'Elixir.Crawly.Middlewares.UserAgent', [
                   {user_agents, [
                       <<"Mozilla/5.0 (Macintosh; Intel Mac OS X x.y; rv:42.0) Gecko/20100101 Firefox/42.0">>,
                       <<"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36">>
                       ]
                   }]
               }
           ]
       },
    
       {pipelines, [
               {'Elixir.Crawly.Pipelines.Validate', [{fields, [title, author, publishing_date, url, article_body]}]},
               {'Elixir.Crawly.Pipelines.DuplicatesFilter', [{item_id, title}]},
               {'Elixir.Crawly.Pipelines.JSONEncoder'},
               {'Elixir.Crawly.Pipelines.WriteToFile', [{folder, <<"/tmp">>}, {extension, <<"jl">>}]}
           ]
       }]
    }].
    
    

    You probably have noticed that this looks like an Erlang configuration file, which is the case. I would say that it’s not the perfect solution, and one of the possible ways is to simplify it so it’s possible to configure the project more simply. If you have ideas — write me on Github’s discussions https://github.com/elixir-crawly/crawly/discussions .

    4. The basic configuration is now done, and we can run the Crawly application, to see that we can start it this way:

    docker run --name crawly 
    -d -p 4001:4001 -v $(pwd)/spiders:/app/spiders 
    -v $(pwd)/crawly.config:/app/config/crawly.config 
    oltarasenko/crawly:0.15.0

    Notes:

    • 4001 — is the default HTTP port used for spiders management, so we need to forward data to it
    • The spiders’ directory is an expected storage of spider files that will be added to the application later on.
    • Finally, the ugly configuration file is also mounted inside the Crawly container.

    Now you can see the Crawly Management User interface on the localhost:4001

    Crawly Management Tool

    Working on a new spider

    Now, let’s define the spider itself. Let’s start with the following boilerplate code (put it into erlang_solutions_blog/spiders/esl.ex ):

    defmodule ESLSpider do
     use Crawly.Spider
    
     @impl Crawly.Spider
     def init() do
       [start_urls: ["https://www.erlang-solutions.com/"]]
     end
    
     @impl Crawly.Spider
     def base_url(), do: "https://www.erlang-solutions.com"
    
     @impl Crawly.Spider
     def parse_item(response) do
       %{items: [], requests: []}
     end
    end

    This code defines an “ESLSpider ” module that uses the “Crawly.Spider” behavior.

    The behavior requires three functions to be implemented:

    teinit(), base_url(), and parse_item(response).

    The “init()” function returns a list containing a single key-value pair. The key is “start_urls” and the value is a list containing a single URL string: “ https://www.erlang-solutions.com/ ” This means that the spider will start crawling from this URL.

    The “base_url()” function returns a string representing the base URL for the spider, used to filter out requests that go outside of erlang-solutions.com website.

    The `parse_item(response)` function takes a response object as an argument and returns a map containing two keys: `items` and `requests`

    Once the code is saved, we can run it via the Web interface (it will be required to re-start a docker container or click the Reload spiders button in the Web interface).

    Crawly Management Tool

    Working on a new spider

    Now, let’s define the spider itself. Let’s start with the following boilerplate code (put it into erlang_solutions_blog/spiders/esl.ex ):

    defmodule ESLSpider do
     use Crawly.Spider
    
     @impl Crawly.Spider
     def init() do
       [start_urls: ["https://www.erlang-solutions.com/"]]
     end
    
     @impl Crawly.Spider
     def base_url(), do: "https://www.erlang-solutions.com"
    
     @impl Crawly.Spider
     def parse_item(response) do
       %{items: [], requests: []}
     end
    end

    This code defines an “ESLSpider ” module that uses the “Crawly.Spider” behavior.

    The behavior requires three functions to be implemented:

    teinit(), base_url(), and parse_item(response).

    The “init()” function returns a list containing a single key-value pair. The key is “start_urls” and the value is a list containing a single URL string: “ https://www.erlang-solutions.com/ ” This means that the spider will start crawling from this URL.

    The “base_url()” function returns a string representing the base URL for the spider, used to filter out requests that go outside of erlang-solutions.com website.

    The `parse_item(response)` function takes a response object as an argument and returns a map containing two keys: `items` and `requests`

    Once the code is saved, we can run it via the Web interface (it will be required to re-start a docker container or click the Reload spiders button in the Web interface).

    New Crawly Management UI

    Once the job is started, you can review the Scheduled Requests, Logs, or Extracted Items.

    Parsing the page

    Now we find CSS selectors to extract the needed data. The same approach is already described here https://www.erlang-solutions.com/blog/web-scraping-with-elixir/ under extracting the data section. I think one of the best ways to find relevant CSS selectors is by just using Google Chrome’s inspect option:

    So let’s connect to the Crawly Shell and fetch data using the fetcher, extracting this title:

    docker exec -it crawly /app/bin/crawly remote

    1> response = Crawly.fetch("https://www.erlang-solutions.com/blog/web-scraping-with-elixir/")
    2> document = Floki.parse_document!(response.body)
    4> title_tag = Floki.find(document, ".page-title-sm")
    [{"h1", [{"class", "page-title-sm mb-sm"}], ["Web scraping with Elixir"]}]
    5> title = Floki.text(title_tag)
    "Web scraping with Elixir"
    
    

    We are going to extract all items this way. In the end, we came up with the following map of selectors representing the expected item:

    item =
     %{
       url: response.request_url,
       title: Floki.find(document, ".page-title-sm") |> Floki.text(),
       article_body: Floki.find(document, ".default-content") |> Floki.text(),
       author: Floki.find(document, ".post-info__author") |> Floki.text(),
       publishing_date: Floki.find(document, ".header-inner .post-info .post-info__item span") |> Floki.text()
      }
    
    requests = Enum.map(
     Floki.find(document, ".link-to-all") |> Floki.attribute("href"),
     fn url -> Crawly.Utils.request_from_url(url) end
    )
    
    

    At the end of it, we came up with the following code representing the spider:

    defmodule ESLSpider do
     use Crawly.Spider
    
     @impl Crawly.Spider
     def init() do
       [
         start_urls: [
           "https://www.erlang-solutions.com/blog/web-scraping-with-elixir/",
           "https://www.erlang-solutions.com/blog/which-companies-are-using-elixir-and-why-mytopdogstatus/"
         ]
       ]
     end
    
     @impl Crawly.Spider
     def base_url(), do: "https://www.erlang-solutions.com"
    
     @impl Crawly.Spider
     def parse_item(response) do
       {:ok, document} = Floki.parse_document(response.body)
    
       requests = Enum.map(
         Floki.find(document, ".link-to-all") |> Floki.attribute("href"),
         fn url -> Crawly.Utils.request_from_url(url) end
         )
    
       item = %{
         url: response.request_url,
         title: Floki.find(document, ".page-title-sm") |> Floki.text(),
         article_body: Floki.find(document, ".default-content") |> Floki.text(),
         author: Floki.find(document, ".post-info__author") |> Floki.text(),
         publishing_date: Floki.find(document, ".header-inner .post-info .post-info__item span") |> Floki.text()
       }
       %{items: [item], requests: requests}
     end
    end
    
    
    

    That’s all, folks! Thanks for reading!

    Well, not really. Let’s schedule this version of the spider again, and let’s see the results:

    Scraping results

    As you can see, the spider could only extract 34 items. This is quite interesting, as it’s pretty clear that Erlang Solution’s blog contains way more items. So why do we have only this amount? Can anything be done to improve it?

    Debugging your spider

    Some intelligent developers write everything just once, and everything works. Other people like me have to spend time debugging the code.

    In my case, I start with exploring logs. There is something there I don’t like:

    08:23:37.417 [info] Dropping item: %{article_body: “Scalable and Reliable Real-time MQTT Messaging Engine for IoT in the 5G Era.We work with proven, world leading technologies that provide a highly scalable, highly available distributed message broker for all major IoT protocols, as well as M2M and mobile applications.Available virtually everywhere with real-time system monitoring and management ability, it can handle tens of millions of concurrent clients.Today, more than 5,000 enterprise users are trusting EMQ X to connect more than 50 million devices.As well as being trusted experts in EMQ x, we also have 20 years of experience building reliable, fault-tolerant, real-time distributed systems. Our experts are able to guide you through any stage of the project to ensure your system can scale with confidence. Whether you†™ re hunting for a suspected bug, or doing due diligence to future proof your system, we†™ re here to help. Our world-leading team will deep dive into your system providing an in-depth report of recommendations. This gives you full visibility on the vulnerabilities of your system and how to improve it. Connected devices play an increasingly vital role in major infrastructure and the daily lives of the end user. To provide our clients with peace of mind, our support agreements ensure an expert is on hand to minimise the length and damage in the event of a disruption. Catching a disruption before it occurs is always cheaper and less time consuming. WombatOAM is specifically designed for the monitoring and maintenance of BEAM-based systems (including EMQ x). This provides you with powerful visibility and custom alerts to stop issues before they occur. As well as being trusted experts in EMQ x, we also have 20 years of experience building reliable, fault-tolerant, real-time distributed systems. Our experts are able to guide you through any stage of the project to ensure your system can scale with confidence. Whether you†™ re hunting for a suspected bug, or doing due diligence to future proof your system, we†™ re here to help. Our world-leading team will deep dive into your system providing an in-depth report of recommendations. This gives you full visibility on the vulnerabilities of your system and how to improve it. Connected devices play an increasingly vital role in major infrastructure and the daily lives of the end user. To provide our clients with peace of mind, our support agreements ensure an expert is on hand to minimise the length and damage in the event of a disruption. Catching a disruption before it occurs is always cheaper and less time consuming. WombatOAM is specifically designed for the monitoring and maintenance of BEAM-based systems (including EMQ x). This provides you with powerful visibility and custom alerts to stop issues before they occur. Because it†™ s written in Erlang!With it†™ s Erlang/OTP design, EMQ X fuses some of the best qualities of Erlang. A single node broker can sustain one million concurrent connections…but a single EMQ X cluster – which contains multiple nodes – can support tens of millions of concurrent connections. Inside this cluster, routing and broker nodes are deployed independently to increase the routing efficiency. Control channels and data channels are also separated – significantly improving the performance of message forwarding. EMQ X works on a soft real-time basis. No matter how many simultaneous requests are going through the system, the latency is guaranteed.Here†™ s how EMQ X can help with your IoT messaging needs?Erlang Solutions exists to build transformative solutions for the world†™ s most ambitious companies, by providing user-focused consultancy, high tech capabilities and diverse communities. Let†™ s talk about how we can help you.”, author: “”, publishing_date: “”, title: “”, url: “https://www.erlang-solutions.com/capabilities/emqx/”}. Reason: missing required fields

    The line above indicates that the spider has dropped an article, which is not an article but is a general page. We want to exclude these URLs from the route of our bot.

    Try to avoid creating unnecessary loads on a website when doing crawling activities.

    The following lines can achieve this:

    requests =
     Floki.find(document, ".link-to-all") |> Floki.attribute("href")
     |> Enum.filter(fn url -> String.contains?(url, "/blog/") end)
     |> Enum.map(&Crawly.Utils.request_from_url/1)
    

    Now, we can re-run the spider and see that we’re not hitting non-blog pages anymore (don’t forget to reload the spider’s code)!

    This optimised our crawler, but more was needed to extract more items. (Besides other things, it’s interesting to note that we can only get 35 articles from the “Keep reading” blog, which indicates some possible directions for improving the cross-linking inside the blog itself).

    Improving the extraction coverage

    When looking at the possibility of extracting more items, we should try finding a better source of links. One good way to do it is by exploring the blog’s homepage, potentially with JavaScript turned off. Here is what I can see:

    Sometimes you need to switch JavaScript off to see more.

    As you can see, there are 14 Pages (only 12 of which are working), and every page contains nine articles. So we expect ~100–108 articles in total.

    So let’s try to use this pagination as a source of new links! I have updated the init() function, so it refers the blog’s index, and also parse_item so it can use the information found there:

    @impl Crawly.Spider
     def init() do
       [
         start_urls: [
           "https://www.erlang-solutions.com/blog/page/2/?pg=2",
           "https://www.erlang-solutions.com/blog/web-scraping-with-elixir/",
           "https://www.erlang-solutions.com/blog/which-companies-are-using-elixir-and-why-mytopdogstatus/"
         ]
       ]
     end
    
    @impl Crawly.Spider
    def parse_item(response) do
     {:ok, document} = Floki.parse_document(response.body)
    
     case String.contains?(response.request_url, "/blog/page/") do
       false -> parse_article_page(document, response.request_url)
       true -> parse_index_page(document, response.request_url)
     end
    end
    
    defp parse_index_page(document, _url) do
     index_pages =
       document
       |> Floki.find(".page a")
       |> Floki.attribute("href")
       |> Enum.map(&Crawly.Utils.request_from_url/1)
    
     blog_posts =
       Floki.find(document, ".grid-card__content a.btn-link")
       |> Floki.attribute("href")
       |> Enum.filter(fn url -> String.contains?(url, "/blog/") end)
       |> Enum.map(&Crawly.Utils.request_from_url/1)
    
       %{items: [], requests: index_pages ++ blog_posts }
    end
    
    defp parse_article_page(document, url) do
     requests =
       Floki.find(document, ".link-to-all")
       |> Floki.attribute("href")
       |> Enum.filter(fn url -> String.contains?(url, "/blog/") end)
       |> Enum.map(&Crawly.Utils.request_from_url/1)
    
     item = %{
       url: url,
       title: Floki.find(document, ".page-title-sm") |> Floki.text(),
       article_body: Floki.find(document, ".default-content") |> Floki.text(),
       author: Floki.find(document, ".post-info__author") |> Floki.text(),
       publishing_date: Floki.find(document, ".header-inner .post-info .post-info__item span") |> Floki.text()
     }
     %{items: [item], requests: requests}

    Running it again

    Now, finally, after adding all fixes, let’s reload the code and re-run the spider:

    So as you can see, we have extracted 114 items, which looks quite close to what we expected!

    Conclusion

    Honestly speaking — running an open-source project is a complex thing. We have spent almost four years building Crawly and progressed quite a bit with the possibilities. Adding some bugs as well.

    The example above shows how to run something with Elixir/Floki and a bit more complex process of debugging and fixing that sometimes appears in practice.

    We want to thank Erlang Solutions for supporting the development and allocating help when needed!

    The post Re-implement our first blog scrapper with Crawly 0.15.0 appeared first on Erlang Solutions .

    • wifi_tethering open_in_new

      This post is public

      www.erlang-solutions.com /blog/re-implement-our-first-blog-scrapper-with-crawly-0-15-0-2/