phone

    • chevron_right

      Isode: Directory Products Update – 19.0 Capabilities

      news.movim.eu / PlanetJabber • 16 May, 2023 • 1 minute

    The below is a list of the new capabilities brought to our Directory products for the 19.0 release. 19.0 adds a lot of extra functionality across the board for our messaging products, along with a complete rewrite of the codebase so that future releases and bug fixes can be developed more quickly. For the full release notes please check the individual product updates, available from the customer portal and evaluation sections of our website.

    Dependencies

    Use of several new 19.0 features depend on Cobalt 1.3 or later.

    M-Vault

    Product Activation

    M-Vault uses the new product activation.  Product activation is managed with the Messaging Activation Server (MAS) which provides a Web interface to facilitate managing activation of messaging and other Isode products. MAS is provided as a tool, but installed as an independent component.

    Headless Setup

    M-Vault, in conjunction with Cobalt, provides a mechanism to set up a server remotely with a Web interface only. This complements setup on the server using the M-Vault Console GUI.

    Password Storage

    Password storage format defaults to SCRAM-SHA-1 (hashed). This hash format is preferred as it enables use of SASL SCRAM-SHA-1 authentication which avoids sending plain passwords. Storage of passwords in the plain (previous default) is still allowed but discouraged.

    LDAP/AD Passthrough

    An LDAP Passthrough mechanism is added so that M-Vault users can be authenticated over LDAP against an entry in another directory. The key target for this mechanism is where there is a need to manage information in M-Vault, but to authenticate users with password against users provisioned in Microsoft Active Directory.  This is particularly important for Isode applications such as M-Switch, M-Link, and Harrier which utilize directory information not generally held in Active Directory.

    Cobalt provides capabilities to manage accounts utilizing LDAP Passthrough.

    OAuth Enhancements

    A number of enhancements to OAuth, which was introduced in R18.1

    • OAUTH service has been integrated  into the core M-Vault server, which simplifies configuration and improves security,
    • Operation without Client Secret, validating OAUTH Client using TLS Client Authentication.  This improves security and resilience.
    • Allow client authentication using Windows SSO, so that Windows SSO can work for OAUTH Clients.  This enables SSO to be used for Isode’s applications using OAuth.

    Sodium Sync

    • Some enhancements to Sodium Sync to improve operation on Windows Server.
    • Option that will improve performance for any remote server with a large round-trip-time.
    • wifi_tethering open_in_new

      This post is public

      www.isode.com /company/wordpress/directory-products-update-19-0-capabilities/

    • chevron_right

      Erlang Solutions: MongooseIM 6.1: Handle more traffic, consume less resources

      news.movim.eu / PlanetJabber • 10 May, 2023 • 9 minutes

    MongooseIM is a highly customisable instant messaging backend, that can handle millions of messages per minute, exchanged between millions of users from thousands of dynamically configurable XMPP domains. With the new release 6.1.0 it becomes even more cost-efficient, flexible and robust thanks to the new arm64 Docker containers and the C2S process rework.

    Arm64 Docker containers

    Modern applications are often deployed in Docker containers. This solution simplifies deployment to cloud-based environments, such as Amazon Web Services ( AWS ) and Google Cloud . We believe this is a great choice for MongooseIM, and we also support Kubernetes by providing Helm Charts . Docker images are independent of the host operating system, but they need to be built for specific processor architectures. Amd64 (x86-64) CPUs have dominated the market for a long time, but recently arm64 (AArch64) has been taking over. Notable examples include the Apple Silicon and AWS Graviton processors. We made the decision to start publishing ARM-compatible Docker images with our latest 6.1.0 release.

    To ensure top performance, we have been load-testing MongooseIM for many years using our own tools, such as amoc and amoc-arsenal-xmpp .

    When we tested the latest Docker image on both amd64 and arm64 AWS EC2 instances, the results turned out to be much better than before – especially for arm64. The tested MongooseIM cluster consisted of two nodes, which is less than the recommended production size of three nodes. But the goal was to determine the maximum capability of a simple installation. Various compute-optimized instances were tested – including the 5th, 6th and 7th generations, all in the xlarge size. PostgreSQL ( db.m6g.xlarge) was used for persistent storage, and three Amoc nodes ( m6g.xlarge ) were used for load generation. The three best-performing instance types were c6id (Intel Xeon Scalable, amd64), c6gd (AWS Graviton2, arm64) and c7g (AWS Graviton3, arm64).

    The two most important test scenarios were:

    • One-to-one messaging, where each user chats with their contacts.
    • Multi-user chat, where each user sends messages to chat rooms with 5 participants each.

    Several extensions were enabled to resemble a real-life use case. The most important are:

    The first two extensions perform database write operations for each message, and disabling them would improve performance.

    The results are summarized in the table below:

    Node instance type (size: xlarge) c6id c6gd c7g
    One-to-one messages per minute per node 240k 240k 300k
    Multi-user chat messages per minute per node 120k sent
    600k received
    120k sent
    600k received
    150k sent
    750k received
    On-demand AWS instance pricing per node per hour (USD) 0.2016 0.1536 0.1445
    Instance cost per billion delivered one-to-one chat messages (USD) 14.00 10.67 8.03
    Instance cost per billion delivered multi-user chat messages (USD) 5.60 4.27 3.21

    For each instance, the table shows the highest possible message rates achievable without performance degradation. The load was scaled up for the c7g instances thanks to their better performance, making it possible to handle 600k one-to-one messages per minute in the whole cluster, which is 300k messages per minute per node. Should you need more, you can scale horizontally or vertically, and further tests showed almost a linear increase of performance – of course there are limits (especially for the cluster size), but they are high. Maximum message rates for MUC Light were different because each message was routed to five recipients, making it possible to send up to 300k messages per minute, but deliver 1.5 million.

    The results allowed calculating the costs of MongooseIM instances per 1 billion delivered messages, which are presented in the table above. Of course it might be difficult to reach these numbers in production environments because of the necessary margin for handling bursts of traffic, but during heavy load you can get close to these numbers. The database cost was actually higher than the cost of MongooseIM instances themselves.

    C2S Process Rework

    We have completely reimplemented the handling of C2S (client-to-server) connections. Although the changes are mostly internal, you can benefit from them, even if you are not interested in the implementation details.

    The first change is about accepting incoming connections – instead of custom listener processes, the Ranch 2.1 library is now used. This introduces some new options, e.g. max_connections and reuse_port .

    Prior to version 6.1.0, each open C2S connection was handled by two Erlang processes – the receiver process was responsible for XML parsing, while the C2S process would handle the decoded XML elements. They are now integrated into one, which means that the footprint of each session is smaller, and there is less internal messaging.

    C2S State Machine: Separation of Concerns

    The core XMPP operations are defined in RFC 6120 , and we have reimplemented them from scratch in the new mongoose_c2s module. The most important benefit of this change from the user perspective is the vastly improved separation of concerns , making feature development much easier. A simplified version of the C2S state machine diagram is presented below. Error handling is omitted for simplicity. The “wait for session” state is optional, and you can disable it with the backwards_compatible_session configuration option.

    A similar diagram for version 6.0 would be much more complicated, because the former implementation had parts of multiple extensions scattered around its code:

    Functionality Described in Moved out to
    Stream resumption XEP-0198 Stream Management mod_stream_management
    AMP event triggers XEP-0079 Advanced Message Processing mod_amp
    Stanza buffering for CSI XEP-0352 Client State Indication mod_csi
    Roster subscription handling RFC 6121 Instant Messaging and Presence mod_roster
    Presence tracking RFC 6121 Instant Messaging and Presence mod_presence
    Broadcasting PEP messages XEP-0163 Personal Eventing Protocol mod_pubsub
    Handling and using privacy lists XEP-0016 Privacy Lists mod_privacy
    Handling and using blocking commands XEP-0191 Blocking Command mod_blocking

    It is important to note that mod_presence is the only new module in the list. Others have existed before, but parts of their code were in the C2S module. By disabling unnecessary extensions, you can gain performance. For example, by omitting [mod_presence] from your configuration file you can skip all the server-side presence handling. Our load tests have shown that this could significantly reduce the total time needed to establish a connection. Moreover, disabling extensions is now 100% reliable and guarantees that no unwanted code would be executed.

    Easier extension development

    If you are interested in developing your custom extensions, it is now easier than ever, because mongoose_c2s uses the new C2S-related hooks and handlers and several new features of the gen_statem behaviour. C2S Hooks can be divided into the following categories, depending on the events that trigger them:

    Trigger Hooks
    User session opening user_open_session
    User sends an XML element user_send_packet, user_send_xmlel, user_send_message, user_send_presence, user_send_iq
    User receives an XML element user_receive_packet, user_receive_xmlel, user_receive_message, user_receive_presence, user_receive_iq, xmpp_presend_element
    User session closing user_stop_request, user_socker_closed, user_socket_error, reroute_unacked_messages
    mongoose_c2s:call/3
    mongoose_c2s:cast/3

    foreign_event

    Most of the hooks are triggered by XMPP traffic. The only exception is foreign_event , which can be triggered by modules on demand, making it possible to execute code in context of a specific user’s C2S process.

    Modules add handlers to selected hooks. Such a handler performs module-specific actions and returns an accumulator, which can contain special options, allowing the module to:

    • Store module-specific data using state_mod , or replace the whole C2S state data with c2s_data .
    • Transition to a new state with c2s_state .
    • Perform arbitrary gen_statem transition actions with actions.
    • Stop the state machine gracefully ( stop ) or forcefully ( hard_stop ).
    • Deliver XML elements to the user with ( route, flush ) or without triggering hooks ( socket_send ).

    Example

    Let’s take a look at the handlers of the new mod_presence module. For user_send_presence and user_receive_presence hooks, it updates the module-specific state ( state_mod ) storing the presence state. The handler for foreign_event is more complicated, because it handles the following events:

    Event Handler logic Trigger
    {mod_presence, get_presence | get_subscribed}
    Get user presence information / subscribed users mongoose_c2s:call(Pid, mod_presence, get_presence | get_subscribed)
    {mod_presence, {set_presence, Presence}}
    Set user presence information mongoose_c2s:cast(Pid, mod_presence, {set_presence, Presence})
    {mod_roster, RosterItem} Update roster subscription state mongoose_c2s:cast(Pid, mod_roster, RosterItem)

    The example shows how the coupling between extension modules remains loose and modules don’t call each other’s code directly.

    The benefits of gen_statem

    The following new gen_statem features are used in mongoose_c2s:

    Arbitrary term state – with the state_event_function callback mode it is possible to use tuples for state names. An example is {wait_for_sasl_response, cyrsasl:sasl_state(), retries()} , which has the state of the SASL authentication process and the number of authentication retries left encoded in the state tuple. Apart from the states shown in the diagram above, modules can introduce their own external states – they have the format {external, StateName} . An example is mod_stream_management , which causes transition to the {external, resume} state when a session is closed.

    Multiple callback modules – to handle an external state, the callback module has to be changed, e.g. mod_stream_management uses the {push_callback_module, ?MODULE} transition action to provide its own handle_event function for the {external, resume} state.

    State timeouts for all states before wait_for_session , the session terminates after the configurable c2s_state_timeout . The timeout tuple itself is {state_timeout, Timeout, state_timeout_termination} .

    Named timeouts – modules use these to trigger specific actions, e.g. mod_ping uses several timeouts to schedule ping requests and to wait for responses. The timeout tuple has the format {{timeout, ping | ping_timeout | send_ping}, Interval, fun ping_c2s_handler/2} . This feature is also used for traffic shaping to pause the state machine if the traffic volume exceeds the limit.

    Self-generated events – this feature is used very often, for example when incoming XML data is parsed, an event {next_event, internal, XmlElement} is generated for each parsed XML element. The route and flush options of the c2s accumulator generate internal events as well.

    Summary

    MongooseIM 6.1.0 is full of improvements on many levels – both on the outside, like the arm64 Docker images, and deep inside, like the separation of concerns in mongoose_c2s. What is common for all of them is that we have load-tested them extensively, making sure that our new messaging server delivers what it promises and the performance is better than ever. There are no unpleasant surprises hidden underneath. After all, it is open source, and you are welcome to download, deploy, use and extend it free of charge. However, should you have a special use case, high performance requirements or want to reduce costs.

    Don’t hesitate to contact us , and we will be able to help you deploy, load test and maintain your messaging solution.

    The post MongooseIM 6.1: Handle more traffic, consume less resources appeared first on Erlang Solutions .

    • wifi_tethering open_in_new

      This post is public

      www.erlang-solutions.com /blog/mongooseim-6-1-handle-more-traffic-consume-less-resources/

    • chevron_right

      Kaidan: Kaidan 0.9: End-to-End Encryption & XMPP Providers

      news.movim.eu / PlanetJabber • 5 May, 2023 • 2 minutes

    OMEMO logo

    It’s finally there: Kaidan with end-to-end encryption via OMEMO 2 , Automatic Trust Management and support of XMPP Providers ! Most of the work has been funded by NLnet via NGI Zero PET and NGI Assure with public money provided by the European Commission. We would also like to thank Radically Open Security (especially Christian Reitter) for a quick security evaluation during the NGI Zero project.

    Even if Kaidan is making good progress, please keep in mind that it is not yet a stable app. Do not expect it to work well on all supported systems. Moreover, we do currently not consider Kaidan’s security as good as the security of the dominating chat apps.

    There is a new overview of features Kaidan supports. Have a look at that or at the changelog for more details.

    Encryption

    All messages sent by Kaidan can be encrypted now. If a contact supports the same encryption, Kaidan enables it by default. Therefore, you do not have to enable it by yourself. And you will also never need to worry about enabling it for new contacts. But it is possible to disable it for each contact at any time.

    Additionally, all metadata that is encryptable, such as typing notifications, is encrypted too. The new Automatic Trust Management (ATM) makes trust management easier than before. The details are explained in a previous post .

    We worked hard on covering as many corner cases as possible. Encrypted sessions are initialized in the background to reduce the loading time. Kaidan even tries to repair sessions broken by other chat apps. But if you discover any strange behavior, please let us know!

    We decided to focus on future technologies. Thus, Kaidan does not support OMEMO versions older than 0.8.1. Unfortunately, many other clients do not support the latest version yet. They only encrypt the body (text content) of a message, which is not compatible with newer OMEMO versions and ATM. But we hope that other client developers will follow our lead soon.

    Screenshot of Kaidan in widescreen Screenshot of Kaidan

    XMPP Providers

    Kaidan introduced an easy registration in version 0.5. It used an own list of XMPP providers since then. The new project XMPP Providers arose from that approach. That project is intended to be used by various applications and services.

    Kaidan is now one of them. It uses XMPP Providers for its registration process instead of maintaining an own list of providers. Try it out and see how easy it can be to get an XMPP account with Kaidan!

    Changelog

    This release adds the following features:

    • End-to-end encryption with OMEMO 2 for messages, files and metadata including an easy trust management
    • XMPP Providers support for an easy onboarding
    • Message reactions for sending emojis upon a message
    • Read markers showing which messages a contact has read
    • Message drafts to send entered messages later after switching chats or restarting Kaidan
    • Message search for messages that are not yet loaded
    • New look of the chat background and message bubbles including grouped messages from the same author
    • Chat pinning for reordering chats
    • Public group chat search (without group chat support yet)
    • New contact and account details including the ability to change the own profile picture
    • Restored window position on start

    Download

    Or install Kaidan from your distribution:

    Packaging status

    • wifi_tethering open_in_new

      This post is public

      kaidan.im /2023/05/05/kaidan-0.9.0/

    • chevron_right

      JMP: Newsletter: Jabber ID Discovery, New Referral Codes

      news.movim.eu / PlanetJabber • 1 May, 2023 • 4 minutes

    Hi everyone!

    Welcome to the latest edition of your pseudo-monthly JMP update!

    In case it’s been a while since you checked out JMP, here’s a refresher: JMP lets you send and receive text and picture messages (and calls) through a real phone number right from your computer, tablet, phone, or anything else that has a Jabber client.  Among other things, JMP has these features: Your phone number on every device; Multiple phone numbers, one app; Free as in Freedom; Share one number with multiple people.

    It has been a while since we got a newsletter out, and lots has been happening as we race towards our launch.

    For those who have experienced the issue with Google Voice participants not showing up properly in our MMS group texting stack, we have a new stack in testing right now. Let support know if you want to try it out, it has been working well so far for those already using it.

    If you check your account settings for the “refer a friend” option you will now see two kinds of referral code.  The list of one-time use codes remains the same as always: a free month for your friend, and a free month’s worth of credit for you if they start paying.  The new code up in the top is multi-use and you can post and share it as much as you like.  It provides credit equivalent to an additional month to anyone who uses it on sign up after their initial $15 deposit as normal, and then a free month’s worth of credit for you after that payment fully clears.

    We mentioned before that much of the team will be present at FOSSY , and we can now reveal why: there will be a conference track dedicated to XMPP , which we are helping to facilitate!  Call for proposals ends May 14th. Sign up and come out this summer!

    Quicksy Logo For quite some time now, customers have been asked while registering if they would like to enable others who know their phone number to discover their Jabber ID, to enable upgrading to end-to-end encryption, video calls, etc.  The first version of this feature is now live, and users of at least Cheogram Android and Movim can check the contact details of anyone they exchange SMS with to see if a Jabber ID is listed.  We are happy to announce that we have also partnered with Quicksy to allow discovery of anyone registered for their app or directory as well.

    Tapbacks Jabber-side reactions are now translated where possible into the tapback pseudo-syntax recognized by many Android and iMessage users so that your reactions will appear in a native way to those users.  In Cheogram Android you can swipe to reply to a message and enter a single emoji as the reply to send a reaction/tapback.

    Cheogram Android There have been two Cheogram Android releases since our last newsletter, with a third coming out today.  You no longer need to add a contact to send a message or initiate a call.  The app has seen the addition of moderation features for channel administrators, as well as respecting these moderation actions on display.  For offensive media arriving from other sources, in avatars, or just not moderated quickly enough, users also have the ability to permanently block any media they see from their device.

    Cheogram Android has seen some new sticker-related features including default sticker packs and the ability to import any sticker pack made for signal (browse signalstickers.com to find more sticker packs, just tap “add to signal” to add them to Cheogram Android).

    There are also brand-new features today in 2.12.1-5 , including a new onboarding flow that allows new users to register and pay for JMP before getting a Jabber ID, and then set up their very own Snikket instance all from within the app.  This flow also features some new introductory material about the Jabber network which we will continue to refine over time:

    Welcome to Cheogram Android Screenshot How the Jabber network works Screenshot Welcome Screen Screenshot

    Notifications about new messages now use the conversation style in Android.  This means that you can set seperate priority and sounds per-conversation at the OS level on new enough version of Android.  There is also an option in each conversation’s menu to add that conversation to your homescreen, something that has always been possible with the app but hopefully this makes it more discoverable for some.

    For communities organizing in Jabber channels, sometimes it can be useful to notify everyone present about a message.  Cheogram Android now respects the attention element from members and higher in any channel or group chat.  To send a message with this priority attached, start the message body with @here (this will not be included in the actual message people see).

    WebXDC Logo

    This release also brings an experimental prototype supporting WebXDC .  This is an experimental specification to allow developers to ship mini-apps that work inside your chats.  Take any *.xdc file and send it to a contact or group chat where everyone uses Cheogram Android and you can play games, share notes, shopping lists, calendars, and more.  Please come by the channel to discuss the future of this technology on the Jabber network with us.

    To learn what’s happening with JMP between newsletters, here are some ways you can find out:

    Thanks for reading and have a wonderful rest of your week!

    • wifi_tethering open_in_new

      This post is public

      blog.jmp.chat /b/april-newsletter-2023

    • chevron_right

      Erlang Solutions: Re-implement our first blog scrapper with Crawly 0.15.0

      news.movim.eu / PlanetJabber • 25 April, 2023 • 14 minutes

    It has been almost four years since my first article about scraping with Elixir and Crawly was published. Since then, many changes have occurred, the most significant being Erlang Solution’s blog design update. As a result, the 2019 tutorial is no longer functional.

    This situation provided an excellent opportunity to update the original work and re-implement the Crawler using the new version of Crawly. By doing so, the tutorial will showcase several new features added to Crawly over the years and, more importantly, provide a functional version to the community. Hopefully, this updated tutorial will be beneficial to all.

    First of all, why it’s broken now?

    This situation is reasonably expected! When a website gets a new design, usually they redo everything—the new layout results in a new HTML which makes all old CSS/XPath selectors obselete, not even speaking about new URL schemes. As a result, the XPath/CSS selectors that were working before referred to nothing after the redesign, so we have to start from the very beginning. What a shame!

    But of course, the web is done for more than just crawling. The web is done for people, not robots, so let’s adapt our robots!

    Our experience from a large-scale scraping platform is that a successful business usually runs at least one complete redesign every two years. More minor updates will occur even more often, but remember that even minor updates harm your web scrapers.

    Getting started

    Usually, I recommend starting by following the Quickstart guide from Crawly’s documentation pages . However, this time I have something else in mind. I want to show you the Crawly standalone version.

    Make it simple. In some cases, you need the data that can be extracted from a relatively simple source. In these situations, it might be quite beneficial to avoid bootstrapping all the Elixir stuff (new project, config, libs, dependencies). The idea is to deliver you data that other applications can consume without setting up.

    Of course, the approach will have some limitations and only work for simple projects at this stage. Some may get inspired by this article and improve it so that the following readers will be amazed by new possibilities. In any case, let’s get straight to it now!

    Bootstrapping 2.0

    As promised, the simplified (compare it with the previous setup described here )version of the setup:

    1. Create a directory for your project: mkdir erlang_solutions_blog
    2. Create a subdirectory that will contain the code of your spiders: mkdir erlang_solutions_blog/spiders
    3. Now, knowing that we want to extract the following fields: title, author , publishing_date, URL, article_body . Let’s define the following configuration for your project (erlang_solutions_blog/crawly.config):
    
    [{crawly, [
       {closespider_itemcount, 100},
       {closespider_timeout, 5},
       {concurrent_requests_per_domain, 15},
    
       {middlewares, [
               'Elixir.Crawly.Middlewares.DomainFilter',
               'Elixir.Crawly.Middlewares.UniqueRequest',
               'Elixir.Crawly.Middlewares.RobotsTxt',
               {'Elixir.Crawly.Middlewares.UserAgent', [
                   {user_agents, [
                       <<"Mozilla/5.0 (Macintosh; Intel Mac OS X x.y; rv:42.0) Gecko/20100101 Firefox/42.0">>,
                       <<"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36">>
                       ]
                   }]
               }
           ]
       },
    
       {pipelines, [
               {'Elixir.Crawly.Pipelines.Validate', [{fields, [title, author, publishing_date, url, article_body]}]},
               {'Elixir.Crawly.Pipelines.DuplicatesFilter', [{item_id, title}]},
               {'Elixir.Crawly.Pipelines.JSONEncoder'},
               {'Elixir.Crawly.Pipelines.WriteToFile', [{folder, <<"/tmp">>}, {extension, <<"jl">>}]}
           ]
       }]
    }].
    
    

    You probably have noticed that this looks like an Erlang configuration file, which is the case. I would say that it’s not the perfect solution, and one of the possible ways is to simplify it so it’s possible to configure the project more simply. If you have ideas — write me on Github’s discussions https://github.com/elixir-crawly/crawly/discussions .

    4. The basic configuration is now done, and we can run the Crawly application, to see that we can start it this way:

    docker run --name crawly 
    -d -p 4001:4001 -v $(pwd)/spiders:/app/spiders 
    -v $(pwd)/crawly.config:/app/config/crawly.config 
    oltarasenko/crawly:0.15.0

    Notes:

    • 4001 — is the default HTTP port used for spiders management, so we need to forward data to it
    • The spiders’ directory is an expected storage of spider files that will be added to the application later on.
    • Finally, the ugly configuration file is also mounted inside the Crawly container.

    Now you can see the Crawly Management User interface on the localhost:4001

    Crawly Management Tool

    Working on a new spider

    Now, let’s define the spider itself. Let’s start with the following boilerplate code (put it into erlang_solutions_blog/spiders/esl.ex ):

    defmodule ESLSpider do
     use Crawly.Spider
    
     @impl Crawly.Spider
     def init() do
       [start_urls: ["https://www.erlang-solutions.com/"]]
     end
    
     @impl Crawly.Spider
     def base_url(), do: "https://www.erlang-solutions.com"
    
     @impl Crawly.Spider
     def parse_item(response) do
       %{items: [], requests: []}
     end
    end

    This code defines an “ESLSpider ” module that uses the “Crawly.Spider” behavior.

    The behavior requires three functions to be implemented:

    teinit(), base_url(), and parse_item(response).

    The “init()” function returns a list containing a single key-value pair. The key is “start_urls” and the value is a list containing a single URL string: “ https://www.erlang-solutions.com/ ” This means that the spider will start crawling from this URL.

    The “base_url()” function returns a string representing the base URL for the spider, used to filter out requests that go outside of erlang-solutions.com website.

    The `parse_item(response)` function takes a response object as an argument and returns a map containing two keys: `items` and `requests`

    Once the code is saved, we can run it via the Web interface (it will be required to re-start a docker container or click the Reload spiders button in the Web interface).

    Crawly Management Tool

    Working on a new spider

    Now, let’s define the spider itself. Let’s start with the following boilerplate code (put it into erlang_solutions_blog/spiders/esl.ex ):

    defmodule ESLSpider do
     use Crawly.Spider
    
     @impl Crawly.Spider
     def init() do
       [start_urls: ["https://www.erlang-solutions.com/"]]
     end
    
     @impl Crawly.Spider
     def base_url(), do: "https://www.erlang-solutions.com"
    
     @impl Crawly.Spider
     def parse_item(response) do
       %{items: [], requests: []}
     end
    end

    This code defines an “ESLSpider ” module that uses the “Crawly.Spider” behavior.

    The behavior requires three functions to be implemented:

    teinit(), base_url(), and parse_item(response).

    The “init()” function returns a list containing a single key-value pair. The key is “start_urls” and the value is a list containing a single URL string: “ https://www.erlang-solutions.com/ ” This means that the spider will start crawling from this URL.

    The “base_url()” function returns a string representing the base URL for the spider, used to filter out requests that go outside of erlang-solutions.com website.

    The `parse_item(response)` function takes a response object as an argument and returns a map containing two keys: `items` and `requests`

    Once the code is saved, we can run it via the Web interface (it will be required to re-start a docker container or click the Reload spiders button in the Web interface).

    New Crawly Management UI

    Once the job is started, you can review the Scheduled Requests, Logs, or Extracted Items.

    Parsing the page

    Now we find CSS selectors to extract the needed data. The same approach is already described here https://www.erlang-solutions.com/blog/web-scraping-with-elixir/ under extracting the data section. I think one of the best ways to find relevant CSS selectors is by just using Google Chrome’s inspect option:

    So let’s connect to the Crawly Shell and fetch data using the fetcher, extracting this title:

    docker exec -it crawly /app/bin/crawly remote

    1> response = Crawly.fetch("https://www.erlang-solutions.com/blog/web-scraping-with-elixir/")
    2> document = Floki.parse_document!(response.body)
    4> title_tag = Floki.find(document, ".page-title-sm")
    [{"h1", [{"class", "page-title-sm mb-sm"}], ["Web scraping with Elixir"]}]
    5> title = Floki.text(title_tag)
    "Web scraping with Elixir"
    
    

    We are going to extract all items this way. In the end, we came up with the following map of selectors representing the expected item:

    item =
     %{
       url: response.request_url,
       title: Floki.find(document, ".page-title-sm") |> Floki.text(),
       article_body: Floki.find(document, ".default-content") |> Floki.text(),
       author: Floki.find(document, ".post-info__author") |> Floki.text(),
       publishing_date: Floki.find(document, ".header-inner .post-info .post-info__item span") |> Floki.text()
      }
    
    requests = Enum.map(
     Floki.find(document, ".link-to-all") |> Floki.attribute("href"),
     fn url -> Crawly.Utils.request_from_url(url) end
    )
    
    

    At the end of it, we came up with the following code representing the spider:

    defmodule ESLSpider do
     use Crawly.Spider
    
     @impl Crawly.Spider
     def init() do
       [
         start_urls: [
           "https://www.erlang-solutions.com/blog/web-scraping-with-elixir/",
           "https://www.erlang-solutions.com/blog/which-companies-are-using-elixir-and-why-mytopdogstatus/"
         ]
       ]
     end
    
     @impl Crawly.Spider
     def base_url(), do: "https://www.erlang-solutions.com"
    
     @impl Crawly.Spider
     def parse_item(response) do
       {:ok, document} = Floki.parse_document(response.body)
    
       requests = Enum.map(
         Floki.find(document, ".link-to-all") |> Floki.attribute("href"),
         fn url -> Crawly.Utils.request_from_url(url) end
         )
    
       item = %{
         url: response.request_url,
         title: Floki.find(document, ".page-title-sm") |> Floki.text(),
         article_body: Floki.find(document, ".default-content") |> Floki.text(),
         author: Floki.find(document, ".post-info__author") |> Floki.text(),
         publishing_date: Floki.find(document, ".header-inner .post-info .post-info__item span") |> Floki.text()
       }
       %{items: [item], requests: requests}
     end
    end
    
    
    

    That’s all, folks! Thanks for reading!

    Well, not really. Let’s schedule this version of the spider again, and let’s see the results:

    Scraping results

    As you can see, the spider could only extract 34 items. This is quite interesting, as it’s pretty clear that Erlang Solution’s blog contains way more items. So why do we have only this amount? Can anything be done to improve it?

    Debugging your spider

    Some intelligent developers write everything just once, and everything works. Other people like me have to spend time debugging the code.

    In my case, I start with exploring logs. There is something there I don’t like:

    08:23:37.417 [info] Dropping item: %{article_body: “Scalable and Reliable Real-time MQTT Messaging Engine for IoT in the 5G Era.We work with proven, world leading technologies that provide a highly scalable, highly available distributed message broker for all major IoT protocols, as well as M2M and mobile applications.Available virtually everywhere with real-time system monitoring and management ability, it can handle tens of millions of concurrent clients.Today, more than 5,000 enterprise users are trusting EMQ X to connect more than 50 million devices.As well as being trusted experts in EMQ x, we also have 20 years of experience building reliable, fault-tolerant, real-time distributed systems. Our experts are able to guide you through any stage of the project to ensure your system can scale with confidence. Whether you†™ re hunting for a suspected bug, or doing due diligence to future proof your system, we†™ re here to help. Our world-leading team will deep dive into your system providing an in-depth report of recommendations. This gives you full visibility on the vulnerabilities of your system and how to improve it. Connected devices play an increasingly vital role in major infrastructure and the daily lives of the end user. To provide our clients with peace of mind, our support agreements ensure an expert is on hand to minimise the length and damage in the event of a disruption. Catching a disruption before it occurs is always cheaper and less time consuming. WombatOAM is specifically designed for the monitoring and maintenance of BEAM-based systems (including EMQ x). This provides you with powerful visibility and custom alerts to stop issues before they occur. As well as being trusted experts in EMQ x, we also have 20 years of experience building reliable, fault-tolerant, real-time distributed systems. Our experts are able to guide you through any stage of the project to ensure your system can scale with confidence. Whether you†™ re hunting for a suspected bug, or doing due diligence to future proof your system, we†™ re here to help. Our world-leading team will deep dive into your system providing an in-depth report of recommendations. This gives you full visibility on the vulnerabilities of your system and how to improve it. Connected devices play an increasingly vital role in major infrastructure and the daily lives of the end user. To provide our clients with peace of mind, our support agreements ensure an expert is on hand to minimise the length and damage in the event of a disruption. Catching a disruption before it occurs is always cheaper and less time consuming. WombatOAM is specifically designed for the monitoring and maintenance of BEAM-based systems (including EMQ x). This provides you with powerful visibility and custom alerts to stop issues before they occur. Because it†™ s written in Erlang!With it†™ s Erlang/OTP design, EMQ X fuses some of the best qualities of Erlang. A single node broker can sustain one million concurrent connections…but a single EMQ X cluster – which contains multiple nodes – can support tens of millions of concurrent connections. Inside this cluster, routing and broker nodes are deployed independently to increase the routing efficiency. Control channels and data channels are also separated – significantly improving the performance of message forwarding. EMQ X works on a soft real-time basis. No matter how many simultaneous requests are going through the system, the latency is guaranteed.Here†™ s how EMQ X can help with your IoT messaging needs?Erlang Solutions exists to build transformative solutions for the world†™ s most ambitious companies, by providing user-focused consultancy, high tech capabilities and diverse communities. Let†™ s talk about how we can help you.”, author: “”, publishing_date: “”, title: “”, url: “https://www.erlang-solutions.com/capabilities/emqx/”}. Reason: missing required fields

    The line above indicates that the spider has dropped an article, which is not an article but is a general page. We want to exclude these URLs from the route of our bot.

    Try to avoid creating unnecessary loads on a website when doing crawling activities.

    The following lines can achieve this:

    requests =
     Floki.find(document, ".link-to-all") |> Floki.attribute("href")
     |> Enum.filter(fn url -> String.contains?(url, "/blog/") end)
     |> Enum.map(&Crawly.Utils.request_from_url/1)
    

    Now, we can re-run the spider and see that we’re not hitting non-blog pages anymore (don’t forget to reload the spider’s code)!

    This optimised our crawler, but more was needed to extract more items. (Besides other things, it’s interesting to note that we can only get 35 articles from the “Keep reading” blog, which indicates some possible directions for improving the cross-linking inside the blog itself).

    Improving the extraction coverage

    When looking at the possibility of extracting more items, we should try finding a better source of links. One good way to do it is by exploring the blog’s homepage, potentially with JavaScript turned off. Here is what I can see:

    Sometimes you need to switch JavaScript off to see more.

    As you can see, there are 14 Pages (only 12 of which are working), and every page contains nine articles. So we expect ~100–108 articles in total.

    So let’s try to use this pagination as a source of new links! I have updated the init() function, so it refers the blog’s index, and also parse_item so it can use the information found there:

    @impl Crawly.Spider
     def init() do
       [
         start_urls: [
           "https://www.erlang-solutions.com/blog/page/2/?pg=2",
           "https://www.erlang-solutions.com/blog/web-scraping-with-elixir/",
           "https://www.erlang-solutions.com/blog/which-companies-are-using-elixir-and-why-mytopdogstatus/"
         ]
       ]
     end
    
    @impl Crawly.Spider
    def parse_item(response) do
     {:ok, document} = Floki.parse_document(response.body)
    
     case String.contains?(response.request_url, "/blog/page/") do
       false -> parse_article_page(document, response.request_url)
       true -> parse_index_page(document, response.request_url)
     end
    end
    
    defp parse_index_page(document, _url) do
     index_pages =
       document
       |> Floki.find(".page a")
       |> Floki.attribute("href")
       |> Enum.map(&Crawly.Utils.request_from_url/1)
    
     blog_posts =
       Floki.find(document, ".grid-card__content a.btn-link")
       |> Floki.attribute("href")
       |> Enum.filter(fn url -> String.contains?(url, "/blog/") end)
       |> Enum.map(&Crawly.Utils.request_from_url/1)
    
       %{items: [], requests: index_pages ++ blog_posts }
    end
    
    defp parse_article_page(document, url) do
     requests =
       Floki.find(document, ".link-to-all")
       |> Floki.attribute("href")
       |> Enum.filter(fn url -> String.contains?(url, "/blog/") end)
       |> Enum.map(&Crawly.Utils.request_from_url/1)
    
     item = %{
       url: url,
       title: Floki.find(document, ".page-title-sm") |> Floki.text(),
       article_body: Floki.find(document, ".default-content") |> Floki.text(),
       author: Floki.find(document, ".post-info__author") |> Floki.text(),
       publishing_date: Floki.find(document, ".header-inner .post-info .post-info__item span") |> Floki.text()
     }
     %{items: [item], requests: requests}

    Running it again

    Now, finally, after adding all fixes, let’s reload the code and re-run the spider:

    So as you can see, we have extracted 114 items, which looks quite close to what we expected!

    Conclusion

    Honestly speaking — running an open-source project is a complex thing. We have spent almost four years building Crawly and progressed quite a bit with the possibilities. Adding some bugs as well.

    The example above shows how to run something with Elixir/Floki and a bit more complex process of debugging and fixing that sometimes appears in practice.

    We want to thank Erlang Solutions for supporting the development and allocating help when needed!

    The post Re-implement our first blog scrapper with Crawly 0.15.0 appeared first on Erlang Solutions .

    • wifi_tethering open_in_new

      This post is public

      www.erlang-solutions.com /blog/re-implement-our-first-blog-scrapper-with-crawly-0-15-0-2/

    • chevron_right

      Isode: Red/Black 2.0 – New Capabilities

      news.movim.eu / PlanetJabber • 21 April, 2023 • 1 minute

    This major release adds significant new functionality and improvements to Red/Black, a management tool that allows you to monitor and control devices and servers across a network, with a particular focus on HF Radio Systems.  A general summary is given in the white paper Red/Black Overview

    Switch Device

    Support added for Switch type devices, that can connect multiple devices and allow an operator (red or black side) to change switch connections.   Physical switch connectivity is configured by an administrator.  The switch column can be hidden, so that logical connectivity through the switch is shown.

    SNMP Support

    A device driver for SNMP devices is provided, including SNMPv3 authorization.   Abstract devices specifications are included in Red/Black for:

    • SNMP System MIB
    • SNMP Host MIB
    • SNMP UPS MIB
    • Leonardo HF 2000 radio
    • IES Antenna Switch
    • eLogic Radio Gateway

    Abstract devices specifications can be configured for other devices with suitable SNMP MIBs.

    Further details provided in the Isode WP “ Managing SNMP Devices in Red/Black “.

    Alert Handling

    The UI shows all devices that have Alerts which have not been handled by operator.   The UI enables an operator to see all un-handled alerts for a device and gives the ability to mark some or all alerts as handled.

    Device Parameter Display and Management

    A number of improvements have been made to the way device parameters are handled:

    • Improved general parameter display
    • Display in multiple columns, with selectable number of columns and choice of style, to better support devices with large numbers of parameters
    • Parameter grouping
    • Labelled integer support, so that semantics can be added to values
    • Configurable Colours
    • Display of parameter Units
    • Configurable parameter icons
    • Optimized UI for Device refresh; enable/disable; power off; and reset
    • Integer parameters can specify “interval”
    • Parameters with limited integer values can be selected as drop down

    Top Screen Display

    The top screen display is improved.

    • Modes of “Device” (monitoring)  and “Connectivity” with UIs optimized for these functions
    • Reduced clutter when no device is being examined
    • Allow columns to be hidden/restored so that the display can be tuned to operator needs
    • Show selected device parameters on top screen so that operator can see critical device parameters without needing to inspect the device details
    • UI clearly shows which links user can modify, according to operator or administrator rights
    • wifi_tethering open_in_new

      This post is public

      www.isode.com /company/wordpress/red-black-2-0-new-capabilities/

    • chevron_right

      ProcessOne: ejabberd 23.04

      news.movim.eu / PlanetJabber • 19 April, 2023 • 13 minutes

    This new ejabberd 23.04 release includes many improvements and bug fixes, as well as some new features.

    ejabberd 23.04

    A more detailed explanation of these topics and other features:

    Many improvements to SQL databases

    There are many improvements in the area of SQL databases (see #3980 and #3982 ):

    • Added support for migrating MySQL and MS SQL to new schema , fixed a long-standing bug, and many other improvements.
    • Regarding MS SQL, there are schema fixes, added support for new schema and the corresponding schema migration, along with other minor improvements and bugfixes.
    • The automated ejabberd tests now also run on updated schema databases, and support for running tests on MS SQL has been added.
    • and other minor SQL schema inconsistencies, removed unnecessary indexes and changed PostgreSQL SERIAL columns to BIGSERIAL columns.

    Please upgrade your existing SQL database, check the notes later in this document!

    Added mod_mam support for XEP-0425: Message Moderation

    XEP-0425: Message Moderation allows a Multi-User Chat (XEP-0045) moderator to moderate certain group chat messages, for example by removing them from the group chat history, as part of an effort to address and resolve issues such as message spam, inappropriate venue language, or revealing private personal information of others. It also allows moderators to correct a message on another user’s behalf, or flag a message as inappropriate, without having to retract it.

    Clients that currently support this XEP are Gajim , Converse.js , Monocles , and have read-only support Poezio and XMPP Web .

    New mod_muc_rtbl module

    This new module implements Real-Time Block List for MUC rooms. It works by monitoring remote pubsub nodes according to the specification described in xmppbl.org .

    captcha_url option now accepts auto value

    In recent ejabberd releases, captcha_cmd got support for macros (in ejabberd 22.10 ) and support for using modules (in ejabberd 23.01 ).

    Now captcha_url gets an improvement: if set to auto , it tries to detect the URL automatically, taking into account the ejabberd configuration. This is now the default. This should be good enough in most cases, but manually setting the URL may be necessary when using port forwarding or very specific setups.

    Erlang/OTP 19.3 is deprecated

    This is the last ejabberd release with support for Erlang/OTP 19.3. If you have not already done so, please upgrade to Erlang/OTP 20.0 or newer before the next ejabberd release. See the ejabberd 22.10 release announcement for more details.

    About the binary packages provided for ejabberd:

    • The binary installers and container images now use Erlang/OTP 25.3 and Elixir 1.14.3.
    • The mix , ecs and ejabberd container images now use Alpine 3.17.
    • The ejabberd container image now supports an alternative build method, useful to work around a problem in QEMU and Erlang 25 when building the image for the arm64 architecture.

    Erlang node name in ecs container image

    The ecs container image is built using the files from docker-ejabberd/ecs and published in docker.io/ejabberd/ecs . This image generally gets only minimal fixes, no major or breaking changes, but in this release it got one change that requires administrator intervention.

    The Erlang node name is now fixed to ejabberd@localhost by default, instead of being variable based on the container hostname. If you previously allowed ejabberd to choose its node name (which was random), it will now create a new mnesia database instead of using the previous one:

    $ docker exec -it ejabberd ls /home/ejabberd/database/
    ejabberd@1ca968a0301a
    ejabberd@localhost
    ...
    

    A simple solution is to create a container that provides ERLANG_NODE_ARG with the old erlang node name, for example:

    docker run ... -e ERLANG_NODE_ARG=ejabberd@1ca968a0301a
    

    or in docker-compose.yml

    version: '3.7'
    services:
      main:
        image: ejabberd/ecs
        environment:
          - ERLANG_NODE_ARG=ejabberd@1ca968a0301a
    

    Another solution is to change the mnesia node name in the mnesia spool files.

    Other improvements to the ecs container image

    In addition to the previously mentioned change to the default erlang node name, the ecs container image has received other improvements:

    • For each commit to the docker-ejabberd repository that affects ecs and mix container images, those images are uploaded as artifacts and are available for download in the corresponding runs .
    • When a new release is tagged in the docker-ejabberd repository, the image is automatically published to ghcr.io/processone/ecs , in addition to being manually published to the Docker Hub.
    • There are new sections in the ecs README file: Clustering and Clustering Example .

    Documentation Improvements

    In addition to the usual improvements and fixes, some sections of the ejabberd documentation have been improved:

    Acknowledgments

    We would like to thank the following people for their contributions to the source code, documentation, and translation for this release:

    And also to all the people who help solve doubts and problems in the ejabberd chatroom and issue tracker.

    Updating SQL Databases

    These notes allow you to apply the SQL database schema improvements in this ejabberd release to your existing SQL database. Please consider which database you are using and whether it is the default or the new schema .

    PostgreSQL new schema:

    Fixes a long-standing bug in the new schema on PostgreSQL. The fix for all existing affected installations is the same:

    ALTER TABLE vcard_search DROP CONSTRAINT vcard_search_pkey;
    ALTER TABLE vcard_search ADD PRIMARY KEY (server_host, lusername);
    

    PosgreSQL default or new schema:

    To convert columns to allow up to 2 billion rows in these tables. This conversion requires full table rebuilds and will take a long time if the tables already have many rows. Optional: This is not necessary if the tables will never grow large.

    ALTER TABLE archive ALTER COLUMN id TYPE BIGINT;
    ALTER TABLE privacy_list ALTER COLUMN id TYPE BIGINT;
    ALTER TABLE pubsub_node ALTER COLUMN nodeid TYPE BIGINT;
    ALTER TABLE pubsub_state ALTER COLUMN stateid TYPE BIGINT;
    ALTER TABLE spool ALTER COLUMN seq TYPE BIGINT;
    

    PostgreSQL or SQLite default schema:

    DROP INDEX i_rosteru_username;
    DROP INDEX i_sr_user_jid;
    DROP INDEX i_privacy_list_username;
    DROP INDEX i_private_storage_username;
    DROP INDEX i_muc_online_users_us;
    DROP INDEX i_route_domain;
    DROP INDEX i_mix_participant_chan_serv;
    DROP INDEX i_mix_subscription_chan_serv_ud;
    DROP INDEX i_mix_subscription_chan_serv;
    DROP INDEX i_mix_pam_us;
    

    PostgreSQL or SQLite new schema:

    DROP INDEX i_rosteru_sh_username;
    DROP INDEX i_sr_user_sh_jid;
    DROP INDEX i_privacy_list_sh_username;
    DROP INDEX i_private_storage_sh_username;
    DROP INDEX i_muc_online_users_us;
    DROP INDEX i_route_domain;
    DROP INDEX i_mix_participant_chan_serv;
    DROP INDEX i_mix_subscription_chan_serv_ud;
    DROP INDEX i_mix_subscription_chan_serv;
    DROP INDEX i_mix_pam_us;
    

    And now add index that might be missing

    In PostgreSQL:

    CREATE INDEX i_push_session_sh_username_timestamp ON push_session USING btree (server_host, username, timestamp);
    

    In SQLite:

    CREATE INDEX i_push_session_sh_username_timestamp ON push_session (server_host, username, timestamp);
    

    MySQL default schema:

    ALTER TABLE rosterusers DROP INDEX i_rosteru_username;
    ALTER TABLE sr_user DROP INDEX i_sr_user_jid;
    ALTER TABLE privacy_list DROP INDEX i_privacy_list_username;
    ALTER TABLE private_storage DROP INDEX i_private_storage_username;
    ALTER TABLE muc_online_users DROP INDEX i_muc_online_users_us;
    ALTER TABLE route DROP INDEX i_route_domain;
    ALTER TABLE mix_participant DROP INDEX i_mix_participant_chan_serv;
    ALTER TABLE mix_participant DROP INDEX i_mix_subscription_chan_serv_ud;
    ALTER TABLE mix_participant DROP INDEX i_mix_subscription_chan_serv;
    ALTER TABLE mix_pam DROP INDEX i_mix_pam_u;
    

    MySQL new schema:

    ALTER TABLE rosterusers DROP INDEX i_rosteru_sh_username;
    ALTER TABLE sr_user DROP INDEX i_sr_user_sh_jid;
    ALTER TABLE privacy_list DROP INDEX i_privacy_list_sh_username;
    ALTER TABLE private_storage DROP INDEX i_private_storage_sh_username;
    ALTER TABLE muc_online_users DROP INDEX i_muc_online_users_us;
    ALTER TABLE route DROP INDEX i_route_domain;
    ALTER TABLE mix_participant DROP INDEX i_mix_participant_chan_serv;
    ALTER TABLE mix_participant DROP INDEX i_mix_subscription_chan_serv_ud;
    ALTER TABLE mix_participant DROP INDEX i_mix_subscription_chan_serv;
    ALTER TABLE mix_pam DROP INDEX i_mix_pam_us;
    

    Add index that might be missing:

    CREATE INDEX i_push_session_sh_username_timestamp ON push_session (server_host, username(191), timestamp);
    

    MS SQL

    DROP INDEX [rosterusers_username] ON [rosterusers];
    DROP INDEX [sr_user_jid] ON [sr_user];
    DROP INDEX [privacy_list_username] ON [privacy_list];
    DROP INDEX [private_storage_username] ON [private_storage];
    DROP INDEX [muc_online_users_us] ON [muc_online_users];
    DROP INDEX [route_domain] ON [route];
    go
    

    MS SQL schema was missing some tables added in earlier versions of ejabberd:

    CREATE TABLE [dbo].[mix_channel] (
        [channel] [varchar] (250) NOT NULL,
        [service] [varchar] (250) NOT NULL,
        [username] [varchar] (250) NOT NULL,
        [domain] [varchar] (250) NOT NULL,
        [jid] [varchar] (250) NOT NULL,
        [hidden] [smallint] NOT NULL,
        [hmac_key] [text] NOT NULL,
        [created_at] [datetime] NOT NULL DEFAULT GETDATE()
    ) TEXTIMAGE_ON [PRIMARY];
    
    CREATE UNIQUE CLUSTERED INDEX [mix_channel] ON [mix_channel] (channel, service)
    WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON);
    
    CREATE INDEX [mix_channel_serv] ON [mix_channel] (service)
    WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON);
    
    CREATE TABLE [dbo].[mix_participant] (
        [channel] [varchar] (250) NOT NULL,
        [service] [varchar] (250) NOT NULL,
        [username] [varchar] (250) NOT NULL,
        [domain] [varchar] (250) NOT NULL,
        [jid] [varchar] (250) NOT NULL,
        [id] [text] NOT NULL,
        [nick] [text] NOT NULL,
        [created_at] [datetime] NOT NULL DEFAULT GETDATE()
    ) TEXTIMAGE_ON [PRIMARY];
    
    CREATE UNIQUE INDEX [mix_participant] ON [mix_participant] (channel, service, username, domain)
    WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON);
    
    CREATE INDEX [mix_participant_chan_serv] ON [mix_participant] (channel, service)
    WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON);
    
    CREATE TABLE [dbo].[mix_subscription] (
        [channel] [varchar] (250) NOT NULL,
        [service] [varchar] (250) NOT NULL,
        [username] [varchar] (250) NOT NULL,
        [domain] [varchar] (250) NOT NULL,
        [node] [varchar] (250) NOT NULL,
        [jid] [varchar] (250) NOT NULL
    );
    
    CREATE UNIQUE INDEX [mix_subscription] ON [mix_subscription] (channel, service, username, domain, node)
    WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON);
    
    CREATE INDEX [mix_subscription_chan_serv_ud] ON [mix_subscription] (channel, service, username, domain)
    WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON);
    
    CREATE INDEX [mix_subscription_chan_serv_node] ON [mix_subscription] (channel, service, node)
    WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON);
    
    CREATE INDEX [mix_subscription_chan_serv] ON [mix_subscription] (channel, service)
    WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON);
    
    CREATE TABLE [dbo].[mix_pam] (
        [username] [varchar] (250) NOT NULL,
        [channel] [varchar] (250) NOT NULL,
        [service] [varchar] (250) NOT NULL,
        [id] [text] NOT NULL,
        [created_at] [datetime] NOT NULL DEFAULT GETDATE()
    ) TEXTIMAGE_ON [PRIMARY];
    
    CREATE UNIQUE CLUSTERED INDEX [mix_pam] ON [mix_pam] (username, channel, service)
    WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON);
    
    go
    

    MS SQL also had some incompatible column types:

    ALTER TABLE [dbo].[muc_online_room] ALTER COLUMN [node] VARCHAR (250);
    ALTER TABLE [dbo].[muc_online_room] ALTER COLUMN [pid] VARCHAR (100);
    ALTER TABLE [dbo].[muc_online_users] ALTER COLUMN [node] VARCHAR (250);
    ALTER TABLE [dbo].[pubsub_node_option] ALTER COLUMN [name] VARCHAR (250);
    ALTER TABLE [dbo].[pubsub_node_option] ALTER COLUMN [val] VARCHAR (250);
    ALTER TABLE [dbo].[pubsub_node] ALTER COLUMN [plugin] VARCHAR (32);
    go
    

    … and mqtt_pub table was incorrectly defined in old schema:

    ALTER TABLE [dbo].[mqtt_pub] DROP CONSTRAINT [i_mqtt_topic_server];
    ALTER TABLE [dbo].[mqtt_pub] DROP COLUMN [server_host];
    ALTER TABLE [dbo].[mqtt_pub] ALTER COLUMN [resource] VARCHAR (250);
    ALTER TABLE [dbo].[mqtt_pub] ALTER COLUMN [topic] VARCHAR (250);
    ALTER TABLE [dbo].[mqtt_pub] ALTER COLUMN [username] VARCHAR (250);
    CREATE UNIQUE CLUSTERED INDEX [dbo].[mqtt_topic] ON [mqtt_pub] (topic)
    WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON);
    go
    

    … and sr_group index/PK was inconsistent with other DBs:

    ALTER TABLE [dbo].[sr_group] DROP CONSTRAINT [sr_group_PRIMARY];
    CREATE UNIQUE CLUSTERED INDEX [sr_group_name] ON [sr_group] ([name])
    WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON);
    go
    

    ChangeLog

    General

    • New s2s_out_bounce_packet hook
    • Re-allow anonymous connection for connection without client certificates ( #3985 )
    • Stop ejabberd_system_monitor before stopping node
    • captcha_url option now accepts auto value, and it’s the default
    • mod_mam : Add support for XEP-0425: Message Moderation
    • mod_mam_sql : Fix problem with results of mam queries using rsm with max and before
    • mod_muc_rtbl : New module for Real-Time Block List for MUC rooms ( #4017 )
    • mod_roster : Set roster name from XEP-0172, or the stored one ( #1611 )
    • mod_roster : Preliminary support to store extra elements in subscription request ( #840 )
    • mod_pubsub : Pubsub xdata fields max_item/item_expira/children_max use max not infinity
    • mod_vcard_xupdate : Invalidate vcard_xupdate cache on all nodes when vcard is updated

    Admin

    • ext_mod : Improve support for loading *.so files from ext_mod dependencies
    • Improve output in gen_html_doc_for_commands command
    • Fix ejabberdctl output formatting ( #3979 )
    • Log HTTP handler exceptions

    MUC

    • New command get_room_history
    • Persist none role for outcasts
    • Try to populate room history from mam when unhibernating
    • Make mod_muc_room:set_opts process persistent flag first
    • Allow passing affiliations and subscribers to create_room_with_opts command
    • Store state in db in mod_muc:create_room()
    • Make subscribers members by default

    SQL schemas

    • Fix a long standing bug in new schema migration
    • update_sql command: Many improvements in new schema migration
    • update_sql command: Add support to migrate MySQL too
    • Change PostgreSQL SERIAL to BIGSERIAL columns
    • Fix minor SQL schema inconsistencies
    • Remove unnecessary indexes
    • New SQL schema migrate fix

    MS SQL

    • MS SQL schema fixes
    • Add new schema for MS SQL
    • Add MS SQL support for new schema migration
    • Minor MS SQL improvements
    • Fix MS SQL error caused by ORDER BY in subquery

    SQL Tests

    • Add support for running tests on MS SQL
    • Add ability to run tests on upgraded DB
    • Un-deprecate ejabberd_config:set_option/2
    • Use python3 to run extauth.py for tests
    • Correct README for creating test docker MS SQL DB
    • Fix TSQLlint warnings in MSSQL test script

    Testing

    • Fix Shellcheck warnings in shell scripts
    • Fix Remark-lint warnings
    • Fix Prospector and Pylint warnings in test extauth.py
    • Stop testing ejabberd with Erlang/OTP 19.3, as Github Actions no longer supports ubuntu-18.04
    • Test only with oldest OTP supported (20.0), newest stable (25.3) and bleeding edge (26.0-rc2)
    • Upload Common Test logs as artifact in case of failure

    ecs container image

    • Update Alpine to 3.17 to get Erlang/OTP 25 and Elixir 1.14
    • Add tini as runtime init
    • Set ERLANG_NODE fixed to ejabberd@localhost
    • Upload images as artifacts to Github Actions
    • Publish tag images automatically to ghcr.io

    ejabberd container image

    • Update Alpine to 3.17 to get Erlang/OTP 25 and Elixir 1.14
    • Add METHOD to build container using packages ( #3983 )
    • Add tini as runtime init
    • Detect runtime dependencies automatically
    • Remove unused Mix stuff: ejabberd script and static COOKIE
    • Copy captcha scripts to /opt/ejabberd-*/lib like the installers
    • Expose only HOME volume, it contains all the required subdirs
    • ejabberdctl: Don’t use .../releases/COOKIE , it’s no longer included

    Installers

    • make-binaries: Bump versions, e.g. erlang/otp to 25.3
    • make-binaries: Fix building with erlang/otp 25.x
    • make-packages: Fix for installers workflow, which didn’t find lynx

    Full Changelog

    https://github.com/processone/ejabberd/compare/23.01…23.04

    ejabberd 23.04 download & feedback

    As usual, the release is tagged in the git source repository on GitHub .

    The source package and installers are available on the ejabberd Downloads page. To verify the *.asc signature files, see How to verify the integrity of ProcessOne downloads .

    For convenience, there are alternative download locations such as the ejabberd DEB/RPM Packages Repository and the GitHub Release / Tags .

    The ecs container image is available in docker.io/ejabberd/ecs and ghcr.io/processone/ecs . The alternative ejabberd container image is available in ghcr.io/processone/ejabberd .

    If you think you’ve found a bug, please search or file a bug report at GitHub Issues .

    The post ejabberd 23.04 first appeared on ProcessOne .
    • chevron_right

      Sam Whited: Concord and Spring Road Linear Parks

      news.movim.eu / PlanetJabber • 15 April, 2023 • 4 minutes

    In my earlier review of Rose Garden and Jonquil public parks I mentioned the Mountain-to-River Trail ( M2R ), a mixed-use bicycle and walking trail that connects the two parks.

    The two parks I’m going to review today are also connected by the M2R trail in addition to the Concord Road Trail , but unlike the previous parks these are linear parks that are integrated directly into the trails!

    Since the linear parks aren’t very large and don’t have much in the way of ammenities to talk about, we’ll veer outside of our Smyrna focus and discuss a few other highlights of the Concord Road Trail and the southern portion of the M2R trail, starting with the Chattahoochee River.

    Paces Mill

    • Amenities: 🏞️ 👟 🥾 🛶 💩 🚲
    • Transportation: 🚍 🚴 🚣

    The southern terminus of the M2R trail is at the Chattahoochee River National Recreation Area ’s Paces Mill Unit. In addition to the paved walking and biking trails, the park has several miles of unpaved hiking trail, fishing, and of course the river itself. Dogs are allowed and bags are available near the entrance. If you head north on the paved Rottenwood Creek Trail you’ll eventually connect to the Palisades West Trails, the Bob Callan Trail, and the Akers Mill East Trail, giving you access to one of the largest connected mixed-use trail systems in the Atlanta area!

    If, instead, you head out of the park to the south on the M2R trail you’ll quickly turn back north into the urban sprawl of the Atlanta suburbs. In approximately 2km you’ll reach the Cumberland Transfer Center where you can catch a bus to most anywhere in Cobb, or transfer to MARTA in Atlanta-proper. At this point the trail also forks for a more direct route to the Silver Comet Trail using the Silver Comet Cumberland Connector trail. We may take that trail another day, but for now we’ll continue north on the M2R trail. Just a bit further north there are also short connector trails to Cobb Galleria Center (an exhibit hall and convention center) and The Battery, a mixed-use development surrounding the Atlanta Braves baseball stadium.

    It’s at this point that the trail turns west along Spring Road where it coincides with the Spring Road Trail that connects to the previously-reviewed Jonquil Park (a total ride of ~3.7km). Shortly thereafter we reach our first actual un-reviewed Smyrna park: the Spring Road Linear Park.

    a map showing bike directions between the CRNRA at Paces Mill and Spring Road Linear Park

    Spring Road Linear Park

    • Amenities: 👟 💩
    • Transportation: 🚍 🚴

    The Spring Road Linear Park stretches 1.1km along the M2R Trail and is easily accessed by both bike (of course) and bus via CobbLinc Route 25 .

    The park does not have a sign or other markers, but does have several nice pull offs with benches that make a good stop over point on your way home to or from the buses at the Cumberland Transfer Center. If you’re out walking the dog public trash cans and dog-poo bags are available on the east end of the park, but do keep in mind that the main trail is mixed-use so dogs should be kept on one side of the trail to avoid incidents with bikes.

    a trail stretches out ahead with a side trail providing access from a neighborhood a small parklet to the side of the trail contains benches and dog poo bags

    After a short climb the trail turns north again and intersects with the Concord Road Trail and the Atlanta Road Trail. We could veer just off the trail near this point to reach Durham Park , the subject of a future review, but instead we’ll continue west, transitioning to the Concord Road Trail to reach our next park: Concord Road Linear Park.

    map of the bike trail between Spring Road Linear Park and Concord Road Linear Park

    Concord Road Linear Park

    • Amenities: 👟 🔧 📚 💩
    • Transportation: 🚍 🚴

    The Concord Road Linear Park sits in the middle of the mid-century Smyrna Heights neighborhood and has something special that’s not often found in poorly designed suburban neighborhoods: (limited) mixed-use zoning! A restaurant and bar (currently seafood) sits at the edge of the park along with a bike repair stand and bike parking.

    It’s worth commending Smyrna for creating this park at all, it may be small but in addition to the mixed-use zoning it did something that’s also not often seen in the burbs: it removed part of Evelyn Street, disconnecting it from the nearest arterial road! In the war-on-cars this is a small but important victory that creates a quality-of-life improvement for everyone in the neighborhood, whether they bike, walk the dog, or just take a stroll over to the restaurants in the town square without having to be molested by cars.

    a street ends and becomes a walking path into a park

    Formerly part of Evelyn Street, now a path

    Silver Comet Concord Road Trail Head

    • Amenities: 🚻 🍳 👟 📚
    • Transportation: 🚴

    In our next review we’ll turn back and continue up the M2R trail to reach a few other parks, but if we were to continue we’d find that the Concord Road Trail continues for another 4km until it terminates at the Silver Comet Trail’s Concord Road Trail Head . This trail head sits at mile marker 2.6 on the Silver Comet Trail, right by the Concord Covered Bridge Historic District .

    The Silver Comet will likely be covered in future posts, so for now I’ll leave it there. Thanks for bearing with me while we take a detour away from the City of Smyrna’s parks, next time the majority of the post will be about parks within the city, I promise.

    map of the bike trail between Concord Road Linear Park and the Silver Comet Concord Road Trail Head
    • wifi_tethering open_in_new

      This post is public

      blog.samwhited.com /2023/04/concord-and-spring-road-linear-parks/

    • chevron_right

      Erlang Solutions: Optimización para lograr concurrencia: comparación y contraste de las máquinas virtuales BEAM y JVM

      news.movim.eu / PlanetJabber • 12 April, 2023 • 17 minutes

    En esta nota exploraremos los aspectos internos de la máquina virtual BEAM o VM por sus siglas en inglés (Virtual Machine). Y haremos una comparación con la máquina virtual de Java, la JVM.

    El éxito de cualquier lenguaje de programación en el ecosistema Erlang puede ser repartido a tres componentes estrechamente acoplados:

    1. la semántica del lenguaje de programación Erlang, que es la base sobre la cual otros lenguajes están implementados
    2. las bibliotecas OTP y middleware usados para construir arquitecturas escalabels y sistemas concurrentes y resilientes y
    3. la máquina virtual BEAM, estrechamente acoplada a la semántica del lenguaje y OPT.

    Toma cualquiera de estos componentes por si solo y tendras a un potencial ganador. Pero si consideras a los tres juntos, tendrás a un ganador indiscutible para el desarrollo de sistemas escalables, resilientes y soft-real time. Citando a Joe Armstrong:

    “Puedes copiar las bibliotecas de Erlang, pero si no corren en la BEAM, no puedes emular la semánticas”

    Esta idea es reforzada por la primera regla de programación de Robert Virding, que establece que “Cualquier programa concurrente escrito en otro lenguaje y que sea lo suficientemente complejo, contiene una implementación ad hoc, específicada informalmente, lenta y plagada de errores, de la mitad de Erlang.”

    En esta nota vamos a explorar los aspectos internos de la máquina virtual BEAM. Compararemos algunos de ellos con la JVM, señalando las razones por las que deberías poner especial atención en ellos. Por mucho tiempo estos componentes han sido tratados como una caja negra, y confiamos ciegamente en ellos sin entender que hay detrás. Es tiempo de cambiar eso!

    Aspectos relevantes de la BEAM

    Erlang y la máquina virtual BEAM fueron inventados para tener una herramienta que resolviera un problema específico. Fueron desarrollados por Ericsson para ayudar a implementar la infraestructura de un sistema de telecomunicaciones que manejara redes fijas y móviles. Esta infraestructura por naturaleza es altamente concurrente y escalable. Tiene que funcionar en tiempo real y posiblemente nunca presentar fallas. No queremos que nuestra llamada de Hangouts con nuestra abuela de pronto terminé por un error, o estar en un juego en línea como Fortnite y que se interrumpa porque le tienen que hacer actualizaciones. La máquina virtual BEAM está optimizada para resolver muchos de estos retos, gracias a características que funcionan con un modelo de programación concurrente predecible.

    La receta secreta son los procesos Erlang, que son ligeros, no comparten memoria y son administrados por schedulers capaces de manejar millones a través de múltiples procesadores. Utiliza un recolector de basura que corre en un proceso por si mismo y está altamente optimizado para reducir el impacto en otros procesos. Como resultado de esto, el recolector de basura no afecta las propiedades globales en tiempo real del sistema. La BEAM es también la única máquina virtual utilizada ampliamente a escala con un modelo de distribución hecho a la medida, que permite a un programa ejecutarse en múltiples máquinas de manera transparente.

    Aspectos relevantes de la JVM

    La máquina virtual de Java o JVM por sus siglas en inglés (Java Virtual Machine) fue desarrollada por Sun Microsystem en un intento de proveer un plataforma en la que “ escribes código una vez ” y corre en donde sea. Crearon un lenguaje orientado a objetos, similar a C++ pero que fuera memory-safe ya que la detección de errores en tiempo de ejecución revisa los límites de los arreglos y las desreferencias de punteros. El ecosistema JVM se volvió extremamente popular en la era del Internet, convirtiéndose un estándar de-facto para el desarrollo de aplicaciones de servidores empresariales. El amplio rango de aplicabilidad fue posible gracias a una máquina virtual que se adapta a muchos  casos de uso y a un impresionante conjunto de bibliotecas que se adaptan al desarrollo empresarial.

    La JVM fue diseñada pensando en eficiencia. La mayoría de sus conceptos son una abstracción de características encontradas en populares sistemas operativos, como el modelo de hilos, similar al manejo de hilos o threads del sistema operativo. La JVM es altamente personalizable, incluyendo el recolector de basura y class loaders. Algunas implementaciones del recolector de basura de última generación brindan características ajustables que se adaptan a un modelo de programación basado en memoria compartida.

    La JVM le permite modificar el código mientras se ejecuta el programa. Y, un compilador JIT permite que el código de bytes se compile en el código de máquina nativo con la intención de acelerar partes de la aplicación.

    La concurrencia en el mundo de Java se relaciona principalmente con la ejecución de aplicaciones en subprocesos paralelos, lo que garantiza que sean rápidos. La programación con primitivas de concurrencia es una tarea difícil debido a los desafíos creados por su modelo de memoria compartida. Para superar estas dificultades, existen intentos de simplificar y unificar los modelos de programación concurrentes, como el marco Akka, que es el intento más exitoso.

    Concurrencia y Paralelismo

    Cuando hablamos de ejecución de código paralela, nos referismo a que partes del código se ejecutan al mismo tiempo en múltiples procesadores, o computadoras, mientras que programación concurrente se refiere al manejo de eventos que llegan de forma independiente. Una ejecución concurrente se puede simular en hardware de un solo subproceso, mientras que la ejecución en paralelo no. Aunque esta distinción puede parecer pedante, la diferencia da como resultado problemas por resolver con enfoques muy diferentes. Piense en muchos cocineros que preparan un plato de pasta carbonara. En el enfoque paralelo, las tareas se dividen entre la cantidad de cocineros disponibles, y una sola parte se completaría tan rápido como les tome a estos cocineros completar sus tareas específicas. En un mundo concurrente, obtendría una porción para cada cocinero, donde cada cocinero hace todas las tareas.

    Utilice el paralelismo para velocidad y la concurrencia para escalabilidad.

    La ejecución en paralelo intenta resolver una descomposición óptima del problema en partes independientes entre sí. Hervir el agua, sacar la pasta, mezclar el huevo, freír el jamón, rallar el queso. Los datos compartidos (o en nuestro ejemplo, el plato a servir) se manejan mediante bloqueos, mutexes y otras técnicas que garantizan la correcta ejecución. Otra forma de ver esto es que los datos (o ingredientes) están presentes y queremos utilizar tantos recursos de CPU paralelos como sea posible para terminar el trabajo lo más rápido que se pueda.

    La programación concurrente, por otro lado, trata con muchos eventos que llegan al sistema en diferentes momentos y trata de procesarlos todos dentro de un tiempo razonable. En arquitecturas multi-procesadores o distribuidas, parte de la ejecución se lleva a cabo en paralelo, pero esto no es un requisito. Otra forma de verlo es que el mismo cocinero hierve el agua, saca la pasta, mezcla los huevos, etc., siguiendo un algoritmo secuencial que es siempre el mismo. Lo que cambia entre procesos (o cocciones) son los datos (o ingredientes) en los que trabajar, que existen en múltiples instancias.

    La JVM está diseñada para el paralelismo, la BEAM para la concurrencia. Son dos problemas intrínsecamente diferentes, que requieren soluciones diferentes.

    La BEAM y la concurrencia

    La BEAM proporciona procesos ligeros para dar contexto al código en ejecución. Estos procesos, también llamados actores, no comparten memoria, sino que se comunican a través del paso de mensajes, copiando datos de un proceso a otro. El paso de mensajes es una característica que la máquina virtual implementa a través de buzones de correo que tienen los procesos individualmente. El paso de mensajes es una operación no-bloqueante, lo que significa que enviar un mensaje de un proceso a otro otro es casi instantáneo y la ejecución del remitente no se bloquea. Los mensajes enviados tienen la forma de datos inmutables, copiados de la pila del proceso remitente al buzón del proceso receptor. Esto se logra sin necesidad de bloqueos y mutexes entre los procesos, el único bloqueo en el buzón o mailbox es en caso de que varios procesos envíen un mensaje al mismo destinatario en paralelo.

    Los datos inmutables y el paso de mensajes permiten al programador escribir procesos que funcionan de forma independiente y que se centran en la funcionalidad en lugar del manejo de bajo nivel de la memoria y la programación de tareas. Este diseño simple no solo funciona en un solo proceso, sino también en múltiples threads en una máquina local que se ejecuta en la misma VM y utilizando la distribución integrada, a través de la red con un grupo de VMs y máquinas. Si los mensajes son inmutables entre procesos, se pueden enviar a otro subproceso (o máquina) sin bloqueos, escalando casi linealmente en arquitecturas distribuidas de varios procesadores. Los procesos se abordan de la misma manera en una VM local que en un clúster de VM, el envío de mensajes funciona de manera transparente, independientemente de la ubicación del proceso de recepción.

    Los procesos no comparten memoria, permitiendo replicar los datos para mayor resiliencia, y distribuirlos para escalar. Esto significa que se pueden tener dos instancias del mismo proceso en dos máquinas separadas, compartiendo una actualización de estado entre ellas. Si una máquina falla entonces la otra tiene una copia de los datos y puede continuar manejando la solicitud, haciendo el sistema tolerante a fallas. Si ambas máquinas están operando, ambos procesos pueden manejar peticiones, brindando así escalabilidad. La BEAM proporcional primitivas altamente optmizadas para que todo esto funcione sin problemas, mientras que OTP (la biblioteca estándar ) proporciona las construcciones de nivel superior para facilitar la vida de los programadores.

    Akka hace un gran trabajo al replicar las construcciones de nivel superior, pero esta de alguna manera limitado por la falta de primitivas proporcionadas por la JVm, permitiendo estar altamente optimizada para concurrencia. Si bien las primitivas de la JVM permiten una gama más amplia de casos de uso, hacen el desarrollo de sistemas distribuidos más complicado al no tener características por default para la comunicación y a menudo se basan en un modelo de memoria compartida. Por ejemplo, ¿en qué parte de un sistema distribuido coloca memoria compartida? ¿Y cuál es el costo de acceder a ella?

    Scheduler

    Mencionamos que una de las características más fuertes de la BEAM es la capacidad de dividir un programa en procesos pequeños y livianos. La gestión de estos procesos es tarea del scheduler . A diferencia de la JVM, que asigna sus subprocesos a threads del sistema operativo y deja que este los administre, la BEAM viene con su propio scheduler o administrador.

    El scheduler inicia por default un hilo del sistema operativo ( OS thread ) por cada procesador de la máquina y optimiza la carga entre ellos. Cada proceso consiste en código que será ejecutado y un estado que cambia con el tiempo. El scheduler escoge el primer proceso en la cola de ejecución que esté listo para correr, le asigna una cierta cantidad de reducciones para ejecutarse, donde cada reducción es el equivalente aproximado a un comando. Una vez que el proceso se ha quedado sin reducciones, sera bloqueado por I/O, y se queda esperando un mensaje o que pueda completar su ejecución, el scheduler escoge el siguiente proceso en la cola y lo despacha. Esta técnica es llamada preventiva.

    Mencionamos el framework Akka varias veces, ya que su más grande desventaja es la necesidad de anotar el código con scheduling points, ya que la administración no está dada a nivel de la JVM. Al quitar el control de las manos del programador, las propiedades en tiempo real son preservadas y garantizadas, ya que no hay riesgo de que accidentalmente se provoque inanición del proceso.

    Los procesos pueden ser esparcidos a todos los hilos disponiblews del scheduler y maximizar el uso de CPU. Hay muchas maneras de modificar el scheduler pero es raro y solo será requerido en ciertos casos límite, ya que las opciones predeterminadas cubren la mayoría de los patrones de uso.

    Hay un tema sensible que aparece con frequencia con respecto a los schedulers: como manejar funciones implementadas nativamente, o NIFs por sus siglas en inglés (Natively Implemented Functions). Un NIF es un fragmento de código escrito en C, compilado como una biblioteca y ejecutado en el mismo espacio de memoria que la BEAM para mayor velocidad. El problema con los NIF es que no son preventivos y pueden afectar a los schedulers. En versiones recientes de la BEAM, se agregó una nueva función, dirty schedulers , para brindar un mejor control de los NIF. Los dirty schedulers son schedulers separados que se ejecutan en diferentes subprocesos para minimizar la interrupción que puede causar un NIF en un sistema. La palabra dirty se refiere a la naturaleza del código que ejecutan estos schedulers.

    Recolector de Basura

    Los lenguajes de programación modernos utilizan un recolector de basura para el manejo de memoria. Los lenguajes en la BEAM no son la excepción. Confiar en la máquina virtual para manejar los recursos y administrar la memoria es muy útil cuando desea escribir código concurrente de alto nivel, ya que simplifica la tarea. La implementación subyacente del recolector de basura es bastante sencilla y eficiente, gracias al modelo de memoria basado en el estado inmutable. Los datos se copian, no se modifican y el hecho de que los procesos no compartan memoria elimina las interdependencias de los procesos que por consiguiente, no necesitan ser administradas.

    Otra característica del recolecto de basura de la BEAM es que solamente se ejecuta cuando es necesario, en un proceso por si solo, sin afectar otros procesos esperando en la cola de ejecución. Por lo tanto, el recolector de basura en Erlang no detine el mundo. Evita picos de latencia en el procesamiento, porque la máquina virtual nunca se detiene como un todo, solo se detienen procesos específicos, y nunca todos al mismo tiempo. En la práctica, es solo parte de lo que hace un proceso y se trata como otra reducción. El recolector de basura suspende el proceso por un intervalo muy corto, hablamos de microsegundos. Como resultado, habrá muchas ráfagas pequeñas, que se activarán solo cuando el proceso necesite más memoria. Un solo proceso generalmente no tiene asignadas grandes cantidades de memoria y, a menudo, es de corta duración, lo que reduce aún más el impacto al liberar inmediatamente toda su memoria. Una característica de la JVM es la capacidad de intercambiar recolectores de basura, así que al usar un recolector comercial, también es posible lograr un recolector continuo o non-stopping en la JVM.

    Las características del recolector de basura son discutidas en este excelente post por Lukas Larsson. Hay muchos detalles intrincados, pero está optimizada para manejar datos inmutables de manera eficiente, dividiendo los datos entre la pila y el heap para cada proceso. El mejor enfoque es hacer la mayor parte del trabajo en procesos de corta duración.

    Una pregunta que surge a menudo sobre este tema es cuánta memoria usa la BEAM. Si indgamos un poco, la máquina virtual asigna grandes porciones de memoria y utiliza allocators personalizados para almacenar los datos de manera eficiente y minimizar la sobrecarga de las llamadas al sistema. Esto tiene dos efectos visibles:

    1) La memoria utilizada disminuye gradualmente después de que no se necesita espacio

    2) La reasignación de grandes cantidades de datos podría significar duplicar la memoria de trabajo actual.

    El primer efecto puede, si es realmente necesario, mitigarse ajustando las estrategias del allocator . El segundo es fácil de monitorear y planificar si tiene visibilidad de los diferentes tipos de uso de la memoria. (Una de esas herramientas de monitoreo que proporciona métricas del sistema listas para usar es WombatOAM ).

    Hot Code Loading

    La carga de código en caliente o  hot code loading es probablemente la característica única más citada de BEAM. La carga de código en caliente significa que la lógica de la aplicación se puede actualizar cambiando el código ejecutable en el sistema mientras se conserva el estado del proceso interno. Esto se logra reemplazando los archivos BEAM cargados e instruyendo a la máquina virtual para que reemplace las referencias del código en los procesos en ejecución.

    Es una característica fundamental para garantizar que no habrá tiempo de inactividad en una infraestructura de telecomunicaciones, donde se utilizó hardware redundante para manejar los picos. Hoy en día, en la era de la contenerización, también se utilizan otras técnicas hacer actualizaciones a un sistema en producción. Aquellos que nunca lo han usado o requerido, lo descartan como una característica no tan importante, pero no hay que subestimarla en el flujo de trabajo de desarrollo. Los desarrolladores pueden iterar más rápido reemplazando parte de su código sin tener que reiniciar el sistema para probarlo. Incluso si la aplicación no está diseñada para ser actualizable en producción, esto puede reducir el tiempo necesario para volver a compilar y re-lanzar el sistema.

    Cuando no usar la BEAM

    Se trata en gran medida de saber escoger la herramienta adecuada para el trabajo.

    ¿Necesita un sistema que sea extremadamente rápido, pero no le preocupa la concurrencia? ¿Quiere manejar algunos eventos en paralelo de manera rápida? ¿Necesita procesar números para gráficos, IA o análisis? Siga la ruta de C++, Python o Java. La infraestructura de telecomunicaciones no necesita operaciones rápidas con floats , por lo que la velocidad nunca fue una prioridad. Con el tipado dinámico, que tiene que hacer todas las comprobaciones de tipo en tiempo de ejecución, las optimizaciones en el compilador no son tan triviales. Por lo tanto, es mejor dejar el procesamiento de números en manos de la JVM, Go u otros lenguajes que compilan de forma nativa. No es de sorprender que las operaciones de coma flotante en Erjang, la versión de Erlang que se ejecuta en la JVM, sean un 5000 % más rápidas que en la BEAM. Por otro lado, en donde hemos visto realmente brillar a la BEAM es en el uso de su concurrencia para orquestar el procesamiento de números, subcontratando el análisis a C, Julia, Python o Rust. Haces el mapa fuera de la BEAM y la reducción dentro de ella.

    El mantra siempre ha sido lo suficientemente rápido. Los humanos tardan unos cientos de milisegundos en percibir un estímulo (un evento) y procesarlo en su cerebro, lo que significa que el tiempo de respuesta de micro o nano segundos no es necesario para muchas aplicaciones. Tampoco es recomendable usar la BEAM para microcontroladores, ya que consume demasiados recursos. Pero para los sistemas integrados con un poco más de potencia de procesamiento, donde los multi-procesadores se están convirtiendo en la norma, se necesita concurrencia y la BEAM brilla ahí. En los años 90, estábamos implementando conmutadores de telefonía que manejaban decenas de miles de suscriptores que se ejecutaban en placas integradas con 16 MB de memoria. ¿Cuánta memoria tiene un RaspberryPi en estos días?

    Y por último, hard-real-time . Probablemente no quiera que la BEAM administre su sistema de control de bolsas de aire. Necesita garantías sólidas, un sistema operativo en tiempo real y un lenguaje sin recolección de basura ni excepciones. Una implementación de una máquina virtual de Erlang que se ejecuta en el metal, como GRiSP, le brindará garantías similares.

    Conclusión

    Utilice la herramienta adecuada para el trabajo.

    Si está escribiendo un sistema soft-real time que tiene que escalar fuera de la caja y nunca fallar, y hacerlo sin la molestia de tener que reinventar la rueda, definitivamente la BEAM es la tecnología que está buscando. Para muchos, funciona como una caja negra. No saber cómo funciona sería como conducir un Ferrari y no ser capaz de lograr un rendimiento óptimo o no entender de qué parte del motor proviene ese extraño sonido. Es por eso que es importante aprender más sobre la BEAM, comprender su funcionamiento interno y estar listo para ajustarlo. Para aquellos que han usado Erlang y Elixir con ira, hemos lanzado un curso dirigido por un instructor de un día que desmitificará y explicará mucho de lo que vio mientras lo prepara para manejar la concurrencia masiva a escala. El curso está disponible a través de nuestra nueva capacitación remota dirigida por un instructor; obtenga más información aquí . También recomendamos el libro The BEAM de Erik Stenman y BEAM Wisdoms , una colección de artículos de Dmytro Lytovchenko.

    The post Optimización para lograr concurrencia: comparación y contraste de las máquinas virtuales BEAM y JVM appeared first on Erlang Solutions .

    • wifi_tethering open_in_new

      This post is public

      www.erlang-solutions.com /blog/optimizacion-para-lograr-concurrencia-comparacion-y-contraste-de-las-maquinas-virtuales-beam-y-jvm/